Apress SQL Server 2012 Data Integration Recipes (2012).pdf

(50124 KB) Pobierz
1015627694.002.png
For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.
1015627694.003.png 1015627694.004.png
Contents at a Glance
About the Author �������������������������������������������������������������������������������������������������������� xlv
About the Technical Reviewers �������������������������������������������������������������������������������� xlvii
Acknowledgments ����������������������������������������������������������������������������������������������������� xlix
Introduction ������������������������������������������������������������������������������������������������������������������ �� li
Chapter 1: Sourcing Data from MS Ofice Applications
���������������������������������������������� 1
Chapter 2: Flat File Data Sources
����������������������������������������������������������������������������� 61
Chapter 3: XML Data Sources
��������������������������������������������������������������������������������� 133
Chapter 4: SQL Databases
�������������������������������������������������������������������������������������� 179
Chapter 5: SQL Server Sources
������������������������������������������������������������������������������� 241
Chapter 6: Miscellaneous Data Sources
����������������������������������������������������������������� 285
Chapter 7: Exporting Data from SQL Server
����������������������������������������������������������� 343
Chapter 8: Metadata
����������������������������������������������������������������������������������������������� 425
Chapter 9: Data Transformation
����������������������������������������������������������������������������� 481
Chapter 10: Data Proiling
�������������������������������������������������������������������������������������� 559
Chapter 11: Delta Data Management
���������������������������������������������������������������������� 619
Chapter 12: Change Tracking and Change Data Capture
���������������������������������������� 681
Chapter 13: Organising And Optimizing Data Loads
����������������������������������������������� 731
v
1015627694.005.png
Contents at a GlanCe
Chapter 14: ETL Process Acceleration
�������������������������������������������������������������������� 801
Chapter 15: Logging and Auditing
�������������������������������������������������������������������������� 853
Appendix A: Data Types
������������������������������������������������������������������������������������������ 931
Appendix B: Sample Databases and Scripts
����������������������������������������������������������� 973
Index�������������������������������������������������������������������������������������������������������������������������� 989
vi
Introduction
Microsoft SQL Server 2012 is a vast subject. One part of the ecosystem of this powerful and comprehensive
database which has evolved considerably over many years is data integration – or ETL if you want to use another
virtually synonymous term. Long gone are the days when BCP was the only available tool to load or export data.
Even DTS is now a distant memory. Today the user is spoilt for choice when it comes to the plethora of tools and
options available to get data into and out of the Microsoft RDBMS. his book is an attempt to shed some light on
many of the ways in which data can be both loaded into SQL Server and sent from it into the outside world. I also
try to give some ideas as to which techniques are the most appropriate to use when faced with various diferent
challenges and situations.
his book is not, however, just an SSIS manual. I have a profound respect for this excellent product, but
do not believe that it is the “one stop shop” which some developers take it to be. I wanted to show readers that
there are frequently alternative technologies which can be applied fruitfully in many ETL scenarios. Indeed my
philosophy is that when dealing with data you should always apply the right solution, and never believe that
there is only one answer. Consequently this book includes recipes on many of the other tools in the SQL Server
universe. Sometimes I have deliberately shown varied ways of dealing with essentially the same challenge. I
hope by doing this to arouse your curiosity and also to provide some practical examples of ways to get data from
myriad sources into SQL Server databases cleanly and eiciently.
Although this book speciically targets users of SQL Server 2012 I try, wherever feasible, to say if a recipe can
be applied to previous versions of the database. I also try and highlight any new features and diferences between
SQL Server 2012 and older versions. his is because it is unlikely that users will only ever deal with the latest
version of this RDBMS, and are likely to have multiple versions in production on most sites. I only ever go back to
SQL Server 2005 when pointing out how the database has evolved, as this was the version which introduced SSIS -
which was the major turning point in SQL Server-based ETL.
As the book is focused on SQL Server nearly all the code used is T-SQL. Some of the samples given are
extremely simple, others are more complex. All of it is concentrated on ETL requirements. Consequently you will
ind no OLTP or DBA-based examples in this book. You will ind a few touches of MDX where handling Analysis
Services data is concerned and some VB.Net where SSIS script tasks are used. I have chosen to use VB.Net in
nearly all the SSIS script tasks described in this book as it is, in my experience, the .Net language that many
T-SQL programmers are most familiar with. Nonetheless I have added one or two snippets of C# (particularly
where CLR assemblies are used) to avoid accusations of neglecting this particular language.
Data integration is a vast subject. Consequently, in an attempt to apply a little structure to a potentially
enormous and disparate domain, this book is divided into two main parts.
he irst part —Chapters 1 through 7—deals with the mechanics of getting data into and out of SQL Server.
Here you will ind the essential details of how to connect to various data sources, and then ingurgitate the data.
As many potential pitfalls and traps as possible are brought to your attention for each data source.
he second part —Chapters 8 through 15—deal with the wider ETL environment. Here we progress from the
nuts and bolts to the coordinated whole of extracting, transforming, and (eiciently) loading data. hese chapters
take the reader on a trip through the process of metadata analysis, data transformation, proiling source data,
logging data processes, and some of the ways of optimizing data loads.
For this book I decided to avoid the ubiquitous AdventureWorks, and use my own sample database. here
are a few reasons for this. Firstly, I thought that AdventureWorks was so large and complex that it could divert
attention from some of the techniques which I wanted to explain. I prefer to use an extremely simplistic data
li
1015627694.001.png
Zgłoś jeśli naruszono regulamin