Apress Pro SQL Server 2012 Integration Services (2012).pdf

(24543 KB) Pobierz
1015627444.009.png
For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.
1015627444.010.png 1015627444.011.png
Contents at a Glance
About the Authors............................................................................................... xvii
About the Technical Reviewer........................................................................... xviii
Chapter 1: Introducing Integration Services........................................................... 1
Chapter 2: BIDS and SSMS.................................................................................... 11
Chapter 3: Hello World—Your First SSIS 2012 Package...................................... 43
Chapter 4: Connection Managers.......................................................................... 83
Chapter 5: Control Flow Basics........................................................................... 107
Chapter 6: Advanced Control Flow Tasks........................................................... 163
Chapter 7: Source and Destination Adapters...................................................... 203
Chapter 8: Data Flow Transformations............................................................... 245
Chapter 9: Variables, Parameters, and Expressions.......................................... 325
Chapter 10: Scripting.......................................................................................... 361
Chapter 11: Events and Error Handling............................................................... 405
Chapter 12: Data Profiling and Scrubbing.......................................................... 427
Chapter 13: Logging and Auditing...................................................................... 465
Chapter 14: Heterogeneous Sources and Destinations....................................... 487
Chapter 15: Data Flow Tuning and Optimization................................................ 511
Chapter 16: Parent-Child Design Pattern............................................................ 525
Chapter 17: Dimensional Data ETL...................................................................... 543
Chapter 18: Building Robust Solutions............................................................... 561
Chapter 19: Deployment Model........................................................................... 579
Index................................................................................................................... 605
iv
1015627444.012.png 1015627444.001.png 1015627444.002.png 1015627444.003.png
C H A P T E R 1
Introducing Integration Services
I’m the glue that holds everything together.
—Singer Otis Williams
Your business analysts have finished gathering business requirements. The database architect has
designed and built a database that can be described only as a work of art. The BI architects are designing
their OLAP cubes and the dimensional data marts that feed them. On the other hand, maybe you’re a
one-man show and have designed and built everything yourself. Either way, the only piece that’s missing
is a tool to bring it all together. Enter SQL Server Integration Services (SSIS).
Like Otis Williams, cofounder of the Motown group the Temptations, SSIS is the glue that holds it all
together. More than that, SSIS is the circulatory system of your data warehousing and BI solutions. SSIS
breathes life into your technical solutions by moving data—the lifeblood of your organization—from
disparate sources, along well-known paths, and injecting it directly into the heart of your system. Along
the way, SSIS can validate, cleanse, manipulate, transform, and enrich your data for maximum
effectiveness.
In this book, we’ll take you on a tour of SSIS, from building your very first SSIS package to
implementing complex multipackage design patterns seamlessly. This chapter introduces you to SSIS
and the concepts behind extract, transform, and load (ETL) processes in general. We begin at the
beginning, with a brief history of ETL, Microsoft-style.
A Brief History of Microsoft ETL
Before we dive headfirst into the details of Microsoft’s current-generation ETL processing solution, it’s
important to understand just what ETL is. As we have stated, ETL is an acronym for extract, transform,
and load , which is a very literal description of modern data manipulation and movement processes.
When we talk about ETL, we are specifically talking about (1) extracting data from a source, such as a
database or flat files; (2) transforming data, or manipulating and enriching it en route to its destination;
and (3) loading data into its destination, often a database.
Over the years, business requirements for data processing in nearly any industry you can point to
have grown more complex, even as the amount of data that needs to be processed has increased
exponentially. Unfortunately, the number of hours in a day has remained fairly constant over the same
time period, meaning you’re stuck with the same limited processing window each day to transport and
manipulate an ever-growing magnitude of data. ETL solutions have become increasingly sophisticated
and robust in response to these increased data processing demands of performance, flexibility, and
quality.
1
1015627444.004.png
CHAPTER 1 INTRODUCING INTEGRATION SERVICES
So we’ll begin our journey into SSIS by looking at how ETL has evolved in the SQL Server world. Up
to SQL Server 6.5, the bulk copy program (bcp) was the primary tool for loading data into SQL Server
databases. A command-line utility, bcp made loading basic text files into database tables fairly simple.
Unfortunately, the flip side of that simplicity was that you could use bcp only to load data from flat files,
and you couldn’t perform additional validations or transformations on the data during the load. A
common database-to-database ETL scenario with bcp might include extracting data from a database
server to a delimited text file, importing the file into a SQL Server database, and finally using T-SQL to
perform transformations on the data in the database. The bcp utility is still provided with all versions of
SQL Server, and is still used for simple one-off data loads from flat files on occasion.
In response to the increasing demands of ETL processing, Data Transformation Services (DTS)
made its first appearance in SQL Server 7. With DTS, you could grab data from a variety of sources,
transform it on the fly, and load it into the database. Although DTS was a much more sophisticated tool
than bcp, it still lacked much of the functionality required to develop enterprise-class ETL solutions.
With the release of SQL Server 2005, Microsoft replaced DTS with SQL Server Integration Services
(SSIS). SSIS is a true enterprise ETL solution with several advancements over its predecessors, including
built-in logging; support for a wide variety of complex transformation, data validation, and data
cleansing components; separation of process control from data flow; support for several types of data
sources and destinations; and the ability to create custom components, to name a few.
SSIS in SQL Server 12 represents the first major enhancement to SSIS since its introduction way
back in 2005. In this newest release, Microsoft has implemented major improvements in functionality
and usability. Some of the new goodness includes the ability to move ETL packages seamlessly between
environments, centralized storage and administration of SSIS packages, and a host of usability
enhancements. In this book, you’ll explore the core functionality you need to get up and running with
SSIS and the advanced functionality you need to implement the most complex ETL processing.
ETL: THE LOST YEARS
Although bcp is efficient, many developers and DBAs over the years found the need for solutions that can
perform more-complex solutions. During the “lost years” of SQL Server ETL, a large number of home-
grown ETL applications began to sprout up in shops all over the world. Many of these solutions were very
inefficient, featuring hard-coded sources and destinations and inflexible transformations. Even in the 21st
century, there are quite a few of these legacy home-brewed ETL applications running at some of the
world’s largest corporations. Building and maintaining in-house ETL applications from scratch can be an
interesting academic exercise, but it’s terribly inefficient. The extra time and money spent trying to
maintain and administer the code base for these applications can take a significant chunk of the resources
you could otherwise devote to designing, developing, and building out actual ETL solutions with an
enterprise ETL platform.
What Can SSIS Do for You?
SSIS provides a wide array of out-of-the-box functionality to accomplish common ETL-related tasks. The
major tasks you’ll encounter during most ETL processing include the following:
2
1015627444.005.png 1015627444.006.png 1015627444.007.png 1015627444.008.png
 
Zgłoś jeśli naruszono regulamin