Patterns_Of_Parallel_Programming_Cs_2010_Toub.pdf

(1518 KB) Pobierz
447562572 UNPDF
PTTERNSOFPRLLELPROGRMMING
UNDERSTANDING AND APPLYING PARALLEL PATTERNS
WITH THE .NET FRAMEWORK 4 AND VISUAL C#
Stephen Toub
Parallel Computing Platform
Microsoft Corporation
Abstract
This document provides an in-depth tour of support in the Microsoft® .NET Framework 4 for parallel programming.
This includes an examination of commonparallelpatternsandhowthey’reimplementedwithoutandwiththis
new support, as well as best practices for developing parallel components utilizing parallel patterns.
Last Updated:
July 1, 2010
This material is provided for informational purposes only. Microsoft makes no warranties, express or implied.
©2010 Microsoft Corporation.
447562572.007.png
TABLE OF CONTENT S
Patterns of Parallel Programming
Page 2
447562572.008.png 447562572.009.png
INTRODUCTION
Patterns are everywhere, yielding software development best practices and helping to seed new generations of
developers with immediate knowledge of established directions on a wide array of problem spaces. Patterns
represent successful (or in the case of anti-patterns, unsuccessful) repeated and common solutions developers
have applied time and again in particular architectural and programming domains. Over time, these tried and true
practices find themselves with names, stature, and variations, helping further to proliferate their application and
to jumpstart many a project.
Patternsdon’tjustmanifestatthemacrolevel. Whereas design patterns typically cover architectural structure or
methodologies, coding patterns and building blocks also emerge, representing typical ways of implementing a
specific mechanism. Such patterns typically become ingrained in our psyche, and we code with them on a daily
basis without even thinking about it. These patterns represent solutions to common tasks we encounter
repeatedly.
Of course, finding good patterns can happen only after many successful and failed attempts at solutions. Thus for
new problem spaces, it can take some time for them to gain a reputation. Such is where our industry lies today
with regards to patterns for parallel programming. While developers in high-performance computing have had to
develop solutions for supercomputers and clusters for decades, the need for such experiences has only recently
found its way to personal computing, as multi-core machines have become the norm for everyday users. As we
move forward with multi-core into the manycore era, ensuring that all software is written with as much parallelism
and scalability in mind is crucial to the future of the computing industry. This makes patterns in the parallel
computing space critical to that same future.
“In general, a ‘multi-core’ chip refers to eight or fewer homogeneous cores in one
microprocessor package, whereas a ‘manycore’ chip has more than eight possibly
heterogeneous cores in one microprocessor package. In a manycore system, all cores
share the resources and services, including memory and disk access, provided by the
operating system.” –The Manycore Shift, (Microsoft Corp., 2007)
In the .NET Framework 4, a slew of new support has been added to handle common needs in parallel
programming, to help developers tackle the difficult problem that is programming for multi-core and manycore.
Parallel programming is difficult for many reasons and is fraughtwithperilsmostdevelopershaven’thadto
experience. Issues of races, deadlocks, livelocks, priority inversions, two-step dances, and lock convoys typically
have no place in a sequential world, and avoiding such issues makes quality patterns all the more important. This
new support in the .NET Framework 4 provides support for key parallel patterns along with building blocks to help
enable implementations of new ones that arise.
To that end, this document provides an in-depth tour of support in the .NET Framework 4 for parallel
programmingcommonparallelpatternsandhowthey’reimplementedwithoutand with this new support, and
best practices for developing parallel components in this brave new world.
This document only minimally covers the subject of asynchrony for scalable, I/O-bound applications: instead, it
focuses predominantly on applications of CPU-bound workloads and of workloads with a balance of both CPU and
I/O activity. This document also does not cover Visual F# in Visual Studio 2010, which includes language-based
support for several key parallel patterns.
Patterns of Parallel Programming
Page 3
447562572.010.png 447562572.001.png 447562572.002.png
DELIGHTFULLY PARALLEL LOOPS
Arguably the most well-known parallel pattern is that befitting “Embarrassingly Parallel”algorithms. Programs that
fit this pattern are able to run well in parallel because the many individual operations being performed may
operate in relative independence, with few or no dependencies between operations such that they can be carried
out in parallel efficiently. It’sunfortunatethatthe“embarrassing”monikerhasbeenappliedtosuchprogramsas
there’snothingatallembarrassingaboutthem. In fact, if more algorithms and problem domains mapped to the
embarrassing parallel domain, the software industry would be in a much better state of affairs. For this reason,
manyfolkshavestartedusingalternativenamesforthispatternsuchas“convenientlyparallel”“pleasantly
parallel”and“delightfullyparallel”in order to exemplify the true nature of these problems. If you find yourself
trying to parallelize a problem that fits this pattern, consider yourself fortunate, and expect that your
parallelization job will be much easier than it otherwise could have beenpotentiallyevena“delightful”activity
A significant majority of the work in many applications and algorithms is done through loop control constructs.
Loops, after all, often enable the application to execute a set of instructions over and over, applying logic to
discrete entities, whether those entities are integral values, such as in the case of a for loop, or sets of data, such
as in the case of a for each loop. Many languages have built-in control constructs for these kinds of loops,
Microsoft Visual C#® and Microsoft Visual Basic® being among them, the former with for and foreach keywords,
and the latter with For and For Each keywords. For problems that may be considered delightfully parallel, the
entities to be processed by individual iterations of the loops may execute concurrently: thus, we need a
mechanism to enable such parallel processing.
IMPLEMENTING A PARALLEL LOOPING CONSTRUCT
sdelightfullyparallelloopsaresuchapredominantpatternit’sreallyimportanttounderstandtheinsandouts
of how they work, and all of the tradeoffs implicit to the pattern. To understand these concepts further, we’ll build
a simple parallelized loop using support in the .NET Framework 3.5, prior to the inclusion of the more
comprehensive parallelization support introduced in the .NET Framework 4.
First, we need a signature. To parallelize a for loopwe’llimplementamethodthattakesthreeparameters: a
lower-bound, an upper-bound, and a delegate for the loop body that accepts as a parameter an integral value to
represent the current iteration index (that delegate will be invoked once for each iteration). Note that we have
several options for the behavior of these parameters. With C# and Visual Basic, the vast majority of for loops are
written in a manner similar to the following:
C#
for ( int i = 0 ; i < upperBound; i++)
{
// ... loop body here
}
Visual Basic
For i As Integer = 0 To upperBound
' ... loop body here
Next
Contrary to what a cursory read may tell you, these two loops are not identical: the Visual Basic loop will execute
one more iteration than will the C# loop. This is because Visual Basic treats the supplied upper-bound as inclusive,
Patterns of Parallel Programming
Page 4
447562572.003.png 447562572.004.png 447562572.005.png
whereas we explicitly specified it in C# to be exclusive through our use of the less-than operator. For our purposes
here, we’llfollowsuittotheC#implementationandwe’llhavetheupper-bound parameter to our parallelized
loop method represent an exclusive upper-bound:
C#
public static void MyParallelFor(
int inclusiveLowerBound, int exclusiveUpperBound, Action < int > body);
Our implementation of this method will invoke the body of the loop once per element in the range
[inclusiveLowerBound,exclusiveUpperBound) , and will do so with as much parallelization as it can muster. To
accomplish that, we first need to understand how much parallelization is possible.
Wisdom in parallel circles often suggests that a good parallel implementation will use one thread per core. After
all, with one thread per core, we can keep all cores fully utilized. Any more threads, and the operating system will
need to context switch between them, resulting in wasted overhead spent on such activities; any fewer threads,
andthere’snochancewecantakeadvantageofallthatthemachinehastoofferasatleastonecorewillbe
guaranteed to go unutilized. This logic has some validity, at least for certain classes of problems. But the logic is
also predicated on an idealized and theoretical concept of the machine. As an example of where this notion may
break down, to do anything useful threads involved in the parallel processing need to access data, and accessing
data requires trips to caches or main memory or disk or the network or other stores that can cost considerably in
terms of access times; while such activities are in flight, a CPU may be idle. As such, while a good parallel
implementation may assume a default of one-thread-per-core, an open mindedness to other mappings can be
beneficial. For our initial purposes here, however, we’llstickwiththeone-thread-per core notion.
With the .NET Framework, retrieving the number of logical processors is achieved
using the System.Environment class, and in particular its ProcessorCount property.
Under the covers, .NET retrieves the corresponding value by delegating to the
GetSystemInfo native function exposed from kernel32.dll.
This value doesn’t necessarily correlate to the number of physical processors or even
to the number of physical cores in the machine. Rather, it takes into account the
number of hardware threads available. As an example, on a machine with two
sockets, each with four cores, each with two hardware threads (sometimes referred
to as hyperthreads), Environment.ProcessorCount would return 16.
Starting with Windows 7 and Windows Server 2008 R2, the Windows operating
system supports greater than 64 logical processors, and by default (largely for legacy
application reasons), access to these cores is exposed to applications through a new
concept known as “processor groups.” The .NET Framework does not provide
managed access to the processor group APIs, and thus Environment.ProcessorCount
will return a value capped at 64 (the maximum size of a processor group), even if the
machine has a larger number of processors. Additionally, in a 32-bit process,
ProcessorCount will be capped further to 32, in order to map well to the 32-bit mask
used to represent processor affinity (a requirement that a particular thread be
scheduled for execution on only a specific subset of processors).
Patterns of Parallel Programming
Page 5
447562572.006.png
Zgłoś jeśli naruszono regulamin