Patterns_Of_Parallel_Programming_Cs_2010_Toub.pdf

PTTERNSOFPRLLELPROGRMMING

UNDERSTANDING AND APPLYING PARALLEL PATTERNS

WITH THE .NET FRAMEWORK 4 AND VISUAL C#

Stephen Toub

Parallel Computing Platform

Microsoft Corporation

Abstract

This document provides an in-depth tour of support in the Microsoft® .NET Framework 4 for parallel programming.

This includes an examination of commonparallelpatternsandhowthey’reimplementedwithoutandwiththis

new support, as well as best practices for developing parallel components utilizing parallel patterns.

Last Updated:

July 1, 2010

This material is provided for informational purposes only. Microsoft makes no warranties, express or implied.

TABLE OF CONTENT S

Introduction...................................................................................................................................................................3

Delightfully Parallel Loops .............................................................................................................................................4

Fork/Join ......................................................................................................................................................................36

Passing Data.................................................................................................................................................................49

Producer/Consumer ....................................................................................................................................................53

Aggregations................................................................................................................................................................67

MapReduce..................................................................................................................................................................75

Dependencies ..............................................................................................................................................................77

Data Sets of Unknown Size ..........................................................................................................................................88

Speculative Processing ................................................................................................................................................94

Laziness........................................................................................................................................................................97

Shared State ..............................................................................................................................................................105

Conclusion .................................................................................................................................................................118

Patterns of Parallel Programming

Page 2

INTRODUCTION

Patterns are everywhere, yielding software development best practices and helping to seed new generations of

developers with immediate knowledge of established directions on a wide array of problem spaces. Patterns

represent successful (or in the case of anti-patterns, unsuccessful) repeated and common solutions developers

have applied time and again in particular architectural and programming domains. Over time, these tried and true

practices find themselves with names, stature, and variations, helping further to proliferate their application and

to jumpstart many a project.

Patternsdon’tjustmanifestatthemacrolevel. Whereas design patterns typically cover architectural structure or

methodologies, coding patterns and building blocks also emerge, representing typical ways of implementing a

specific mechanism. Such patterns typically become ingrained in our psyche, and we code with them on a daily

basis without even thinking about it. These patterns represent solutions to common tasks we encounter

repeatedly.

Of course, finding good patterns can happen only after many successful and failed attempts at solutions. Thus for

new problem spaces, it can take some time for them to gain a reputation. Such is where our industry lies today

with regards to patterns for parallel programming. While developers in high-performance computing have had to

develop solutions for supercomputers and clusters for decades, the need for such experiences has only recently

found its way to personal computing, as multi-core machines have become the norm for everyday users. As we

move forward with multi-core into the manycore era, ensuring that all software is written with as much parallelism

and scalability in mind is crucial to the future of the computing industry. This makes patterns in the parallel

computing space critical to that same future.

“In general, a ‘multi-core’ chip refers to eight or fewer homogeneous cores in one

microprocessor package, whereas a ‘manycore’ chip has more than eight possibly

heterogeneous cores in one microprocessor package. In a manycore system, all cores

share the resources and services, including memory and disk access, provided by the

operating system.” –The Manycore Shift, (Microsoft Corp., 2007)

In the .NET Framework 4, a slew of new support has been added to handle common needs in parallel

programming, to help developers tackle the difficult problem that is programming for multi-core and manycore.

Parallel programming is difficult for many reasons and is fraughtwithperilsmostdevelopershaven’thadto

experience. Issues of races, deadlocks, livelocks, priority inversions, two-step dances, and lock convoys typically

have no place in a sequential world, and avoiding such issues makes quality patterns all the more important. This

new support in the .NET Framework 4 provides support for key parallel patterns along with building blocks to help

enable implementations of new ones that arise.

To that end, this document provides an in-depth tour of support in the .NET Framework 4 for parallel

programmingcommonparallelpatternsandhowthey’reimplementedwithoutand with this new support, and

best practices for developing parallel components in this brave new world.

This document only minimally covers the subject of asynchrony for scalable, I/O-bound applications: instead, it

focuses predominantly on applications of CPU-bound workloads and of workloads with a balance of both CPU and

I/O activity. This document also does not cover Visual F# in Visual Studio 2010, which includes language-based

support for several key parallel patterns.

Patterns of Parallel Programming

Page 3

DELIGHTFULLY PARALLEL LOOPS

Arguably the most well-known parallel pattern is that befitting “Embarrassingly Parallel”algorithms. Programs that

fit this pattern are able to run well in parallel because the many individual operations being performed may

operate in relative independence, with few or no dependencies between operations such that they can be carried

out in parallel efficiently. It’sunfortunatethatthe“embarrassing”monikerhasbeenappliedtosuchprogramsas

there’snothingatallembarrassingaboutthem. In fact, if more algorithms and problem domains mapped to the

embarrassing parallel domain, the software industry would be in a much better state of affairs. For this reason,

manyfolkshavestartedusingalternativenamesforthispatternsuchas“convenientlyparallel”“pleasantly

parallel”and“delightfullyparallel”in order to exemplify the true nature of these problems. If you find yourself

trying to parallelize a problem that fits this pattern, consider yourself fortunate, and expect that your

parallelization job will be much easier than it otherwise could have beenpotentiallyevena“delightful”activity

A significant majority of the work in many applications and algorithms is done through loop control constructs.

Loops, after all, often enable the application to execute a set of instructions over and over, applying logic to

discrete entities, whether those entities are integral values, such as in the case of a for loop, or sets of data, such

as in the case of a for each loop. Many languages have built-in control constructs for these kinds of loops,

Microsoft Visual C#® and Microsoft Visual Basic® being among them, the former with for and foreach keywords,

and the latter with For and For Each keywords. For problems that may be considered delightfully parallel, the

entities to be processed by individual iterations of the loops may execute concurrently: thus, we need a

mechanism to enable such parallel processing.

IMPLEMENTING A PARALLEL LOOPING CONSTRUCT

sdelightfullyparallelloopsaresuchapredominantpatternit’sreallyimportanttounderstandtheinsandouts

of how they work, and all of the tradeoffs implicit to the pattern. To understand these concepts further, we’ll build

a simple parallelized loop using support in the .NET Framework 3.5, prior to the inclusion of the more

comprehensive parallelization support introduced in the .NET Framework 4.

First, we need a signature. To parallelize a for loopwe’llimplementamethodthattakesthreeparameters: a

lower-bound, an upper-bound, and a delegate for the loop body that accepts as a parameter an integral value to

represent the current iteration index (that delegate will be invoked once for each iteration). Note that we have

several options for the behavior of these parameters. With C# and Visual Basic, the vast majority of for loops are

written in a manner similar to the following:

for ( int i = 0 ; i < upperBound; i++)

{

// ... loop body here

}

Visual Basic

For i As Integer = 0 To upperBound

' ... loop body here

Contrary to what a cursory read may tell you, these two loops are not identical: the Visual Basic loop will execute

one more iteration than will the C# loop. This is because Visual Basic treats the supplied upper-bound as inclusive,

Patterns of Parallel Programming

Page 4

whereas we explicitly specified it in C# to be exclusive through our use of the less-than operator. For our purposes

here, we’llfollowsuittotheC#implementationandwe’llhavetheupper-bound parameter to our parallelized

loop method represent an exclusive upper-bound:

public static void MyParallelFor(

int inclusiveLowerBound, int exclusiveUpperBound, Action < int > body);

Our implementation of this method will invoke the body of the loop once per element in the range

[inclusiveLowerBound,exclusiveUpperBound) , and will do so with as much parallelization as it can muster. To

accomplish that, we first need to understand how much parallelization is possible.

Wisdom in parallel circles often suggests that a good parallel implementation will use one thread per core. After

all, with one thread per core, we can keep all cores fully utilized. Any more threads, and the operating system will

need to context switch between them, resulting in wasted overhead spent on such activities; any fewer threads,

andthere’snochancewecantakeadvantageofallthatthemachinehastoofferasatleastonecorewillbe

guaranteed to go unutilized. This logic has some validity, at least for certain classes of problems. But the logic is

also predicated on an idealized and theoretical concept of the machine. As an example of where this notion may

break down, to do anything useful threads involved in the parallel processing need to access data, and accessing

data requires trips to caches or main memory or disk or the network or other stores that can cost considerably in

terms of access times; while such activities are in flight, a CPU may be idle. As such, while a good parallel

implementation may assume a default of one-thread-per-core, an open mindedness to other mappings can be

beneficial. For our initial purposes here, however, we’llstickwiththeone-thread-per core notion.

With the .NET Framework, retrieving the number of logical processors is achieved

using the System.Environment class, and in particular its ProcessorCount property.

Under the covers, .NET retrieves the corresponding value by delegating to the

GetSystemInfo native function exposed from kernel32.dll.

This value doesn’t necessarily correlate to the number of physical processors or even

to the number of physical cores in the machine. Rather, it takes into account the

number of hardware threads available. As an example, on a machine with two

sockets, each with four cores, each with two hardware threads (sometimes referred

to as hyperthreads), Environment.ProcessorCount would return 16.

Starting with Windows 7 and Windows Server 2008 R2, the Windows operating

system supports greater than 64 logical processors, and by default (largely for legacy

application reasons), access to these cores is exposed to applications through a new

concept known as “processor groups.” The .NET Framework does not provide

managed access to the processor group APIs, and thus Environment.ProcessorCount

will return a value capped at 64 (the maximum size of a processor group), even if the

machine has a larger number of processors. Additionally, in a 32-bit process,

ProcessorCount will be capped further to 32, in order to map well to the 32-bit mask

used to represent processor affinity (a requirement that a particular thread be

scheduled for execution on only a specific subset of processors).

Patterns of Parallel Programming

Page 5

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: