Sunday 13 January 2013

Start using System.Threadng.Tasks.Parallel now!

If you haven't experienced the power of the System.Threading.Tasks namespace new in .NET 4 you're missing out. This post is about the Parallel class which takes all of the complexity out of the seemingly simple task of running multiple functions in parallel.

Before we get into it, it's important to understand understand how the Action generic class works first. An Action is basically a method that returns void.

Invoke(Action[] actions)

With the Invoke method you can simply pass in an array of Action objects and the method will return once all the Actions have completed.
Action[] actions = new Action[3];
actions[0] = () => DoSomething();
actions[1] = () => DoSomethingElse();
actions[2] = () => DoSomethingAgain();

Parallel.Invoke(actions);
The overload of Invoke, Invoke(ParallelOptions parallelOptions, Action[] actions) allows you to specify some options for the actions. Here are the options:
  • CancellationToken
    This allows us to exit the operation early
  • MaxDegreeOfParallelism
    Specifies how many of the actions can be run at one time
  • TaskScheduler
    Specifies a custom TaskScheduler
var options = new ParallelOptions()
{ 
    MaxDegreeOfParallelism = 20
};

Parallel.Invoke(options, actions);

ParallelLoopResult For(int fromInclusive, int toExclusive, Action<int> body)

The For function is just like a regular for loop, it takes an int to start at, an int to run up to and an Action<int> to be run on every int from start to finish.
Parallel.For(1, 4, (i) => Console.Write(i));
 
 
Possible (non-deterministic) output:
123
132
213
231
312
321
There are several overloads that allow specification of options and functions to be run before each iteration.

ForEach(IEnumerable, Action)

The ForEach again is much like the foreach loop, a set of data is provided and an Action is run on each of them in parallel.
var data = new List<string>() { "a", "b", "c" };

Parallel.ForEach(data, (e) => Console.Write(e));
Possible (non-deterministic) output:
abc
acb
bac
bca
cba
cab
Just like with For there are a number of overloads that allow you to set options and a function to be run before each iteration.

Be careful

As with any concurrent programming you need to be careful when using objects that are shared between the Actions. Any methods or properties that you are using need to be thread-safe, if you're unsure a quick Google can usually give you the answer. It's probably best to stay away from concurrent programming until you understand how this can go wrong.

I'll give a quick example of what's called a race condition which would result in an exception. Say our two Actions are both calling List<T>.Add(T) on the object instance of List. It is possible that Add will throw an IndexOutOfRangeException due to what is happening inside the Add method. List uses a dynamic array so let's say that it executes something like this:
If array.lastIndex < array.capacity
    double array.capacity
array[array.lastIndex] <- newvalue
array.lastIndex <- array.lastIndex + 1
Now our two Actions (A1, A2) could execute in the following order, resulting in an exception
A1: If array.lastIndex(2) < array.capacity(3)
A2: If array.lastIndex(2) < array.capacity(3)
A2: array[array.lastIndex(2)](null) <- newValue("action2")
A2: array.lastIndex(2) <- array.lastIndex(2) + 1
A1: array[array.lastIndex(3)](doesn't exist) <- newValue("action1")
In the above an attempt was made to add to the array in a position that doesn't yet exist because array.capacity was never increased. A way to get around race conditions like this is to use lock, but that's out of the scope of this post.