Using Parallelism
Version 4.0 of the Microsoft .NET Framework introduced improvements that make it possible to parallelize code by using multiple cores or processors. From version 4.0 onwards, Extreme Numerics.NET takes advantage of this Task Parallel Library to speed up many calculations.
When to parallelize
Parallelizing code can lead to significant improvements in performance. However, it is no silver bullet. There is some overhead that comes with running code in parallel. So when not used appropriately, parallelization can actually degrade performance.
Not all code can be parallelized. Moreover, the ideal situation of an N-fold speed-up using N cores cannot always be achieved. The speed-up depends on the proportion of the code that can be parallelized. The larger that proportion, the closer we can get to the ideal.
For small calculations, parallelization doesn't make much sense. In general, a calculation must involve a few thousand instructions to see any gains from running on multiple cores.
This leads us to the last and most important aspect: what to parallelize. Most calculations are organized like a tree, where some branches can be executed in parallel. The best place to parallelize is where the chunk size is large enough that the overhead is relatively small, but small enough so that dependent tasks are not left waiting for large chunks of work to be finished.
It is important to pick a specific level at which to parallelize. It does not make sense to parallelize at a high and at a low level at the same time. From the point of view of a smaller (lower level) task, the division of labor has already been done. The net result would be more overhead and worse performance.
In practical terms, you will have to decide if you want to parallelize your code or use the parallelized versions of algorithms in the numerical libraries. Doing both may make sense in some instances, but may be problematic in others.
For more information on using the Task Parallel Library in .NET 4.0 to parallelize your code, see the Microsoft Parallel Computing Developer Center.
Parallel features
There are many ways you can take advantage of the built-in parallelization features in Extreme Numerics.NET.
Most methods that were parallelized will run in parallel automatically. However, it may be desirable to have more control. For example, you may want to make use of the task cancellation mechanism in the task parallel library. Or, if you are calling the code from multiple threads, you may want to force the library to run single-threaded to avoid overhead.
Parallelism in algorithms
Many of the algorithms in Extreme Numerics.NET have been parallelized. Every type that inherits from ManagedIterativeAlgorithm<T, TError, TReport> supports parallelization, although in some cases it has not been completely implemented. The cancellation mechanism from the task parallel library is fully supported.
Two mechanisms are provided. First, the MaxDegreeOfParallelism property specifies how many threads may be used in the calculation. The default value is -1, which means the number of threads is determined by the concurrency runtime.
Second, many of the implementations have worker methods that are overloaded to take an additional ParallelOptions argument. The options supplied in this argument override any set by other means, including the algorithm's MaxDegreeOfParallelism property. If you want to enable the cancellation mechanism, then you must use this method.
Parallelism in statistical models
The classes that implement statistical models have a similar mechanism. The MaxDegreeOfParallelism property is present once again. It specifies how many threads may be used in the calculation. The default value is -1, which means the number of threads is determined by the concurrency runtime.
The Compute() method, which must be called to compute the model, is overloaded. It takes an optional ParallelOptions argument. The options supplied in this argument override any set by other means, including the algorithm's MaxDegreeOfParallelism property. If you want to enable the cancellation mechanism, then you must use this method.
Not all statistical models lend themselves easily to parallelization. The speed-up and scalability varies greatly between models.
Using the parallel native libraries
One of the simplest ways to parallelize your code is to take advantage of the parallelism built into the native libraries for linear algebra and signal processing.
Extreme Numerics.NET includes native libraries with processor-optimized code based on Intel's Math Kernel Library. Each library is available in a serial and in a parallel version. In addition, separate versions exists for 32 bit and 64 bit platforms, and for single and double precision.
The parallel libraries use OpenMP for threading, which is not compatible with the .NET concurrency runtime. For this reason, native parallelism and parallelism in your code should not be mixed. While it is fine to have multiple threads in your application, the parallel native libraries should not be called from multiple threads, or data corruption may result.
To enable the parallel libraries, set the MaxDegreeOfParallelism property of the provider to a value greater than 1, or -1 to use the number of cores.
Using the parallel managed libraries
The equivalent managed linear algebra and signal processing libraries have also been parallelized. This is currently only exposed in the provider API. Each provider instance has a fixed degree of parallelism. Instances are available with degrees from 1 to the number of cores in the machine. To select the provider instance with a specific degree of parallelism, use the WithMaxDegreeOfParallelism(Int32) method of any managed provider instance.