Getting the most out of the Numerics.NET Random API
This guide discusses the Random API introduced in Numerics.NET version 10.3 for performance-focused users. It assumes you care about hot loops, predictable overhead, and writing code the JIT can specialize. The theme is simple: Numerics.NET Random is layered so high performance is the default path, not an optional mode you have to “opt into”.
How the stack is built (and why it is fast)
Numerics.NET Random is organized into layers that separate algorithm mechanics from policy and user-facing behavior, while preserving a zero-overhead path for hot loops.
-
Layer 1: Engines (
Numerics.NET.Random.Engines) Pure algorithm cores asstructstate: a transition plus an output mapping. Engines implement specific algorithms for random number generation, like Xoshiro256** or pcg64. Engines support state persistence and are seeded via aSeedSequence. -
Layer 2: Core generators (
Numerics.NET.Random.Generators) Minimalstructstate machines that present Layer 1 engine functionality through a canonical primitive interface (NextUInt32,NextUInt64,NextBytes), adding the small bits of machinery needed for efficient consumption by Layer 3 (caching, cursor/indexing for block engines). -
Layer 3: Algorithm classes (
Numerics.NET.Random) User-facing RNG types likePcg64,Xoshiro256StarStar,Philox4x64. These apply policy (SeedProfile, stream separation rules, compatibility behaviors) and expose typed access to the underlying generator so code can extract a reference to the generator once and run specialized instances of loops. -
Layer 4: Adapters (
Numerics.NET.Random.Adapters)System.Randominterop in both directions, plus recovery of the original Numerics.NET RNG when aSystem.Randominstance is actually a wrapper.
The intended fast path is: construct a Layer 3 RNG, extract the Layer 2 generator once, then run your hot loop on the (reference to the) generator state. That is what allows the JIT to inline state transitions and avoid per-sample interface dispatch.
Seeding is a one-time initialization cost. In any workload that draws large volumes of random numbers, spending a few extra cycles up front to robustly initialize generator state is typically negligible compared to the cost of the simulation itself, and it is often a good trade for statistical safety.
Aim for the specialized fast path
The fastest path is the one where the compiler sees the exact engine type and can inline it
into your code. In practice, that means using Numerics.NET RNG classes directly
in sampling-heavy code, or writing kernels that accept constrained generics and ref generator structs.
using Numerics.NET.Random;
var rng = new Pcg64(42);
// Prefer APIs that can specialize on concrete types
NormalDistribution.SampleInto(rng, values);If you start from System.Random or stay on an interface path per sample,
you will usually pay extra overhead that adds up quickly.
Prefer bulk generation over per-sample calls
Bulk APIs reduce overhead and give Numerics.NET a chance to keep loops tight and predictable. They are also the shape most likely to benefit from future SIMD optimizations.
Good:
ContinuousUniformDistribution.SampleInto(rng, values, 0.0, 1.0);
NormalDistribution.SampleInto(rng, values);Less good:
for (int i = 0; i < values.Length; i++)
values[i] = rng.NextDouble();Writing your own kernels: always pass generator structs by ref
If you are implementing performance-critical routines, follow the Numerics.NET kernel style. Hot-path algorithms should accept Layer 2 references to generator structs, constrained to the primitive generator interface.
using Numerics.NET.Random.Generators;
public static void MyKernel<TGenerator>(ref TGenerator generator, Span<ulong> dst)
where TGenerator : struct, IRandomGenerator
{
for (int i = 0; i < dst.Length; i++)
dst[i] = generator.NextUInt64();
}Why this constraint is the “magic”
With the where TGenerator : struct, IRandomGenerator type constraint,
the JIT compiler can monomorphize the generic method.
It generates a specialized version of MyKernel for each concrete generator type you use
(one for the generator behind Pcg64, another for the generator behind Xoshiro256StarStar,
and so on). That is what enables inlining of NextUInt64() and the engine step,
and it removes per-sample interface or virtual dispatch from your inner loop.
This is also why Layer 2 is built from struct generators. If Layer 2 were class-based,
the JIT would not be able to eliminate virtual dispatch in the same way, even with generic constraints.
WARNING: never pass generator structs by value
This is a correctness-critical rule, not just a performance guideline.
Generator structs are mutable state machines. Passing by value copies the RNG state.
In .NET, people are used to passing class instances by value (which copies a reference). With structs, you copy the entire state. That is a silent failure and it can be extremely hard to debug.
// BUG: generator is copied. Callers can silently produce identical sequences.
public static void Buggy<TGenerator>(TGenerator generator, Span<ulong> dst)
where TGenerator : struct, IRandomGenerator
{
for (int i = 0; i < dst.Length; i++)
dst[i] = generator.NextUInt64(); // operates on a copy
}In parallel code, accidental copies can create forked identical streams and invalidate results.
Avoid per-sample interface calls: visualize the call chain
Interface calls are not “bad” in general. They are just expensive when they happen per sample.
Dispatch-heavy path (slow in tight loops)
hot loop
-> IRandomSource.NextDouble() // dispatch
-> helper code
-> NextUInt64() // may dispatch again depending on shape
-> engine step
Specialized generic path (what you want)
hot loop
-> Kernel<TGenerator>(ref generator) // concrete type known or constrained
-> generator.NextUInt64() // direct call, inlinable
-> engine step // inlinable
The goal is simple: extract the generator once, then run the loop on ref TGenerator.
Streams: multiple independent sequences from one seed
Parallel performance requires independent streams. For most workloads, do not invent your own “seed math” and do not try to build stream splitting from ad-hoc skipping.
Simple and fast: bulk stream creation
using Numerics.NET;
var streams = RandomSources.CreateStreams(count: 64, seed: 42);
// Each worker gets a stable stream identity.
Parallel.For(0, streams.Length, i =>
{
var local = streams[i];
// sample-heavy work here
});CreateStreams handles the mixing for you. Streams are derived from the base seed
plus a stream identity called the stream address, then expanded into full generator state
using a robust initializer. The practical result is that stream 0 and stream 1
are not “neighbors” in a sequence. They start in statistically isolated territories
without you having to design a scheme.
Advanced: stream trees and jump-based partitions
If you need nested parallelism, hierarchical stream derivation, or explicit control over stream identity, use the advanced stream APIs.
RandomStreamTree<TRandom>: mixing-based derivation of statistically independent streams.RandomStreamPartition<TRandom>: partitioning of a random stream by jumping ahead over long distances with mathematically guaranteed independence.
**Warning about arbitrary jumps: ** do not treat “jump ahead” as a general-purpose way to manufacture independent streams. Arbitrary skip distances are easy to get wrong and hard to reason about. If you use jump-based methods, use the library’s partitioning abstractions or the generator’s defined jump stride capability, not a self-invented skip schedule.
Counter-based RNGs: seekable randomness
Counter-based RNGs (CBRNGs) like Philox are designed so you can move around in counter space without iterating. For some workloads, that is the difference between “parallelism is annoying” and “parallelism is deterministic”.
What this enables
- Give each worker a non-overlapping region of a conceptual stream by mapping work to counter coordinates.
- Seek deterministically to a later region of the stream without burning cycles stepping a stateful generator.
Mental model:
| Operation | Traditional stateful RNG | Counter-based RNG |
|---|---|---|
| Next value | mutate state | evaluate (counter, key) -> block |
| Skip far ahead | loop many times | advance counter (block units) |
Example:
using Numerics.NET.Random;
var rng = new Philox4x64(42); // Philox variant assumed; seed type may differ
// Deterministically jump in block space
rng.AdvanceBlock(1_000_000);
ulong x = rng.NextUInt64();The key point is that seeking is defined by the algorithm’s counter and key model, not by choosing arbitrary jump distances.
Interop: treat System.Random as an exchange format, then unwrap
Interop should happen at the boundary, not in hot loops.
Use Numerics.NET RNG as System.Random
using Numerics.NET.Random;
System.Random sys = new Pcg64(42).AsRandom();Use System.Random as a random source
using Numerics.NET;
IRandomSource rng = sys.AsRandomSource();Optional fast path: recover original Numerics.NET RNG type
When System.Random is actually a wrapper around a Numerics.NET RNG, unwrapping lets you bypass the adapter and return to the native path without reseeding or losing state.
using Numerics.NET.Random;
public void Process(System.Random sys)
{
if (sys.TryUnwrap<Pcg64>(out var pcg))
{
RunFast(pcg);
return;
}
RunGeneral(sys.AsRandomSource());
}Note: TryUnwrap<T> is not expected to be common in application code. It is most useful in library code that wants to preserve fast paths.
Note on threading
Numerics.NET RNG instances are optimized for single-thread throughput. For parallel code:
- use one instance per worker,
- prefer stream creation (
CreateStreamsor advanced stream APIs) over ad-hoc seeding, - avoid shared RNG instances and synchronization in hot paths.
RandomSources.Shared
is appropriate for thread-safe convenience randomness when reproducibility is not required.
Performance checklist
- Prefer concrete RNG types (or
ref-based constrained generics over Layer 2 generators). - Prefer bulk sampling APIs (distributions sampling into spans/arrays) over per-sample calls.
- In hot loops, run on concrete RNG types or constrained generics so the JIT can monomorphize and inline.
- For custom kernels, accept generator structs by
refand never pass generator state by value. - Avoid per-sample interface dispatch in inner loops; adapt once at the boundary.
- For parallel work, use the built-in deterministic stream creation, not
seed + i. - Use jump-based methods only through defined mechanisms; avoid arbitrary skip schedules.
- Use
System.Randominterop only at boundaries; unwrap only when it is genuinely beneficial.