Getting the most out of the Numerics.NET Random API

This guide discusses the Random API introduced in Numerics.NET version 10.3 for performance-focused users. It assumes you care about hot loops, predictable overhead, and writing code the JIT can specialize. The theme is simple: Numerics.NET Random is layered so high performance is the default path, not an optional mode you have to “opt into”.

How the stack is built (and why it is fast)

Numerics.NET Random is organized into layers that separate algorithm mechanics from policy and user-facing behavior, while preserving a zero-overhead path for hot loops.

  • Layer 1: Engines (Numerics.NET.Random.Engines) Pure algorithm cores as struct state: a transition plus an output mapping. Engines implement specific algorithms for random number generation, like Xoshiro256** or pcg64. Engines support state persistence and are seeded via a SeedSequence.

  • Layer 2: Core generators (Numerics.NET.Random.Generators) Minimal struct state machines that present Layer 1 engine functionality through a canonical primitive interface (NextUInt32, NextUInt64, NextBytes), adding the small bits of machinery needed for efficient consumption by Layer 3 (caching, cursor/indexing for block engines).

  • Layer 3: Algorithm classes (Numerics.NET.Random) User-facing RNG types like Pcg64, Xoshiro256StarStar, Philox4x64. These apply policy (SeedProfile, stream separation rules, compatibility behaviors) and expose typed access to the underlying generator so code can extract a reference to the generator once and run specialized instances of loops.

  • Layer 4: Adapters (Numerics.NET.Random.Adapters) System.Random interop in both directions, plus recovery of the original Numerics.NET RNG when a System.Random instance is actually a wrapper.

The intended fast path is: construct a Layer 3 RNG, extract the Layer 2 generator once, then run your hot loop on the (reference to the) generator state. That is what allows the JIT to inline state transitions and avoid per-sample interface dispatch.

Seeding is a one-time initialization cost. In any workload that draws large volumes of random numbers, spending a few extra cycles up front to robustly initialize generator state is typically negligible compared to the cost of the simulation itself, and it is often a good trade for statistical safety.

Aim for the specialized fast path

The fastest path is the one where the compiler sees the exact engine type and can inline it into your code. In practice, that means using Numerics.NET RNG classes directly in sampling-heavy code, or writing kernels that accept constrained generics and ref generator structs.

C#
using Numerics.NET.Random;

var rng = new Pcg64(42);

// Prefer APIs that can specialize on concrete types
NormalDistribution.SampleInto(rng, values);

If you start from System.Random or stay on an interface path per sample, you will usually pay extra overhead that adds up quickly.

Prefer bulk generation over per-sample calls

Bulk APIs reduce overhead and give Numerics.NET a chance to keep loops tight and predictable. They are also the shape most likely to benefit from future SIMD optimizations.

Good:

C#
ContinuousUniformDistribution.SampleInto(rng, values, 0.0, 1.0);
NormalDistribution.SampleInto(rng, values);

Less good:

C#
for (int i = 0; i < values.Length; i++)
    values[i] = rng.NextDouble();

Writing your own kernels: always pass generator structs by ref

If you are implementing performance-critical routines, follow the Numerics.NET kernel style. Hot-path algorithms should accept Layer 2 references to generator structs, constrained to the primitive generator interface.

C#
using Numerics.NET.Random.Generators;

public static void MyKernel<TGenerator>(ref TGenerator generator, Span<ulong> dst)
    where TGenerator : struct, IRandomGenerator
{
    for (int i = 0; i < dst.Length; i++)
        dst[i] = generator.NextUInt64();
}

Why this constraint is the “magic”

With the where TGenerator : struct, IRandomGenerator type constraint, the JIT compiler can monomorphize the generic method. It generates a specialized version of MyKernel for each concrete generator type you use (one for the generator behind Pcg64, another for the generator behind Xoshiro256StarStar, and so on). That is what enables inlining of NextUInt64() and the engine step, and it removes per-sample interface or virtual dispatch from your inner loop.

This is also why Layer 2 is built from struct generators. If Layer 2 were class-based, the JIT would not be able to eliminate virtual dispatch in the same way, even with generic constraints.

WARNING: never pass generator structs by value

This is a correctness-critical rule, not just a performance guideline.

Generator structs are mutable state machines. Passing by value copies the RNG state.

In .NET, people are used to passing class instances by value (which copies a reference). With structs, you copy the entire state. That is a silent failure and it can be extremely hard to debug.

C#
// BUG: generator is copied. Callers can silently produce identical sequences.
public static void Buggy<TGenerator>(TGenerator generator, Span<ulong> dst)
    where TGenerator : struct, IRandomGenerator
{
    for (int i = 0; i < dst.Length; i++)
        dst[i] = generator.NextUInt64(); // operates on a copy
}

In parallel code, accidental copies can create forked identical streams and invalidate results.

Avoid per-sample interface calls: visualize the call chain

Interface calls are not “bad” in general. They are just expensive when they happen per sample.

Dispatch-heavy path (slow in tight loops)

hot loop
  -> IRandomSource.NextDouble()         // dispatch
       -> helper code
            -> NextUInt64()             // may dispatch again depending on shape
                 -> engine step

Specialized generic path (what you want)

hot loop
  -> Kernel<TGenerator>(ref generator)  // concrete type known or constrained
       -> generator.NextUInt64()        // direct call, inlinable
            -> engine step              // inlinable

The goal is simple: extract the generator once, then run the loop on ref TGenerator.

Streams: multiple independent sequences from one seed

Parallel performance requires independent streams. For most workloads, do not invent your own “seed math” and do not try to build stream splitting from ad-hoc skipping.

Simple and fast: bulk stream creation

C#
using Numerics.NET;

var streams = RandomSources.CreateStreams(count: 64, seed: 42);

// Each worker gets a stable stream identity.
Parallel.For(0, streams.Length, i =>
{
    var local = streams[i];
    // sample-heavy work here
});

CreateStreams handles the mixing for you. Streams are derived from the base seed plus a stream identity called the stream address, then expanded into full generator state using a robust initializer. The practical result is that stream 0 and stream 1 are not “neighbors” in a sequence. They start in statistically isolated territories without you having to design a scheme.

Advanced: stream trees and jump-based partitions

If you need nested parallelism, hierarchical stream derivation, or explicit control over stream identity, use the advanced stream APIs.

**Warning about arbitrary jumps: ** do not treat “jump ahead” as a general-purpose way to manufacture independent streams. Arbitrary skip distances are easy to get wrong and hard to reason about. If you use jump-based methods, use the library’s partitioning abstractions or the generator’s defined jump stride capability, not a self-invented skip schedule.

Counter-based RNGs: seekable randomness

Counter-based RNGs (CBRNGs) like Philox are designed so you can move around in counter space without iterating. For some workloads, that is the difference between “parallelism is annoying” and “parallelism is deterministic”.

What this enables

  • Give each worker a non-overlapping region of a conceptual stream by mapping work to counter coordinates.
  • Seek deterministically to a later region of the stream without burning cycles stepping a stateful generator.

Mental model:

Operation Traditional stateful RNG Counter-based RNG
Next value mutate state evaluate (counter, key) -> block
Skip far ahead loop many times advance counter (block units)

Example:

C#
using Numerics.NET.Random;

var rng = new Philox4x64(42); // Philox variant assumed; seed type may differ

// Deterministically jump in block space
rng.AdvanceBlock(1_000_000);

ulong x = rng.NextUInt64();

The key point is that seeking is defined by the algorithm’s counter and key model, not by choosing arbitrary jump distances.

Interop: treat System.Random as an exchange format, then unwrap

Interop should happen at the boundary, not in hot loops.

Use Numerics.NET RNG as System.Random

C#
using Numerics.NET.Random;

System.Random sys = new Pcg64(42).AsRandom();

Use System.Random as a random source

C#
using Numerics.NET;

IRandomSource rng = sys.AsRandomSource();

Optional fast path: recover original Numerics.NET RNG type

When System.Random is actually a wrapper around a Numerics.NET RNG, unwrapping lets you bypass the adapter and return to the native path without reseeding or losing state.

C#
using Numerics.NET.Random;

public void Process(System.Random sys)
{
    if (sys.TryUnwrap<Pcg64>(out var pcg))
    {
        RunFast(pcg);
        return;
    }

    RunGeneral(sys.AsRandomSource());
}

Note: TryUnwrap<T> is not expected to be common in application code. It is most useful in library code that wants to preserve fast paths.

Note on threading

Numerics.NET RNG instances are optimized for single-thread throughput. For parallel code:

  • use one instance per worker,
  • prefer stream creation (CreateStreams or advanced stream APIs) over ad-hoc seeding,
  • avoid shared RNG instances and synchronization in hot paths.

RandomSources.Shared is appropriate for thread-safe convenience randomness when reproducibility is not required.

Performance checklist

  • Prefer concrete RNG types (or ref-based constrained generics over Layer 2 generators).
  • Prefer bulk sampling APIs (distributions sampling into spans/arrays) over per-sample calls.
  • In hot loops, run on concrete RNG types or constrained generics so the JIT can monomorphize and inline.
  • For custom kernels, accept generator structs by ref and never pass generator state by value.
  • Avoid per-sample interface dispatch in inner loops; adapt once at the boundary.
  • For parallel work, use the built-in deterministic stream creation, not seed + i.
  • Use jump-based methods only through defined mechanisms; avoid arbitrary skip schedules.
  • Use System.Random interop only at boundaries; unwrap only when it is genuinely beneficial.

Categories:

Updated: