Variable-Length Data

Variable-length data, also known as ragged data, is a common pattern in data analysis where each observation contains a sequence whose length may vary. Examples include tokenized text (where each document has a different number of words), grouped measurements (where each entity has a different number of observations), segmented time series, and feature lists per sample.

Numerics.NET provides first-class support for variable-length data through the ListVector<T> class, which efficiently stores and manipulates sequences of variable-length lists.

What is Variable-Length Data?

Variable-length data occurs when observations naturally contain sequences of different lengths. Common examples include:

Tokenized text: Documents contain different numbers of words or tokens.
Grouped measurements: Each subject or entity has a different number of measurements.
Segmented time series: Time periods contain different numbers of observations.
Feature lists: Each sample has a variable number of features or attributes.

Unlike fixed-length data that can be represented in a rectangular matrix, variable-length data requires a more flexible representation that preserves the natural grouping structure.

Relationship to Groupings

A Grouping<TKey> defines how a flat sequence is partitioned into groups. A ListVector<T>materializes this partitioning as data: each group becomes a list, and the collection of lists becomes a vector.

This means that aggregating over a list vector is conceptually equivalent to flattening it first and then aggregating by its grouping:

// A ListVector materializes a grouping
var flat = tokens.Flatten();
var grouping = tokens.Grouping;

// These two operations are equivalent:
var listSums = tokens.AggregateLists(Aggregators.Count);
var groupedSums = flat.AggregateBy(grouping, Aggregators.Count);

Visual Basic

' A ListVector materializes a grouping
Dim flat = tokens.Flatten()
Dim grouping = tokens.Grouping

' These two operations are equivalent:
Dim listSums = tokens.AggregateLists(Aggregators.Count)
Dim groupedSums = flat.AggregateBy(grouping, Aggregators.Count)

Visual Basic

No code example is currently available or this language may not be supported.

// A ListVector materializes a grouping
let flat = tokens.Flatten()
let grouping = tokens.Grouping

// These two operations are equivalent:
let listSums = tokens.AggregateLists(Aggregators.Count)
let groupedSums = flat.AggregateBy(grouping, Aggregators.Count)

A ListVector<T> is thus not merely a container; it is a grouped representation of data that preserves and exposes the grouping structure through its Grouping property.

ListVector<T> as the Core Abstraction

The ListVector<T> class represents a vector whose elements are variable-length lists. It has the following characteristics:

Fixed structure: The number of lists and their boundaries are immutable.
Variable list lengths: Each list can contain any number of elements.
Efficient storage: Values are stored in a flattened format similar to compressed sparse row (CSR) storage, with all values in a contiguous array and offsets indicating list boundaries.
Mutable values: Individual elements within lists can be modified when the vector attributes permit it.

Creating a list vector from nested data is straightforward:

// Create a list vector from nested data
var documents = new[] {
    new[] { "the", "quick", "brown" },
    new[] { "the", "lazy", "dog", "sleeps" },
    new[] { "pack", "my", "box" }
};
var tokens = Vector.CopyFromAsLists(documents);
Console.WriteLine($"List vector length: {tokens.Length}");
Console.WriteLine($"Total tokens: {tokens.FlattenedLength}");

Visual Basic

' Create a list vector from nested data
Dim documents = {
    New String() {"the", "quick", "brown"},
    New String() {"the", "lazy", "dog", "sleeps"},
    New String() {"pack", "my", "box"}
}
Dim tokens = Vector.CopyFromAsLists(documents)
Console.WriteLine($"List vector length: {tokens.Length}")
Console.WriteLine($"Total tokens: {tokens.FlattenedLength}")

Visual Basic

No code example is currently available or this language may not be supported.

// Helper to convert nested sequences to seq<seq<T>>
let asSeqs (xss: seq<#seq<'T>>) : seq<seq<'T>> =
    xss |> Seq.map (fun xs -> xs :> seq<'T>)
// Create a list vector from nested data
let documents = [|
    [| "the"; "quck"; "brown" |]
    [| "the"; "lazy"; "dog"; "sleeps" |]
    [| "pack"; "my"; "box" |]
|]
let tokens = Vector.CopyFromAsLists(documents |> asSeqs)
printfn $"List vector length: {tokens.Length}"
printfn $"Total tokens: {tokens.FlattenedLength}"

Typical Workflows

ListVector<T> supports a variety of common operations on variable-length data:

Aggregating Per List

Use the AggregateLists method to compute statistics for each list:

// Compute statistics per list
var observations = new[] {
    new[] { 1.5, 2.3, 1.8 },
    new[] { 4.1, 3.9, 4.5, 4.2 },
    new[] { 2.1, 2.0 }
};
var data = Vector.CopyFromAsLists(observations);

var means = data.AggregateLists(Aggregators.Mean);
var maxValues = data.AggregateLists(Aggregators.Max);
var counts = data.GetListLengths();

Console.WriteLine($"Means: {means}");
Console.WriteLine($"Max values: {maxValues}");
Console.WriteLine($"Counts: {counts}");

Visual Basic

' Compute statistics per list
Dim observations = {
    New Double() {1.5, 2.3, 1.8},
    New Double() {4.1, 3.9, 4.5, 4.2},
    New Double() {2.1, 2.0}
}
Dim data = Vector.CopyFromAsLists(observations)

Dim means = data.AggregateLists(Aggregators.Mean)
Dim maxValues = data.AggregateLists(Aggregators.Max)
Dim counts = data.GetListLengths()

Console.WriteLine($"Means: {means}")
Console.WriteLine($"Max values: {maxValues}")
Console.WriteLine($"Counts: {counts}")

Visual Basic

No code example is currently available or this language may not be supported.

// Compute statistics per list
let observations = [|
    [| 1.5; 2.3; 1.8 |]
    [| 4.1; 3.9; 4.5; 4.2 |]
    [| 2.1; 2.0 |]
|]
let data = Vector.CopyFromAsLists(observations |> asSeqs)

let means = data.AggregateLists(Aggregators.Mean)
let maxValues = data.AggregateLists(Aggregators.Max)
let counts = data.GetListLengths()

printfn $"Means: {means}"
printfn $"Max values: {maxValues}"
printfn $"Counts: {counts}"

Transforming Elements

Use the Map method to transform elements within each list:

// Transform elements within each list
var normalized = data.Map(x => x / 10.0);

// Transform using per-list scalars
var scaled = data.Map(means, (value, mean) => value - mean);

Visual Basic

' Transform elements within each list
Dim normalized = data.Map(Function(x) x / 10.0)

' Transform using per-list scalars
Dim scaled = data.Map(means, Function(value, mean) value - mean)

Visual Basic

No code example is currently available or this language may not be supported.

// Transform elements within each list
let normalized = data.Map(fun x -> x / 10.0)

// Transform using per-list scalars
let scaled = data.Map(means, fun value mean -> value - mean)

Flattening to a Standard Vector

Use the Flatten() method to convert the list vector back to a regular vector containing all elements in order:

// Flatten back to a regular vector
var allValues = data.Flatten();
var overallMean = allValues.Mean();

Visual Basic

' Flatten back to a regular vector
Dim allValues = data.Flatten()
Dim overallMean = allValues.Mean()

Visual Basic

No code example is currently available or this language may not be supported.

// Flatten back to a regular vector
let allValues = data.Flatten()
let overallMean = allValues.Mean()

Converting to a Fixed-Width Matrix

For downstream algorithms that require rectangular data, use the ToRowMatrix or ToColumnMatrix methods to convert the list vector to a matrix, padding or truncating lists as necessary:

// Convert to matrix for downstream analysis
var matrix = data.ToRowMatrix(paddingValue: 0.0);
Console.WriteLine($"Matrix shape: {matrix.RowCount}x{matrix.ColumnCount}");

Visual Basic

' Convert to matrix for downstream analysis
Dim matrix = data.ToRowMatrix(paddingValue:=0.0)
Console.WriteLine($"Matrix shape: {matrix.RowCount}x{matrix.ColumnCount}")

Visual Basic

No code example is currently available or this language may not be supported.

// Convert to matrix for downstream analysis
let matrix = data.ToRowMatrix(paddingValue = 0.0)
printfn $"Matrix shape: {matrix.RowCount}x{matrix.ColumnCount}"

List-Level Operations

ListVector<T> provides several methods for manipulating entire lists:

// Various list-level operations
var first3 = tokens.HeadLists(3);
var sorted = tokens.SortLists();
var reversed = tokens.ReverseLists();

Visual Basic

' Various list-level operations
Dim first3 = tokens.HeadLists(3)
Dim sorted = tokens.SortLists()
Dim reversed = tokens.ReverseLists()

Visual Basic

No code example is currently available or this language may not be supported.

// Various list-level operations
let first3 = tokens.HeadLists(3)
let sorted = tokens.SortLists()
let reversed = tokens.ReverseLists()

Variable-Length Data in Data Frames

Variable-length lists often appear as columns in data frames: each row represents an observation, and one or more columns contain lists whose lengths may vary from row to row. This is a first-class pattern in Numerics.NET, not an edge case.

ListVector<T> is the underlying representation for such columns:

// Create a data frame with a list-valued column
var ids = Vector.Create(new[] { 101, 102, 103 });
var measurements = Vector.CopyFromAsLists(new[] {
    new[] { 1.5, 2.3, 1.8 },
    new[] { 4.1, 3.9, 4.5, 4.2 },
    new[] { 2.1, 2.0 }
});

var df = DataFrame.FromColumns(
    ("ID", ids),
    ("Measurements", measurements)
);

Console.WriteLine(df);

Visual Basic

' Create a data frame with a list-valued column
Dim ids = Vector.Wrap(New Integer() {101, 102, 103})
Dim measurements = Vector.CopyFromAsLists(New Double()() {
    New Double() {1.5, 2.3, 1.8},
    New Double() {4.1, 3.9, 4.5, 4.2},
    New Double() {2.1, 2.0}
})

Dim df = DataFrame.FromColumns(
    ("ID", ids),
    ("Measurements", measurements)
)

Console.WriteLine(df)

Visual Basic

No code example is currently available or this language may not be supported.

// Create a data frame with a list-valued column
let ids = Vector.Create([| 101; 102; 103 |])
let measurements = Vector.CopyFromAsLists([|
    [| 1.5; 2.3; 1.8 |]
    [| 4.1; 3.9; 4.5; 4.2 |]
    [| 2.1; 2.0 |]
|])

let col (name: string) (v: #IVector) : struct (string * IVector) =
    struct (name, v :> IVector)

let df = DataFrame.FromColumns(
    [|
        col "ID" ids
        col "Measurements" measurements
    |]
)

printfn "%O" df

You can compute statistics on list-valued columns and add them as new columns:

// Compute statistics on list column
var measCol = df["Measurements"].As<IReadOnlyList<double>>();
var listVector = measCol as ListVector<double>;

if (listVector != null)
{
    var means = listVector.AggregateLists(Aggregators.Mean);
    df["Mean"] = means;

    var counts = listVector.GetListLengths();
    df["Count"] = counts;
}

Console.WriteLine(df);

Visual Basic

' Compute statistics on list column
Dim measCol = df("Measurements").As(Of IReadOnlyList(Of Double))()
Dim listVector = TryCast(measCol, ListVector(Of Double))

If listVector IsNot Nothing Then
    Dim means = listVector.AggregateLists(Aggregators.Mean)
    df("Mean") = means

    Dim counts = listVector.GetListLengths()
    df("Count") = counts
End If

Console.WriteLine(df)

Visual Basic

No code example is currently available or this language may not be supported.

// Compute statistics on list column
let measCol = df["Measurements"].As<IReadOnlyList<float>>()
let listVector = measCol :? ListVector<float>

if listVector then
    let lv = measCol :?> ListVector<float>
    let means = lv.AggregateLists(Aggregators.Mean)
    df["Mean"] <- means

    let counts = lv.GetListLengths()
    df["Count"] <- counts

printfn "%O" df

This pattern is particularly useful when working with grouped data that you want to preserve in a structured format rather than flattening immediately.

When to Use ListVector<T> vs Other Representations

Use ListVector<T> when:

List lengths vary meaningfully and should be preserved.
The grouping structure is important for analysis or aggregation.
You want to perform per-list operations or transformations.
The data naturally arises from a grouping operation.

Use dense matrices when:

Data is naturally rectangular (all rows/columns have the same length).
You have already padded or truncated lists to a fixed length for downstream algorithms (e.g., machine learning or statistical methods that require rectangular input).

In many workflows, you may start with a ListVector<T> to preserve the natural structure of your data, perform per-list analysis, and then convert to a matrix when needed for algorithms that require rectangular data.

Variable-Length Data

What is Variable-Length Data?

Relationship to Groupings

ListVector<T> as the Core Abstraction

Typical Workflows

Aggregating Per List

Transforming Elements

Flattening to a Standard Vector

Converting to a Fixed-Width Matrix

List-Level Operations

Variable-Length Data in Data Frames

When to Use ListVector<T> vs Other Representations

See Also

Reference

Other Resources