Logistic Regression in Visual Basic QuickStart Sample

Illustrates how to use the LogisticRegressionModel class to create logistic regression models in Visual Basic.

This sample is also available in: C#, F#, IronPython.

Overview

This QuickStart sample demonstrates how to perform binary and multinomial logistic regression analysis using the LogisticRegressionModel class in Numerics.NET.

The sample shows two key examples:

  1. A binary logistic regression model analyzing factors that determine low birth weight, using data from Baystate Medical Center. The example demonstrates:
    • Loading data from fixed-width text files
    • Creating and fitting logistic regression models using both explicit variable lists and formulas
    • Handling categorical predictors
    • Interpreting model parameters and statistics
    • Performing likelihood ratio tests to compare nested models
  2. A multinomial logistic regression example analyzing duration data with three response levels. This example shows:
    • Working with categorical response variables
    • Specifying multinomial models
    • Interpreting results with multiple response levels
    • Testing global model significance

The sample includes detailed output of parameter estimates, standard errors, test statistics, and p-values, demonstrating how to access and interpret the full range of model diagnostics available in Numerics.NET’s logistic regression implementation.

The code

Option Infer On

Imports Numerics.NET.Data.Text
Imports Numerics.NET.DataAnalysis
Imports Numerics.NET.Statistics
Imports Numerics.NET.Statistics.Tests

' Illustrates building logistic regression models using
' the LogisticRegressionModel class in the
' Numerics.NET.Statistics namespace of Numerics.NET.
Module LogisticRegression

    Sub Main()
        ' The license is verified at runtime. We're using
        ' a 30 day trial key here. For more information, see
        '     https://numerics.net/trial-key
        Numerics.NET.License.Verify("your-trial-key-here")

        ' Logistic regression can be performed using
        ' the LogisticRegressionModel class.
        '
        ' This QuickStart sample uses data from a study of factors
        ' that determine low birth weight at Baystate Medical Center.
        ' from Belsley, Kuh and Welsch. The fields are as follows:
        '   AGE:  Mother's age.
        '   LWT:  Mother's weight.
        '   RACE: 1=white, 2=black, 3=other.
        '   FVT:  Number of physician visits during the 1st trimester.
        '   LOW:  Low birth weight indicator.

        ' First, read the data from a file into an ADO.NET DataTable.
        ' For the sake of clarity, we put this code in its own method.
        Dim data = FixedWidthTextFile.ReadDataFrame(
                "..\..\..\..\..\Data\lowbwt.txt",
                {4, 11, 18, 25, 33, 42, 49, 55, 61, 68})

        ' We need indicator variables for the race. All we need to do
        ' is mark the variable as categorical:
        data.MakeCategorical("RACE", Index.Create({1, 2, 3}))

        ' Now create the regression model. Parameters are the name
        ' of the dependent variable, a string array containing
        ' the names of the independent variables, and the data frame
        ' containing all variables.

        ' Note that RACE, which is a categorical variable, is automatically
        ' expanded into indicator variables.
        Dim model As LogisticRegressionModel = New LogisticRegressionModel(data, "LOW",
                New String() {"AGE", "LWT", "RACE", "FTV"})

        ' Alternatively, we can use a formula to describe the variables
        ' in the model. The dependent variable goes on the left, the
        ' independent variables on the right of the ~
        model = New LogisticRegressionModel(data, "LOW ~ AGE + LWT + RACE + FTV")

        ' The Fit method performs the actual regression analysis.
        model.Fit()

        ' The Parameters collection contains information about the regression
        ' parameters.
        Console.WriteLine("Variable              Value    Std.Error  t-stat  p-Value")
        For Each parameter In model.Parameters
            ' Parameter objects have the following properties:
            ' Name, usually the name of the variable:
            ' Estimated value of the parameter:
            ' Standard error:
            ' The value of the t statistic for the hypothesis that the parameter is zero.
            ' Probability corresponding to the t statistic.
            Console.WriteLine("{0,-20}{1,10:F5}{2,10:F5}{3,8:F2} {4,7:F4}",
                    parameter.Name,
                    parameter.Value,
                    parameter.StandardError,
                    parameter.Statistic,
                    parameter.PValue)
        Next

        ' The log-likelihood of the computed solution is also available:
        Console.WriteLine($"Log-likelihood: {model.LogLikelihood:F4}")

        ' We can test the significance by looking at the results
        ' of a log-likelihood test, which compares the model to
        ' a constant-only model:
        Dim lrt As SimpleHypothesisTest = model.GetLikelihoodRatioTest()
        Console.WriteLine("Likelihood-ratio test: chi-squared={0:F4}, p={1:F4}", lrt.Statistic, lrt.PValue)

        ' We can compute a model with fewer parameters:
        Dim model2 As LogisticRegressionModel = New LogisticRegressionModel(data, "LOW",
                New String() {"LWT", "RACE"})
        model2.Fit()

        ' Print the results...
        Console.WriteLine("Variable              Value    Std.Error  t-stat  p-Value")
        For Each parameter In model2.Parameters
            Console.WriteLine("{0,-20}{1,10:F5}{2,10:F5}{3,8:F2} {4,7:F4}",
                    parameter.Name, parameter.Value, parameter.StandardError,
                    parameter.Statistic, parameter.PValue)
            ' ...including the log-likelihood:
        Next

        Console.WriteLine($"Log-likelihood: {model2.LogLikelihood:F4}")

        ' We can now compare the original model to this one, once again
        ' using the likelihood ratio test:
        lrt = model.GetLikelihoodRatioTest(model2)
        Console.WriteLine("Likelihood-ratio test: chi-squared={0:F4}, p={1:F4}", lrt.Statistic, lrt.PValue)

        '
        ' Multinomial (polytopous) logistic regression
        '

        ' The LogisticRegressionModel class can also be used
        ' for logistic regression with more than 2 responses.
        ' The following example is from "Applied Linear Statistical
        ' Models."

        ' Load the data into a matrix
        Dim columnNames = {"id", "duration", "x2", "x3", "x4",
                "nutritio", "agecat1", "agecat3", "alcohol", "smoking"}
        Dim frame = FixedWidthTextFile.ReadDataFrame(
                "..\..\..\..\..\Data\mlogit.txt",
                New FixedWidthTextOptions(
                    {5, 10, 15, 20, 25, 32, 37, 42, 47},
                    columnHeaders:=False)).WithColumnIndex(columnNames)

        ' For multinomial regression, the response variable must be
        ' a CategoricalVariable:
        frame.MakeCategorical("duration")

        ' The constructor takes an extra argument of type
        ' LogisticRegressionMethod:
        Dim model3 As New LogisticRegressionModel(frame, "duration",
                {"nutritio", "agecat1", "agecat3", "alcohol", "smoking"})
        model3.Method = LogisticRegressionMethod.Nominal

        ' When using a formula, we can use '.' as a shortcut
        ' for all unused variables in the data frame.
        ' Because duration has 3 levels, nominal logistic regression
        ' Is automatically inferred.
        model3 = New LogisticRegressionModel(frame,
                "duration ~ nutritio + agecat1 + agecat3 + alcohol + smoking")

        ' Everything else is the same:
        model3.Fit()

        ' There is a set of parameters for each level of the
        ' response variable. The highest level is the reference
        ' level and has no associated parameters.
        For Each p In model3.Parameters
            Console.WriteLine(p.ToString())
        Next

        Console.WriteLine($"Log likelihood:  {model3.LogLikelihood:F4}")

        ' To test the hypothesis that all the slopes are zero,
        ' use the GetLikelihoodRatioTest method.
        lrt = model3.GetLikelihoodRatioTest()
        Console.WriteLine("Test that all slopes are zero: chi-squared={0:F4}, p={1:F4}", lrt.Statistic, lrt.PValue)

        Console.WriteLine("Press Enter key to continue.")
        Console.ReadLine()
    End Sub

End Module