Logistic Regression in Visual Basic QuickStart Sample

Illustrates how to use the LogisticRegressionModel class to create logistic regression models in Visual Basic.

View this sample in: C# F# IronPython

Option Infer On

Imports Numerics.NET.Data.Text
Imports Numerics.NET.DataAnalysis
Imports Numerics.NET.Statistics
Imports Numerics.NET.Statistics.Tests

' Illustrates building logistic regression models using 
' the LogisticRegressionModel class in the 
' Numerics.NET.Statistics namespace of Numerics.NET.
Module LogisticRegression

    Sub Main()
        ' The license is verified at runtime. We're using
        ' a 30 day trial key here. For more information, see
        '     https://numerics.net/trial-key
        Numerics.NET.License.Verify("64542-18980-57619-62268")

        ' Logistic regression can be performed using 
        ' the LogisticRegressionModel class.
        '
        ' This QuickStart sample uses data from a study of factors
        ' that determine low birth weight at Baystate Medical Center.
        ' from Belsley, Kuh and Welsch. The fields are as follows:
        '   AGE:  Mother's age.
        '   LWT:  Mother's weight.
        '   RACE: 1=white, 2=black, 3=other.
        '   FVT:  Number of physician visits during the 1st trimester.
        '   LOW:  Low birth weight indicator.

        ' First, read the data from a file into an ADO.NET DataTable. 
        ' For the sake of clarity, we put this code in its own method.
        Dim data = FixedWidthTextFile.ReadDataFrame(
                "..\..\..\..\Data\lowbwt.txt",
                {4, 11, 18, 25, 33, 42, 49, 55, 61, 68})

        ' We need indicator variables for the race. All we need to do
        ' is mark the variable as categorical:
        data.MakeCategorical("RACE", Index.Create({1, 2, 3}))

        ' Now create the regression model. Parameters are the name 
        ' of the dependent variable, a string array containing 
        ' the names of the independent variables, and the data frame
        ' containing all variables.

        ' Note that RACE, which is a categorical variable, is automatically
        ' expanded into indicator variables.
        Dim model As LogisticRegressionModel = New LogisticRegressionModel(data, "LOW",
                New String() {"AGE", "LWT", "RACE", "FTV"})

        ' Alternatively, we can use a formula to describe the variables
        ' in the model. The dependent variable goes on the left, the
        ' independent variables on the right of the ~
        model = New LogisticRegressionModel(data, "LOW ~ AGE + LWT + RACE + FTV")

        ' The Fit method performs the actual regression analysis.
        model.Fit()

        ' The Parameters collection contains information about the regression 
        ' parameters.
        Console.WriteLine("Variable              Value    Std.Error  t-stat  p-Value")
        For Each parameter In model.Parameters
            ' Parameter objects have the following properties:
            ' Name, usually the name of the variable:
            ' Estimated value of the parameter:
            ' Standard error:
            ' The value of the t statistic for the hypothesis that the parameter is zero.
            ' Probability corresponding to the t statistic.
            Console.WriteLine("{0,-20}{1,10:F5}{2,10:F5}{3,8:F2} {4,7:F4}",
                    parameter.Name,
                    parameter.Value,
                    parameter.StandardError,
                    parameter.Statistic,
                    parameter.PValue)
        Next

        ' The log-likelihood of the computed solution is also available:
        Console.WriteLine($"Log-likelihood: {model.LogLikelihood:F4}")

        ' We can test the significance by looking at the results
        ' of a log-likelihood test, which compares the model to
        ' a constant-only model:
        Dim lrt As SimpleHypothesisTest = model.GetLikelihoodRatioTest()
        Console.WriteLine("Likelihood-ratio test: chi-squared={0:F4}, p={1:F4}", lrt.Statistic, lrt.PValue)

        ' We can compute a model with fewer parameters:
        Dim model2 As LogisticRegressionModel = New LogisticRegressionModel(data, "LOW",
                New String() {"LWT", "RACE"})
        model2.Fit()

        ' Print the results...
        Console.WriteLine("Variable              Value    Std.Error  t-stat  p-Value")
        For Each parameter In model2.Parameters
            Console.WriteLine("{0,-20}{1,10:F5}{2,10:F5}{3,8:F2} {4,7:F4}",
                    parameter.Name, parameter.Value, parameter.StandardError,
                    parameter.Statistic, parameter.PValue)
            ' ...including the log-likelihood:
        Next

        Console.WriteLine($"Log-likelihood: {model2.LogLikelihood:F4}")

        ' We can now compare the original model to this one, once again
        ' using the likelihood ratio test:
        lrt = model.GetLikelihoodRatioTest(model2)
        Console.WriteLine("Likelihood-ratio test: chi-squared={0:F4}, p={1:F4}", lrt.Statistic, lrt.PValue)

        '
        ' Multinomial (polytopous) logistic regression
        ' 

        ' The LogisticRegressionModel class can also be used
        ' for logistic regression with more than 2 responses.
        ' The following example is from "Applied Linear Statistical
        ' Models."

        ' Load the data into a matrix
        Dim columnNames = {"id", "duration", "x2", "x3", "x4",
                "nutritio", "agecat1", "agecat3", "alcohol", "smoking"}
        Dim frame = FixedWidthTextFile.ReadDataFrame(
                "..\..\..\..\Data\mlogit.txt",
                New FixedWidthTextOptions(
                    {5, 10, 15, 20, 25, 32, 37, 42, 47},
                    columnHeaders:=False)).WithColumnIndex(columnNames)

        ' For multinomial regression, the response variable must be
        ' a CategoricalVariable:
        frame.MakeCategorical("duration")

        ' The constructor takes an extra argument of type
        ' LogisticRegressionMethod:
        Dim model3 As New LogisticRegressionModel(frame, "duration",
                {"nutritio", "agecat1", "agecat3", "alcohol", "smoking"})
        model3.Method = LogisticRegressionMethod.Nominal

        ' When using a formula, we can use '.' as a shortcut 
        ' for all unused variables in the data frame.
        ' Because duration has 3 levels, nominal logistic regression
        ' Is automatically inferred.
        model3 = New LogisticRegressionModel(frame,
                "duration ~ nutritio + agecat1 + agecat3 + alcohol + smoking")

        ' Everything else is the same:
        model3.Fit()

        ' There is a set of parameters for each level of the
        ' response variable. The highest level is the reference 
        ' level and has no associated parameters.
        For Each p In model3.Parameters
            Console.WriteLine(p.ToString())
        Next

        Console.WriteLine($"Log likelihood:  {model3.LogLikelihood:F4}")

        ' To test the hypothesis that all the slopes are zero,
        ' use the GetLikelihoodRatioTest method.
        lrt = model3.GetLikelihoodRatioTest()
        Console.WriteLine("Test that all slopes are zero: chi-squared={0:F4}, p={1:F4}", lrt.Statistic, lrt.PValue)

        Console.WriteLine("Press Enter key to continue.")
        Console.ReadLine()
    End Sub

End Module