Curve Fitting Sample

Numerics.NET makes it very easy to fit data to arbitrary curves.

The mathematics of Curve Fitting

Linear least squares

Curve fitting is the process of finding the curve that best approximates a set of points from within a set of curves. The least squares method does this by minimizing the sum of the squares of the differences between the actual and predicted values. The linear least squares method, which is used here, restricts the set of curves to linear combinations of a set of basis functions.

Linear least squares problems can be solved using standard methods of linear algebra. In mathematical terms, the linear least squares problem corresponds to minimizing the length of a vector

*Ax - b*

where A is a matrix containing the values of the basis functions at the x-coordinates of the data points. b is a vector containing the y-values of the data points. x is a vector containing the unknown coefficients of the basis function in the best fit linear combination.

This problem can be solved in a variety of ways. The simplest is by solving the normal equations:

*ATAx = *AT*b*

This is a system of simultaneous linear equations. It can be shown that the solution to this equation is also the solution to the least squares problem.

Basis functions

Any set of functions can be used as basis functions. In practice, however, certain common functions are used. Often, the basis functions are chosen to reflect properties of the relationship that is being investigated. The unknown parameters of the model describing that relationship correspond to the coefficients of the least squares fit.

The simplest of these is a constant function. The resulting ‘fit’ is simply the mean of the y-values of the data points.

To fit a straight line through a set of points, the basis functions are a constant function and the function f(x) = x. The result is a linear model of the form y = ax + b.

To fit a polynomial, the basis functions are the ‘monomials’ 1, x, x2, x3, and so on, up to a certain degree. Polynomials are often used because they have such a simple form.

Instead of using monomials, Chebyshev polynomials can also be used as basis functions for polynomial fitting. Chebyshev polynomials are a special kind of polynomial with some very desirable properties.

  • They are mutually orthogonal, which means that calculations should be more accurate as round-off error is reduced.

  • They also oscillate very evenly, which usually results in decreasing coefficients as the degree of the polynomial increases. With ordinary polynomial fits, the coefficients often show wild oscillations, further decreasing their accuracy.

Mathematically, curve fitting with ordinary polynomials and with Chebyshev polynomials produce exactly the same result. In practice, however, the Chebyshev method is clearly superior.

How To Use The Program

On startup, the program window shows a blank graph on the left and a tabbed input/output panel on the right.

Initial view of the Curve Fitting sample application.

Clicking anywhere within the graph area selects a new data point, marked by a black dot. You can select up to 20 data points. The Reset button clears all data points.

The input panel lets you select which type of curve you want to fit to the data points. The options are:

Value Description
Constant A horizontal line.
Line A line with arbitrary slope.
Quadratic A quadratic curve of the form y = ax2 + bx + c
Polynomial A polynomial of the specified degree.
ChebyshevSeries A combination of Chebyshev polynomials up to the specified degree.
Custom A set of functions.

When Polynomial or ChebyshevSeries is chosen, you must specify the degree of the approximation. The default is 4. Note that both these options yield essentially the same curve, but in a different representation.

When Custom is selected, a list of combo boxes appears that lets you choose up to five functions from a list:

Options when fitting a custom set of functions.

The functions available are: constant, identity (x), sin x, cos x, exp x (ex), erfc x (the complimentary error function), and J0(x) (the ordinary Bessel function of the first kind of order 0).

Pressing the ‘Fit’ button calculates the fit. The best fit curve is drawn on the graph area as a purple line. The coefficients are available from the Output tab.

We pointed out earlier that polynomials and Chebyshev expansions produce the same mathematical curve but Chebyshev expansions are more desirable numerically. To illustrate this point, the table below lists the coefficients of the 8th degree polynomial and Chebyshev expansions for a set of data points:

Degree Polynomial coefficient Chebyshev coefficient
0 +9.77626290245595 +3.71069905169146
1 +95.7239111394749 +1.92780480809127
2 -221.123267897382 -1.36134951909525
3 +233.06753396019 +1.9297471097274
4 -133.066620592025 -1.1546134638987
5 +44.1694446535232 +1.75613756606959
6 -8.5637324626004 -2.04416637823348
7 +0.902201184865039 +0.982129798914638
8 -0.0399611216231435 -0.476413261410861

As you can see, the polynomial coefficients take on large values, up to 100 times the actual size of the result. Inevitably, two or more digits of precision will be lost in the calculation. By contrast, the coefficients of the Chebyshev series are all of the same order. In addition, it turns out that it is impossible to calculate higher degree polynomials due to round-off error. Trying to fit the same data set to a 12th degree polynomial results in the following error message:

Error message when trying to fit a polynomial of high order.

The Chebyshev expansion poses no such problem up to and including the maximum degree of 19.

The Sample Code

A complete code walkthrough would be beyond the purpose of this document. We will cover the numerical aspects only.

Representation of data points

The data points are kept in two double arrays, x and y. When the user clicks in the chart area, the screen coordinates are transformed to values in the range [0, 5]. The _numberOfDataPoints keeps track of the total number of data points.

Encapsulating Functions

In computer science, the term function usually refers to a subroutine that may take some arguments and returns a value. The concept was derived from the mathematical function, which has the same meaning at its core. The major difference between the two concepts is that in computer science, what matters is to produce the result in the fastest and most economical way possible, whereas in mathematics, the function’s properties and relationships to other functions are more important.

Centuries of study of the properties of mathematical functions have yielded many derived and related properties that have no parallel in computer science. Derivatives and integrals are perhaps the most common examples. Functions or methods in the Common Language Runtime aren’t suited to represent mathematical functions together with their properties. Mathematical functions are more like objects than they are methods of an object or class.

In Numerics.NET, the Curve class and its descendants represent mathematical functions. The name Curve was chosen over the more obvious Function because the latter is a reserved word in Visual Basic .NET and other languages. Curve classes come with methods to evaluate the function, its derivative and integral, and to find the roots or zeroes of a curve.

Curves also have parameters that determine the exact shape of the curve. For instance, a Line has both a slope and a y intercept (the point where it crosses the y-axis). All operations on lines can be expressed in terms of these two parameters, and any set of parameters yields a different line.

The algorithmic aspect of evaluating mathematical functions is encapsulated in a series of delegates with various signatures. For example, a Func<double,double> delegate essentially contains a reference to a method that takes one real (double) argument and returns a real number. When you only need to evaluate a mathematical function, and don’t need the rich functionality that comes with the Curve class, use a Func<double,double> delegate.

Computing the Least Squares fit

Curve fitting is the process of finding the curve that best approximates a set of points from within a set of curves which are linear combinations of a set of basis functions. The FunctionBasis class represents such a set of basis functions.

Computing the least squares fit is done in two steps:

  1. A FunctionBasis of the appropriate type is created to respresent the set of basis functions.
  2. The LeastSquaresFit method is called on the FunctionBasis object. This creates a LinearCombination object, which represents a curve that is a linear combination of the basis functions.

Specialized classes that inherit from FunctionBasis exist for polynomials, Chebyshev expansions, and collections of arbitrary functions.

The PolynomialBasis class represents a set of monomials, which form a basis for polynomials up to a specified degree. Constants, lines, and quadratic curves are special cases of polynomials of degree 0, 1, and 2, respectively. The global variable _degree is set to this value in the SelectedIndexChanged event handler of the cboCurveType combo box. For general polynomials, the degree is fetched from the textbox and validated. This value is then passed on to the constructor for the PolynomialBasis class as follows:

_basis = new PolynomialBasis(_degree);

The ChebyshevBasis class represents a set of Chebyshev polynomials up to a specified degree. Chebyshev polynomials only have their excellent properties over the interval [-1, +1]. The domain of our fit runs from 0 to 5. Fortunately, the ChebyshevBasis class can rescale the Chebyshev polynomials so that they retain these properties over any chosen interval. We must pass the interval to the constructor. The degree is once again fetched from the textbox control.

_basis = new ChebyshevBasis(0, 5, _degree);

For custom basis functions, we first create an array of Func<double, double> delegates containing the basis functions. The elements of the _customFunctions array are set by the SelectedIndexChanged handler of the combo boxes. This handler also keeps track of the _numberOfCustomFunctions variable. To get the array of basis functions, we simply iterate through the _customFunctions array and fill our basis with the elements that are not null. We then pass this array on to the GeneralFunctionBasis constructor.

Func<double, double>[] basisFunctions =
    new Func<double, double>[_numberOfCustomFunctions];
Int32 current = 0;
for(Int32 index = 0; index < 5; index++)
    if (_customFunctions[index] != null)
    basisFunctions[current++] = _customFunctions[index];
_basis = new GeneralFunctionBasis(basisFunctions);

The actual curve fitting is performed by the LeastSquaresFit method of the FunctionBasis object. The coefficients are retrieved through the Parameters method of the resulting Curve.

_fit = _basis.LeastSquaresFit(vx, vy);
listParameters.DataSource = _fit.GetParameters();