Arbitrary Precision Floating-Point Numbers

The BigFloat class represents floating-point numbers of arbitrary precision. The range of numbers that can be represented is from roughly 10^-646,000,000 to 10^646,000,000. Note that this range is somewhat smaller than the BigInteger type. The precision can be up to about 20 billion digits.

Accuracy and precision

The accuracy of a number is a measure of how close an approximation is to its actual value. The precision of a number is a measure of the amount of memory used to represent a value.

Floating-point numbers are stored in the form mantissa 2^exponent, where both the mantissa and the exponent are integers. Most numbers cannot be represented exactly in this format. Therefore, to compute and store the exact result of a calculation would generally require infinite time and infinite precision. This is clearly impossible. We need a way to specify the desired accuracy of floating-point calculations.

The AccuracyGoal structure is used to specify the desired accuracy of a calculation. This structure has two special values. InheritAbsolute indicates that the result should be computed with the same number of digits after the decimal point as the arguments. InheritRelative indicates that the result should be computed with the same total number of digits as the arguments. For example, 1.57 has three digits total and two after the decimal point. Computing Tan(1.57) with accuracy goal InheritAbsolute would result in 1255.77. With accuracy goal InheritRelative , the result would be 1.26e+003.

To compute a result with a specific accuracy, create an AccuracyGoal using one of two static methods. Relative creates an accuracy goal with the relative accuracy specified in decimal digits. Absolute creates an accuracy goal with the absolute accuracy specified in decimal digits. The number of digits need not be an integer.

Some operations, like casting an integer to a BigFloat, do not have operands to inherit the precision from. In such cases, the default accuracy goal is used, available through the DefaultAccuracyGoal, property. The default is a relative precision of about 60 decimal digits. This property can not be set to an inherited accuracy goal.

Rounding

When the precision of a number is reduced, a choice must be made how the information in the discarded bits will be used. The RoundingMode enumeration lists the possibilities:

Rounding Mode Values

Field	Description
TowardsNearest	Numbers are rounded to the nearest value. In case of a tie, the last bit of the result is made zero. This is the default.
TowardsNegativeInfinity	All numbers are rounded down.
TowardsPositiveInfinity	All numbers are rounded up.
TowardsZero	Positive numbers are rounded down. Negative numbers are rounded up.

When no rounding mode is specified, the DefaultRoundingMode is used.

Constructing big floating-point numbers

The BigFloat structure has several constructors that construct a floating-point number with the same value as the argument. You can start from 32 and 64 bit integers, single or double-precision numbers, BigInteger values and BigRational values.

Most rational numbers cannot be expressed exactly as a floating-point number. For this reason, a second constructor is provided that takes two additional arguments: a AccuracyGoal value that specifies the desired accuracy of the approximation, and a RoundingMode value that specifies how to round the final approximation.

BigFloat a = new BigFloat(123);
BigFloat b = new BigFloat(3.141592);
AccuracyGoal accuracyGoal = AccuracyGoal.Absolute(50);
BigFloat r = new BigFloat(new BigRational(22, 7), accuracyGoal, RoundingMode.TowardsNearest);

Visual Basic

Dim a As BigFloat = New BigFloat(123)
Dim b As BigFloat = New BigFloat(3.141592)
Dim accuracyGoal As AccuracyGoal = AccuracyGoal.Absolute(50)
Dim r As BigFloat = New BigFloat(New BigRational(22, 7), accuracyGoal, RoundingMode.TowardsNearest)

Visual Basic

No code example is currently available or this language may not be supported.

Visual Basic

let a = new BigFloat(123)
let b = new BigFloat(3.141592)
let accuracyGoal = AccuracyGoal.Absolute(50.0)
let r = new BigFloat(new BigRational(22, 7), accuracyGoal, RoundingMode.TowardsNearest)
//

In addition, several static methods are available. For example, Parse and TryParse create big floats from strings.

Floating-point constants

The BigFloat class provides several constants for commonly used and special floating-point numbers. These are listed in the following table:

Floating-point number constants

Field	Description
Zero	The number zero.
One	The number one.
MinusOne	The number minus one.
MaxValue	The largest possible BigFloat.
MinValue	The smallest possible BigFloat.
PositiveInfinity	Positive infinity.
NegativeInfinity	Negative infinity.
NaN	Not-a-Number value.

The last three values in the above list deserve special attention. These values correspond to the special values defined in the IEEE-754 standard for single and double precision floating-point numbers that defines the behavior of the Single and Double types.

As the name implies, PositiveInfinity represents positive infinity. This value is used to represent numbers that are too large to be represented in the number format, as well as the result of certain operations like 1/0. Likewise, NegativeInfinity represents negative infinity and is used to represent numbers that are too small to be represented in the number format, as well as the result of certain other operations like -1/0.

The NaN field represents Not-a-Number. It is a special value that is returned when the result of an operation is undefined. For example, dividing zero by zero and taking the square root of a negative number both result in NaN. To test whether a number is NaN, use the static IsNaN method.

Working with floating-point numbers

You can work with BigFloat numbers like you would any built-in floating-point type. Like all other arbitrary precision types, big floats are immutable.

One complicating factor is that the precision of BigFloat values is not a constant but depends on how it was constructed or computed. The next section goes into this factor in more depth.

Details of big floating-point arithmetic

Most operations compute a result with the same relative precision as its operands. When two or more operands are involved, the precision is the smaller of the precisions of its argument. For example, the result of multiplying two numbers with 50 and 200 digits of precision, respectively, will have a precision of 50 digits. The result is always rounded to the nearest value.

An important exception is addition and subtraction, which are calculated to be accurate within the smaller absolute accuracy of the operands. Care should be taken when subtracting from integers, which are stored with the default precision by default. For example, the result of BigFloat.One - x*x will have the default precision regardless of the value of x. To prevent this from happening, use the ExtendPrecision method. Note that this method does not modify the instance it is called on but returns a new value.

To allow for maximum flexibility, every computational method has at least two overloads. One overload uses the default accuracy goal and rounding mode. A second overload has two additional arguments that can be used to specify the rounding mode and accuracy goal of the result.

When the result of an operation can not be represented as a finite floating-point number, then the following rules apply. When the result is too large to be represented, the value PositiveInfinity is returned. When the result is too small (i.e. negative and too large in magnitude), NegativeInfinity is returned. When the result is undefined, NaN is returned. When one of the operands is NaN, the result is also NaN. When one or both of the operands of a relational operator is NaN, the result is false. The one exception is the inequality operator, which returns true if both operands are NaN.

Arithmetic operations

Numerics.NET provides methods for all basic arithmetic operators on floating-point numbers. Overloaded versions of the arithmetic operators are provided for languages that support them. These use the default values for rounding mode (towards nearest) and accuracy goal (usually inherit relative). For languages that don't support operator overloading, equivalent static (Shared in Visual Basic) methods are supplied.

Floating-point number operators and their static (Shared) method equivalents

Operator	Static method equivalent	Description
+x	(no equivalent)	Returns the floating-point number x.
-x	Negate	Returns the negation of the floating-point number x.
x1 + x2	BigFloat.Add(x1, x2)	Adds the floating-point numbers x1 and x2.
x + a	BigFloat.Add(x, a)	Adds the floating-point number x and the real number a.
a + x	BigFloat.Add(a, x)	Adds the real number a to the floating-point number x.
x++	(no equivalent)	Increments the floating-point number x by one.
x1 - x2	BigFloat.Subtract(x1, x2)	Subtracts the floating-point numbers x1 and x2.
x - a	BigFloat.Subtract(x, a)	Subtracts the real number a from the floating-point number x.
a - x	BigFloat.Subtract(a, x)	Subtracts the floating-point number x from the real number a.
x--	(no equivalent)	Decrements the floating-point number x by one.
x1 * x2	BigFloat.Multiply(x1, x2)	Multiplies the floating-point numbers x1 and x2.
x * a	BigFloat.Multiply(x, a)	Multiplies the floating-point number x and the real number a.
a * x	BigFloat.Multiply(a, x)	Multiplies the real number a and the floating-point number x.
x1 / x2	BigFloat.Divide(x1, x2)	Divides the floating-point number x1 by x2.
x / a	BigFloat.Divide(x, a)	Divides the floating-point number x by the real number a.
a / x	BigFloat.Divide(a, x)	Divides the real number a by the floating-point number x.

In addition, the relational operators are also available. In a language that does not support custom operators, the Equals or CompareTo method can be used.

BigFloat d = BigFloat.Exp(1);
BigFloat e = BigFloat.Log(2);
BigFloat f = 2 - 3 * (d + e);

Visual Basic

Dim d As BigFloat = BigFloat.Exp(1)
Dim e As BigFloat = BigFloat.Log(2)
Dim f As BigFloat = 2 - 3 * (d + e)

Visual Basic

No code example is currently available or this language may not be supported.

Visual Basic

let d = BigFloat.Exp(BigFloat 1)
let e = BigFloat.Log(BigFloat 2)
let f = 2 - 3 * (d + e)
//

Functions of floating-point numbers

TheBigFloattype defines static methods for the most common mathematical functions of floating-point numbers, including: logarithmic, exponential, trigonometric and hyperbolic functions.

The tables below summarize these methods, and their meaning. Each of these methods is overloaded: two parameters are available that can be used to specify the rounding mode and accuracy goal used to compute the result.

Miscellaneous functions of floating-point numbers.

Method	Description
Abs	The absolute value of the floating point number x.
CopySign	The floating point number x with its sign changed to match y.
Floor	The largest integer less than or equal to the floating-point number x.
Ceiling	The smallest integer greater than or equal to the floating-point number x.
FractionalPart	The fractional part of the floating-point number x. The result is negative if x is negative.
Round	The floating-point number x rounded to the specified number of digits.
ScaleByPowerOfTwo	The floating-point number x multiplied by the specified power of two.
IsPositiveInfinity	Indicates whether the floating-point number x equals positive infinity.
IsNegativeInfinity	Indicates whether the floating-point number x equals negative infinity.
IsNaN	Indicates whether the floating-point number x is Not-a-Number.

Logarithmic and exponential functions of floating-point numbers.

Method	Description
Exp	The number E raised to the power x.
Inverse	The inverse (reciprocal) of the floating-point number x.
Sqrt	The square root of the floating-point number x.
Root	The nth root of the floating-point number x.
Pow	The floating-point number x1 raised to the complex power x2.
Pow	The floating-point number x raised to the integer power n.
Log	Natural logarithm of the floating-point number x.
Log	Base x1 logarithm of the floating-point number x2.

Trigonometric functions of floating-point numbers

Method	Description
GetPi	Gets the number pi to the specified accuracy.
SinCos	Computes the sine and cosine of the floating-point number x.
Sin	Sine of the floating-point number x.
Cos	Cosine of the floating-point number x.
Tan	Tangent of the floating-point number x.
Asin	Inverse sine of the floating-point number x.
Acos	Inverse cosine of the floating-point number x.
Atan	Inverse tangent of the floating-point number x.
Atan2	Inverse tangent of the floating-point number y/x.

Hyperbolic functions of floating-point numbers

Method	Description
Sinh	Hyperbolic sine of the floating-point number x.
Cosh	Hyperbolic cosine of the floating-point number x.
Tanh	Hyperbolic tangent of the floating-point number x.
Asinh	Inverse hyperbolic sine of the floating-point number x.
Acosh	Inverse hyperbolic cosine of the floating-point number x.
Atanh	Inverse hyperbolic tangent of the floating-point number x.

The following, larger example shows how to calculate the number π using the Arithmetic-Geometric Mean (AGM) formula. For details, see for example this paper.

int digits = 100;
AccuracyGoal goal = AccuracyGoal.Absolute(100);
BigFloat x1 = BigFloat.Sqrt(2, goal, RoundingMode.TowardsNearest);
BigFloat x2 = BigFloat.One;
BigFloat S = BigFloat.Zero;
BigFloat c = BigFloat.One;
int k = 0;
while (-c.GetDecimalDigits() < digits)
{
    S += BigFloat.ScaleByPowerOfTwo(c, k - 1);
    BigFloat aMean = BigFloat.ScaleByPowerOfTwo(x1 + x2, -1);
    BigFloat gMean = BigFloat.Sqrt(x1 * x2);
    x1 = aMean;
    x2 = gMean;
    c = (x1 + x2) * (x1 - x2);
    k++;
}
BigFloat pi = x1 * x1 / (1 - S);
Console.WriteLine("Pi = {0:F100}", pi);

Visual Basic

Dim digits As Integer = 100
Dim goal As AccuracyGoal = AccuracyGoal.Absolute(100)
Dim x1 As BigFloat = BigFloat.Sqrt(2, goal, RoundingMode.TowardsNearest)
Dim x2 As BigFloat = BigFloat.One
Dim S As BigFloat = BigFloat.Zero
Dim c As BigFloat = BigFloat.One
Dim k As Integer = 0
While (-c.GetDecimalDigits() < digits)
    S += BigFloat.ScaleByPowerOfTwo(c, k - 1)
    Dim aMean As BigFloat = BigFloat.ScaleByPowerOfTwo(x1 + x2, -1)
    Dim gMean As BigFloat = BigFloat.Sqrt(x1 * x2)
    x1 = aMean
    x2 = gMean
    c = (x1 + x2) * (x1 - x2)
    k = k + 1
End While

Dim pi As BigFloat = x1 * x1 / (1 - S)
Console.WriteLine("Pi = {0:F100}", pi)

Visual Basic

No code example is currently available or this language may not be supported.

Visual Basic

let digits = 100
let goal = AccuracyGoal.Absolute(100.0)
let mutable x1 = BigFloat.Sqrt(BigFloat(2), goal, RoundingMode.TowardsNearest)
let mutable x2 = BigFloat.One
let mutable S = BigFloat.Zero
let mutable c = BigFloat.One
let mutable k = 0
while (-c.GetDecimalDigits() < (float digits)) do
    S <- S + BigFloat.ScaleByPowerOfTwo(c, k - 1)
    let aMean = BigFloat.ScaleByPowerOfTwo(x1 + x2, -1)
    let gMean = BigFloat.Sqrt(x1 * x2)
    x1 <- aMean
    x2 <- gMean
    c <- (x1 + x2) * (x1 - x2)
    k <- k + 1

let pi = x1 * x1 / (1 - S)
printfn $"Pi = {pi:F100}"
//