Module std::num::dec2flt [] [src]

Unstable (dec2flt)

: internal routines only exposed for testing

Converting decimal strings into IEEE 754 binary floating point numbers.

Problem statement

We are given a decimal string such as 12.34e56. This string consists of integral (12), fractional (45), and exponent (56) parts. All parts are optional and interpreted as zero when missing.

We seek the IEEE 754 floating point number that is closest to the exact value of the decimal string. It is well-known that many decimal strings do not have terminating representations in base two, so we round to 0.5 units in the last place (in other words, as well as possible). Ties, decimal values exactly half-way between two consecutive floats, are resolved with the half-to-even strategy, also known as banker's rounding.

Needless to say, this is quite hard, both in terms of implementation complexity and in terms of CPU cycles taken.

Implementation

First, we ignore signs. Or rather, we remove it at the very beginning of the conversion process and re-apply it at the very end. This is correct in all edge cases since IEEE floats are symmetric around zero, negating one simply flips the first bit.

Then we remove the decimal point by adjusting the exponent: Conceptually, 12.34e56 turns into 1234e54, which we describe with a positive integer f = 1234 and an integer e = 54. The (f, e) representation is used by almost all code past the parsing stage.

We then try a long chain of progressively more general and expensive special cases using machine-sized integers and small, fixed-sized floating point numbers (first f32/f64, then a type with 64 bit significand, Fp). When all these fail, we bite the bullet and resort to a simple but very slow algorithm that involved computing f * 10^e fully and doing an iterative search for the best approximation.

Primarily, this module and its children implement the algorithms described in: "How to Read Floating Point Numbers Accurately" by William D. Clinger, available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.4152

In addition, there are numerous helper functions that are used in the paper but not available in Rust (or at least in core). Our version is additionally complicated by the need to handle overflow and underflow and the desire to handle subnormal numbers. Bellerophon and Algorithm R have trouble with overflow, subnormals, and underflow. We conservatively switch to Algorithm M (with the modifications described in section 8 of the paper) well before the inputs get into the critical region.

Another aspect that needs attention is the RawFloat trait by which almost all functions are parametrized. One might think that it's enough to parse to f64 and cast the result to f32. Unfortunately this is not the world we live in, and this has nothing to do with using base two or half-to-even rounding.

Consider for example two types d2 and d4 representing a decimal type with two decimal digits and four decimal digits each and take "0.01499" as input. Let's use half-up rounding. Going directly to two decimal digits gives 0.01, but if we round to four digits first, we get 0.0150, which is then rounded up to 0.02. The same principle applies to other operations as well, if you want 0.5 ULP accuracy you need to do everything in full precision and round exactly once, at the end, by considering all truncated bits at once.

FIXME Although some code duplication is necessary, perhaps parts of the code could be shuffled around such that less code is duplicated. Large parts of the algorithms are independent of the float type to output, or only needs access to a few constants, which could be passed in as parameters.

Other

The conversion should never panic. There are assertions and explicit panics in the code, but they should never be triggered and only serve as internal sanity checks. Any panics should be considered a bug.

There are unit tests but they are woefully inadequate at ensuring correctness, they only cover a small percentage of possible errors. Far more extensive tests are located in the directory src/etc/test-float-parse as a Python script.

A note on integer overflow: Many parts of this file perform arithmetic with the decimal exponent e. Primarily, we shift the decimal point around: Before the first decimal digit, after the last decimal digit, and so on. This could overflow if done carelessly. We rely on the parsing submodule to only hand out sufficiently small exponents, where "sufficient" means "such that the exponent +/- the number of decimal digits fits into a 64 bit integer". Larger exponents are accepted, but we don't do arithmetic with them, they are immediately turned into {positive,negative} {zero,infinity}.

Modules

parse [Unstable]

Validating and decomposing a decimal string of the form:

rawfp [Unstable]

Bit fiddling on positive IEEE 754 floats. Negative numbers aren't and needn't be handled. Normal floating point numbers have a canonical representation as (frac, exp) such that the value is 2exp * (1 + sum(frac[N-i] / 2i)) where N is the number of bits. Subnormals are slightly different and weird, but the same principle applies.

Structs

ParseFloatError

An error which can be returned when parsing a float.