Module std::num::flt2dec
[−]
[src]
flt2dec
): internal routines only exposed for testing
Floating-point number to decimal conversion routines.
Problem statement
We are given the floating-point number v = f * 2^e
with an integer f
,
and its bounds minus
and plus
such that any number between v - minus
and
v + plus
will be rounded to v
. For the simplicity we assume that
this range is exclusive. Then we would like to get the unique decimal
representation V = 0.d[0..n-1] * 10^k
such that:
d[0]
is non-zero.It's correctly rounded when parsed back:
v - minus < V < v + plus
. Furthermore it is shortest such one, i.e. there is no representation with less thann
digits that is correctly rounded.It's closest to the original value:
abs(V - v) <= 10^(k-n) / 2
. Note that there might be two representations satisfying this uniqueness requirement, in which case some tie-breaking mechanism is used.
We will call this mode of operation as to the shortest mode. This mode is used
when there is no additional constraint, and can be thought as a "natural" mode
as it matches the ordinary intuition (it at least prints 0.1f32
as "0.1").
We have two more modes of operation closely related to each other. In these modes
we are given either the number of significant digits n
or the last-digit
limitation limit
(which determines the actual n
), and we would like to get
the representation V = 0.d[0..n-1] * 10^k
such that:
d[0]
is non-zero, unlessn
was zero in which case onlyk
is returned.It's closest to the original value:
abs(V - v) <= 10^(k-n) / 2
. Again, there might be some tie-breaking mechanism.
When limit
is given but not n
, we set n
such that k - n = limit
so that the last digit d[n-1]
is scaled by 10^(k-n) = 10^limit
.
If such n
is negative, we clip it to zero so that we will only get k
.
We are also limited by the supplied buffer. This limitation is used to print
the number up to given number of fractional digits without knowing
the correct k
beforehand.
We will call the mode of operation requiring n
as to the exact mode,
and one requiring limit
as to the fixed mode. The exact mode is a subset of
the fixed mode: the sufficiently large last-digit limitation will eventually fill
the supplied buffer and let the algorithm to return.
Implementation overview
It is easy to get the floating point printing correct but slow (Russ Cox has demonstrated how it's easy), or incorrect but fast (naïve division and modulo). But it is surprisingly hard to print floating point numbers correctly and efficiently.
There are two classes of algorithms widely known to be correct.
The "Dragon" family of algorithm is first described by Guy L. Steele Jr. and Jon L. White. They rely on the fixed-size big integer for their correctness. A slight improvement was found later, which is posthumously described by Robert G. Burger and R. Kent Dybvig. David Gay's
dtoa.c
routine is a popular implementation of this strategy.The "Grisu" family of algorithm is first described by Florian Loitsch. They use very cheap integer-only procedure to determine the close-to-correct representation which is at least guaranteed to be shortest. The variant, Grisu3, actively detects if the resulting representation is incorrect.
We implement both algorithms with necessary tweaks to suit our requirements.
In particular, published literatures are short of the actual implementation
difficulties like how to avoid arithmetic overflows. Each implementation,
available in strategy::dragon
and strategy::grisu
respectively,
extensively describes all necessary justifications and many proofs for them.
(It is still difficult to follow though. You have been warned.)
Both implementations expose two public functions:
format_shortest(decoded, buf)
, which always needs at leastMAX_SIG_DIGITS
digits of buffer. Implements the shortest mode.format_exact(decoded, buf, limit)
, which accepts as small as one digit of buffer. Implements exact and fixed modes.
They try to fill the u8
buffer with digits and returns the number of digits
written and the exponent k
. They are total for all finite f32
and f64
inputs (Grisu internally falls back to Dragon if necessary).
The rendered digits are formatted into the actual string form with four functions:
to_shortest_str
prints the shortest representation, which can be padded by zeroes to make at least given number of fractional digits.to_shortest_exp_str
prints the shortest representation, which can be padded by zeroes when its exponent is in the specified ranges, or can be printed in the exponential form such as1.23e45
.to_exact_exp_str
prints the exact representation with given number of digits in the exponential form.to_exact_fixed_str
prints the fixed representation with exactly given number of fractional digits.
They all return a slice of preallocated Part
array, which corresponds to
the individual part of strings: a fixed string, a part of rendered digits,
a number of zeroes or a small (u16
) number. The caller is expected to
provide a large enough buffer and Part
array, and to assemble the final
string from resulting Part
s itself.
All algorithms and formatting functions are accompanied by extensive tests
in coretest::num::flt2dec
module. It also shows how to use individual
functions.
Modules
decoder |
[Unstable] Decodes a floating-point value into individual parts and error ranges. |
estimator |
[Unstable] The exponent estimator. |
strategy |
[Unstable] Digit-generation algorithms. |
Structs
Decoded |
[Unstable] Decoded unsigned finite value, such that: |
Formatted |
[Unstable] Formatted result containing one or more parts. This can be written to the byte buffer or converted to the allocated string. |
Enums
FullDecoded |
[Unstable] Decoded unsigned value. |
Part |
[Unstable] Formatted parts. |
Sign |
[Unstable] Sign formatting options. |
Constants
MAX_SIG_DIGITS |
[Unstable] The minimum size of buffer necessary for the shortest mode. |
Traits
DecodableFloat |
[Unstable] A floating point type which can be |
Functions
decode |
[Unstable] Returns a sign (true when negative) and |
round_up |
[Unstable] When |
to_exact_exp_str |
[Unstable] Formats given floating point number into the exponential form with
exactly given number of significant digits. The result is stored to
the supplied parts array while utilizing given byte buffer as a scratch.
|
to_exact_fixed_str |
[Unstable] Formats given floating point number into the decimal form with exactly
given number of fractional digits. The result is stored to the supplied parts
array while utilizing given byte buffer as a scratch. |
to_shortest_exp_str |
[Unstable] Formats given floating point number into the decimal form or
the exponential form, depending on the resulting exponent. The result is
stored to the supplied parts array while utilizing given byte buffer
as a scratch. |
to_shortest_str |
[Unstable] Formats given floating point number into the decimal form with at least
given number of fractional digits. The result is stored to the supplied parts
array while utilizing given byte buffer as a scratch. |