Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not ignore explicitly given mantissa width #868

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mnieper
Copy link
Contributor

@mnieper mnieper commented Aug 28, 2024

This fixes the issue raised in #866, making mantissa widths meaningful.

@mflatt
Copy link
Contributor

mflatt commented Sep 22, 2024

I haven't yet understand the function completely, but it looks rounder function needs to be a little different to defend against very large bit widths. With this patch, 1|100000000000000000000000000000 runs out of memory, but that reads as 1.0 in current Chez Scheme.

@mnieper
Copy link
Contributor Author

mnieper commented Sep 22, 2024

I haven't yet understand the function completely, but it looks rounder function needs to be a little different to defend against very large bit widths. With this patch, 1|100000000000000000000000000000 runs out of memory, but that reads as 1.0 in current Chez Scheme.

Chez also runs out of memory with #e0.1|10000000000000. That is somewhat unavoidable by the definition of ...|.... That number represents an exact rational number that is the best binary floating-point approximation to 0.1 with 100...000 significant binary digits, and that exact number has a huge numerator and denominator. The only way out here would be to raise an &implementation-restriction for large mantissa widths. Maybe that is the best course of action as mantissa widths in practice are much smaller than these huge numbers.

Your example, on the other hand, is an inexact number, so the final result won't take much space. The question is whether the calculation has to take that much space, especially in the less trivial case 0.1|1000000000000000000000. I would like to get the result of (inexact #e0.1|10000000000000000), which is the best 53-bit approximation to the best 10000000000000-bit approximation to 1/10.

@mflatt
Copy link
Contributor

mflatt commented Sep 22, 2024

Chez also runs out of memory with #e0.1|10000000000000.

Here's what I'm seeing:

Chez Scheme Version 10.1.0-pre-release.2
Copyright 1984-2024 Cisco Systems, Inc.

> #e0.1|10000000000000
1/10

I don't see why there's inherently a problem here. As long as the number written before the | has a tractable number of digits, any request requested additional precision is just zeros, right?

@mnieper
Copy link
Contributor Author

mnieper commented Sep 23, 2024

(Please excuse the delayed answer; it was night here.)

Have you tried #e0.1|100...000 with the (most recent version of the) patch I provided here? For example, I get

> #e0.1|100
1014120480182583521197362564301/10141204801825835211973625643008

and much larger denominators for higher mantissa widths.

It would be wrong to truncate the precision. The reason is that 1/10 has a period of length 4 in binary representation. In fact,

1/10 = 0.00011001100110011001100...

in binary representation. From this, we can deduce that

#e0.1|1 = 0.001
#e0.1|2 = 0.00011
#e0.1|3 = 0.00011
#e0.1|4 = 0.0001101
#e0.1|5 = 0.00011011
...

in binary representation, where I rounded to even. This means that larger and larger denominators (all powers of two) are needed the larger the mantissa width is.

@mflatt
Copy link
Contributor

mflatt commented Sep 23, 2024

Sorry, I misunderstood what you meant by "Chez also", and i was confused about fractions and binary representations. Thank you for the tutorial! It makes sense that #e0.1|10000000000000 runs out of memory.

It still seems like 0.1|10000000000000 should not run out of memory. Is it a matter of setting a ceiling on precision when working toward for an inexact result, or is it more complex than that?

@mflatt
Copy link
Contributor

mflatt commented Sep 23, 2024

Also, the results of #e1|10000000000000 and #e0.25|10000000000000 fit comfortably into memory, so it seems like they should be allowed, too. Is that a matter of detecting a power-of-two denominator, or are the cases when the result is representable more complicated to characterize?

@mnieper
Copy link
Contributor Author

mnieper commented Sep 23, 2024

Sorry, I misunderstood what you meant by "Chez also"

Oh, I see. Indeed, what I wrote wasn't very clear.

It still seems like 0.1|10000000000000 should not run out of memory. Is it a matter of setting a ceiling on precision when working toward for an inexact result, or is it more complex than that?

Consider the following number in binary notation (where N* means to repeat the following binary digit N times):

0.1 52*0 1 N*0 1

Let x be its decimal representation.

If N is sufficiently larger than p and q sufficiently larger than N, we have that #x|p is 0.1 in binary and #x|q is 0.1 51*0 1 in binary. In other words, we cannot simply truncate a huge mantissa width like q to some smaller one p without examining the value of x.

@mnieper
Copy link
Contributor Author

mnieper commented Sep 23, 2024

Also, the results of #e1|10000000000000 and #e0.25|10000000000000 fit comfortably into memory, so it seems like they should be allowed, too. Is that a matter of detecting a power-of-two denominator, or are the cases when the result is representable more complicated to characterize?

I do not have a full characterisation. But a denominator N means that the quotient has a period length of at most N - 1, so one should be able to reduce the case of an arbitrary mantissa width roughly to the case of a mantissa width <= 2*N.

But I wonder whether it makes sense to spend the time getting the details right and writing extra code for huge mantissa widths. In practice, the largest mantissa widths may come from when using libraries like GNU MPFR. Do you think someone would use floats that use megabytes of memory?

@mnieper
Copy link
Contributor Author

mnieper commented Sep 23, 2024

Maybe it is not that complicated to actually implement mantissa truncation.

  1. For exact numbers, this is only possible when the denominator is a power of two because otherwise, we have an infinite period in the binary representation. When the denominator is a power of two, we can truncate at the log2 of the denominator.
  2. For inexact numbers, we calculate the period length P according to the denominator. We then add 53 and the log2 of the numerator, getting some number N. If the mantissa width p is then > N + P, we can replace it with p - P.

The estimates can possibly be off by 1 or 2, but one can code safely.

@mflatt
Copy link
Contributor

mflatt commented Sep 23, 2024

That sounds really great. As we've established, I'm not clear on the math, but it certainly sounds plausible.

My experience with Scheme numbers is that these details end up being worthwhile, even though it means extra code, and even through the happy spaces often end up being complex (e.g., only power-of-two denominators). Unfortunately, my experience with Scheme numbers is also that I have to learn a lot of new things, and then I forget them soon afterward!

Avoiding running out of memory for a very large precision request when
the number with adjusted precision should take about as much memory as
the number without an adjustment.
@mflatt
Copy link
Contributor

mflatt commented Sep 29, 2024

Hi @mnieper — I pushed a commit to add precision bounding in (I think) the way you describe. Does it look right? Do I understand correctly that this captures all of the cases where the end result number can be represented with about the same amount of memory as the number without a precision adjustment?

@mnieper
Copy link
Contributor Author

mnieper commented Sep 29, 2024

Thanks a lot, Matthew. I am going to take a look at it within the next few days. (I wanted to have come up with some code as well but haven't found the time.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants