You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Alex Herbert <al...@gmail.com> on 2023/03/06 17:50:09 UTC

[numbers][GSoC2023] About “Add support for extended precision floating-point numbers” project

FYI: I added the [numbers] prefix to the subject line.

<!--snip -->

> > ・ I read David Bailey's paper on the QD library and understood its
> > algorithms. I also briefly looked over its C++ implementation.

This is where I would start with an API design. E.g. what OO API does
the c++ reference implementation provide?

I do not think we want to replace the functionality in the Sum class.
This is a specialised class for linear combinations. The DD class
would be a more general number to be used as you would use a double or
a BigDecimal. I would imagine the API would consist of methods acting
on the current instance and returning a new instance:

DD add(DD)
DD subtract(DD)
DD multiply(DD)
DD divide(DD)

Overrides could be provided for int/long/double arguments as these
have useful simplifications over adding a full double-double number.

Other methods from the current statistics implementation are:

DD pow(int)
DD inverse()

DD ldexp(int)
(int, DD) frexp()  ???

The later are useful for scaling where the exponent range of a double
is effectively limited to [-1074, 1023] in base 2; without sub-normal
numbers this is -1022.

The frexp method is somewhat problematic as there are two return
values. A new normalised fraction in [0.5, 1) and the base 2 scale of
the normalised number. It could be implemented as:

DD frexp(int[] exp)

Other methods for a number could be e.g.:

DD negate()

> > ・ Implement QD as well as DD. As briefly mentioned in the David Bailey
> > paper, for many applications, the use of DD or QD is sufficient.
> > Therefore, I do not think implementing arbitrary-length floating-point
> > numbers is necessary.

Perfectly fine. Other variants can be added later if required.

> > And my question is, what specific extensions do you think are needed
> > regarding the existing double-double API?

The key point of the current API is that it requires no memory
allocation within the class. As such the class has been written to be
mutable. All methods act on primitives and write results to an output
argument. However this does not fully encapsulate the functionality
and methods may be called with arguments that are not normalised
double-double numbers. As such it is open to incorrect usage. For a
public class this either should not be done, or it should be provided
as an alternative to the friendly OO API for advanced usage.

I wrote the implementation this way to avoid memory allocation of a
new object for every operation. I do not know if it actually impacts
the performance. A first project would be to: copy the entire DD class
into a JMH project; add OO methods that create a new instance for all
operations; and copy/adapt the Kolmogorov-Smirnov p-value computation
from statistics for the non-OO and OO API. For a large p-value
computation the current method takes up to 1 second and would create
approximately 50 million objects for intermediate computations (my
guesstimate). The present implementation creates 3 objects. It would
be a useful test to determine if this object creation and garbage
collection affects the performance.

> > Also, how about my ideas on extending the API to be more
> > user-friendly? Am I on the right way?

Perhaps focus on what API is in the c++ library. I imagine this is
fairly mature and will provide a good example of an API of how to
manipulate a custom number implementation.

> Sure; your questions are certainly part of the issues that need
> clarification.
> However, besides the functionality itself, there is the question of
> how it fits within the (math-related) components' "eco-system".
> By this I mean that the code should be moved to "[Numbers]", but
> where?  In the "core" module (where "Sum" is implemented), or in
> a new module of its own (e.g. on the assumption that it may be
> too specific a utility)?
> For one thing, the "Statistics" component will surely depend on
> that utility; hence, porting the existing code to "[Numbers]" might
> be your first pull request[1] (after ensuring locally that the calling
> code is "Statistics" still works as it used to).
>
> Thus, please create a JIRA report[2] to further elaborate on this.[3]

I think a new numbers module for a DD implementation makes sense. We
already have modules for fractions and complex numbers.

Feel free to add some comments on the initial Jira ticket summarising
this direction. We can then create sub-tickets for tasks that you wish
to tackle (e.g. JMH benchmark the current DD class; describe an
initial API for a DD class).

Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org