You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Sentaro Onizuka <on...@gmail.com> on 2023/03/06 10:22:10 UTC

[GSoC2023] About “Add support for extended precision floating-point numbers” project

To whom it may concern,

About “Add support for extended precision floating-point numbers”
project of Google Summer of Code 2023.

My name is Sentaro Onizuka.
I emailed several days ago about my interest in this project and
received the details and related resources.
And I read the resources provided and would like to discuss the projects.

Here is what I understood based on the resource
・ I understood how the user-friendly API is implemented by looking at
what is already implemented in the Commons project (especially the Sum
class that uses the builder pattern).
・ I read David Bailey's paper on the QD library and understood its
algorithms. I also briefly looked over its C++ implementation.
・ I read the DD class[2] and understood the double-double
implementation and API.

What I would like to discuss is the API for a double-double. In
particular, how I can extend it to a more user-friendly API.
I have two ideas on this.

・ In the DD class API, simplify adding multiple DD values.
     e.g., compute the sum a1 + a2 + a3 (ai = {ai_h, ai_l}, ai == ai_h + ai_l)
  In the current DD class
         DD dd = DD.create(a1[0], a1[1]);
         dd = DD.add(dd.hi(),dd.lo(), a2[0], a2[1], dd);
         dd = DD.add(dd.hi(),dd.lo(), a3[0], a3[1], dd);
         double result = dd.doubleValue();
  In contrast, we implement the varargs factory method, which allows
writing the following.
         double a = {a1, a2, a3};
         double result = DD.of(a).doubleValue();
  The implementation is assumed to be as follows, referring to the Sum class.
         public static DD of(double[][] values){
             return create().add(values);
         }
         public DD add(final double[][] terms) {
             for (double[] t : terms) {
                assert i.length == 2 : “ERROR”;
                add(t[0], t[1]);
            }
            return this;
         }
         public DD add(double x, double xx){
             // Adds a single term to this DD.
             return this;
         }

・ Implement QD as well as DD. As briefly mentioned in the David Bailey
paper, for many applications, the use of DD or QD is sufficient.
Therefore, I do not think implementing arbitrary-length floating-point
numbers is necessary.

And my question is, what specific extensions do you think are needed
regarding the existing double-double API?
Also, how about my ideas on extending the API to be more
user-friendly? Am I on the right way?


Regards,
Sentaro Onizuka

[1] commons-numbers-core/src/main/java/org/apache/commons/numbers/core/Sum.java
[2] commons-statistics-inference/src/main/java/org/apache/commons/statistics/inference/DD.java

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[numbers][GSoC2023] About “Add support for extended precision floating-point numbers” project

Posted by Alex Herbert <al...@gmail.com>.
FYI: I added the [numbers] prefix to the subject line.

<!--snip -->

> > ・ I read David Bailey's paper on the QD library and understood its
> > algorithms. I also briefly looked over its C++ implementation.

This is where I would start with an API design. E.g. what OO API does
the c++ reference implementation provide?

I do not think we want to replace the functionality in the Sum class.
This is a specialised class for linear combinations. The DD class
would be a more general number to be used as you would use a double or
a BigDecimal. I would imagine the API would consist of methods acting
on the current instance and returning a new instance:

DD add(DD)
DD subtract(DD)
DD multiply(DD)
DD divide(DD)

Overrides could be provided for int/long/double arguments as these
have useful simplifications over adding a full double-double number.

Other methods from the current statistics implementation are:

DD pow(int)
DD inverse()

DD ldexp(int)
(int, DD) frexp()  ???

The later are useful for scaling where the exponent range of a double
is effectively limited to [-1074, 1023] in base 2; without sub-normal
numbers this is -1022.

The frexp method is somewhat problematic as there are two return
values. A new normalised fraction in [0.5, 1) and the base 2 scale of
the normalised number. It could be implemented as:

DD frexp(int[] exp)

Other methods for a number could be e.g.:

DD negate()

> > ・ Implement QD as well as DD. As briefly mentioned in the David Bailey
> > paper, for many applications, the use of DD or QD is sufficient.
> > Therefore, I do not think implementing arbitrary-length floating-point
> > numbers is necessary.

Perfectly fine. Other variants can be added later if required.

> > And my question is, what specific extensions do you think are needed
> > regarding the existing double-double API?

The key point of the current API is that it requires no memory
allocation within the class. As such the class has been written to be
mutable. All methods act on primitives and write results to an output
argument. However this does not fully encapsulate the functionality
and methods may be called with arguments that are not normalised
double-double numbers. As such it is open to incorrect usage. For a
public class this either should not be done, or it should be provided
as an alternative to the friendly OO API for advanced usage.

I wrote the implementation this way to avoid memory allocation of a
new object for every operation. I do not know if it actually impacts
the performance. A first project would be to: copy the entire DD class
into a JMH project; add OO methods that create a new instance for all
operations; and copy/adapt the Kolmogorov-Smirnov p-value computation
from statistics for the non-OO and OO API. For a large p-value
computation the current method takes up to 1 second and would create
approximately 50 million objects for intermediate computations (my
guesstimate). The present implementation creates 3 objects. It would
be a useful test to determine if this object creation and garbage
collection affects the performance.

> > Also, how about my ideas on extending the API to be more
> > user-friendly? Am I on the right way?

Perhaps focus on what API is in the c++ library. I imagine this is
fairly mature and will provide a good example of an API of how to
manipulate a custom number implementation.

> Sure; your questions are certainly part of the issues that need
> clarification.
> However, besides the functionality itself, there is the question of
> how it fits within the (math-related) components' "eco-system".
> By this I mean that the code should be moved to "[Numbers]", but
> where?  In the "core" module (where "Sum" is implemented), or in
> a new module of its own (e.g. on the assumption that it may be
> too specific a utility)?
> For one thing, the "Statistics" component will surely depend on
> that utility; hence, porting the existing code to "[Numbers]" might
> be your first pull request[1] (after ensuring locally that the calling
> code is "Statistics" still works as it used to).
>
> Thus, please create a JIRA report[2] to further elaborate on this.[3]

I think a new numbers module for a DD implementation makes sense. We
already have modules for fractions and complex numbers.

Feel free to add some comments on the initial Jira ticket summarising
this direction. We can then create sub-tickets for tasks that you wish
to tackle (e.g. JMH benchmark the current DD class; describe an
initial API for a DD class).

Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [GSoC2023] About “Add support for extended precision floating-point numbers” project

Posted by Gilles Sadowski <gi...@gmail.com>.
Hi.

Le lun. 6 mars 2023 à 11:22, Sentaro Onizuka <on...@gmail.com> a écrit :
>
> To whom it may concern,

Usually, the target audience would recognize itself through an
appropriate prefix in the "Subject:" line of the email.
In this case, instead of "[GSoC2023]", it should probably be
"[Numbers]", referring to the project (a.k.a. "component") where
the code being discussed, belongs (or should/will belong, in this
case, since the current implementation is in "Statistics").

>
> About “Add support for extended precision floating-point numbers”
> project of Google Summer of Code 2023.
>
> My name is Sentaro Onizuka.
> I emailed several days ago about my interest in this project and
> received the details and related resources.
> And I read the resources provided and would like to discuss the projects.

Many thanks for your interest and initial research into the subject!

Posting to this list is the right thing to do in order to draw attention
to a contribution proposal.

However, beyond a general discussion (e.g. whether the proposal
is within, or out of, the scope of project), the detailed plan of action
(including code excerpts) is better maintained through a dedicated
report page on the bug-tracking system (JIRA).

>
> Here is what I understood based on the resource
> ・ I understood how the user-friendly API is implemented by looking at
> what is already implemented in the Commons project (especially the Sum
> class that uses the builder pattern).
> ・ I read David Bailey's paper on the QD library and understood its
> algorithms. I also briefly looked over its C++ implementation.
> ・ I read the DD class[2] and understood the double-double
> implementation and API.
>
> What I would like to discuss is the API for a double-double. In
> particular, how I can extend it to a more user-friendly API.
> I have two ideas on this.
>
> ・ In the DD class API, simplify adding multiple DD values.
>      e.g., compute the sum a1 + a2 + a3 (ai = {ai_h, ai_l}, ai == ai_h + ai_l)
>   In the current DD class
>          DD dd = DD.create(a1[0], a1[1]);
>          dd = DD.add(dd.hi(),dd.lo(), a2[0], a2[1], dd);
>          dd = DD.add(dd.hi(),dd.lo(), a3[0], a3[1], dd);
>          double result = dd.doubleValue();
>   In contrast, we implement the varargs factory method, which allows
> writing the following.
>          double a = {a1, a2, a3};
>          double result = DD.of(a).doubleValue();
>   The implementation is assumed to be as follows, referring to the Sum class.
>          public static DD of(double[][] values){
>              return create().add(values);
>          }
>          public DD add(final double[][] terms) {
>              for (double[] t : terms) {
>                 assert i.length == 2 : “ERROR”;
>                 add(t[0], t[1]);
>             }
>             return this;
>          }
>          public DD add(double x, double xx){
>              // Adds a single term to this DD.
>              return this;
>          }
>
> ・ Implement QD as well as DD. As briefly mentioned in the David Bailey
> paper, for many applications, the use of DD or QD is sufficient.
> Therefore, I do not think implementing arbitrary-length floating-point
> numbers is necessary.
>
> And my question is, what specific extensions do you think are needed
> regarding the existing double-double API?
> Also, how about my ideas on extending the API to be more
> user-friendly? Am I on the right way?

Sure; your questions are certainly part of the issues that need
clarification.
However, besides the functionality itself, there is the question of
how it fits within the (math-related) components' "eco-system".
By this I mean that the code should be moved to "[Numbers]", but
where?  In the "core" module (where "Sum" is implemented), or in
a new module of its own (e.g. on the assumption that it may be
too specific a utility)?
For one thing, the "Statistics" component will surely depend on
that utility; hence, porting the existing code to "[Numbers]" might
be your first pull request[1] (after ensuring locally that the calling
code is "Statistics" still works as it used to).

Thus, please create a JIRA report[2] to further elaborate on this.[3]

Anyways, thanks for posting here; it was an opportunity to provide
general information to every new contributor (particularly in view of
of potential GSoC applications).

Regards,
Gilles

[1] Perhaps we'll create a dedicated branch in the repository (so
     that we don't mess with "master" until API issues are settled).
[2] https://issues.apache.org/jira/browse/NUMBERS
[3] Within JIRA, it's also possible to create "sub-tasks" (e.g. to
     define "milestones").

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org