You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Fan Liya <li...@gmail.com> on 2019/08/15 02:26:10 UTC

[DISCUSS][Java] Provide an interface for numeric vectors

Dear all,



We want to provide an interface for all vectors with numeric types (small
int, float4, float8, etc). This interface will make it convenient for many
operations on a vector, like average, sum, variance, etc. With this
interface, the client code will be greatly simplified, with many
branches/switch removed.



The design is similar to BaseIntVector (the interface for all integer
vectors). We provide 3 methods for setting & getting numeric values:

         setWithPossibleRounding

         setSafeWithPossibleRounding

         getValueAsDouble



Please give some comments. Thanks a lot.



Best,

Liya Fan

Re: [DISCUSS][Java] Provide an interface for numeric vectors

Posted by Fan Liya <li...@gmail.com>.
As suggested by Micah, there are some problems with the common interface
for numeric types:
1. corner cases for transforming special floating point values, like NaN,
inf, etc.
2. loss of precision when transforming long to double.

Instead, we provide a common interface for float4 & float8 values.
The purpose is similar, to reduce unnecessary branch/swtich statements in
the code.
Please take a look https://issues.apache.org/jira/browse/ARROW-6247

Best,
Liya Fan

On Thu, Aug 15, 2019 at 1:41 PM Fan Liya <li...@gmail.com> wrote:

> Hi Micah,
>
> Thanks for the good points.
> I agree with you that we should improve the efficiency of algorithms.
>
> This is related to another improvement: reduce the if/switch statements in
> the code.
>
> To account for the edge cases, can we remove the set methods, and leaving
> only the get method?
> This is because for some scenarios, we have vectors that can either be int
> vectors or float vectors.
>
> The common interface for float4 and float8 sounds good. Let's do it in
> another issue.
>
> Best,
> Liya FAn
>
> On Thu, Aug 15, 2019 at 12:49 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
>> Hi Liya Fan,
>> I'm not sure if this is a good idea.  First, floating point operations
>> have
>> more edge cases than integer arithmetic (e.g. dealing with NaNs).  Second,
>> and I apologize that I've been remiss in thinking this through on reviews,
>> but I think we should be thinking about how to make algorithms/operations
>> as efficient as possible.  In this regards putting everything behind an
>> interface prevents the JVM JIT from inlining them effectively.
>>
>> It might not be a bad idea to have a have a common interface for just the
>> Float4 and Float8 vectors, but I'd like to get other peoples thoughts on
>> this.
>>
>> Thanks,
>> Micah
>>
>> On Wed, Aug 14, 2019 at 7:26 PM Fan Liya <li...@gmail.com> wrote:
>>
>> > Dear all,
>> >
>> >
>> >
>> > We want to provide an interface for all vectors with numeric types
>> (small
>> > int, float4, float8, etc). This interface will make it convenient for
>> many
>> > operations on a vector, like average, sum, variance, etc. With this
>> > interface, the client code will be greatly simplified, with many
>> > branches/switch removed.
>> >
>> >
>> >
>> > The design is similar to BaseIntVector (the interface for all integer
>> > vectors). We provide 3 methods for setting & getting numeric values:
>> >
>> >          setWithPossibleRounding
>> >
>> >          setSafeWithPossibleRounding
>> >
>> >          getValueAsDouble
>> >
>> >
>> >
>> > Please give some comments. Thanks a lot.
>> >
>> >
>> >
>> > Best,
>> >
>> > Liya Fan
>> >
>>
>

Re: [DISCUSS][Java] Provide an interface for numeric vectors

Posted by Fan Liya <li...@gmail.com>.
Hi Micah,

Thanks for the good points.
I agree with you that we should improve the efficiency of algorithms.

This is related to another improvement: reduce the if/switch statements in
the code.

To account for the edge cases, can we remove the set methods, and leaving
only the get method?
This is because for some scenarios, we have vectors that can either be int
vectors or float vectors.

The common interface for float4 and float8 sounds good. Let's do it in
another issue.

Best,
Liya FAn

On Thu, Aug 15, 2019 at 12:49 PM Micah Kornfield <em...@gmail.com>
wrote:

> Hi Liya Fan,
> I'm not sure if this is a good idea.  First, floating point operations have
> more edge cases than integer arithmetic (e.g. dealing with NaNs).  Second,
> and I apologize that I've been remiss in thinking this through on reviews,
> but I think we should be thinking about how to make algorithms/operations
> as efficient as possible.  In this regards putting everything behind an
> interface prevents the JVM JIT from inlining them effectively.
>
> It might not be a bad idea to have a have a common interface for just the
> Float4 and Float8 vectors, but I'd like to get other peoples thoughts on
> this.
>
> Thanks,
> Micah
>
> On Wed, Aug 14, 2019 at 7:26 PM Fan Liya <li...@gmail.com> wrote:
>
> > Dear all,
> >
> >
> >
> > We want to provide an interface for all vectors with numeric types (small
> > int, float4, float8, etc). This interface will make it convenient for
> many
> > operations on a vector, like average, sum, variance, etc. With this
> > interface, the client code will be greatly simplified, with many
> > branches/switch removed.
> >
> >
> >
> > The design is similar to BaseIntVector (the interface for all integer
> > vectors). We provide 3 methods for setting & getting numeric values:
> >
> >          setWithPossibleRounding
> >
> >          setSafeWithPossibleRounding
> >
> >          getValueAsDouble
> >
> >
> >
> > Please give some comments. Thanks a lot.
> >
> >
> >
> > Best,
> >
> > Liya Fan
> >
>

Re: [DISCUSS][Java] Provide an interface for numeric vectors

Posted by Micah Kornfield <em...@gmail.com>.
Hi Liya Fan,
I'm not sure if this is a good idea.  First, floating point operations have
more edge cases than integer arithmetic (e.g. dealing with NaNs).  Second,
and I apologize that I've been remiss in thinking this through on reviews,
but I think we should be thinking about how to make algorithms/operations
as efficient as possible.  In this regards putting everything behind an
interface prevents the JVM JIT from inlining them effectively.

It might not be a bad idea to have a have a common interface for just the
Float4 and Float8 vectors, but I'd like to get other peoples thoughts on
this.

Thanks,
Micah

On Wed, Aug 14, 2019 at 7:26 PM Fan Liya <li...@gmail.com> wrote:

> Dear all,
>
>
>
> We want to provide an interface for all vectors with numeric types (small
> int, float4, float8, etc). This interface will make it convenient for many
> operations on a vector, like average, sum, variance, etc. With this
> interface, the client code will be greatly simplified, with many
> branches/switch removed.
>
>
>
> The design is similar to BaseIntVector (the interface for all integer
> vectors). We provide 3 methods for setting & getting numeric values:
>
>          setWithPossibleRounding
>
>          setSafeWithPossibleRounding
>
>          getValueAsDouble
>
>
>
> Please give some comments. Thanks a lot.
>
>
>
> Best,
>
> Liya Fan
>