You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Fernando Herrera <fe...@gmail.com> on 2021/01/27 11:27:19 UTC

[RUST] Implement value function with Array trait

Hi,

I'm wondering if it has been considered to move the value function that is
implemented in all the arrays (StringArray, BooleanArray, ListArray, etc)
as part of the Array trait?

This would help when extracting values from generic arrays that implement
dyn Array without having to manually downcast the array all the time to
read a value from the array.

Thanks,

Re: [RUST] Implement value function with Array trait

Posted by Fernando Herrera <fe...@gmail.com>.
I see what you mean. I was thinking that the function signature would have
to be something like this:

trait Array<T> {
>    fn value(&self) -> T
> }


Where T would have to implement another trait, call it ValueTrait, in order
to define how to extract the different values types, e.g. &str, u32, etc.
But as you said, the access to that singular value would be slower
(especially if you want to get multiple values) than downcasting the whole
column to access the values.

I will keep looking at DataFusion to understand how a column is downcasted
automatically based on a RecordBatch schema

Thanks,

On Wed, Jan 27, 2021 at 11:03 PM Andrew Lamb <al...@influxdata.com> wrote:

> I think the idea is enticing, but it comes with some challenges:
>
> 1. Rust is strongly typed so when extracting values we would likely need a
> `Scalar` type enum or multiple different `value_bool`, `value_u64` type
> functions
> 2. Such access would likely be much slower (though possible more
> convenient) as it would dispatch based on type for each row (whereas the
> downcast_as pattern does that dispatch once per array)
>
> Andrew
>
> On Wed, Jan 27, 2021 at 6:27 AM Fernando Herrera <
> fernando.j.herrera@gmail.com> wrote:
>
> > Hi,
> >
> > I'm wondering if it has been considered to move the value function that
> is
> > implemented in all the arrays (StringArray, BooleanArray, ListArray, etc)
> > as part of the Array trait?
> >
> > This would help when extracting values from generic arrays that implement
> > dyn Array without having to manually downcast the array all the time to
> > read a value from the array.
> >
> > Thanks,
> >
>

Re: [RUST] Implement value function with Array trait

Posted by Andrew Lamb <al...@influxdata.com>.
I think the idea is enticing, but it comes with some challenges:

1. Rust is strongly typed so when extracting values we would likely need a
`Scalar` type enum or multiple different `value_bool`, `value_u64` type
functions
2. Such access would likely be much slower (though possible more
convenient) as it would dispatch based on type for each row (whereas the
downcast_as pattern does that dispatch once per array)

Andrew

On Wed, Jan 27, 2021 at 6:27 AM Fernando Herrera <
fernando.j.herrera@gmail.com> wrote:

> Hi,
>
> I'm wondering if it has been considered to move the value function that is
> implemented in all the arrays (StringArray, BooleanArray, ListArray, etc)
> as part of the Array trait?
>
> This would help when extracting values from generic arrays that implement
> dyn Array without having to manually downcast the array all the time to
> read a value from the array.
>
> Thanks,
>

Re: [RUST] Implement value function with Array trait

Posted by Fernando Herrera <fe...@gmail.com>.
Thanks Andrew and Jorge for the help.

I think the use of the ScalarValue enum is precisely what I want. I was
worried that downcasting the column every time you need to get a value
would be slow but I can see that you are doing that with the ScalarValue
enum (
https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L83).
That's great.


On Thu, Jan 28, 2021 at 12:21 PM Fernando Herrera <
fernando.j.herrera@gmail.com> wrote:

> In the application I'm working on I'm reading a parquet file and creating
> a table to keep the records in memory.
>
> This gist has the idea of it
> https://gist.github.com/elferherrera/a2a796ae83a7203f58de704c178c44ef
>
> I would like to keep it as pure Arrow because I have found that it is
> super fast to create references to the data and create HashMaps with the
> information read from the parquet. The limitation I have is that I have to
> change the type on the column in the code every time I want to extract data
> from a column that is not a StringArray, either with an iterator or using a
> value method.
>
> I will go through the scalar example you are using in datafusion to
> implement something similar.
>
> Thanks
>
>
> On Thu, Jan 28, 2021 at 12:06 PM Andrew Lamb <al...@influxdata.com> wrote:
>
>> I think this approach would work (and we have something similar in
>> DataFusion (ScalarValue)
>>
>> https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L46
>> -- though it is an enum rather than a Trait, I think the idea is basically
>> the same)
>>
>> I think this API would be reasonable to implement (and I think would be
>> worth considering adding to Arrow for usability), but I fear it will be
>> quite slow as now the program would have to do some sort of type dispatch
>> on each element in an array rather than once for the entire array.
>>
>> On Thu, Jan 28, 2021 at 5:50 AM Fernando Herrera <
>> fernando.j.herrera@gmail.com> wrote:
>>
>> > Hi Jorge,
>> >
>> > What about making the Array::value return a &dyn ValueTrait. This new
>> > ValueTrait would have to be implemented for all the possible values that
>> > can be returned from the arrays
>> >
>> > Fernando
>> >
>> > On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão, <
>> jorgecarleitao@gmail.com
>> > >
>> > wrote:
>> >
>> > > Hi Fernando,
>> > >
>> > > I tried that some time ago, but I was unable to do so. The reason is
>> that
>> > > Array is a trait that needs to support also being a trait object (i.e.
>> > > support `&dyn Array`).
>> > >
>> > > Let's try here: what type should `Array::value` return? One option is
>> to
>> > > make Array a generic. But if Array is a generic, we can't support `dyn
>> > > Array` without declaring its type (e.g. `dyn Array<i32>`), which goes
>> > > against the requirement that we can use `Array` without knowing its
>> > > compile-time type.
>> > >
>> > > If we make the function `value<T>()` a generic without constraints,
>> then
>> > > all concrete arrays (e.g. PrimitiveArray) will need to implement that,
>> > > which is not possible because e.g. `StringArray` does not know how to
>> > yield
>> > > a value of e.g. `f32`.
>> > >
>> > > I also tried a softer version recently: use ListArray<T: Array>, i.e.
>> try
>> > > to change `ListArray` to be a generic over Array and have `values(i)`
>> > > return the concrete type. However, even that does not work because it
>> is
>> > > impossible to tell how nested a ListArray will be until we read the
>> data
>> > > (i.e. after the program was compiled), which means that the compiler
>> will
>> > > be unable to compile all (potentially nested) possible variations of
>> the
>> > > generic.
>> > >
>> > > So, overall, this exercise convinced me that what we have is already
>> the
>> > > simplest (but no simpler) API that we can offer under the
>> requirements we
>> > > have (But I would love to be proven wrong, as I share your concerns)
>> > >
>> > > Best,
>> > > Jorge
>> > >
>> > >
>> > > On Wed, Jan 27, 2021 at 12:27 PM Fernando Herrera <
>> > > fernando.j.herrera@gmail.com> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I'm wondering if it has been considered to move the value function
>> that
>> > > is
>> > > > implemented in all the arrays (StringArray, BooleanArray, ListArray,
>> > etc)
>> > > > as part of the Array trait?
>> > > >
>> > > > This would help when extracting values from generic arrays that
>> > implement
>> > > > dyn Array without having to manually downcast the array all the
>> time to
>> > > > read a value from the array.
>> > > >
>> > > > Thanks,
>> > > >
>> > >
>> >
>>
>

Re: [RUST] Implement value function with Array trait

Posted by Fernando Herrera <fe...@gmail.com>.
In the application I'm working on I'm reading a parquet file and creating a
table to keep the records in memory.

This gist has the idea of it
https://gist.github.com/elferherrera/a2a796ae83a7203f58de704c178c44ef

I would like to keep it as pure Arrow because I have found that it is super
fast to create references to the data and create HashMaps with the
information read from the parquet. The limitation I have is that I have to
change the type on the column in the code every time I want to extract data
from a column that is not a StringArray, either with an iterator or using a
value method.

I will go through the scalar example you are using in datafusion to
implement something similar.

Thanks


On Thu, Jan 28, 2021 at 12:06 PM Andrew Lamb <al...@influxdata.com> wrote:

> I think this approach would work (and we have something similar in
> DataFusion (ScalarValue)
>
> https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L46
> -- though it is an enum rather than a Trait, I think the idea is basically
> the same)
>
> I think this API would be reasonable to implement (and I think would be
> worth considering adding to Arrow for usability), but I fear it will be
> quite slow as now the program would have to do some sort of type dispatch
> on each element in an array rather than once for the entire array.
>
> On Thu, Jan 28, 2021 at 5:50 AM Fernando Herrera <
> fernando.j.herrera@gmail.com> wrote:
>
> > Hi Jorge,
> >
> > What about making the Array::value return a &dyn ValueTrait. This new
> > ValueTrait would have to be implemented for all the possible values that
> > can be returned from the arrays
> >
> > Fernando
> >
> > On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão, <
> jorgecarleitao@gmail.com
> > >
> > wrote:
> >
> > > Hi Fernando,
> > >
> > > I tried that some time ago, but I was unable to do so. The reason is
> that
> > > Array is a trait that needs to support also being a trait object (i.e.
> > > support `&dyn Array`).
> > >
> > > Let's try here: what type should `Array::value` return? One option is
> to
> > > make Array a generic. But if Array is a generic, we can't support `dyn
> > > Array` without declaring its type (e.g. `dyn Array<i32>`), which goes
> > > against the requirement that we can use `Array` without knowing its
> > > compile-time type.
> > >
> > > If we make the function `value<T>()` a generic without constraints,
> then
> > > all concrete arrays (e.g. PrimitiveArray) will need to implement that,
> > > which is not possible because e.g. `StringArray` does not know how to
> > yield
> > > a value of e.g. `f32`.
> > >
> > > I also tried a softer version recently: use ListArray<T: Array>, i.e.
> try
> > > to change `ListArray` to be a generic over Array and have `values(i)`
> > > return the concrete type. However, even that does not work because it
> is
> > > impossible to tell how nested a ListArray will be until we read the
> data
> > > (i.e. after the program was compiled), which means that the compiler
> will
> > > be unable to compile all (potentially nested) possible variations of
> the
> > > generic.
> > >
> > > So, overall, this exercise convinced me that what we have is already
> the
> > > simplest (but no simpler) API that we can offer under the requirements
> we
> > > have (But I would love to be proven wrong, as I share your concerns)
> > >
> > > Best,
> > > Jorge
> > >
> > >
> > > On Wed, Jan 27, 2021 at 12:27 PM Fernando Herrera <
> > > fernando.j.herrera@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm wondering if it has been considered to move the value function
> that
> > > is
> > > > implemented in all the arrays (StringArray, BooleanArray, ListArray,
> > etc)
> > > > as part of the Array trait?
> > > >
> > > > This would help when extracting values from generic arrays that
> > implement
> > > > dyn Array without having to manually downcast the array all the time
> to
> > > > read a value from the array.
> > > >
> > > > Thanks,
> > > >
> > >
> >
>

Re: [RUST] Implement value function with Array trait

Posted by Jorge Cardoso Leitão <jo...@gmail.com>.
I agree with Andrew (as usual) :)

Irrespectively, maybe it is easier if you could describe what you are
trying to accomplish, Fernando. There are possibly other ways of going
about this,
and maybe someone can help by knowing more context.

Best,
Jorge



On Thu, Jan 28, 2021 at 1:06 PM Andrew Lamb <al...@influxdata.com> wrote:

> I think this approach would work (and we have something similar in
> DataFusion (ScalarValue)
>
> https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L46
> -- though it is an enum rather than a Trait, I think the idea is basically
> the same)
>
> I think this API would be reasonable to implement (and I think would be
> worth considering adding to Arrow for usability), but I fear it will be
> quite slow as now the program would have to do some sort of type dispatch
> on each element in an array rather than once for the entire array.
>
> On Thu, Jan 28, 2021 at 5:50 AM Fernando Herrera <
> fernando.j.herrera@gmail.com> wrote:
>
> > Hi Jorge,
> >
> > What about making the Array::value return a &dyn ValueTrait. This new
> > ValueTrait would have to be implemented for all the possible values that
> > can be returned from the arrays
> >
> > Fernando
> >
> > On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão, <
> jorgecarleitao@gmail.com
> > >
> > wrote:
> >
> > > Hi Fernando,
> > >
> > > I tried that some time ago, but I was unable to do so. The reason is
> that
> > > Array is a trait that needs to support also being a trait object (i.e.
> > > support `&dyn Array`).
> > >
> > > Let's try here: what type should `Array::value` return? One option is
> to
> > > make Array a generic. But if Array is a generic, we can't support `dyn
> > > Array` without declaring its type (e.g. `dyn Array<i32>`), which goes
> > > against the requirement that we can use `Array` without knowing its
> > > compile-time type.
> > >
> > > If we make the function `value<T>()` a generic without constraints,
> then
> > > all concrete arrays (e.g. PrimitiveArray) will need to implement that,
> > > which is not possible because e.g. `StringArray` does not know how to
> > yield
> > > a value of e.g. `f32`.
> > >
> > > I also tried a softer version recently: use ListArray<T: Array>, i.e.
> try
> > > to change `ListArray` to be a generic over Array and have `values(i)`
> > > return the concrete type. However, even that does not work because it
> is
> > > impossible to tell how nested a ListArray will be until we read the
> data
> > > (i.e. after the program was compiled), which means that the compiler
> will
> > > be unable to compile all (potentially nested) possible variations of
> the
> > > generic.
> > >
> > > So, overall, this exercise convinced me that what we have is already
> the
> > > simplest (but no simpler) API that we can offer under the requirements
> we
> > > have (But I would love to be proven wrong, as I share your concerns)
> > >
> > > Best,
> > > Jorge
> > >
> > >
> > > On Wed, Jan 27, 2021 at 12:27 PM Fernando Herrera <
> > > fernando.j.herrera@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm wondering if it has been considered to move the value function
> that
> > > is
> > > > implemented in all the arrays (StringArray, BooleanArray, ListArray,
> > etc)
> > > > as part of the Array trait?
> > > >
> > > > This would help when extracting values from generic arrays that
> > implement
> > > > dyn Array without having to manually downcast the array all the time
> to
> > > > read a value from the array.
> > > >
> > > > Thanks,
> > > >
> > >
> >
>

Re: [RUST] Implement value function with Array trait

Posted by Andrew Lamb <al...@influxdata.com>.
I think this approach would work (and we have something similar in
DataFusion (ScalarValue)
https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L46
-- though it is an enum rather than a Trait, I think the idea is basically
the same)

I think this API would be reasonable to implement (and I think would be
worth considering adding to Arrow for usability), but I fear it will be
quite slow as now the program would have to do some sort of type dispatch
on each element in an array rather than once for the entire array.

On Thu, Jan 28, 2021 at 5:50 AM Fernando Herrera <
fernando.j.herrera@gmail.com> wrote:

> Hi Jorge,
>
> What about making the Array::value return a &dyn ValueTrait. This new
> ValueTrait would have to be implemented for all the possible values that
> can be returned from the arrays
>
> Fernando
>
> On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão, <jorgecarleitao@gmail.com
> >
> wrote:
>
> > Hi Fernando,
> >
> > I tried that some time ago, but I was unable to do so. The reason is that
> > Array is a trait that needs to support also being a trait object (i.e.
> > support `&dyn Array`).
> >
> > Let's try here: what type should `Array::value` return? One option is to
> > make Array a generic. But if Array is a generic, we can't support `dyn
> > Array` without declaring its type (e.g. `dyn Array<i32>`), which goes
> > against the requirement that we can use `Array` without knowing its
> > compile-time type.
> >
> > If we make the function `value<T>()` a generic without constraints, then
> > all concrete arrays (e.g. PrimitiveArray) will need to implement that,
> > which is not possible because e.g. `StringArray` does not know how to
> yield
> > a value of e.g. `f32`.
> >
> > I also tried a softer version recently: use ListArray<T: Array>, i.e. try
> > to change `ListArray` to be a generic over Array and have `values(i)`
> > return the concrete type. However, even that does not work because it is
> > impossible to tell how nested a ListArray will be until we read the data
> > (i.e. after the program was compiled), which means that the compiler will
> > be unable to compile all (potentially nested) possible variations of the
> > generic.
> >
> > So, overall, this exercise convinced me that what we have is already the
> > simplest (but no simpler) API that we can offer under the requirements we
> > have (But I would love to be proven wrong, as I share your concerns)
> >
> > Best,
> > Jorge
> >
> >
> > On Wed, Jan 27, 2021 at 12:27 PM Fernando Herrera <
> > fernando.j.herrera@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I'm wondering if it has been considered to move the value function that
> > is
> > > implemented in all the arrays (StringArray, BooleanArray, ListArray,
> etc)
> > > as part of the Array trait?
> > >
> > > This would help when extracting values from generic arrays that
> implement
> > > dyn Array without having to manually downcast the array all the time to
> > > read a value from the array.
> > >
> > > Thanks,
> > >
> >
>

Re: [RUST] Implement value function with Array trait

Posted by Fernando Herrera <fe...@gmail.com>.
Hi Jorge,

What about making the Array::value return a &dyn ValueTrait. This new
ValueTrait would have to be implemented for all the possible values that
can be returned from the arrays

Fernando

On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão, <jo...@gmail.com>
wrote:

> Hi Fernando,
>
> I tried that some time ago, but I was unable to do so. The reason is that
> Array is a trait that needs to support also being a trait object (i.e.
> support `&dyn Array`).
>
> Let's try here: what type should `Array::value` return? One option is to
> make Array a generic. But if Array is a generic, we can't support `dyn
> Array` without declaring its type (e.g. `dyn Array<i32>`), which goes
> against the requirement that we can use `Array` without knowing its
> compile-time type.
>
> If we make the function `value<T>()` a generic without constraints, then
> all concrete arrays (e.g. PrimitiveArray) will need to implement that,
> which is not possible because e.g. `StringArray` does not know how to yield
> a value of e.g. `f32`.
>
> I also tried a softer version recently: use ListArray<T: Array>, i.e. try
> to change `ListArray` to be a generic over Array and have `values(i)`
> return the concrete type. However, even that does not work because it is
> impossible to tell how nested a ListArray will be until we read the data
> (i.e. after the program was compiled), which means that the compiler will
> be unable to compile all (potentially nested) possible variations of the
> generic.
>
> So, overall, this exercise convinced me that what we have is already the
> simplest (but no simpler) API that we can offer under the requirements we
> have (But I would love to be proven wrong, as I share your concerns)
>
> Best,
> Jorge
>
>
> On Wed, Jan 27, 2021 at 12:27 PM Fernando Herrera <
> fernando.j.herrera@gmail.com> wrote:
>
> > Hi,
> >
> > I'm wondering if it has been considered to move the value function that
> is
> > implemented in all the arrays (StringArray, BooleanArray, ListArray, etc)
> > as part of the Array trait?
> >
> > This would help when extracting values from generic arrays that implement
> > dyn Array without having to manually downcast the array all the time to
> > read a value from the array.
> >
> > Thanks,
> >
>

Re: [RUST] Implement value function with Array trait

Posted by Jorge Cardoso Leitão <jo...@gmail.com>.
Hi Fernando,

I tried that some time ago, but I was unable to do so. The reason is that
Array is a trait that needs to support also being a trait object (i.e.
support `&dyn Array`).

Let's try here: what type should `Array::value` return? One option is to
make Array a generic. But if Array is a generic, we can't support `dyn
Array` without declaring its type (e.g. `dyn Array<i32>`), which goes
against the requirement that we can use `Array` without knowing its
compile-time type.

If we make the function `value<T>()` a generic without constraints, then
all concrete arrays (e.g. PrimitiveArray) will need to implement that,
which is not possible because e.g. `StringArray` does not know how to yield
a value of e.g. `f32`.

I also tried a softer version recently: use ListArray<T: Array>, i.e. try
to change `ListArray` to be a generic over Array and have `values(i)`
return the concrete type. However, even that does not work because it is
impossible to tell how nested a ListArray will be until we read the data
(i.e. after the program was compiled), which means that the compiler will
be unable to compile all (potentially nested) possible variations of the
generic.

So, overall, this exercise convinced me that what we have is already the
simplest (but no simpler) API that we can offer under the requirements we
have (But I would love to be proven wrong, as I share your concerns)

Best,
Jorge


On Wed, Jan 27, 2021 at 12:27 PM Fernando Herrera <
fernando.j.herrera@gmail.com> wrote:

> Hi,
>
> I'm wondering if it has been considered to move the value function that is
> implemented in all the arrays (StringArray, BooleanArray, ListArray, etc)
> as part of the Array trait?
>
> This would help when extracting values from generic arrays that implement
> dyn Array without having to manually downcast the array all the time to
> read a value from the array.
>
> Thanks,
>