You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Simon Dumke <si...@ipp.mpg.de> on 2019/08/30 09:55:36 UTC

Reccord-Level Access

Hi all,




I did not find anything (and so: no definite answer) in the docs, so i 
thought to ask here:




Does Arrow (and at this point my main concern is Arrow for java) support 
any type of concept that allows a "record level access" (so, a "row") to 
data in an Arrow RecordBatch or Table? I would have thougt that even in 
column-oriented analytics etc. this would be a common last step access 
pattern over many use cases, but i could not find any references to such a 
thing.




Thanks and kind regards,
Simon



Re: Reccord-Level Access

Posted by Micah Kornfield <em...@gmail.com>.
That is a good point, I didn't initially think of these because they are
missing an adapter or documentation (creating a top level struct) on how to
make them work with a VectorSchemaRoot.

On Saturday, August 31, 2019, Jacques Nadeau <ja...@apache.org> wrote:

> I'm arrow Java there is a record level accesor facade called FieldReader.
> There is also a record level builder called ComplexWriter. Both allow
> arbitrary complex data to be worked with using method invocations. Won't be
> as efficient as a columnar algorithm but definitely much easier to get
> started. We use a heavily overloaded interface pattern to basically support
> dynamic typing.
>
> Example reader use
> https://github.com/dremio/dremio-oss/blob/master/sabot/
> kernel/src/main/java/com/dremio/exec/vector/complex/fn/JsonWriter.java
>
> Example builder use
> https://github.com/dremio/dremio-oss/blob/master/sabot/
> kernel/src/main/java/com/dremio/exec/vector/complex/fn/JsonReader.java
>
> On Fri, Aug 30, 2019, 8:14 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
>> Hi Simon,
>> A couple notes:
>> - Scalars are a C++ thing, There are ValueHolders in Java but i'm not
>> sure what you want.
>> - Dremio has a JDBC adaptor [1] that might be worth looking at (or maybe
>> porting pieces of it into Arrow).
>>
>> Thanks,
>> Micah
>>
>> [1] https://github.com/dremio/dremio-oss/blob/
>> 04e0387d474f1408731da0029aef7ecfad5e4d08/client/jdbc/src/
>> main/java/com/dremio/jdbc/impl/DremioResultSetImpl.java
>>
>>
>>
>> On Fri, Aug 30, 2019 at 12:03 PM Simon Dumke <si...@ipp.mpg.de>
>> wrote:
>>
>>> Hi Ben,
>>>
>>> thanks for the suggestion, i'll loon into it!
>>>
>>> Regards,
>>> Simon
>>>
>>> Am 30. August 2019 20:59:23 schrieb Ben Kietzman <
>>> ben.kietzman@rstudio.com>:
>>>
>>>> Hi Simon,
>>>>
>>>> If you're interested in adding a record interface, the Scalar classes
>>>> might be a good place to start. They represent a value from an array slot
>>>> and it should be fairly straightforward to extract a table row as a
>>>> StructScalar
>>>>
>>>> On Fri, Aug 30, 2019 at 1:27 PM Wes McKinney <we...@gmail.com>
>>>> wrote:
>>>>
>>>>> hi Simon -- I don't think there is any such Row accessor class in Java
>>>>> but you are welcome to contribute one to the project. For performance
>>>>> sensitive applications, using a record interface might not be the best
>>>>> idea, but I can understand the convenience for some uses cases.
>>>>>
>>>>> - Wes
>>>>>
>>>>> On Fri, Aug 30, 2019 at 4:55 AM Simon Dumke <si...@ipp.mpg.de>
>>>>> wrote:
>>>>> >
>>>>> > Hi all,
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > I did not find anything (and so: no definite answer) in the docs, so
>>>>> i
>>>>> > thought to ask here:
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > Does Arrow (and at this point my main concern is Arrow for java)
>>>>> support
>>>>> > any type of concept that allows a "record level access" (so, a
>>>>> "row") to
>>>>> > data in an Arrow RecordBatch or Table? I would have thougt that even
>>>>> in
>>>>> > column-oriented analytics etc. this would be a common last step
>>>>> access
>>>>> > pattern over many use cases, but i could not find any references to
>>>>> such a
>>>>> > thing.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > Thanks and kind regards,
>>>>> > Simon
>>>>> >
>>>>> >
>>>>>
>>>>
>>>

Re: Reccord-Level Access

Posted by Jacques Nadeau <ja...@apache.org>.
I'm arrow Java there is a record level accesor facade called FieldReader.
There is also a record level builder called ComplexWriter. Both allow
arbitrary complex data to be worked with using method invocations. Won't be
as efficient as a columnar algorithm but definitely much easier to get
started. We use a heavily overloaded interface pattern to basically support
dynamic typing.

Example reader use
https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/vector/complex/fn/JsonWriter.java

Example builder use
https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/vector/complex/fn/JsonReader.java

On Fri, Aug 30, 2019, 8:14 PM Micah Kornfield <em...@gmail.com> wrote:

> Hi Simon,
> A couple notes:
> - Scalars are a C++ thing, There are ValueHolders in Java but i'm not sure
> what you want.
> - Dremio has a JDBC adaptor [1] that might be worth looking at (or maybe
> porting pieces of it into Arrow).
>
> Thanks,
> Micah
>
> [1]
> https://github.com/dremio/dremio-oss/blob/04e0387d474f1408731da0029aef7ecfad5e4d08/client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioResultSetImpl.java
>
>
>
> On Fri, Aug 30, 2019 at 12:03 PM Simon Dumke <si...@ipp.mpg.de>
> wrote:
>
>> Hi Ben,
>>
>> thanks for the suggestion, i'll loon into it!
>>
>> Regards,
>> Simon
>>
>> Am 30. August 2019 20:59:23 schrieb Ben Kietzman <
>> ben.kietzman@rstudio.com>:
>>
>>> Hi Simon,
>>>
>>> If you're interested in adding a record interface, the Scalar classes
>>> might be a good place to start. They represent a value from an array slot
>>> and it should be fairly straightforward to extract a table row as a
>>> StructScalar
>>>
>>> On Fri, Aug 30, 2019 at 1:27 PM Wes McKinney <we...@gmail.com>
>>> wrote:
>>>
>>>> hi Simon -- I don't think there is any such Row accessor class in Java
>>>> but you are welcome to contribute one to the project. For performance
>>>> sensitive applications, using a record interface might not be the best
>>>> idea, but I can understand the convenience for some uses cases.
>>>>
>>>> - Wes
>>>>
>>>> On Fri, Aug 30, 2019 at 4:55 AM Simon Dumke <si...@ipp.mpg.de>
>>>> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > I did not find anything (and so: no definite answer) in the docs, so i
>>>> > thought to ask here:
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > Does Arrow (and at this point my main concern is Arrow for java)
>>>> support
>>>> > any type of concept that allows a "record level access" (so, a "row")
>>>> to
>>>> > data in an Arrow RecordBatch or Table? I would have thougt that even
>>>> in
>>>> > column-oriented analytics etc. this would be a common last step access
>>>> > pattern over many use cases, but i could not find any references to
>>>> such a
>>>> > thing.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > Thanks and kind regards,
>>>> > Simon
>>>> >
>>>> >
>>>>
>>>
>>

Re: Reccord-Level Access

Posted by Simon Dumke <si...@ipp.mpg.de>.
Hi Micah,

thanks for pointing this out, when first reading Bens Message I misread 
"the Scalar classes" as pointing to an Arrow Implementation in the Scalar 
language (and assuming a record construct existing there), not a set of 
Java (or rather C++) classes.

About dremio's jdbc: this was exactly what i was looking for when thinking 
of arrow in an sql (like) context. Thanks a lot!

Regards,
Simon

Am 31. August 2019 05:14:42 schrieb Micah Kornfield <em...@gmail.com>:
> Hi Simon,
> A couple notes:
> - Scalars are a C++ thing, There are ValueHolders in Java but i'm not sure 
> what you want.
> - Dremio has a JDBC adaptor [1] that might be worth looking at (or maybe 
> porting pieces of it into Arrow).
>
> Thanks,
> Micah
>
> [1] 
> https://github.com/dremio/dremio-oss/blob/04e0387d474f1408731da0029aef7ecfad5e4d08/client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioResultSetImpl.java
>
>
>
> On Fri, Aug 30, 2019 at 12:03 PM Simon Dumke <si...@ipp.mpg.de> wrote:
> Hi Ben,
>
> thanks for the suggestion, i'll loon into it!
>
>
> Regards,
> Simon
>
> Am 30. August 2019 20:59:23 schrieb Ben Kietzman <be...@rstudio.com>:
>> Hi Simon,
>>
>> If you're interested in adding a record interface, the Scalar classes might 
>> be a good place to start. They represent a value from an array slot and it 
>> should be fairly straightforward to extract a table row as a StructScalar
>>
>> On Fri, Aug 30, 2019 at 1:27 PM Wes McKinney <we...@gmail.com> wrote:
>> hi Simon -- I don't think there is any such Row accessor class in Java
>> but you are welcome to contribute one to the project. For performance
>> sensitive applications, using a record interface might not be the best
>> idea, but I can understand the convenience for some uses cases.
>>
>> - Wes
>>
>> On Fri, Aug 30, 2019 at 4:55 AM Simon Dumke <si...@ipp.mpg.de> wrote:
>>>
>>> Hi all,
>>>
>>>
>>>
>>>
>>> I did not find anything (and so: no definite answer) in the docs, so i
>>> thought to ask here:
>>>
>>>
>>>
>>>
>>> Does Arrow (and at this point my main concern is Arrow for java) support
>>> any type of concept that allows a "record level access" (so, a "row") to
>>> data in an Arrow RecordBatch or Table? I would have thougt that even in
>>> column-oriented analytics etc. this would be a common last step access
>>> pattern over many use cases, but i could not find any references to such a
>>> thing.
>>>
>>>
>>>
>>>
>>> Thanks and kind regards,
>>> Simon
>>>
>>>


Re: Reccord-Level Access

Posted by Micah Kornfield <em...@gmail.com>.
Hi Simon,
A couple notes:
- Scalars are a C++ thing, There are ValueHolders in Java but i'm not sure
what you want.
- Dremio has a JDBC adaptor [1] that might be worth looking at (or maybe
porting pieces of it into Arrow).

Thanks,
Micah

[1]
https://github.com/dremio/dremio-oss/blob/04e0387d474f1408731da0029aef7ecfad5e4d08/client/jdbc/src/main/java/com/dremio/jdbc/impl/DremioResultSetImpl.java



On Fri, Aug 30, 2019 at 12:03 PM Simon Dumke <si...@ipp.mpg.de> wrote:

> Hi Ben,
>
> thanks for the suggestion, i'll loon into it!
>
> Regards,
> Simon
>
> Am 30. August 2019 20:59:23 schrieb Ben Kietzman <ben.kietzman@rstudio.com
> >:
>
>> Hi Simon,
>>
>> If you're interested in adding a record interface, the Scalar classes
>> might be a good place to start. They represent a value from an array slot
>> and it should be fairly straightforward to extract a table row as a
>> StructScalar
>>
>> On Fri, Aug 30, 2019 at 1:27 PM Wes McKinney <we...@gmail.com> wrote:
>>
>>> hi Simon -- I don't think there is any such Row accessor class in Java
>>> but you are welcome to contribute one to the project. For performance
>>> sensitive applications, using a record interface might not be the best
>>> idea, but I can understand the convenience for some uses cases.
>>>
>>> - Wes
>>>
>>> On Fri, Aug 30, 2019 at 4:55 AM Simon Dumke <si...@ipp.mpg.de>
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> >
>>> >
>>> >
>>> > I did not find anything (and so: no definite answer) in the docs, so i
>>> > thought to ask here:
>>> >
>>> >
>>> >
>>> >
>>> > Does Arrow (and at this point my main concern is Arrow for java)
>>> support
>>> > any type of concept that allows a "record level access" (so, a "row")
>>> to
>>> > data in an Arrow RecordBatch or Table? I would have thougt that even in
>>> > column-oriented analytics etc. this would be a common last step access
>>> > pattern over many use cases, but i could not find any references to
>>> such a
>>> > thing.
>>> >
>>> >
>>> >
>>> >
>>> > Thanks and kind regards,
>>> > Simon
>>> >
>>> >
>>>
>>
>

Re: Reccord-Level Access

Posted by Simon Dumke <si...@ipp.mpg.de>.
Hi Ben,

thanks for the suggestion, i'll loon into it!


Regards,
Simon

Am 30. August 2019 20:59:23 schrieb Ben Kietzman <be...@rstudio.com>:
> Hi Simon,
>
> If you're interested in adding a record interface, the Scalar classes might 
> be a good place to start. They represent a value from an array slot and it 
> should be fairly straightforward to extract a table row as a StructScalar
>
> On Fri, Aug 30, 2019 at 1:27 PM Wes McKinney <we...@gmail.com> wrote:
> hi Simon -- I don't think there is any such Row accessor class in Java
> but you are welcome to contribute one to the project. For performance
> sensitive applications, using a record interface might not be the best
> idea, but I can understand the convenience for some uses cases.
>
> - Wes
>
> On Fri, Aug 30, 2019 at 4:55 AM Simon Dumke <si...@ipp.mpg.de> wrote:
>>
>> Hi all,
>>
>>
>>
>>
>> I did not find anything (and so: no definite answer) in the docs, so i
>> thought to ask here:
>>
>>
>>
>>
>> Does Arrow (and at this point my main concern is Arrow for java) support
>> any type of concept that allows a "record level access" (so, a "row") to
>> data in an Arrow RecordBatch or Table? I would have thougt that even in
>> column-oriented analytics etc. this would be a common last step access
>> pattern over many use cases, but i could not find any references to such a
>> thing.
>>
>>
>>
>>
>> Thanks and kind regards,
>> Simon
>>
>>


Re: Reccord-Level Access

Posted by Ben Kietzman <be...@rstudio.com>.
Hi Simon,

If you're interested in adding a record interface, the Scalar classes might
be a good place to start. They represent a value from an array slot and it
should be fairly straightforward to extract a table row as a StructScalar

On Fri, Aug 30, 2019 at 1:27 PM Wes McKinney <we...@gmail.com> wrote:

> hi Simon -- I don't think there is any such Row accessor class in Java
> but you are welcome to contribute one to the project. For performance
> sensitive applications, using a record interface might not be the best
> idea, but I can understand the convenience for some uses cases.
>
> - Wes
>
> On Fri, Aug 30, 2019 at 4:55 AM Simon Dumke <si...@ipp.mpg.de>
> wrote:
> >
> > Hi all,
> >
> >
> >
> >
> > I did not find anything (and so: no definite answer) in the docs, so i
> > thought to ask here:
> >
> >
> >
> >
> > Does Arrow (and at this point my main concern is Arrow for java) support
> > any type of concept that allows a "record level access" (so, a "row") to
> > data in an Arrow RecordBatch or Table? I would have thougt that even in
> > column-oriented analytics etc. this would be a common last step access
> > pattern over many use cases, but i could not find any references to such
> a
> > thing.
> >
> >
> >
> >
> > Thanks and kind regards,
> > Simon
> >
> >
>

Re: Reccord-Level Access

Posted by Simon Dumke <si...@ipp.mpg.de>.
Hi Wes,

thanks for the feedback.
I actually share your reservations regarding performance. I just think that 
the arrow structure seems ideal for working with tabular data (especially 
for effective filtering and selection), and after that a final step would 
(i think) often involve traversing the remaining data in a row oriented 
fashion. You would probably have a good overview over the ecosystem using 
Arrow - aren't there any SQL engines etc using Arrow, who would probably 
already have invested some thought in that? Or was your answer really 
limited to the specific hava case and such a concept does exist somewhere 
else, like in the c++ lib?

I'll cerntainly put some thought into this, and if i come up with a 
sensible solution, i'd be happy to contribute it.

Kind regards,
Simon

BTW: I've seen quite some of your talks (at YouTube) and read some of your 
articles while investigating into Arrow and its surrounding ecosystem, 
therefore: Thanks for all you have done and invested for Arrow in 
particular and for the open source community in general! I (as probably 
many others) very much appreciate that!




Am 30. August 2019 19:27:31 schrieb Wes McKinney <we...@gmail.com>:

> hi Simon -- I don't think there is any such Row accessor class in Java
> but you are welcome to contribute one to the project. For performance
> sensitive applications, using a record interface might not be the best
> idea, but I can understand the convenience for some uses cases.
>
> - Wes
>
> On Fri, Aug 30, 2019 at 4:55 AM Simon Dumke <si...@ipp.mpg.de> wrote:
>>
>>
>> Hi all,
>>
>>
>>
>>
>>
>>
>>
>>
>> I did not find anything (and so: no definite answer) in the docs, so i
>> thought to ask here:
>>
>>
>>
>>
>>
>>
>>
>>
>> Does Arrow (and at this point my main concern is Arrow for java) support
>> any type of concept that allows a "record level access" (so, a "row") to
>> data in an Arrow RecordBatch or Table? I would have thougt that even in
>> column-oriented analytics etc. this would be a common last step access
>> pattern over many use cases, but i could not find any references to such a
>> thing.
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks and kind regards,
>> Simon




Re: Reccord-Level Access

Posted by Wes McKinney <we...@gmail.com>.
hi Simon -- I don't think there is any such Row accessor class in Java
but you are welcome to contribute one to the project. For performance
sensitive applications, using a record interface might not be the best
idea, but I can understand the convenience for some uses cases.

- Wes

On Fri, Aug 30, 2019 at 4:55 AM Simon Dumke <si...@ipp.mpg.de> wrote:
>
> Hi all,
>
>
>
>
> I did not find anything (and so: no definite answer) in the docs, so i
> thought to ask here:
>
>
>
>
> Does Arrow (and at this point my main concern is Arrow for java) support
> any type of concept that allows a "record level access" (so, a "row") to
> data in an Arrow RecordBatch or Table? I would have thougt that even in
> column-oriented analytics etc. this would be a common last step access
> pattern over many use cases, but i could not find any references to such a
> thing.
>
>
>
>
> Thanks and kind regards,
> Simon
>
>