You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Radu Teodorescu <ra...@yahoo.com.INVALID> on 2020/08/05 12:42:34 UTC

Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

Wes & crew,
Congratulations and thank you for the successful 1.0 rollout , it is certainly making a huge difference for my day job!
Is it a good time now to revive the conversation below? (and https://github.com/apache/arrow/pull/7548 ) 
I have also gone ahead and released a prototype the covers some of the more hand wavy parts of my interface proposal (aka ways to compose arrays in a dataframe that controls the balance between fragmentation and buffer  copying) - it is here: https://github.com/raduteo/framespaces/tree/master <https://github.com/raduteo/framespaces/tree/master> and it lacks in documentation but the basic data structures are robustly implemented and tested so if we find merits in the original PR: https://github.com/apache/arrow/pull/7548 <https://github.com/apache/arrow/pull/7548> , there should be a reasonable path for implementing most of it.
 
Thank you
Radu
 

> On Jun 25, 2020, at 3:10 PM, Radu Teodorescu <ra...@yahoo.com.INVALID> wrote:
> 
> Understood and agreed
> My proposal really addresses a number of mechanisms on layer 2 ( "Virtual" tables) in your taxonomy (I can adjust interface names accordingly as part of the review process).
> One additional element I am proposing here is the ability to insert and modify rows in a vectorized fashion - they follow the same mechanics as “filter” which is effectively (i.e. row removal) 
> and I think they are quite important as an efficiently supported construct (for things like data cleanup, data set updates etc.)
> 
> I’m really looking forward to hear more of your thoughts (as well as anybody else’s who is interested in this topic)
> Radu 
> 
> 
>> On Jun 25, 2020, at 2:52 PM, Wes McKinney <we...@gmail.com> wrote:
>> 
>> hi Radu,
>> 
>> It's going to be challenging for me to review in detail until after
>> the 1.0.0 release is out, but in general I think there are 3 layers
>> that we need to be talking about:
>> 
>> * Materialized in-memory tables
>> * "Virtual" tables, whose in-memory/not-in-memory semantics are not
>> exposed -- permitting column selection, iteration as for execution of
>> query engine operators (e.g. projection, filter, join, aggregate), and
>> random access
>> * "Data Frame API": a programming interface for expressing analytical
>> operations on virtual tables. A data frame could be exported to
>> materialized tables / record batches e.g. for writing to Parquet or
>> IPC streams
>> 
>> In principle the "Data Frame API" shouldn't need to know much about
>> the first two layers, instead working with high level primitives and
>> leaving the execution of those primitives to the layers below. Does
>> this make sense?
>> 
>> I think we should be pretty strict about separation of concerns
>> between these three layers . I'll dig in in more detail sometime after
>> July 4.
>> 
>> Thanks
>> Wes
>> 
>> 
>> 
>> 
>> On Thu, Jun 25, 2020 at 11:50 AM Radu Teodorescu
>> <ra...@yahoo.com.invalid> wrote:
>>> 
>>> Here it is as a pull request:
>>> https://github.com/apache/arrow/pull/7548 <https://github.com/apache/arrow/pull/7548>
>>> 
>>> I hope this can be a starter for an active conversation diving into specifics, and I look forward to contribute with more design and algorithm ideas as well as concrete code.
>>> 
>>>> On Jun 17, 2020, at 6:11 PM, Neal Richardson <ne...@gmail.com> wrote:
>>>> 
>>>> Maybe a draft pull request? If you put "WIP" in the pull request title, CI
>>>> won't run builds on it, so it's suitable for rough outlines and collecting
>>>> feedback.
>>>> 
>>>> Neal
>>>> 
>>>> On Wed, Jun 17, 2020 at 2:57 PM Radu Teodorescu
>>>> <ra...@yahoo.com.invalid> wrote:
>>>> 
>>>>> Thank you Wes!
>>>>> Yes, both proposals fit very nicely in your Data Frames vision, I see them
>>>>> as deep dives on some specifics:
>>>>> - the virtual array doc is more fluffy an probably if you agree with the
>>>>> general concept, the next logical move is to put out some interfaces indeed
>>>>> - the random access doc goes into more details and I am curious what you
>>>>> think about some of the concepts
>>>>> 
>>>>> I will follow up shortly with some interfaces - do you prefer references
>>>>> to a repo, inline them in an email or add them as comments to your doc?
>>>>> 
>>>>> 
>>>>>> On Jun 17, 2020, at 4:26 PM, Wes McKinney <we...@gmail.com> wrote:
>>>>>> 
>>>>>> hi Radu,
>>>>>> 
>>>>>> I'll read the proposals in more detail when I can and make comments,
>>>>>> but this has always been something of interest (see, e.g. [1]). The
>>>>>> intent with the "C++ data frames" project that we've discussed (and I
>>>>>> continue to labor towards, e.g. recent compute engine work is directly
>>>>>> in service of this) has always been to be able to express computations
>>>>>> on non-RAM-resident datasets [2]
>>>>>> 
>>>>>> As one initial high level point of discussion, I think what you're
>>>>>> describing in these documents should probably be _new_ C++ classes and
>>>>>> _new_ virtual interfaces, not an evolution of the current arrow::Table
>>>>>> or arrow::Array/ChunkedArray classes. One practical path forward in
>>>>>> terms of discussing implementation issues would be to draft header
>>>>>> files proposing what these new class interfaces look like.
>>>>>> 
>>>>>> - Wes
>>>>>> 
>>>>>> [1]: https://issues.apache.org/jira/browse/ARROW-1329
>>>>>> [2]:
>>>>> https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading=h.g70gstc7jq4h
>>>>>> 
>>>>>> On Wed, Jun 17, 2020 at 2:48 PM Radu Teodorescu
>>>>>> <ra...@yahoo.com.invalid> wrote:
>>>>>>> 
>>>>>>> Hi folks,
>>>>>>> While I’ve been communicating with some members of this group in the
>>>>> past, this is my first official post so please excuse/correct/guide me as
>>>>> needed.
>>>>>>> 
>>>>>>> Logistics first:
>>>>>>> I put most of the content of my proposals in google doc, but if more
>>>>> appropriate, we can keep the conversation going by email.
>>>>>>> Also the two proposals are pretty independent, so if needed we can
>>>>> break it into two separate email threads, but for now I wanted to keep the
>>>>> spam low
>>>>>>> 
>>>>>>> Actual proposals:
>>>>>>> Virtual Array - The idea is to be able to handle arrow Tables where
>>>>> some of the column data is not (yet) available in memory. For example a
>>>>> Table can map to a parquet file, create VirtualArrays for each column chunk
>>>>> and only read the actual content if and when the Array is touched.
>>>>>>> Virtualize arrow Table <
>>>>> https://docs.google.com/document/d/1qXSHSgMZtjNGzWrqDxoBisSoR6gbnRiEztnYihNGLsI/edit?usp=sharing
>>>>>> 
>>>>>>> Random Access - I find that “application state” for most large scale
>>>>> systems is compatible with low level vectorized arrow representation and I
>>>>> propose a number of API expansions that would enable thread safe data
>>>>> mutation and efficient random access.
>>>>>>> Arrow arrays random access <
>>>>> https://docs.google.com/document/d/1tIsOhN6mfIAy6F8XRxeKRIqPBN0gKbcmrp2QJ4L3hJ8/edit?usp=sharing
>>>>>> 
>>>>>>> Please let me know what you think and what is the best course of action
>>>>> moving forward.
>>>>>>> Thank you
>>>>>>> Radu
>>>>> 
>>>>> 
>>> 
>

Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

Posted by Radu Teodorescu <ra...@yahoo.com.INVALID>.

> I will have a closer look and comment most likely next week.

Thank you!

> 
> Unfortunately, having code developed in external repositories increases the
> complexity of importing that code back into the Apache project  Not sure if
> you’re interested in preemptively following the project’s style guide (file
> naming, C++ code style, etc) but that would also help.

I understand that challenge, my intent was to prove to myself and anyone else, that there is a satisfying implementation that provides the semantics and the performance levels I am referring to in my proposals. It is a reference implementation, but certainly not something that can be dropped in directly in its current form (for example, I am leaning quite heavily on c++14/17 and a bit of 20), but if the vision makes sense I would love to bring that into arrow.

> On Wed, Aug 5, 2020 at 7:43 AM Radu Teodorescu <ra...@yahoo.com.invalid>
> wrote:
> 
>> Wes & crew,
>> Congratulations and thank you for the successful 1.0 rollout , it is
>> certainly making a huge difference for my day job!
>> Is it a good time now to revive the conversation below? (and
>> https://github.com/apache/arrow/pull/7548 )
>> I have also gone ahead and released a prototype the covers some of the
>> more hand wavy parts of my interface proposal (aka ways to compose arrays
>> in a dataframe that controls the balance between fragmentation and buffer
>> copying) - it is here: https://github.com/raduteo/framespaces/tree/master
>> <https://github.com/raduteo/framespaces/tree/master> and it lacks in
>> documentation but the basic data structures are robustly implemented and
>> tested so if we find merits in the original PR:
>> https://github.com/apache/arrow/pull/7548 <
>> https://github.com/apache/arrow/pull/7548> , there should be a reasonable
>> path for implementing most of it.
>> 
>> Thank you
>> Radu
>> 
>> 
>>> On Jun 25, 2020, at 3:10 PM, Radu Teodorescu
>> <ra...@yahoo.com.INVALID> wrote:
>>> 
>>> Understood and agreed
>>> My proposal really addresses a number of mechanisms on layer 2 (
>> "Virtual" tables) in your taxonomy (I can adjust interface names
>> accordingly as part of the review process).
>>> One additional element I am proposing here is the ability to insert and
>> modify rows in a vectorized fashion - they follow the same mechanics as
>> “filter” which is effectively (i.e. row removal)
>>> and I think they are quite important as an efficiently supported
>> construct (for things like data cleanup, data set updates etc.)
>>> 
>>> I’m really looking forward to hear more of your thoughts (as well as
>> anybody else’s who is interested in this topic)
>>> Radu
>>> 
>>> 
>>>> On Jun 25, 2020, at 2:52 PM, Wes McKinney <we...@gmail.com> wrote:
>>>> 
>>>> hi Radu,
>>>> 
>>>> It's going to be challenging for me to review in detail until after
>>>> the 1.0.0 release is out, but in general I think there are 3 layers
>>>> that we need to be talking about:
>>>> 
>>>> * Materialized in-memory tables
>>>> * "Virtual" tables, whose in-memory/not-in-memory semantics are not
>>>> exposed -- permitting column selection, iteration as for execution of
>>>> query engine operators (e.g. projection, filter, join, aggregate), and
>>>> random access
>>>> * "Data Frame API": a programming interface for expressing analytical
>>>> operations on virtual tables. A data frame could be exported to
>>>> materialized tables / record batches e.g. for writing to Parquet or
>>>> IPC streams
>>>> 
>>>> In principle the "Data Frame API" shouldn't need to know much about
>>>> the first two layers, instead working with high level primitives and
>>>> leaving the execution of those primitives to the layers below. Does
>>>> this make sense?
>>>> 
>>>> I think we should be pretty strict about separation of concerns
>>>> between these three layers . I'll dig in in more detail sometime after
>>>> July 4.
>>>> 
>>>> Thanks
>>>> Wes
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Thu, Jun 25, 2020 at 11:50 AM Radu Teodorescu
>>>> <ra...@yahoo.com.invalid> wrote:
>>>>> 
>>>>> Here it is as a pull request:
>>>>> https://github.com/apache/arrow/pull/7548 <
>> https://github.com/apache/arrow/pull/7548>
>>>>> 
>>>>> I hope this can be a starter for an active conversation diving into
>> specifics, and I look forward to contribute with more design and algorithm
>> ideas as well as concrete code.
>>>>> 
>>>>>> On Jun 17, 2020, at 6:11 PM, Neal Richardson <
>> neal.p.richardson@gmail.com> wrote:
>>>>>> 
>>>>>> Maybe a draft pull request? If you put "WIP" in the pull request
>> title, CI
>>>>>> won't run builds on it, so it's suitable for rough outlines and
>> collecting
>>>>>> feedback.
>>>>>> 
>>>>>> Neal
>>>>>> 
>>>>>> On Wed, Jun 17, 2020 at 2:57 PM Radu Teodorescu
>>>>>> <ra...@yahoo.com.invalid> wrote:
>>>>>> 
>>>>>>> Thank you Wes!
>>>>>>> Yes, both proposals fit very nicely in your Data Frames vision, I
>> see them
>>>>>>> as deep dives on some specifics:
>>>>>>> - the virtual array doc is more fluffy an probably if you agree with
>> the
>>>>>>> general concept, the next logical move is to put out some interfaces
>> indeed
>>>>>>> - the random access doc goes into more details and I am curious what
>> you
>>>>>>> think about some of the concepts
>>>>>>> 
>>>>>>> I will follow up shortly with some interfaces - do you prefer
>> references
>>>>>>> to a repo, inline them in an email or add them as comments to your
>> doc?
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jun 17, 2020, at 4:26 PM, Wes McKinney <we...@gmail.com>
>> wrote:
>>>>>>>> 
>>>>>>>> hi Radu,
>>>>>>>> 
>>>>>>>> I'll read the proposals in more detail when I can and make comments,
>>>>>>>> but this has always been something of interest (see, e.g. [1]). The
>>>>>>>> intent with the "C++ data frames" project that we've discussed (and
>> I
>>>>>>>> continue to labor towards, e.g. recent compute engine work is
>> directly
>>>>>>>> in service of this) has always been to be able to express
>> computations
>>>>>>>> on non-RAM-resident datasets [2]
>>>>>>>> 
>>>>>>>> As one initial high level point of discussion, I think what you're
>>>>>>>> describing in these documents should probably be _new_ C++ classes
>> and
>>>>>>>> _new_ virtual interfaces, not an evolution of the current
>> arrow::Table
>>>>>>>> or arrow::Array/ChunkedArray classes. One practical path forward in
>>>>>>>> terms of discussing implementation issues would be to draft header
>>>>>>>> files proposing what these new class interfaces look like.
>>>>>>>> 
>>>>>>>> - Wes
>>>>>>>> 
>>>>>>>> [1]: https://issues.apache.org/jira/browse/ARROW-1329
>>>>>>>> [2]:
>>>>>>> 
>> https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading=h.g70gstc7jq4h
>>>>>>>> 
>>>>>>>> On Wed, Jun 17, 2020 at 2:48 PM Radu Teodorescu
>>>>>>>> <ra...@yahoo.com.invalid> wrote:
>>>>>>>>> 
>>>>>>>>> Hi folks,
>>>>>>>>> While I’ve been communicating with some members of this group in
>> the
>>>>>>> past, this is my first official post so please excuse/correct/guide
>> me as
>>>>>>> needed.
>>>>>>>>> 
>>>>>>>>> Logistics first:
>>>>>>>>> I put most of the content of my proposals in google doc, but if
>> more
>>>>>>> appropriate, we can keep the conversation going by email.
>>>>>>>>> Also the two proposals are pretty independent, so if needed we can
>>>>>>> break it into two separate email threads, but for now I wanted to
>> keep the
>>>>>>> spam low
>>>>>>>>> 
>>>>>>>>> Actual proposals:
>>>>>>>>> Virtual Array - The idea is to be able to handle arrow Tables where
>>>>>>> some of the column data is not (yet) available in memory. For
>> example a
>>>>>>> Table can map to a parquet file, create VirtualArrays for each
>> column chunk
>>>>>>> and only read the actual content if and when the Array is touched.
>>>>>>>>> Virtualize arrow Table <
>>>>>>> 
>> https://docs.google.com/document/d/1qXSHSgMZtjNGzWrqDxoBisSoR6gbnRiEztnYihNGLsI/edit?usp=sharing
>>>>>>>> 
>>>>>>>>> Random Access - I find that “application state” for most large
>> scale
>>>>>>> systems is compatible with low level vectorized arrow representation
>> and I
>>>>>>> propose a number of API expansions that would enable thread safe data
>>>>>>> mutation and efficient random access.
>>>>>>>>> Arrow arrays random access <
>>>>>>> 
>> https://docs.google.com/document/d/1tIsOhN6mfIAy6F8XRxeKRIqPBN0gKbcmrp2QJ4L3hJ8/edit?usp=sharing
>>>>>>>> 
>>>>>>>>> Please let me know what you think and what is the best course of
>> action
>>>>>>> moving forward.
>>>>>>>>> Thank you
>>>>>>>>> Radu
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 
>>

Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

Posted by Wes McKinney <we...@gmail.com>.

I will have a closer look and comment most likely next week.

Unfortunately, having code developed in external repositories increases the
complexity of importing that code back into the Apache project  Not sure if
you’re interested in preemptively following the project’s style guide (file
naming, C++ code style, etc) but that would also help.

On Wed, Aug 5, 2020 at 7:43 AM Radu Teodorescu <ra...@yahoo.com.invalid>
wrote:

> Wes & crew,
> Congratulations and thank you for the successful 1.0 rollout , it is
> certainly making a huge difference for my day job!
> Is it a good time now to revive the conversation below? (and
> https://github.com/apache/arrow/pull/7548 )
> I have also gone ahead and released a prototype the covers some of the
> more hand wavy parts of my interface proposal (aka ways to compose arrays
> in a dataframe that controls the balance between fragmentation and buffer
> copying) - it is here: https://github.com/raduteo/framespaces/tree/master
> <https://github.com/raduteo/framespaces/tree/master> and it lacks in
> documentation but the basic data structures are robustly implemented and
> tested so if we find merits in the original PR:
> https://github.com/apache/arrow/pull/7548 <
> https://github.com/apache/arrow/pull/7548> , there should be a reasonable
> path for implementing most of it.
>
> Thank you
> Radu
>
>
> > On Jun 25, 2020, at 3:10 PM, Radu Teodorescu
> <ra...@yahoo.com.INVALID> wrote:
> >
> > Understood and agreed
> > My proposal really addresses a number of mechanisms on layer 2 (
> "Virtual" tables) in your taxonomy (I can adjust interface names
> accordingly as part of the review process).
> > One additional element I am proposing here is the ability to insert and
> modify rows in a vectorized fashion - they follow the same mechanics as
> “filter” which is effectively (i.e. row removal)
> > and I think they are quite important as an efficiently supported
> construct (for things like data cleanup, data set updates etc.)
> >
> > I’m really looking forward to hear more of your thoughts (as well as
> anybody else’s who is interested in this topic)
> > Radu
> >
> >
> >> On Jun 25, 2020, at 2:52 PM, Wes McKinney <we...@gmail.com> wrote:
> >>
> >> hi Radu,
> >>
> >> It's going to be challenging for me to review in detail until after
> >> the 1.0.0 release is out, but in general I think there are 3 layers
> >> that we need to be talking about:
> >>
> >> * Materialized in-memory tables
> >> * "Virtual" tables, whose in-memory/not-in-memory semantics are not
> >> exposed -- permitting column selection, iteration as for execution of
> >> query engine operators (e.g. projection, filter, join, aggregate), and
> >> random access
> >> * "Data Frame API": a programming interface for expressing analytical
> >> operations on virtual tables. A data frame could be exported to
> >> materialized tables / record batches e.g. for writing to Parquet or
> >> IPC streams
> >>
> >> In principle the "Data Frame API" shouldn't need to know much about
> >> the first two layers, instead working with high level primitives and
> >> leaving the execution of those primitives to the layers below. Does
> >> this make sense?
> >>
> >> I think we should be pretty strict about separation of concerns
> >> between these three layers . I'll dig in in more detail sometime after
> >> July 4.
> >>
> >> Thanks
> >> Wes
> >>
> >>
> >>
> >>
> >> On Thu, Jun 25, 2020 at 11:50 AM Radu Teodorescu
> >> <ra...@yahoo.com.invalid> wrote:
> >>>
> >>> Here it is as a pull request:
> >>> https://github.com/apache/arrow/pull/7548 <
> https://github.com/apache/arrow/pull/7548>
> >>>
> >>> I hope this can be a starter for an active conversation diving into
> specifics, and I look forward to contribute with more design and algorithm
> ideas as well as concrete code.
> >>>
> >>>> On Jun 17, 2020, at 6:11 PM, Neal Richardson <
> neal.p.richardson@gmail.com> wrote:
> >>>>
> >>>> Maybe a draft pull request? If you put "WIP" in the pull request
> title, CI
> >>>> won't run builds on it, so it's suitable for rough outlines and
> collecting
> >>>> feedback.
> >>>>
> >>>> Neal
> >>>>
> >>>> On Wed, Jun 17, 2020 at 2:57 PM Radu Teodorescu
> >>>> <ra...@yahoo.com.invalid> wrote:
> >>>>
> >>>>> Thank you Wes!
> >>>>> Yes, both proposals fit very nicely in your Data Frames vision, I
> see them
> >>>>> as deep dives on some specifics:
> >>>>> - the virtual array doc is more fluffy an probably if you agree with
> the
> >>>>> general concept, the next logical move is to put out some interfaces
> indeed
> >>>>> - the random access doc goes into more details and I am curious what
> you
> >>>>> think about some of the concepts
> >>>>>
> >>>>> I will follow up shortly with some interfaces - do you prefer
> references
> >>>>> to a repo, inline them in an email or add them as comments to your
> doc?
> >>>>>
> >>>>>
> >>>>>> On Jun 17, 2020, at 4:26 PM, Wes McKinney <we...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> hi Radu,
> >>>>>>
> >>>>>> I'll read the proposals in more detail when I can and make comments,
> >>>>>> but this has always been something of interest (see, e.g. [1]). The
> >>>>>> intent with the "C++ data frames" project that we've discussed (and
> I
> >>>>>> continue to labor towards, e.g. recent compute engine work is
> directly
> >>>>>> in service of this) has always been to be able to express
> computations
> >>>>>> on non-RAM-resident datasets [2]
> >>>>>>
> >>>>>> As one initial high level point of discussion, I think what you're
> >>>>>> describing in these documents should probably be _new_ C++ classes
> and
> >>>>>> _new_ virtual interfaces, not an evolution of the current
> arrow::Table
> >>>>>> or arrow::Array/ChunkedArray classes. One practical path forward in
> >>>>>> terms of discussing implementation issues would be to draft header
> >>>>>> files proposing what these new class interfaces look like.
> >>>>>>
> >>>>>> - Wes
> >>>>>>
> >>>>>> [1]: https://issues.apache.org/jira/browse/ARROW-1329
> >>>>>> [2]:
> >>>>>
> https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading=h.g70gstc7jq4h
> >>>>>>
> >>>>>> On Wed, Jun 17, 2020 at 2:48 PM Radu Teodorescu
> >>>>>> <ra...@yahoo.com.invalid> wrote:
> >>>>>>>
> >>>>>>> Hi folks,
> >>>>>>> While I’ve been communicating with some members of this group in
> the
> >>>>> past, this is my first official post so please excuse/correct/guide
> me as
> >>>>> needed.
> >>>>>>>
> >>>>>>> Logistics first:
> >>>>>>> I put most of the content of my proposals in google doc, but if
> more
> >>>>> appropriate, we can keep the conversation going by email.
> >>>>>>> Also the two proposals are pretty independent, so if needed we can
> >>>>> break it into two separate email threads, but for now I wanted to
> keep the
> >>>>> spam low
> >>>>>>>
> >>>>>>> Actual proposals:
> >>>>>>> Virtual Array - The idea is to be able to handle arrow Tables where
> >>>>> some of the column data is not (yet) available in memory. For
> example a
> >>>>> Table can map to a parquet file, create VirtualArrays for each
> column chunk
> >>>>> and only read the actual content if and when the Array is touched.
> >>>>>>> Virtualize arrow Table <
> >>>>>
> https://docs.google.com/document/d/1qXSHSgMZtjNGzWrqDxoBisSoR6gbnRiEztnYihNGLsI/edit?usp=sharing
> >>>>>>
> >>>>>>> Random Access - I find that “application state” for most large
> scale
> >>>>> systems is compatible with low level vectorized arrow representation
> and I
> >>>>> propose a number of API expansions that would enable thread safe data
> >>>>> mutation and efficient random access.
> >>>>>>> Arrow arrays random access <
> >>>>>
> https://docs.google.com/document/d/1tIsOhN6mfIAy6F8XRxeKRIqPBN0gKbcmrp2QJ4L3hJ8/edit?usp=sharing
> >>>>>>
> >>>>>>> Please let me know what you think and what is the best course of
> action
> >>>>> moving forward.
> >>>>>>> Thank you
> >>>>>>> Radu
> >>>>>
> >>>>>
> >>>
> >
>
>