You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2018/07/08 16:09:32 UTC

Re: [DISCUSS] Developing a standard memory layout for in-memory records / "row-oriented" data

I added a line item for this on the project wiki. When I or someone
has some time, we can spend some time writing a requirements / design
sketch document to help with further discussions.

On Wed, Jun 27, 2018 at 7:09 AM, Wes McKinney <we...@gmail.com> wrote:
>> I'm not sure this makes sense as an external stable api. I definitely think it is useful as an internal representation for use within a particular algorithm. I also think that can be informed by the particular algorithm that you're working on.
>
> I agree that this is definitely needed in certain algorithms, e.g.
> certain types of hashing. And the memory layout that's best for a
> given algorithm might change. Since we have a number of support
> structures already in place for columnar data, like dictionary
> encoding, it would be easier to implement these things for
> row-oriented data.
>
> I think the question is really about open standards. Our original
> focus when we started the project was to develop an open standard for
> columnar data. It seems valuable to have one for row-oriented data.
> Given how many systems have developed their own internal formats, it
> seems like an inevitability. This begs the questions: if it does not
> happen here, then where? and if not now, then when?
>
> That being said, it's hard to say how feasible the project would be
> until we gather more requirements and non-requirements.
>
> On Wed, Jun 27, 2018 at 3:20 AM, Siddharth Teotia <si...@dremio.com> wrote:
>> I am wondering if this can be considered as an opportunity to implement
>> support in Arrow for building high performance in-memory row stores for low
>> latency and high throughput key based queries. In other words, we can
>> design the in-memory record format keeping efficient RDMA reads as one of
>> the goals too. Consider two data structures in memory -- a  hash table and
>> a row-store comprising of records in Arrow row format. Hashtable points to
>> row store and information can be read from both data structures without
>> interrupting the CPU on server. This client-server code-path support can
>> also be incorporated into Arrow Flight
>>
>> On Tue, Jun 26, 2018 at 7:49 PM, Jacques Nadeau <ja...@apache.org> wrote:
>>
>>> I'm not sure this makes sense as an external stable api. I definitely think
>>> it is useful as an internal representation for use within a particular
>>> algorithm. I also think that can be informed by the particular algorithm
>>> that you're working on.
>>>
>>> We definitely had this requirement in Dremio and came up with an internal
>>> representation that we are happy with for the use in hash tables. I'll try
>>> to dig up the design docs we had around this but the actual
>>> pivoting/unpivoting code that we developed can be seen here: [1], [2].
>>>
>>> Our main model is two blocks: a fixed width block and a variable width
>>> block (with the fixed width block also carrying address & length of the
>>> variable data). Fixed width is randomly accessible and variable width is
>>> randomly accessible through fixed width.
>>>
>>> [1]
>>> https://github.com/dremio/dremio-oss/blob/master/sabot/
>>> kernel/src/main/java/com/dremio/sabot/op/common/ht2/Pivots.java
>>> [2]
>>> https://github.com/dremio/dremio-oss/blob/master/sabot/
>>> kernel/src/main/java/com/dremio/sabot/op/common/ht2/Unpivots.java
>>>
>>> On Tue, Jun 26, 2018 at 10:20 AM, Wes McKinney <we...@gmail.com>
>>> wrote:
>>>
>>> > hi Antoine,
>>> >
>>> > On Sun, Jun 24, 2018 at 1:06 PM, Antoine Pitrou <an...@python.org>
>>> > wrote:
>>> > >
>>> > > Hi Wes,
>>> > >
>>> > > Le 24/06/2018 à 08:24, Wes McKinney a écrit :
>>> > >>
>>> > >> If this sounds interesting to the community, I could help to kickstart
>>> > >> a design process which would likely take a significant amount of time.
>>> > >> The requirements could be complex (i.e. we might want to support
>>> > >> variable-size record fields while also providing random access
>>> > >> guarantees).
>>> > >
>>> > > What do you call "variable-sized" here? A scheme where the length of a
>>> > > record's field is determined by the value of another field in the same
>>> > > record?
>>> >
>>> > As an example, here is a fixed size record
>>> >
>>> > record foo {
>>> >   a: int32;
>>> >   b: float64;
>>> >   c: uint8;
>>> > }
>>> >
>>> > With padding suppose this is 16 bytes per record; so if we have a
>>> > column of these, then random accessing any value in any record is
>>> > simple.
>>> >
>>> > Here's a variable-length record:
>>> >
>>> > record bar {
>>> >   a: string;
>>> >   b: list<int32>;
>>> > }
>>> >
>>> > What I've seen done to represent this in memory is to have a fixed
>>> > size record followed by a sidecar containing the variable-length data,
>>> > so the fixed size portion might look something like
>>> >
>>> > a_offset: int32;
>>> > a_length: int32;
>>> > b_offset: int32;
>>> > b_length: int32;
>>> >
>>> > So from this, you can do random access into the record. If you wanted
>>> > to do random access on a _column_ of such records, it is similar to
>>> > our current variable-length Binary type. So it might be that the
>>> > underlying Arrow memory layout would be FixedSizeBinary for fixed-size
>>> > records and variable Binary for variable-size records.
>>> >
>>> > - Wes
>>> >
>>> > >
>>> > >
>>> > >
>>> > > Regards
>>> > >
>>> > > Antoine.
>>> >
>>>