You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Shazz <sh...@metaverse.fr> on 2020/01/21 18:32:40 UTC

new to Arrow / integration with Kudu

Hi,

I'm thinking of an architecture to store and access efficiently tabular 
data and I was told to look at Arrow and Kudu.
I saw on the frontpage a diagram where Arrow can be integrated with Kudu 
but nothing in the documentation. Is there an example available 
somewhere ?

Thanks !

-- 
shazz@metaverse.fr
GPG public key ID : B517C4C8

Re: new to Arrow / integration with Kudu

Posted by Wes McKinney <we...@gmail.com>.
On Wed, Jan 22, 2020 at 12:28 PM Shazz <sh...@metaverse.fr> wrote:
>
> Thanks Wes,
>
> I will follow what is happening between Arrow and Kudu.
> In the short term, if you would have to define a storage for Arrow which
> has good (enough) performance, not too costly to operate... what would
> you choose ? I saw there is an example to store Parquet files on Azure
> Blob Storage, would it be ok to start ? Or there is a better choice ?

Many people are doing that. Note that you'll need to do some tuning
(e.g. read buffering) to obtain acceptable performance against things
like ABS

> ---
> shazz@metaverse.fr
> GPG public key ID : B517C4C8
>
> Le 21/01/2020 17:54, Wes McKinney a écrit :
> > I'm interested to see an Arrow adapter for Apache Kudu developed. My
> > gut feeling is that this work should be undertaken in Kudu itself,
> > potentially having the tablet servers producing Arrow Record Batches
> > locally and sending them to the client rather than converting to
> > Kudu's own on-the-wire record format and then deserializing into Arrow
> > on the receiver side. It might be worth a conversation with the Kudu
> > community to see what they think.
> >
> > Of course one can build an Arrow deserializer for the current Kudu C++
> > client API and probably get pretty good performance. see also
> > ARROW-814
> >
> > https://issues.apache.org/jira/browse/ARROW-814
> >
> > On Tue, Jan 21, 2020 at 12:32 PM Shazz <sh...@metaverse.fr> wrote:
> >>
> >> Hi,
> >>
> >> I'm thinking of an architecture to store and access efficiently
> >> tabular
> >> data and I was told to look at Arrow and Kudu.
> >> I saw on the frontpage a diagram where Arrow can be integrated with
> >> Kudu
> >> but nothing in the documentation. Is there an example available
> >> somewhere ?
> >>
> >> Thanks !
> >>
> >> --
> >> shazz@metaverse.fr
> >> GPG public key ID : B517C4C8

Re: new to Arrow / integration with Kudu

Posted by Shazz <sh...@metaverse.fr>.
Thanks Wes,

I will follow what is happening between Arrow and Kudu.
In the short term, if you would have to define a storage for Arrow which 
has good (enough) performance, not too costly to operate... what would 
you choose ? I saw there is an example to store Parquet files on Azure 
Blob Storage, would it be ok to start ? Or there is a better choice ?

---
shazz@metaverse.fr
GPG public key ID : B517C4C8

Le 21/01/2020 17:54, Wes McKinney a écrit :
> I'm interested to see an Arrow adapter for Apache Kudu developed. My
> gut feeling is that this work should be undertaken in Kudu itself,
> potentially having the tablet servers producing Arrow Record Batches
> locally and sending them to the client rather than converting to
> Kudu's own on-the-wire record format and then deserializing into Arrow
> on the receiver side. It might be worth a conversation with the Kudu
> community to see what they think.
> 
> Of course one can build an Arrow deserializer for the current Kudu C++
> client API and probably get pretty good performance. see also
> ARROW-814
> 
> https://issues.apache.org/jira/browse/ARROW-814
> 
> On Tue, Jan 21, 2020 at 12:32 PM Shazz <sh...@metaverse.fr> wrote:
>> 
>> Hi,
>> 
>> I'm thinking of an architecture to store and access efficiently 
>> tabular
>> data and I was told to look at Arrow and Kudu.
>> I saw on the frontpage a diagram where Arrow can be integrated with 
>> Kudu
>> but nothing in the documentation. Is there an example available
>> somewhere ?
>> 
>> Thanks !
>> 
>> --
>> shazz@metaverse.fr
>> GPG public key ID : B517C4C8

Re: new to Arrow / integration with Kudu

Posted by Wes McKinney <we...@gmail.com>.
I'm interested to see an Arrow adapter for Apache Kudu developed. My
gut feeling is that this work should be undertaken in Kudu itself,
potentially having the tablet servers producing Arrow Record Batches
locally and sending them to the client rather than converting to
Kudu's own on-the-wire record format and then deserializing into Arrow
on the receiver side. It might be worth a conversation with the Kudu
community to see what they think.

Of course one can build an Arrow deserializer for the current Kudu C++
client API and probably get pretty good performance. see also
ARROW-814

https://issues.apache.org/jira/browse/ARROW-814

On Tue, Jan 21, 2020 at 12:32 PM Shazz <sh...@metaverse.fr> wrote:
>
> Hi,
>
> I'm thinking of an architecture to store and access efficiently tabular
> data and I was told to look at Arrow and Kudu.
> I saw on the frontpage a diagram where Arrow can be integrated with Kudu
> but nothing in the documentation. Is there an example available
> somewhere ?
>
> Thanks !
>
> --
> shazz@metaverse.fr
> GPG public key ID : B517C4C8