You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Roberts, Geoffry [USA]" <Ro...@bah.com> on 2021/02/17 17:17:37 UTC

Re: [External] Re: Accumulo and Arrow

This is where I saw a reference to Hbase<https://blog.cloudera.com/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-standard/>.


________________________________
From: Emilio Lahr-Vivaz <el...@ccri.com>
Sent: Wednesday, February 17, 2021 11:04 AM
To: user@accumulo.apache.org <us...@accumulo.apache.org>
Subject: [External] Re: Accumulo and Arrow

Hello,

Do you have a link to describe the integration between HBase and Arrow? I didn't find anything except some theoretical discussions. My understanding is that Arrow is meant for in-memory representations, and there is no plan to i.e. replace HFiles or RFiles with Arrow files in HBase/Accumulo.

I'm interested in the intersection of the two, though. I'm a committer on GeoMesa, and we provide a way to export Arrow files out of both Accumulo and HBase using custom iterators/coprocessors. GeoMesa is focused on spatial data though, so it may not fit with your use case.

Thanks,

Emilio

On 2/17/21 8:13 AM, Roberts, Geoffry [USA] wrote:
All,

I have been looking into Apache Arrow.  I see that it supports a connect to HBase.  I
Googled but found nothing wrt Accumulo.  Is there, or is there planned, support for Arrow/Accumulo?

Thanks


Re: [External] Re: Accumulo and Arrow

Posted by Emilio Lahr-Vivaz <el...@ccri.com>.
I don't think HBase directly integrates with Parquet either. If you look 
at the HBase documentation, the only reference to Parquet is related to 
spark dataframe compatibility:

 > HBase Dataframe is a standard Spark Dataframe, and is able to 
interact with any other data sources such as Hive, Orc, Parquet, JSON, etc.

There are a lot of projects that integrate between the two (indeed, 
GeoMesa will ingest and export both Parquet and Arrow through HBase and 
Accumulo), but it's not a native component.

Thanks,

Emilio

On 2/18/21 8:15 AM, Roberts, Geoffry [USA] wrote:
> They are saying that HBase uses Apache Parquet, which as I gather is 
> compatible with Arrow.  I am just now spinning up on all this so bear 
> with me.   As I understand it, Arrow is memory and Parquet is files.
>
> I have a code base that is built around Accumulo. My code does a lot 
> in memory already.  I like what Arrow has to offer from a polyglot 
> standpoint, but my data sets are, well,   they're what people call 
> "big data" hence Accumulo.  If HBase can handle the Arrow/Parquet 
> structure, why not Accumulo?
>
> Good to be talking
> ------------------------------------------------------------------------
> *From:* Emilio Lahr-Vivaz <el...@ccri.com>
> *Sent:* Wednesday, February 17, 2021 4:09 PM
> *To:* user@accumulo.apache.org <us...@accumulo.apache.org>
> *Subject:* Re: [External] Re: Accumulo and Arrow
> I believe that was a theoretical - I don't think there has been any 
> actual integration at this point. But I'd be happy to be proven wrong :)
>
> Thanks,
>
> Emilio
>
> On 2/17/21 12:17 PM, Roberts, Geoffry [USA] wrote:
>> This is where I saw a reference to Hbase 
>> <https://urldefense.com/v3/__https://blog.cloudera.com/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-standard/__;!!May37g!ZB8PMax5pRwIM7nFl1H-Mp08wuwY5wrZFRlBWLpFpE_9dISxxitDG-watKobtJyhfuEg$>. 
>>
>>
>>
>> ------------------------------------------------------------------------
>> *From:* Emilio Lahr-Vivaz <el...@ccri.com> 
>> <ma...@ccri.com>
>> *Sent:* Wednesday, February 17, 2021 11:04 AM
>> *To:* user@accumulo.apache.org <ma...@accumulo.apache.org> 
>> <us...@accumulo.apache.org> <ma...@accumulo.apache.org>
>> *Subject:* [External] Re: Accumulo and Arrow
>> Hello,
>>
>> Do you have a link to describe the integration between HBase and 
>> Arrow? I didn't find anything except some theoretical discussions. My 
>> understanding is that Arrow is meant for in-memory representations, 
>> and there is no plan to i.e. replace HFiles or RFiles with Arrow 
>> files in HBase/Accumulo.
>>
>> I'm interested in the intersection of the two, though. I'm a 
>> committer on GeoMesa, and we provide a way to export Arrow files out 
>> of both Accumulo and HBase using custom iterators/coprocessors. 
>> GeoMesa is focused on spatial data though, so it may not fit with 
>> your use case.
>>
>> Thanks,
>>
>> Emilio
>>
>> On 2/17/21 8:13 AM, Roberts, Geoffry [USA] wrote:
>>> All,
>>>
>>> I have been looking into Apache Arrow.  I see that it supports a 
>>> connect to HBase.  I
>>> Googled but found nothing wrt Accumulo.  Is there, or is there 
>>> planned, support for Arrow/Accumulo?
>>>
>>> Thanks
>>
>


Re: [External] Re: Accumulo and Arrow

Posted by "Roberts, Geoffry [USA]" <Ro...@bah.com>.
They are saying that HBase uses Apache Parquet, which as I gather is compatible with Arrow.  I am just now spinning up on all this so bear with me.   As I understand it, Arrow is memory and Parquet is files.

I have a code base that is built around Accumulo. My code does a lot in memory already.  I like what Arrow has to offer from a polyglot standpoint, but my data sets are, well,   they're what people call "big data" hence Accumulo.  If HBase can handle the Arrow/Parquet structure, why not Accumulo?

Good to be talking
________________________________
From: Emilio Lahr-Vivaz <el...@ccri.com>
Sent: Wednesday, February 17, 2021 4:09 PM
To: user@accumulo.apache.org <us...@accumulo.apache.org>
Subject: Re: [External] Re: Accumulo and Arrow

I believe that was a theoretical - I don't think there has been any actual integration at this point. But I'd be happy to be proven wrong :)

Thanks,

Emilio

On 2/17/21 12:17 PM, Roberts, Geoffry [USA] wrote:
This is where I saw a reference to Hbase<https://urldefense.com/v3/__https://blog.cloudera.com/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-standard/__;!!May37g!ZB8PMax5pRwIM7nFl1H-Mp08wuwY5wrZFRlBWLpFpE_9dISxxitDG-watKobtJyhfuEg$>.


________________________________
From: Emilio Lahr-Vivaz <el...@ccri.com>
Sent: Wednesday, February 17, 2021 11:04 AM
To: user@accumulo.apache.org<ma...@accumulo.apache.org> <us...@accumulo.apache.org>
Subject: [External] Re: Accumulo and Arrow

Hello,

Do you have a link to describe the integration between HBase and Arrow? I didn't find anything except some theoretical discussions. My understanding is that Arrow is meant for in-memory representations, and there is no plan to i.e. replace HFiles or RFiles with Arrow files in HBase/Accumulo.

I'm interested in the intersection of the two, though. I'm a committer on GeoMesa, and we provide a way to export Arrow files out of both Accumulo and HBase using custom iterators/coprocessors. GeoMesa is focused on spatial data though, so it may not fit with your use case.

Thanks,

Emilio

On 2/17/21 8:13 AM, Roberts, Geoffry [USA] wrote:
All,

I have been looking into Apache Arrow.  I see that it supports a connect to HBase.  I
Googled but found nothing wrt Accumulo.  Is there, or is there planned, support for Arrow/Accumulo?

Thanks



Re: [External] Re: Accumulo and Arrow

Posted by Emilio Lahr-Vivaz <el...@ccri.com>.
I believe that was a theoretical - I don't think there has been any 
actual integration at this point. But I'd be happy to be proven wrong :)

Thanks,

Emilio

On 2/17/21 12:17 PM, Roberts, Geoffry [USA] wrote:
> This is where I saw a reference to Hbase 
> <https://blog.cloudera.com/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-standard/>. 
>
>
>
> ------------------------------------------------------------------------
> *From:* Emilio Lahr-Vivaz <el...@ccri.com>
> *Sent:* Wednesday, February 17, 2021 11:04 AM
> *To:* user@accumulo.apache.org <us...@accumulo.apache.org>
> *Subject:* [External] Re: Accumulo and Arrow
> Hello,
>
> Do you have a link to describe the integration between HBase and 
> Arrow? I didn't find anything except some theoretical discussions. My 
> understanding is that Arrow is meant for in-memory representations, 
> and there is no plan to i.e. replace HFiles or RFiles with Arrow files 
> in HBase/Accumulo.
>
> I'm interested in the intersection of the two, though. I'm a committer 
> on GeoMesa, and we provide a way to export Arrow files out of both 
> Accumulo and HBase using custom iterators/coprocessors. GeoMesa is 
> focused on spatial data though, so it may not fit with your use case.
>
> Thanks,
>
> Emilio
>
> On 2/17/21 8:13 AM, Roberts, Geoffry [USA] wrote:
>> All,
>>
>> I have been looking into Apache Arrow.  I see that it supports a 
>> connect to HBase.  I
>> Googled but found nothing wrt Accumulo.  Is there, or is there 
>> planned, support for Arrow/Accumulo?
>>
>> Thanks
>