You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Hayden Livingston <ha...@gmail.com> on 2021/12/23 04:14:44 UTC

Arrow vs Artus

Has anyone been able to benchmark the Artus file format vs Arrow?

It seems that the Artus file format is gaining traction inside Google,
replacing their current columnar format Capacitor.

Re: Arrow vs Artus

Posted by Benson Muite <be...@emailplus.org>.
The paper[1] is helpful. Compression may also be helpful - but it may be 
difficult to standardize this.

[1] https://vldb.org/pvldb/vol12/p2022-chattopadhyay.pdf


On 12/25/21 5:37 AM, Micah Kornfield wrote:
> What exactly are you looking for?  To my knowledge neither Capacitor nor
> Artus have been described in enough detail external to Google to allow for
> external benchmarking, so the details would probably only be relevant to
> Google.
> 
> Both formats have more complicated encodings and embedded data-structures
> making them closer to Parquet (which is loosely based on precursor to
> capacitor) and ORC then Arrow.  There are interesting ideas from the
> Procella paper which covers Artus that might be worth thinking about in the
> context of these formats (or a new one).
> 
> Arrow has not spent much focus on optimizing storage size.
> 
> Cheers,
> Micah
> 
> On Wednesday, December 22, 2021, Benson Muite <be...@emailplus.org>
> wrote:
> 
>> On 12/23/21 7:14 AM, Hayden Livingston wrote:
>>
>>> Has anyone been able to benchmark the Artus file format vs Arrow?
>>>
>>> It seems that the Artus file format is gaining traction inside Google,
>>> replacing their current columnar format Capacitor.
>>>
>>> Hayden,
>> Do you have a link to a specification or implementation of Artus?
>> Performance may also be related to disk type, network etc.
>>
> 


Arrow vs Artus

Posted by Micah Kornfield <em...@gmail.com>.
What exactly are you looking for?  To my knowledge neither Capacitor nor
Artus have been described in enough detail external to Google to allow for
external benchmarking, so the details would probably only be relevant to
Google.

Both formats have more complicated encodings and embedded data-structures
making them closer to Parquet (which is loosely based on precursor to
capacitor) and ORC then Arrow.  There are interesting ideas from the
Procella paper which covers Artus that might be worth thinking about in the
context of these formats (or a new one).

Arrow has not spent much focus on optimizing storage size.

Cheers,
Micah

On Wednesday, December 22, 2021, Benson Muite <be...@emailplus.org>
wrote:

> On 12/23/21 7:14 AM, Hayden Livingston wrote:
>
>> Has anyone been able to benchmark the Artus file format vs Arrow?
>>
>> It seems that the Artus file format is gaining traction inside Google,
>> replacing their current columnar format Capacitor.
>>
>> Hayden,
> Do you have a link to a specification or implementation of Artus?
> Performance may also be related to disk type, network etc.
>

Re: Arrow vs Artus

Posted by Benson Muite <be...@emailplus.org>.
On 12/23/21 7:14 AM, Hayden Livingston wrote:
> Has anyone been able to benchmark the Artus file format vs Arrow?
> 
> It seems that the Artus file format is gaining traction inside Google,
> replacing their current columnar format Capacitor.
> 
Hayden,
Do you have a link to a specification or implementation of Artus? 
Performance may also be related to disk type, network etc.