You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Brian Bowman <Br...@sas.com> on 2019/05/09 18:29:54 UTC

Parquet vs. other Open Source Columnar Formats

All,

Is it fair to say that Parquet is fast becoming the dominate open source columnar storage format?   How do those of you with long-term Hadoop experience see this?  For example, is Parquet overtaking ORC and Avro?

Thanks,

Brian

Re: Parquet vs. other Open Source Columnar Formats

Posted by Zoltan Ivanfi <zi...@cloudera.com.INVALID>.
Hi,

Regarding available open-source columnar formats, I have also come across
https://carbondata.apache.org/ but do not really know anything about it
other than it exists.

Br,

Zoltan

On Thu, May 16, 2019 at 11:27 PM Wes McKinney <we...@gmail.com> wrote:

> hi Brian,
>
> Anecdotal evidence suggests that Parquet has more market share than
> ORC, but I have heard that ORC has been gaining some adoption lately
> due to its ACID support in Hive
> (https://orc.apache.org/docs/acid.html). Parquet and ORC are the only
> two open source columnar storage solutions out there AFAIK. Now that
> Cloudera (one of the Parquet creators) and Hortonworks (one of the ORC
> creators) have merged, it will be interesting to see where engineering
> time is invested going forward.
>
> - Wes
>
> On Thu, May 9, 2019 at 2:21 PM Uwe L. Korn <uw...@xhochy.com> wrote:
> >
> > Hello,
> >
> > Be aware that Avro and Protobuf are general serialization formats, not
> columnar ones such as Parquet or ORC. They are good for RPC or row-wise
> streaming whereas the latter two are perfect for analytics.
> >
> > Uwe
> >
> > > Am 09.05.2019 um 20:33 schrieb David Mollitor <da...@gmail.com>:
> > >
> > > I'm sure there are many different opinions on the matter, but in
> regards to
> > > Avro, I would say it is becoming more and more of a niche player.
> > >
> > > Many folks are choosing to go with Google Protobufs for RPC and
> Parquet/ORC
> > > for analytic workloads.
> > >
> > >> On Thu, May 9, 2019 at 2:30 PM Brian Bowman <Br...@sas.com>
> wrote:
> > >>
> > >> All,
> > >>
> > >> Is it fair to say that Parquet is fast becoming the dominate open
> source
> > >> columnar storage format?   How do those of you with long-term Hadoop
> > >> experience see this?  For example, is Parquet overtaking ORC and Avro?
> > >>
> > >> Thanks,
> > >>
> > >> Brian
> > >>
> >
>

Re: Parquet vs. other Open Source Columnar Formats

Posted by Wes McKinney <we...@gmail.com>.
hi Brian,

Anecdotal evidence suggests that Parquet has more market share than
ORC, but I have heard that ORC has been gaining some adoption lately
due to its ACID support in Hive
(https://orc.apache.org/docs/acid.html). Parquet and ORC are the only
two open source columnar storage solutions out there AFAIK. Now that
Cloudera (one of the Parquet creators) and Hortonworks (one of the ORC
creators) have merged, it will be interesting to see where engineering
time is invested going forward.

- Wes

On Thu, May 9, 2019 at 2:21 PM Uwe L. Korn <uw...@xhochy.com> wrote:
>
> Hello,
>
> Be aware that Avro and Protobuf are general serialization formats, not columnar ones such as Parquet or ORC. They are good for RPC or row-wise streaming whereas the latter two are perfect for analytics.
>
> Uwe
>
> > Am 09.05.2019 um 20:33 schrieb David Mollitor <da...@gmail.com>:
> >
> > I'm sure there are many different opinions on the matter, but in regards to
> > Avro, I would say it is becoming more and more of a niche player.
> >
> > Many folks are choosing to go with Google Protobufs for RPC and Parquet/ORC
> > for analytic workloads.
> >
> >> On Thu, May 9, 2019 at 2:30 PM Brian Bowman <Br...@sas.com> wrote:
> >>
> >> All,
> >>
> >> Is it fair to say that Parquet is fast becoming the dominate open source
> >> columnar storage format?   How do those of you with long-term Hadoop
> >> experience see this?  For example, is Parquet overtaking ORC and Avro?
> >>
> >> Thanks,
> >>
> >> Brian
> >>
>

Re: Parquet vs. other Open Source Columnar Formats

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Hello,

Be aware that Avro and Protobuf are general serialization formats, not columnar ones such as Parquet or ORC. They are good for RPC or row-wise streaming whereas the latter two are perfect for analytics.

Uwe

> Am 09.05.2019 um 20:33 schrieb David Mollitor <da...@gmail.com>:
> 
> I'm sure there are many different opinions on the matter, but in regards to
> Avro, I would say it is becoming more and more of a niche player.
> 
> Many folks are choosing to go with Google Protobufs for RPC and Parquet/ORC
> for analytic workloads.
> 
>> On Thu, May 9, 2019 at 2:30 PM Brian Bowman <Br...@sas.com> wrote:
>> 
>> All,
>> 
>> Is it fair to say that Parquet is fast becoming the dominate open source
>> columnar storage format?   How do those of you with long-term Hadoop
>> experience see this?  For example, is Parquet overtaking ORC and Avro?
>> 
>> Thanks,
>> 
>> Brian
>> 


Re: Parquet vs. other Open Source Columnar Formats

Posted by David Mollitor <da...@gmail.com>.
I'm sure there are many different opinions on the matter, but in regards to
Avro, I would say it is becoming more and more of a niche player.

Many folks are choosing to go with Google Protobufs for RPC and Parquet/ORC
for analytic workloads.

On Thu, May 9, 2019 at 2:30 PM Brian Bowman <Br...@sas.com> wrote:

> All,
>
> Is it fair to say that Parquet is fast becoming the dominate open source
> columnar storage format?   How do those of you with long-term Hadoop
> experience see this?  For example, is Parquet overtaking ORC and Avro?
>
> Thanks,
>
> Brian
>