You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Saurabh Mahapatra <sa...@gmail.com> on 2018/02/07 19:58:11 UTC

Which Hadoop File Format Should I Use?

Originally shared with me by Kuna Khatua but is a good read:

https://www.jowanza.com/blog/which-hadoop-file-format-should-i-use

The Carbondata project looks quite promising.

Any thoughts on what file format you prefer?

Thanks,
Saurabh

Re: Which Hadoop File Format Should I Use?

Posted by Aman Sinha <am...@apache.org>.

The multi-level indexing feature in Carbondata seems very interesting...it
will allow persisting OLAP cubes and provide efficient access; virtually
providing the capability that specialized OLAP engines provide.   The ORC
format also provides indexing but it seems not multi-level indexing.

Another promising use is for secondary indexing.  Basically, making the
file format competitive with NoSQL systems that support secondary indexes.

On Wed, Feb 7, 2018 at 2:08 PM, Ted Dunning <te...@gmail.com> wrote:

> Carbondata does look very cool, but I haven't seen any significant user
> adoption which means that I haven't heard very many war stories.
>
>
>
> On Wed, Feb 7, 2018 at 11:58 AM, Saurabh Mahapatra <
> saurabhmahapatra94@gmail.com> wrote:
>
> > ...
> > The Carbondata project looks quite promising.
> >
> > Any thoughts on what file format you prefer?
> >
>

Re: Which Hadoop File Format Should I Use?

Posted by Ted Dunning <te...@gmail.com>.

Carbondata does look very cool, but I haven't seen any significant user
adoption which means that I haven't heard very many war stories.

On Wed, Feb 7, 2018 at 11:58 AM, Saurabh Mahapatra <
saurabhmahapatra94@gmail.com> wrote:

> ...
> The Carbondata project looks quite promising.
>
> Any thoughts on what file format you prefer?
>