You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Manoj Murumkar <ma...@gmail.com> on 2017/04/13 21:19:06 UTC

Support for ORC files

Hi!

I am wondering if someone is actively working on ORC support already.
Appreciate any pointers.

Thanks,

Manoj

Re: Support for ORC files

Posted by Abhishek Girish <ag...@apache.org>.
A good reference would be Paul's Wiki page [1] and his work on the Mock
storage plugin.

[1] https://github.com/paul-rogers/drill/wiki/Storage-Plugin-Model

On Thu, Apr 13, 2017 at 2:57 PM, Manoj Murumkar <ma...@gmail.com>
wrote:

> Thanks. I knew about the hive table format support. I'll look into reading
> directly from orc files on hdfs (a la parquet). Is there some documentation
> around how to develop a new storage plugin?
>
> > On Apr 13, 2017, at 2:51 PM, Abhishek Girish <ag...@apache.org> wrote:
> >
> > Drill does not support ORC as a DFS file format. You are welcome to
> > contribute. As a workaround, Drill supports reading ORC files via the
> Hive
> > plugin, so you should be able use that.
> >
> > On Thu, Apr 13, 2017 at 2:19 PM, Manoj Murumkar <
> manoj.murumkar@gmail.com>
> > wrote:
> >
> >> Hi!
> >>
> >> I am wondering if someone is actively working on ORC support already.
> >> Appreciate any pointers.
> >>
> >> Thanks,
> >>
> >> Manoj
> >>
>

Re: Support for ORC files

Posted by Manoj Murumkar <ma...@gmail.com>.
Thanks guys! Appreciate the pointers.

On Thu, Apr 13, 2017 at 3:15 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> What you need is a format plugin. You can take a look at the Text Format
> plugin while reading paul's documentation which abhishek already shared.
> Don't look at parquet as it is more complicated. A short summary of what
> you need : (maybe too short to be any useful :) )
>
> 1. A group of classes which make drill recognize your format plugin.
> 2. An ORC Reader. This will the heart of this project. Essentially you
> provide a way to read data(columns) from ORC files and then populate
> drill's value vectors. You can later enhance this by parallelizing the
> reads of individual columns.
> 3. Once you have the format plugin working, you might want to start playing
> with planner rules if you want features like "filter pushdown into the
> scan" etc.
>
> - Rahul
>
> On Apr 13, 2017 2:57 PM, "Manoj Murumkar" <ma...@gmail.com>
> wrote:
>
> Thanks. I knew about the hive table format support. I'll look into reading
> directly from orc files on hdfs (a la parquet). Is there some documentation
> around how to develop a new storage plugin?
>
> > On Apr 13, 2017, at 2:51 PM, Abhishek Girish <ag...@apache.org> wrote:
> >
> > Drill does not support ORC as a DFS file format. You are welcome to
> > contribute. As a workaround, Drill supports reading ORC files via the
> Hive
> > plugin, so you should be able use that.
> >
> > On Thu, Apr 13, 2017 at 2:19 PM, Manoj Murumkar <
> manoj.murumkar@gmail.com>
> > wrote:
> >
> >> Hi!
> >>
> >> I am wondering if someone is actively working on ORC support already.
> >> Appreciate any pointers.
> >>
> >> Thanks,
> >>
> >> Manoj
> >>
>

Re: Support for ORC files

Posted by rahul challapalli <ch...@gmail.com>.
What you need is a format plugin. You can take a look at the Text Format
plugin while reading paul's documentation which abhishek already shared.
Don't look at parquet as it is more complicated. A short summary of what
you need : (maybe too short to be any useful :) )

1. A group of classes which make drill recognize your format plugin.
2. An ORC Reader. This will the heart of this project. Essentially you
provide a way to read data(columns) from ORC files and then populate
drill's value vectors. You can later enhance this by parallelizing the
reads of individual columns.
3. Once you have the format plugin working, you might want to start playing
with planner rules if you want features like "filter pushdown into the
scan" etc.

- Rahul

On Apr 13, 2017 2:57 PM, "Manoj Murumkar" <ma...@gmail.com> wrote:

Thanks. I knew about the hive table format support. I'll look into reading
directly from orc files on hdfs (a la parquet). Is there some documentation
around how to develop a new storage plugin?

> On Apr 13, 2017, at 2:51 PM, Abhishek Girish <ag...@apache.org> wrote:
>
> Drill does not support ORC as a DFS file format. You are welcome to
> contribute. As a workaround, Drill supports reading ORC files via the Hive
> plugin, so you should be able use that.
>
> On Thu, Apr 13, 2017 at 2:19 PM, Manoj Murumkar <ma...@gmail.com>
> wrote:
>
>> Hi!
>>
>> I am wondering if someone is actively working on ORC support already.
>> Appreciate any pointers.
>>
>> Thanks,
>>
>> Manoj
>>

Re: Support for ORC files

Posted by Manoj Murumkar <ma...@gmail.com>.
Thanks. I knew about the hive table format support. I'll look into reading directly from orc files on hdfs (a la parquet). Is there some documentation around how to develop a new storage plugin?

> On Apr 13, 2017, at 2:51 PM, Abhishek Girish <ag...@apache.org> wrote:
> 
> Drill does not support ORC as a DFS file format. You are welcome to
> contribute. As a workaround, Drill supports reading ORC files via the Hive
> plugin, so you should be able use that.
> 
> On Thu, Apr 13, 2017 at 2:19 PM, Manoj Murumkar <ma...@gmail.com>
> wrote:
> 
>> Hi!
>> 
>> I am wondering if someone is actively working on ORC support already.
>> Appreciate any pointers.
>> 
>> Thanks,
>> 
>> Manoj
>> 

Re: Support for ORC files

Posted by Abhishek Girish <ag...@apache.org>.
Drill does not support ORC as a DFS file format. You are welcome to
contribute. As a workaround, Drill supports reading ORC files via the Hive
plugin, so you should be able use that.

On Thu, Apr 13, 2017 at 2:19 PM, Manoj Murumkar <ma...@gmail.com>
wrote:

> Hi!
>
> I am wondering if someone is actively working on ORC support already.
> Appreciate any pointers.
>
> Thanks,
>
> Manoj
>

Re: Support for ORC files

Posted by rahul challapalli <ch...@gmail.com>.
Drill indirectly supports reading ORC files through the hive plugin. Apart
from that I am not aware of any efforts in coming up with a format plugin
for orc from the community.

Rahul

On Apr 13, 2017 2:19 PM, "Manoj Murumkar" <ma...@gmail.com> wrote:

> Hi!
>
> I am wondering if someone is actively working on ORC support already.
> Appreciate any pointers.
>
> Thanks,
>
> Manoj
>