You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Andrew Musselman <an...@gmail.com> on 2014/02/10 20:42:06 UTC

Offsets in ORC files

I had a chat with a couple people last week about a feature request for
Pig:  in a "where" or "filter" clause, when loading an ORC file, to skip
directly to the right offset instead of scanning the whole file.

I found this one to add loading and storing of ORC files:
https://issues.apache.org/jira/browse/PIG-3558

But is there a ticket for what I'm asking about?  Seems like a good
improvement, and I'd be happy to take it on if I can.

Best
Andrew

Re: Offsets in ORC files

Posted by Andrew Musselman <an...@gmail.com>.
Cool, I opened a feature ticket here:
https://issues.apache.org/jira/browse/PIG-3760

Thanks
Andrew


On Mon, Feb 10, 2014 at 12:41 PM, Daniel Dai <da...@hortonworks.com> wrote:

> Hi, Andrew,
> Partition pruning for ORC is not addressed in PIG-3558. We will need
> to do partition pruning for both ORC and Parquet in a new ticket.
> Curently there is no interface to deal with this kind of pushdown
> (LoadMetadata.setPartitionFilter push the filter to loader, but remove
> the filter statement, for ORC/Parquet, filter is a hint, and we need
> to do the filter again in Pig even it is pushed to loader), we will
> need to define a new interface for that. You are welcome to initiate
> the work. I know Aniket is also interested in doing that, so be sure
> the talk with him about this work.
>
> Thanks,
> Daniel
>
> On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
> <an...@gmail.com> wrote:
> > I had a chat with a couple people last week about a feature request for
> > Pig:  in a "where" or "filter" clause, when loading an ORC file, to skip
> > directly to the right offset instead of scanning the whole file.
> >
> > I found this one to add loading and storing of ORC files:
> > https://issues.apache.org/jira/browse/PIG-3558
> >
> > But is there a ticket for what I'm asking about?  Seems like a good
> > improvement, and I'd be happy to take it on if I can.
> >
> > Best
> > Andrew
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Offsets in ORC files

Posted by Daniel Dai <da...@hortonworks.com>.
Hi, Andrew,
Partition pruning for ORC is not addressed in PIG-3558. We will need
to do partition pruning for both ORC and Parquet in a new ticket.
Curently there is no interface to deal with this kind of pushdown
(LoadMetadata.setPartitionFilter push the filter to loader, but remove
the filter statement, for ORC/Parquet, filter is a hint, and we need
to do the filter again in Pig even it is pushed to loader), we will
need to define a new interface for that. You are welcome to initiate
the work. I know Aniket is also interested in doing that, so be sure
the talk with him about this work.

Thanks,
Daniel

On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
<an...@gmail.com> wrote:
> I had a chat with a couple people last week about a feature request for
> Pig:  in a "where" or "filter" clause, when loading an ORC file, to skip
> directly to the right offset instead of scanning the whole file.
>
> I found this one to add loading and storing of ORC files:
> https://issues.apache.org/jira/browse/PIG-3558
>
> But is there a ticket for what I'm asking about?  Seems like a good
> improvement, and I'd be happy to take it on if I can.
>
> Best
> Andrew

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.