You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Lawan Subba <la...@gmail.com> on 2017/02/16 15:38:27 UTC

Re: ORC Stripe Skip Using Stripe Level Index

I had a look at https://github.com/apache/hive/blob/master/ql/src/java/
org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2031

And read through isStripeSatisfyPredicate and pickStripesInternal and can
see how stripe level indices are being used.

But I still cannot find where and if File level indices are being used?

File level statistics are however read at
https://github.com/apache/orc/blob/b10ee7b35386d28b4b6fd2a5e724806d7ceb9db7/java/core/src/java/org/apache/orc/impl/ReaderImpl.java

Line: 362,383 private final List<OrcProto.ColumnStatistics> fileStats;

If anybody could point me to the right code file or any documentation, that
would be great.

On Tue, Jan 24, 2017 at 11:50 PM, Owen O'Malley <om...@apache.org> wrote:

> That is my fault. We just haven't ported that part of the functionality
> over yet. Hive's OrcInputFormat has a lot of complexity that most users
> don't need or want. (It's types such as OrcStruct also don't actually work
> as Writables, which causes users outside of Hive problems.) The
> orc-mapreduce's types do work as Writables and thus work better outside of
> Hive. That said, no one has ported the split elimination yet.
>
> .. Owen
>
> On Tue, Jan 24, 2017 at 2:45 PM, Lawan Subba <
> lawansubba.mailinglist@gmail.com> wrote:
>
> > Hi Gopal,
> >
> > Thank you for the quick reply.
> >
> > I am new to open source projects,  can you also tell me why this
> > functionality is missing from the github repository for Apache ORC.
> >
> > Regards,
> > Lawan Subba
> >
> > On Tue, Jan 24, 2017 at 8:26 PM, Gopal Vijayaraghavan <gopalv@apache.org
> >
> > wrote:
> >
> > >
> > > >    I can see that row indices are being used to select only rowgroups
> > > that
> > > >    satisfy a search predicate in
> > > …
> > > >   But, I cannot find where and if the stripe level indices are being
> > > used?
> > >
> > > https://github.com/apache/hive/blob/master/ql/src/java/
> > > org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2031
> > >
> > > Read through isStripeSatisfyPredicate and pickStripesInternal.
> > >
> > > Cheers,
> > > Gopal
> > >
> > >
> > >
> > >
> >
>