You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Lawan Subba <la...@gmail.com> on 2017/02/16 15:38:27 UTC
Re: ORC Stripe Skip Using Stripe Level Index
I had a look at https://github.com/apache/hive/blob/master/ql/src/java/
org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2031
And read through isStripeSatisfyPredicate and pickStripesInternal and can
see how stripe level indices are being used.
But I still cannot find where and if File level indices are being used?
File level statistics are however read at
https://github.com/apache/orc/blob/b10ee7b35386d28b4b6fd2a5e724806d7ceb9db7/java/core/src/java/org/apache/orc/impl/ReaderImpl.java
Line: 362,383 private final List<OrcProto.ColumnStatistics> fileStats;
If anybody could point me to the right code file or any documentation, that
would be great.
On Tue, Jan 24, 2017 at 11:50 PM, Owen O'Malley <om...@apache.org> wrote:
> That is my fault. We just haven't ported that part of the functionality
> over yet. Hive's OrcInputFormat has a lot of complexity that most users
> don't need or want. (It's types such as OrcStruct also don't actually work
> as Writables, which causes users outside of Hive problems.) The
> orc-mapreduce's types do work as Writables and thus work better outside of
> Hive. That said, no one has ported the split elimination yet.
>
> .. Owen
>
> On Tue, Jan 24, 2017 at 2:45 PM, Lawan Subba <
> lawansubba.mailinglist@gmail.com> wrote:
>
> > Hi Gopal,
> >
> > Thank you for the quick reply.
> >
> > I am new to open source projects, can you also tell me why this
> > functionality is missing from the github repository for Apache ORC.
> >
> > Regards,
> > Lawan Subba
> >
> > On Tue, Jan 24, 2017 at 8:26 PM, Gopal Vijayaraghavan <gopalv@apache.org
> >
> > wrote:
> >
> > >
> > > > I can see that row indices are being used to select only rowgroups
> > > that
> > > > satisfy a search predicate in
> > > …
> > > > But, I cannot find where and if the stripe level indices are being
> > > used?
> > >
> > > https://github.com/apache/hive/blob/master/ql/src/java/
> > > org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2031
> > >
> > > Read through isStripeSatisfyPredicate and pickStripesInternal.
> > >
> > > Cheers,
> > > Gopal
> > >
> > >
> > >
> > >
> >
>