You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Ablimit Aji <ab...@gmail.com> on 2012/07/30 17:12:41 UTC

Problem with Hive Indexing

I have written a custom index handler and wanted to test it. However hive
is not using it.
So I test with simple table (pokes (int foo, string bar)) which comes with
hive distribution for testing purpose.
Then I created a compact index and set the set
hive.optimize.index.filter=true;
However, upon checking the log info, it seems hive is still not using the
index.
So, what is the problem ?
The query I issued is as follow:  select foo from pokes WHERE foo=498 ;

Below is the log info I got after issuing the query.

12/07/26 12:25:17 INFO index.IndexWhereProcessor: Processing predicate for
index optimization
12/07/26 12:25:17 INFO index.IndexWhereProcessor: (foo = 498)
12/07/26 12:25:17 INFO metastore.HiveMetaStore: 0: get_table : db=default
tbl=pokes_idx
12/07/26 12:25:17 INFO hive.log: DDL: struct pokes_idx { i32 foo, string
_bucketname, list _offsets}
12/07/26 12:25:17 INFO index.IndexWhereProcessor: checking index
staleness...
12/07/26 12:25:17 INFO index.IndexWhereProcessor: 1342465077455
12/07/26 12:25:17 INFO index.IndexWhereProcessor: 1342465077455
12/07/26 12:25:17 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/07/26 12:25:17 WARN snappy.LoadSnappy: Snappy native library not loaded

Re: Problem with Hive Indexing

Posted by Ablimit Aji <ab...@gmail.com>.

Thanks Mahsa !
I didn't know that there is such a constraint.

Best,
Ablimit

On Thu, Aug 16, 2012 at 12:32 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Hi,
>
> At lease the table size must be greater than 5GB to use the index for
> filter pushdown. Otherwise you have to comment the checkQuerySize method.
>
> Cheers,
> Mahsa
>
> On Mon, Jul 30, 2012 at 11:12 AM, Ablimit Aji <ab...@gmail.com> wrote:
>
> > I have written a custom index handler and wanted to test it. However hive
> > is not using it.
> > So I test with simple table (pokes (int foo, string bar)) which comes
> with
> > hive distribution for testing purpose.
> > Then I created a compact index and set the set
> > hive.optimize.index.filter=true;
> > However, upon checking the log info, it seems hive is still not using the
> > index.
> > So, what is the problem ?
> > The query I issued is as follow:  select foo from pokes WHERE foo=498 ;
> >
> > Below is the log info I got after issuing the query.
> >
> > 12/07/26 12:25:17 INFO index.IndexWhereProcessor: Processing predicate
> for
> > index optimization
> > 12/07/26 12:25:17 INFO index.IndexWhereProcessor: (foo = 498)
> > 12/07/26 12:25:17 INFO metastore.HiveMetaStore: 0: get_table : db=default
> > tbl=pokes_idx
> > 12/07/26 12:25:17 INFO hive.log: DDL: struct pokes_idx { i32 foo, string
> > _bucketname, list _offsets}
> > 12/07/26 12:25:17 INFO index.IndexWhereProcessor: checking index
> > staleness...
> > 12/07/26 12:25:17 INFO index.IndexWhereProcessor: 1342465077455
> > 12/07/26 12:25:17 INFO index.IndexWhereProcessor: 1342465077455
> > 12/07/26 12:25:17 INFO util.NativeCodeLoader: Loaded the native-hadoop
> > library
> > 12/07/26 12:25:17 WARN snappy.LoadSnappy: Snappy native library not
> loaded
> >
>

Re: Problem with Hive Indexing

Posted by Mahsa Mofidpoor <mo...@gmail.com>.

Hi,

At lease the table size must be greater than 5GB to use the index for
filter pushdown. Otherwise you have to comment the checkQuerySize method.

Cheers,
Mahsa

On Mon, Jul 30, 2012 at 11:12 AM, Ablimit Aji <ab...@gmail.com> wrote:

> I have written a custom index handler and wanted to test it. However hive
> is not using it.
> So I test with simple table (pokes (int foo, string bar)) which comes with
> hive distribution for testing purpose.
> Then I created a compact index and set the set
> hive.optimize.index.filter=true;
> However, upon checking the log info, it seems hive is still not using the
> index.
> So, what is the problem ?
> The query I issued is as follow:  select foo from pokes WHERE foo=498 ;
>
> Below is the log info I got after issuing the query.
>
> 12/07/26 12:25:17 INFO index.IndexWhereProcessor: Processing predicate for
> index optimization
> 12/07/26 12:25:17 INFO index.IndexWhereProcessor: (foo = 498)
> 12/07/26 12:25:17 INFO metastore.HiveMetaStore: 0: get_table : db=default
> tbl=pokes_idx
> 12/07/26 12:25:17 INFO hive.log: DDL: struct pokes_idx { i32 foo, string
> _bucketname, list _offsets}
> 12/07/26 12:25:17 INFO index.IndexWhereProcessor: checking index
> staleness...
> 12/07/26 12:25:17 INFO index.IndexWhereProcessor: 1342465077455
> 12/07/26 12:25:17 INFO index.IndexWhereProcessor: 1342465077455
> 12/07/26 12:25:17 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 12/07/26 12:25:17 WARN snappy.LoadSnappy: Snappy native library not loaded
>