You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Thilina Gunarathne <cs...@gmail.com> on 2014/02/03 17:08:28 UTC

Index not getting used for the queries

Dear all,
I created a compact index for a table with several hundred million records
as follows. The table is partitioned by the month. The index on A and B was
created successfully, but I can't see it getting used in the queries. It
would be great if one of you experts can shed some light on what  am I
missing. I'm using hive 0.9.

set hive.exec.parallel=false;
CREATE INDEX idx_yyyy
    ON TABLE yyyy(a,b)
    AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
    WITH DEFERRED REBUILD
    COMMENT 'Index for yyyy table. Indexing on A and B';
ALTER INDEX idx_yyyy on yyyy REBUILD;


hive> describe yyyy;
OK
a    bigint
...
b    bigint
....
month int



hive> show index on yyyy;
OK
idx_yyyy          yyyy              a, b
default__yyyy_p_idx_yyyy__    compact                 Index for tm top50
table. Indexing on A and B


hive> explain select a,b from tm_top50_p where a=113231 and
month=201308;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME yyyy))) (TOK_INSERT
(TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR
(TOK_TABLE_OR_COL a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b))) (TOK_WHERE (and
(= (TOK_TABLE_OR_COL a) 113231) (= (TOK_TABLE_OR_COL month) 201308)))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        yyyy
          TableScan
            alias: yyyy
            Filter Operator
              predicate:
                  expr: (a = 113231)
                  type: boolean
              Select Operator
                expressions:
                      expr: a
                      type: bigint
                      expr: b
                      type: bigint
                outputColumnNames: _col0, _col1
                File Output Operator
                  compressed: false
                  GlobalTableId: 0
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1


thanks a lot,
Thilina

-- 
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org

Re: Index not getting used for the queries

Posted by Thilina Gunarathne <cs...@gmail.com>.
Thanks Peter. It helped. That property combined with setting the property
'hive.optimize.index.filter' to 'true' got the index working.

thanks,
Thilina


On Mon, Feb 3, 2014 at 6:12 PM, Peter Marron <
Peter.Marron@trilliumsoftware.com> wrote:

>  Hi,
>
>
>
> Not sure if it is relevant to your problem but I'm just checking
>
> that you know about
>
> hive.optimize.index.filter.compact.minsize
>
> it's set to 5Gbytes by default and if the estimated query size is
>
> less than this then the index won't be used.
>
> HTH.
>
>
>
> Regards
>
>
>
> *Peter Marron*
>
> Senior Developer, Research & Development
>
>
>
> Office: +44 *(0) 118-940-7609*  peter.marron@trilliumsoftware.com
>
> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
>
>    <https://www.facebook.com/pages/Trillium-Software/109184815778307>
>
>  <https://twitter.com/TrilliumSW>
>
>  <http://www.linkedin.com/company/17710>
>
>
>
> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>*
>
> Be Certain About Your Data. Be Trillium Certain.
>
>
>
> *From:* Thilina Gunarathne [mailto:csethil@gmail.com]
> *Sent:* 03 February 2014 16:08
> *To:* user
> *Subject:* Index not getting used for the queries
>
>
>
> Dear all,
>
> I created a compact index for a table with several hundred million records
> as follows. The table is partitioned by the month. The index on A and B was
> created successfully, but I can't see it getting used in the queries. It
> would be great if one of you experts can shed some light on what  am I
> missing. I'm using hive 0.9.
>
> set hive.exec.parallel=false;
> CREATE INDEX idx_yyyy
>     ON TABLE yyyy(a,b)
>     AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
>     WITH DEFERRED REBUILD
>     COMMENT 'Index for yyyy table. Indexing on A and B';
> ALTER INDEX idx_yyyy on yyyy REBUILD;
>
>
> hive> describe yyyy;
> OK
> a    bigint
> ...
> b    bigint
> ....
>
> month int
>
>
>
> hive> show index on yyyy;
> OK
> idx_yyyy          yyyy              a, b
> default__yyyy_p_idx_yyyy__    compact                 Index for tm top50
> table. Indexing on A and B
>
>
> hive> explain select a,b from tm_top50_p where a=113231 and
> month=201308;
> OK
> ABSTRACT SYNTAX TREE:
>   (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME yyyy))) (TOK_INSERT
> (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR
> (TOK_TABLE_OR_COL a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b))) (TOK_WHERE (and
> (= (TOK_TABLE_OR_COL a) 113231) (= (TOK_TABLE_OR_COL month) 201308)))))
>
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
>
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         yyyy
>           TableScan
>             alias: yyyy
>             Filter Operator
>               predicate:
>                   expr: (a = 113231)
>                   type: boolean
>               Select Operator
>                 expressions:
>                       expr: a
>                       type: bigint
>                       expr: b
>                       type: bigint
>                 outputColumnNames: _col0, _col1
>                 File Output Operator
>                   compressed: false
>                   GlobalTableId: 0
>                   table:
>                       input format:
> org.apache.hadoop.mapred.TextInputFormat
>                       output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>
>   thanks a lot,
> Thilina
>
>
> --
> https://www.cs.indiana.edu/~tgunarat/
> http://www.linkedin.com/in/thilina
>
> http://thilina.gunarathne.org
>



-- 
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org

RE: Index not getting used for the queries

Posted by Peter Marron <Pe...@trilliumsoftware.com>.
Hi,

Not sure if it is relevant to your problem but I'm just checking
that you know about
hive.optimize.index.filter.compact.minsize
it's set to 5Gbytes by default and if the estimated query size is
less than this then the index won't be used.
HTH.

Regards

Peter Marron
Senior Developer, Research & Development

Office: +44 (0) 118-940-7609  peter.marron@trilliumsoftware.com<ma...@trilliumsoftware.com>
Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
[cid:image001.png@01CF2135.7231BFC0]

[cid:image002.png@01CF2135.7231BFC0]<https://www.facebook.com/pages/Trillium-Software/109184815778307>

[cid:image003.png@01CF2135.7231BFC0]<https://twitter.com/TrilliumSW>

[cid:image004.png@01CF2135.7231BFC0]<http://www.linkedin.com/company/17710>


www.trilliumsoftware.com<http://www.trilliumsoftware.com/>

Be Certain About Your Data. Be Trillium Certain.

From: Thilina Gunarathne [mailto:csethil@gmail.com]
Sent: 03 February 2014 16:08
To: user
Subject: Index not getting used for the queries

Dear all,
I created a compact index for a table with several hundred million records as follows. The table is partitioned by the month. The index on A and B was created successfully, but I can't see it getting used in the queries. It would be great if one of you experts can shed some light on what  am I missing. I'm using hive 0.9.

set hive.exec.parallel=false;
CREATE INDEX idx_yyyy
    ON TABLE yyyy(a,b)
    AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
    WITH DEFERRED REBUILD
    COMMENT 'Index for yyyy table. Indexing on A and B';
ALTER INDEX idx_yyyy on yyyy REBUILD;


hive> describe yyyy;
OK
a    bigint
...
b    bigint
....
month int


hive> show index on yyyy;
OK
idx_yyyy          yyyy              a, b                    default__yyyy_p_idx_yyyy__    compact                 Index for tm top50 table. Indexing on A and B


hive> explain select a,b from tm_top50_p where a=113231 and month=201308;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME yyyy))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b))) (TOK_WHERE (and (= (TOK_TABLE_OR_COL a) 113231) (= (TOK_TABLE_OR_COL month) 201308)))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        yyyy
          TableScan
            alias: yyyy
            Filter Operator
              predicate:
                  expr: (a = 113231)
                  type: boolean
              Select Operator
                expressions:
                      expr: a
                      type: bigint
                      expr: b
                      type: bigint
                outputColumnNames: _col0, _col1
                File Output Operator
                  compressed: false
                  GlobalTableId: 0
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1

thanks a lot,
Thilina

--
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org