You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Thilina Gunarathne <cs...@gmail.com> on 2014/02/03 17:08:28 UTC
Index not getting used for the queries
Dear all,
I created a compact index for a table with several hundred million records
as follows. The table is partitioned by the month. The index on A and B was
created successfully, but I can't see it getting used in the queries. It
would be great if one of you experts can shed some light on what am I
missing. I'm using hive 0.9.
set hive.exec.parallel=false;
CREATE INDEX idx_yyyy
ON TABLE yyyy(a,b)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD
COMMENT 'Index for yyyy table. Indexing on A and B';
ALTER INDEX idx_yyyy on yyyy REBUILD;
hive> describe yyyy;
OK
a bigint
...
b bigint
....
month int
hive> show index on yyyy;
OK
idx_yyyy yyyy a, b
default__yyyy_p_idx_yyyy__ compact Index for tm top50
table. Indexing on A and B
hive> explain select a,b from tm_top50_p where a=113231 and
month=201308;
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME yyyy))) (TOK_INSERT
(TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR
(TOK_TABLE_OR_COL a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b))) (TOK_WHERE (and
(= (TOK_TABLE_OR_COL a) 113231) (= (TOK_TABLE_OR_COL month) 201308)))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
yyyy
TableScan
alias: yyyy
Filter Operator
predicate:
expr: (a = 113231)
type: boolean
Select Operator
expressions:
expr: a
type: bigint
expr: b
type: bigint
outputColumnNames: _col0, _col1
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
thanks a lot,
Thilina
--
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org
Re: Index not getting used for the queries
Posted by Thilina Gunarathne <cs...@gmail.com>.
Thanks Peter. It helped. That property combined with setting the property
'hive.optimize.index.filter' to 'true' got the index working.
thanks,
Thilina
On Mon, Feb 3, 2014 at 6:12 PM, Peter Marron <
Peter.Marron@trilliumsoftware.com> wrote:
> Hi,
>
>
>
> Not sure if it is relevant to your problem but I'm just checking
>
> that you know about
>
> hive.optimize.index.filter.compact.minsize
>
> it's set to 5Gbytes by default and if the estimated query size is
>
> less than this then the index won't be used.
>
> HTH.
>
>
>
> Regards
>
>
>
> *Peter Marron*
>
> Senior Developer, Research & Development
>
>
>
> Office: +44 *(0) 118-940-7609* peter.marron@trilliumsoftware.com
>
> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
>
> <https://www.facebook.com/pages/Trillium-Software/109184815778307>
>
> <https://twitter.com/TrilliumSW>
>
> <http://www.linkedin.com/company/17710>
>
>
>
> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>*
>
> Be Certain About Your Data. Be Trillium Certain.
>
>
>
> *From:* Thilina Gunarathne [mailto:csethil@gmail.com]
> *Sent:* 03 February 2014 16:08
> *To:* user
> *Subject:* Index not getting used for the queries
>
>
>
> Dear all,
>
> I created a compact index for a table with several hundred million records
> as follows. The table is partitioned by the month. The index on A and B was
> created successfully, but I can't see it getting used in the queries. It
> would be great if one of you experts can shed some light on what am I
> missing. I'm using hive 0.9.
>
> set hive.exec.parallel=false;
> CREATE INDEX idx_yyyy
> ON TABLE yyyy(a,b)
> AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
> WITH DEFERRED REBUILD
> COMMENT 'Index for yyyy table. Indexing on A and B';
> ALTER INDEX idx_yyyy on yyyy REBUILD;
>
>
> hive> describe yyyy;
> OK
> a bigint
> ...
> b bigint
> ....
>
> month int
>
>
>
> hive> show index on yyyy;
> OK
> idx_yyyy yyyy a, b
> default__yyyy_p_idx_yyyy__ compact Index for tm top50
> table. Indexing on A and B
>
>
> hive> explain select a,b from tm_top50_p where a=113231 and
> month=201308;
> OK
> ABSTRACT SYNTAX TREE:
> (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME yyyy))) (TOK_INSERT
> (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR
> (TOK_TABLE_OR_COL a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b))) (TOK_WHERE (and
> (= (TOK_TABLE_OR_COL a) 113231) (= (TOK_TABLE_OR_COL month) 201308)))))
>
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
>
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> yyyy
> TableScan
> alias: yyyy
> Filter Operator
> predicate:
> expr: (a = 113231)
> type: boolean
> Select Operator
> expressions:
> expr: a
> type: bigint
> expr: b
> type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format:
> org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
> Stage: Stage-0
> Fetch Operator
> limit: -1
>
> thanks a lot,
> Thilina
>
>
> --
> https://www.cs.indiana.edu/~tgunarat/
> http://www.linkedin.com/in/thilina
>
> http://thilina.gunarathne.org
>
--
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org
RE: Index not getting used for the queries
Posted by Peter Marron <Pe...@trilliumsoftware.com>.
Hi,
Not sure if it is relevant to your problem but I'm just checking
that you know about
hive.optimize.index.filter.compact.minsize
it's set to 5Gbytes by default and if the estimated query size is
less than this then the index won't be used.
HTH.
Regards
Peter Marron
Senior Developer, Research & Development
Office: +44 (0) 118-940-7609 peter.marron@trilliumsoftware.com<ma...@trilliumsoftware.com>
Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
[cid:image001.png@01CF2135.7231BFC0]
[cid:image002.png@01CF2135.7231BFC0]<https://www.facebook.com/pages/Trillium-Software/109184815778307>
[cid:image003.png@01CF2135.7231BFC0]<https://twitter.com/TrilliumSW>
[cid:image004.png@01CF2135.7231BFC0]<http://www.linkedin.com/company/17710>
www.trilliumsoftware.com<http://www.trilliumsoftware.com/>
Be Certain About Your Data. Be Trillium Certain.
From: Thilina Gunarathne [mailto:csethil@gmail.com]
Sent: 03 February 2014 16:08
To: user
Subject: Index not getting used for the queries
Dear all,
I created a compact index for a table with several hundred million records as follows. The table is partitioned by the month. The index on A and B was created successfully, but I can't see it getting used in the queries. It would be great if one of you experts can shed some light on what am I missing. I'm using hive 0.9.
set hive.exec.parallel=false;
CREATE INDEX idx_yyyy
ON TABLE yyyy(a,b)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD
COMMENT 'Index for yyyy table. Indexing on A and B';
ALTER INDEX idx_yyyy on yyyy REBUILD;
hive> describe yyyy;
OK
a bigint
...
b bigint
....
month int
hive> show index on yyyy;
OK
idx_yyyy yyyy a, b default__yyyy_p_idx_yyyy__ compact Index for tm top50 table. Indexing on A and B
hive> explain select a,b from tm_top50_p where a=113231 and month=201308;
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME yyyy))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b))) (TOK_WHERE (and (= (TOK_TABLE_OR_COL a) 113231) (= (TOK_TABLE_OR_COL month) 201308)))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
yyyy
TableScan
alias: yyyy
Filter Operator
predicate:
expr: (a = 113231)
type: boolean
Select Operator
expressions:
expr: a
type: bigint
expr: b
type: bigint
outputColumnNames: _col0, _col1
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
thanks a lot,
Thilina
--
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org