You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Hezhiqiang (Ransom)" <ra...@huawei.com> on 2012/05/11 05:32:10 UTC

how to select without Mapreduce after index build?

I think if I  create index for one table
When I excute “select c1,c2 from tab where index_col=1”, should not start mapreduce
But it was start .
So how to use a index without mapreduce?
Compact  index and bitmap index all was tested , all need mapreduce .

Re: how to select without Mapreduce after index build?

Posted by shrikanth shankar <ss...@qubole.com>.
<I am relatively new to Hive  so please take my answers with a pinch of salt.>  For one, your data size is so small that I am not sure that indexes would help (the fixed cost of the extra MR job would probably over shadow any benefits from indexes).  AFAIK the difference b/w compact and bitmap indexes is how they store the mapping from values to the rows in which the value occurs (CompactIndex seems to store (value, block-id) pairs while BitmapIndex stores (value , list of rows as a bitmap)) . From this it looks as if your index size would depend on the number of distinct values in the index columns, whether the table is compressed and whether your index is compressed too.

Shrikanth
On May 11, 2012, at 9:49 PM, ransom.hezhiqiang wrote:

> Thanks Shrikanth
>  
> In my test, I have 120MB+ text data, 4 cols.  I build index for 2 cols.  In compact index. Index size is 340MB+
> In first step query, it will also scan all index data.
> So I think I should choose right cols to create index, and the index size will be more smaller ,is it right?
> And is it index was sorted?
> What’s the different in bitmap index , compact index and aggregate index?
>  
>  
> Best regards
> Ransom.
>  
> From: shrikanth shankar [mailto:sshankar@qubole.com] 
> Sent: Saturday, May 12, 2012 12:05 PM
> To: user@hive.apache.org
> Subject: Re: how to select without Mapreduce after index build?
>  
> My understanding is that the scan of the index is used to remove splits that are known not to contain matching data. If you remove enough splits the second MR task will run much faster. The index should also be much smaller than the base table and that MR task should be much cheaper
>  
> Shrikanth
> On May 11, 2012, at 8:56 PM, ransom.hezhiqiang wrote:
> 
> 
> Thanks Ashish
>  
> the query will be split into three steps after index build.
> 1、  query from index table and get the offset.
> 2、  Move result.
> 3、  Get select result by offset.
> So I think the query will be more slow  then no index because it has more step and has two mapreduce task in query.
>  
> So why index exist? No Performance improvements .
>  
>  
> Best regards
> Ransom.
>  
> From: Ashish Thusoo [mailto:athusoo@qubole.com] 
> Sent: Saturday, May 12, 2012 12:18 AM
> To: user@hive.apache.org
> Cc: Zhaojun (Terry)
> Subject: Re: how to select without Mapreduce after index build?
>  
> Indexing in Hive works through map/reduce. There are no active components in Hive (such as the region servers in Hbase), so the way the index is basically used is by running the map/reduce job on the table that holds the index data to get all the relevant offsets into the main table and then using those offsets to figure out which blocks to read from the main table. So you will not see map/reduce go away even when you are running queries on tables with indexes on them.
> 
> Ashish
> 
> On Thu, May 10, 2012 at 11:32 PM, Hezhiqiang (Ransom) <ra...@huawei.com> wrote:
> I think if I  create index for one table
> When I excute “select c1,c2 from tab where index_col=1”, should not start mapreduce
> But it was start .
> So how to use a index without mapreduce?
> Compact  index and bitmap index all was tested , all need mapreduce .
>  


RE: how to select without Mapreduce after index build?

Posted by "ransom.hezhiqiang" <ab...@gmail.com>.
Thanks Shrikanth

 

In my test, I have 120MB+ text data, 4 cols.  I build index for 2 cols.  In compact index. Index size is 340MB+

In first step query, it will also scan all index data.

So I think I should choose right cols to create index, and the index size will be more smaller ,is it right?

And is it index was sorted?

What’s the different in bitmap index , compact index and aggregate index?

 

 

Best regards

Ransom.

 

From: shrikanth shankar [mailto:sshankar@qubole.com] 
Sent: Saturday, May 12, 2012 12:05 PM
To: user@hive.apache.org
Subject: Re: how to select without Mapreduce after index build?

 

My understanding is that the scan of the index is used to remove splits that are known not to contain matching data. If you remove enough splits the second MR task will run much faster. The index should also be much smaller than the base table and that MR task should be much cheaper

 

Shrikanth

On May 11, 2012, at 8:56 PM, ransom.hezhiqiang wrote:





Thanks Ashish

 

the query will be split into three steps after index build.

1、  query from index table and get the offset.

2、  Move result.

3、  Get select result by offset.

So I think the query will be more slow  then no index because it has more step and has two mapreduce task in query.

 

So why index exist? No Performance improvements .

 

 

Best regards

Ransom.

 

From: Ashish Thusoo [mailto:athusoo@qubole.com] 
Sent: Saturday, May 12, 2012 12:18 AM
To: user@hive.apache.org
Cc: Zhaojun (Terry)
Subject: Re: how to select without Mapreduce after index build?

 

Indexing in Hive works through map/reduce. There are no active components in Hive (such as the region servers in Hbase), so the way the index is basically used is by running the map/reduce job on the table that holds the index data to get all the relevant offsets into the main table and then using those offsets to figure out which blocks to read from the main table. So you will not see map/reduce go away even when you are running queries on tables with indexes on them.

Ashish

On Thu, May 10, 2012 at 11:32 PM, Hezhiqiang (Ransom) <ra...@huawei.com> wrote:

I think if I  create index for one table

When I excute “select c1,c2 from tab where index_col=1”, should not start mapreduce

But it was start .

So how to use a index without mapreduce?

Compact  index and bitmap index all was tested , all need mapreduce .

 


Re: how to select without Mapreduce after index build?

Posted by shrikanth shankar <ss...@qubole.com>.
My understanding is that the scan of the index is used to remove splits that are known not to contain matching data. If you remove enough splits the second MR task will run much faster. The index should also be much smaller than the base table and that MR task should be much cheaper

Shrikanth
On May 11, 2012, at 8:56 PM, ransom.hezhiqiang wrote:

> Thanks Ashish
>  
> the query will be split into three steps after index build.
> 1、  query from index table and get the offset.
> 2、  Move result.
> 3、  Get select result by offset.
> So I think the query will be more slow  then no index because it has more step and has two mapreduce task in query.
>  
> So why index exist? No Performance improvements .
>  
>  
> Best regards
> Ransom.
>  
> From: Ashish Thusoo [mailto:athusoo@qubole.com] 
> Sent: Saturday, May 12, 2012 12:18 AM
> To: user@hive.apache.org
> Cc: Zhaojun (Terry)
> Subject: Re: how to select without Mapreduce after index build?
>  
> Indexing in Hive works through map/reduce. There are no active components in Hive (such as the region servers in Hbase), so the way the index is basically used is by running the map/reduce job on the table that holds the index data to get all the relevant offsets into the main table and then using those offsets to figure out which blocks to read from the main table. So you will not see map/reduce go away even when you are running queries on tables with indexes on them.
> 
> Ashish
> 
> On Thu, May 10, 2012 at 11:32 PM, Hezhiqiang (Ransom) <ra...@huawei.com> wrote:
> I think if I  create index for one table
> When I excute “select c1,c2 from tab where index_col=1”, should not start mapreduce
> But it was start .
> So how to use a index without mapreduce?
> Compact  index and bitmap index all was tested , all need mapreduce .


RE: how to select without Mapreduce after index build?

Posted by "ransom.hezhiqiang" <ab...@gmail.com>.
Thanks Ashish

 

the query will be split into three steps after index build.

1、  query from index table and get the offset.

2、  Move result.

3、  Get select result by offset.

So I think the query will be more slow  then no index because it has more step and has two mapreduce task in query.

 

So why index exist? No Performance improvements .

 

 

Best regards

Ransom.

 

From: Ashish Thusoo [mailto:athusoo@qubole.com] 
Sent: Saturday, May 12, 2012 12:18 AM
To: user@hive.apache.org
Cc: Zhaojun (Terry)
Subject: Re: how to select without Mapreduce after index build?

 

Indexing in Hive works through map/reduce. There are no active components in Hive (such as the region servers in Hbase), so the way the index is basically used is by running the map/reduce job on the table that holds the index data to get all the relevant offsets into the main table and then using those offsets to figure out which blocks to read from the main table. So you will not see map/reduce go away even when you are running queries on tables with indexes on them.

Ashish

On Thu, May 10, 2012 at 11:32 PM, Hezhiqiang (Ransom) <ra...@huawei.com> wrote:

I think if I  create index for one table

When I excute “select c1,c2 from tab where index_col=1”, should not start mapreduce

But it was start .

So how to use a index without mapreduce?

Compact  index and bitmap index all was tested , all need mapreduce . 

 


Re: how to select without Mapreduce after index build?

Posted by Ashish Thusoo <at...@qubole.com>.
Indexing in Hive works through map/reduce. There are no active components
in Hive (such as the region servers in Hbase), so the way the index is
basically used is by running the map/reduce job on the table that holds the
index data to get all the relevant offsets into the main table and then
using those offsets to figure out which blocks to read from the main table.
So you will not see map/reduce go away even when you are running queries on
tables with indexes on them.

Ashish

On Thu, May 10, 2012 at 11:32 PM, Hezhiqiang (Ransom) <
ransom.hezhiqiang@huawei.com> wrote:

>  I think if I  create index for one table****
>
> When I excute “select c1,c2 from tab where index_col=1”, should not start
> mapreduce****
>
> But it was start .****
>
> So how to use a index without mapreduce?****
>
> Compact  index and bitmap index all was tested , all need mapreduce . ****
>