You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Namit Jain (Created) (JIRA)" <ji...@apache.org> on 2012/03/07 02:31:58 UTC

[jira] [Created] (HIVE-2845) Add support for index joins in Hive

Add support for index joins in Hive
-----------------------------------

                 Key: HIVE-2845
                 URL: https://issues.apache.org/jira/browse/HIVE-2845
             Project: Hive
          Issue Type: New Feature
            Reporter: Namit Jain


Hive supports indexes, which are used for filters currently.

It would be very useful to add support for index-based joins in Hive.
If 2 tables A and B are being joined, and an index exists on the join key of A,
B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Created] (HIVE-2845) Add support for index joins in Hive

Posted by Mahsa Mofidpoor <mo...@gmail.com>.

Hello all,

On Tue, Mar 6, 2012 at 8:31 PM, Namit Jain (Created) (JIRA) <jira@apache.org
> wrote:

> Add support for index joins in Hive
> -----------------------------------
>
>                  Key: HIVE-2845
>                  URL: https://issues.apache.org/jira/browse/HIVE-2845
>              Project: Hive
>           Issue Type: New Feature
>             Reporter: Namit Jain
>
>
> Hive supports indexes, which are used for filters currently.
>
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key
> of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the
> corresponding row in A can be performed.


According to
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins, only
the last table which is streamed could be scanned by an index which is in
this case B. Please correct me if I'm wrong.

This can be very useful for some usecases.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
> The process may be re-writing the original query as an added-stage in
physical optimizer, but would not produce any different MapReduce job like
the ones that HiveSkewJoin does in the physical optimizer. If this
is effective, how would that  query rewriting process be? Should it match
with a "JOIN" rule, like HiveSkewJoin, and then replace the second "TS"?
How?

I am eager to implement this issue an I was wondering if it could be
assigned to me.
I appreciate any hints/clues in advance.

Regards,
Mahsa

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Mahsa Mofidpoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405956#comment-13405956 ] 

Mahsa Mofidpoor commented on HIVE-2845:
---------------------------------------

When where clause makes use of indexes, the query is re-written to process the index table, rather than the base one. The predicate (lets' assume a single condition) produces one single value to lookup in the index table.
1-In case of a join, for each row in table B a lookup should be done, does it mean that for each row query should be re-written? This would lead to so many temp files.    
2- Lets's say the lookup is done, how the association between the result from lookup and the original base table is maintained? Finally we have to access the base table for extracting the desired columns.

  
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: gsoc, gsoc2012, indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Srinivasan Sembakkam Rajivelu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235353#comment-13235353 ] 

Srinivasan Sembakkam Rajivelu commented on HIVE-2845:
-----------------------------------------------------

Could you please explain in which use case this will be useful. Because I feel that there is a space overhead for storing the index is also there. 

And there is already a map join optimization implemented in Hive,
http://www.facebook.com/notes/facebook-engineering/join-optimization-in-apache-hive/470667928919

So this index creation will be helpful do u think??
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Namit Jain
>              Labels: gsoc, gsoc2012
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Mahsa Mofidpoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437583#comment-13437583 ] 

Mahsa Mofidpoor commented on HIVE-2845:
---------------------------------------

Finally, I replaced the table in the operator tree with the index table. Now the problem is that i have only access to the indexed cols of the replaced base table. With the bucket name and offset it should not be a big deal to reach other columns. 
Can you please give me some clues on the stages in which other columns are extarcted by bucket and offset? 
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Namit Jain (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235371#comment-13235371 ] 

Namit Jain commented on HIVE-2845:
----------------------------------

This may not be very useful in the current hive setup, since the point lookup on the hive index is not very fast.
However, this opens up a very wide variety of applications. 

Consider the scenario when one of the tables is stored in HBase. In that case, a join with other table can be reduced
to a map-only job, with the mapper doing a point lookup for every row. This is very different from map-join where one of
the tables is so small that it fits in memory.
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Namit Jain
>              Labels: gsoc, gsoc2012
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423688#comment-13423688 ] 

Namit Jain commented on HIVE-2845:
----------------------------------

Mahasa asked me the following:::

In the following query:
 
SELECT col_list FROM A JOIN B ON (A.col1 = B.col1)
 
1. Since all tables are buffered except for the last one which is streamed, it is only table B that can make use of the index, am I right?

>>>>> No, indexes can be used for both the tables if there are filters on any of the tables. It would be after the tablescan for either A or B.

2. In order to do this, in the mapper, the TS should be done on the index table rather than the base table; what about the reduce stage? Don't you need to have access to base table/index table in the reduce phase too? For applying SEL? 

>>>>>  Yes, the TS would be on the index table for either A or B. There would be no change after that -- no change in the reduce phase.

3. As far as I know, filter pushdown and group by use indexes to accelerate the query. Filter pushdown recompiles the re-written query whereas GB only replaces appropriate operators of the operator tree. Which one is more suitable to be inspired to implement HIVE-2845?

>>>>> HIVE-2845 requires new changes. Essentially, one of the tables, say A would be read completely, and the other one, B would be probed for each key of A, 
or vice versa.

4. May I ask to assign this ticket to me?

>>>>> Yes, I dont think anyone is working on it right now.
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Mahsa Mofidpoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409115#comment-13409115 ] 

Mahsa Mofidpoor commented on HIVE-2845:
---------------------------------------

what will be the operator rule that matches with this? Is it a join followed by a tablescan("JOIN%.*TS")?

                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: gsoc, gsoc2012, indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Srinivasan Sembakkam Rajivelu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235355#comment-13235355 ] 

Srinivasan Sembakkam Rajivelu commented on HIVE-2845:
-----------------------------------------------------

Could you please explain in which use case this will be useful. Because I feel that there is a space overhead for storing the index is also there. 

And there is already a map join optimization implemented in Hive,
http://www.facebook.com/notes/facebook-engineering/join-optimization-in-apache-hive/470667928919

So this index creation will be helpful do u think??
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Namit Jain
>              Labels: gsoc, gsoc2012
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Mahsa Mofidpoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420673#comment-13420673 ] 

Mahsa Mofidpoor commented on HIVE-2845:
---------------------------------------

I've checked the query plan. It does not process joins. 
In this case the join key should be pushed to the mapper, but isn't it more appropriate to replace the TS operator(like HIVE-1694) rather than recompiling the query? 
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2845) Add support for index joins in Hive

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2845:
---------------------------------

    Component/s: Query Processor
                 Indexing
         Labels: gsoc gsoc2012 indexing joins performance  (was: gsoc gsoc2012)
    
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: gsoc, gsoc2012, indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424559#comment-13424559 ] 

Namit Jain commented on HIVE-2845:
----------------------------------

Say there is a index of table A on 'key'.

For a query of the type:

select .. from A join B on A.key=B.key;

the plan can be as follows:

scan B
for every row of B (or a batch of rows in B), lookup the value using the index in A


The basic infra-structure is needed first. A lot of optimizations can be added later.
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Mahsa Mofidpoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428956#comment-13428956 ] 

Mahsa Mofidpoor commented on HIVE-2845:
---------------------------------------

I put my code in optimizer/physical, pretty much the same as how HIVE-1644 is organized.
I removed the TS from topToTable and opParseContext and also the table from topOps. Then I updated the correspondent operator and table(index table) in OpParseContext and ParseContext respectively, but the map logs shows the original table still being scanned.
1.Should I regenerate the operator tree afterwards? probably by using RewriteParseContextGenerator?
2.Is a new RowResolver needed?
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Mahsa Mofidpoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419995#comment-13419995 ] 

Mahsa Mofidpoor commented on HIVE-2845:
---------------------------------------

Can anybody tell me how close this JIRA is with HIVE-1644, I have already tried the same approach introduced in HIVE-1644 to perform join using the index but it produces nothing at the end.
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Srinivasan Sembakkam Rajivelu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235354#comment-13235354 ] 

Srinivasan Sembakkam Rajivelu commented on HIVE-2845:
-----------------------------------------------------

Could you please explain in which use case this will be useful. Because I feel that there is a space overhead for storing the index is also there. 

And there is already a map join optimization implemented in Hive,
http://www.facebook.com/notes/facebook-engineering/join-optimization-in-apache-hive/470667928919

So this index creation will be helpful do u think??
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Namit Jain
>              Labels: gsoc, gsoc2012
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "Mahsa Mofidpoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424378#comment-13424378 ] 

Mahsa Mofidpoor commented on HIVE-2845:
---------------------------------------

Can you please clarify on the new changes in more details?
                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2845) Add support for index joins in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420466#comment-13420466 ] 

He Yongqiang commented on HIVE-2845:
------------------------------------

With HIVE-1644, this should be done. Have you looked at the query plan, or looked at the patch of HIVE-1644? Maybe Hive-1644 does not process join cases (but the code is already there.) The filter needs to be pushed down to the mapper to trigger the auto index.

                
> Add support for index joins in Hive
> -----------------------------------
>
>                 Key: HIVE-2845
>                 URL: https://issues.apache.org/jira/browse/HIVE-2845
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>            Reporter: Namit Jain
>              Labels: indexing, joins, performance
>
> Hive supports indexes, which are used for filters currently.
> It would be very useful to add support for index-based joins in Hive.
> If 2 tables A and B are being joined, and an index exists on the join key of A,
> B can be scanned (by the mappers), and for each row in B, a lookup for the corresponding row in A can be performed.
> This can be very useful for some usecases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira