You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Marquis Wang (JIRA)" <ji...@apache.org> on 2011/05/19 22:50:48 UTC

[jira] [Commented] (HIVE-2036) Update bitmap indexes for automatic usage

    [ https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036449#comment-13036449 ] 

Marquis Wang commented on HIVE-2036:
------------------------------------

Making notes on how to do this:

One of the difficult/different parts about using bitmap indexes is that the only time they become useful is when multiple indexes are combined. Thus, you need a query that joins the various bitmap index tables and returns the blocks that contain the rows we want.

Thus the two parts to writing the automatic use index handler for bitmap indexes are:

1. Figuring out what indexes to use:

As mentioned above, you may need to extend the IndexPredicateAnalyzer to support ORs and possibly to return a tree of predicates (I don't think it already does this).

2. Building a query that accesses the index tables:

This is an example query that I know works for querying the index tables in the query

{noformat}
SELECT * FROM lineitem WHERE  L_QUANTITY = 50.0 AND L_DISCOUNT = 0.08 AND L_TAX = 0.01;
{noformat}

{noformat}
SELECT bucketname AS `_bucketname`, COLLECT_SET(offset) as `_offsets`
FROM (SELECT
        `_bucketname` AS bucketname, `_offset` AS offset
      FROM
        (SELECT ab.`_bucketname`, ab.`_offset`, EWAH_BITMAP_AND(ab.bitmap, c.`_bitmaps`) as bitmap FROM
          (SELECT a.`_bucketname`, b.`_offset`, EWAH_BITMAP_AND(a.`_bitmaps`, b.`_bitmaps`) as bitmap FROM 
            (SELECT * FROM default__lineitem_quantity__ WHERE L_QUANTITY = 50.0) a JOIN 
            (SELECT * FROM default__lineitem_discount__ WHERE L_DISCOUNT = 0.08) b 
                ON a.`_bucketname` = b.`_bucketname` AND a.`_offset` = b.`_offset`) ab JOIN
              (SELECT * FROM default__lineitem_tax__ WHERE L_TAX = 0.01) c
                ON ab.`_bucketname` = c.`_bucketname` AND ab.`_offset` = c.`_offset`) abc 
      WHERE 
        NOT EWAH_BITMAP_EMPTY(abc.bitmap)
) t
GROUP BY bucketname;
{noformat}

This format is perfect for joining any number of AND predicates. I'm pretty sure you can figure out how to expand them to include OR predicates and different grounping of predicates as well. If you make any changes/extensions to the format you should be sure to test them to make sure they have the performance characteristics you want.

> Update bitmap indexes for automatic usage
> -----------------------------------------
>
>                 Key: HIVE-2036
>                 URL: https://issues.apache.org/jira/browse/HIVE-2036
>             Project: Hive
>          Issue Type: Improvement
>          Components: Indexing
>    Affects Versions: 0.8.0
>            Reporter: Russell Melick
>            Assignee: Jeffrey Lym
>
> HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap index support.  The bitmap code will need to be extended after it is committed to enable automatic use of indexing.  Most work will be focused in the BitmapIndexHandler, which needs to generate the re-entrant QL index query.  There may also be significant work in the IndexPredicateAnalyzer to support predicates with OR's, instead of just AND's as it is currently.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira