You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "John Sichi (JIRA)" <ji...@apache.org> on 2010/11/01 19:16:27 UTC

[jira] Commented: (HIVE-1694) Accelerate query execution using indexes

    [ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927070#action_12927070 ] 

John Sichi commented on HIVE-1694:
----------------------------------

Hey guys, I haven't gone through all the code yet, but reading through the slides just now, there's one problem I should point out with using the existing compact indexes for aggregate rewrite.

Namely, we store only the distinct block offsets, not the distinct row offsets.  So, if the same key appears more than once within the same block, you'll get the wrong answer for COUNT.  One way to address this would be to compute the COUNT per index entry at the time we are building the index, and then SUM that later for aggregation.  But currently the compact index does not store that, so we would need to add it as a new index type.

One smaller item is that for the DISTINCT rewrite (slide 10), you still need to keep a DISTINCT on the rewritten query since the same l_shipdate may be repeated in the index table if it appears in multiple buckets.


> Accelerate query execution using indexes
> ----------------------------------------
>
>                 Key: HIVE-1694
>                 URL: https://issues.apache.org/jira/browse/HIVE-1694
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Nikhil Deshpande
>         Attachments: demo_q1.hql, demo_q2.hql, HIVE-1694_2010-10-28.diff
>
>
> The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler & execution engine for SELECT queries.
> This is in ref. to John's comment at
> https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
> on creating separate JIRA issue for tracking index usage in optimizer & query execution.
> The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g.
> - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?)
> - Joins (index based joins)
> - Group By, Order By and other misc cases
> The proposal is multi-step:
> 1. Building index based operators, compiler and execution engine changes
> 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.)
> This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans & operator implementations for above mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.