You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Syed Albiz <s....@gmail.com> on 2011/07/05 21:02:45 UTC
Review Request: HIVE-2128: Automatic Indexing with multiple tables
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1010/
-----------------------------------------------------------
Review request for hive and John Sichi.
Summary
-------
Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.
This addresses bug HIVE-2128.
https://issues.apache.org/jira/browse/HIVE-2128
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 090ecfc
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6
Diff: https://reviews.apache.org/r/1010/diff
Testing
-------
added new testcase index_auto_mult_tables.q
Thanks,
Syed
Re: Review Request: HIVE-2128: Automatic Indexing with multiple tables
Posted by John Sichi <js...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1010/#review1112
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java
<https://reviews.apache.org/r/1010/#comment2271>
Why was this comment truncated?
ql/src/test/queries/clientpositive/index_auto_mult_tables.q
<https://reviews.apache.org/r/1010/#comment2273>
All of these SELECT statements need ORDER BY for determinism.
- John
On 2011-07-19 03:15:17, Syed Albiz wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1010/
> -----------------------------------------------------------
>
> (Updated 2011-07-19 03:15:17)
>
>
> Review request for hive and John Sichi.
>
>
> Summary
> -------
>
> Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.
>
>
> This addresses bug HIVE-2128.
> https://issues.apache.org/jira/browse/HIVE-2128
>
>
> Diffs
> -----
>
> ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION
> ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION
> ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION
> ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6
> ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION
> ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e
> ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
> ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
> ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
> ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e
>
> Diff: https://reviews.apache.org/r/1010/diff
>
>
> Testing
> -------
>
> added new testcase index_auto_mult_tables.q
>
>
> Thanks,
>
> Syed
>
>
Re: Review Request: HIVE-2128: Automatic Indexing with multiple tables
Posted by Syed Albiz <s....@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1010/
-----------------------------------------------------------
(Updated 2011-07-21 23:52:23.929900)
Review request for hive and John Sichi.
Changes
-------
Added order by to testcases. This revealed an existing bug where we would walk the entire operator tree for each task in the task tree in IndexWhereTaskDispatcher. I amended this to only walk the subset of the operator tree in the current task.
Summary
-------
Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.
This addresses bug HIVE-2128.
https://issues.apache.org/jira/browse/HIVE-2128
Diffs (updated)
-----
ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 4c9efd1
ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java da084f6
ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6
ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION
ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION
ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION
ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION
ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1010/diff
Testing
-------
added new testcase index_auto_mult_tables.q
Thanks,
Syed
Re: Review Request: HIVE-2128: Automatic Indexing with multiple tables
Posted by Syed Albiz <s....@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1010/
-----------------------------------------------------------
(Updated 2011-07-19 03:15:17.006396)
Review request for hive and John Sichi.
Changes
-------
removed unnecessary imports from patch
Summary
-------
Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.
This addresses bug HIVE-2128.
https://issues.apache.org/jira/browse/HIVE-2128
Diffs (updated)
-----
ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION
ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION
ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION
ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6
ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION
ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e
Diff: https://reviews.apache.org/r/1010/diff
Testing
-------
added new testcase index_auto_mult_tables.q
Thanks,
Syed
Re: Review Request: HIVE-2128: Automatic Indexing with multiple tables
Posted by Syed Albiz <s....@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1010/
-----------------------------------------------------------
(Updated 2011-07-13 00:29:56.738368)
Review request for hive and John Sichi.
Changes
-------
Revamped approach. We already uniquely assign filenames to each index query result, so instead of throwing those away, keep them in the indexIntermediateFile variable, and take the union of those input paths to generate the next set of input splits.
Summary
-------
Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.
This addresses bug HIVE-2128.
https://issues.apache.org/jira/browse/HIVE-2128
Diffs (updated)
-----
ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION
ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 02ab78c
ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6
ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION
ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION
ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION
ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1010/diff
Testing
-------
added new testcase index_auto_mult_tables.q
Thanks,
Syed
Re: Review Request: HIVE-2128: Automatic Indexing with multiple tables
Posted by Syed Albiz <s....@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1010/
-----------------------------------------------------------
(Updated 2011-07-06 00:03:20.513755)
Review request for hive and John Sichi.
Changes
-------
updated patch to include testcase
Summary
-------
Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.
This addresses bug HIVE-2128.
https://issues.apache.org/jira/browse/HIVE-2128
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 090ecfc
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e
ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6
ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION
ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1010/diff
Testing
-------
added new testcase index_auto_mult_tables.q
Thanks,
Syed