You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Remus Rusanu <re...@microsoft.com> on 2013/10/03 16:17:56 UTC
Re: Review Request 13059: HIVE-4850 Implement vector mode map join
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
-----------------------------------------------------------
(Updated Oct. 3, 2013, 2:17 p.m.)
Review request for hive, Eric Hanson and Jitendra Pandey.
Bugs: HIVE-4850
https://issues.apache.org/jira/browse/HIVE-4850
Repository: hive-git
Description
-------
This is not the final iteration, but I thought is easier to discuss it with a review.
This implementation works, handles multiple aliases and multiple values per key. The implementation uses the exiting hash tables saved by the local task for the map join, which are row mode hash tables (have row mode keys and store row mode writable object values). Going forward we should avoid the size-of-big-table conversions of big table keys to row-mode and conversion of small table values to vector data. This would require either converting on-the-fly the hash tables to vector friendly ones (when loaded) or changing the local task tahstable sink to create a vectorization friendly hash. First approach may have memory consumption problems (potentially two hash tables end up in memory, would have to stream the transformation or transform as reading from serialized format... nasty).
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47
ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea
ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8ab5395
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 02ebe14
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java df1c5a6
ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b
Diff: https://reviews.apache.org/r/13059/diff/
Testing
-------
Manually run some join queries on alltypes_orc table.
Thanks,
Remus Rusanu
RE: Review Request 13059: HIVE-4850 Implement vector mode map join
Posted by Remus Rusanu <re...@microsoft.com>.
Thanks for the review!
I will probably have to add a JIRA to address them, otherwise I don't have a vehicle for submitting the patch :)
-----Original Message-----
From: Eric Hanson [mailto:noreply@reviews.apache.org] On Behalf Of Eric Hanson
Sent: Monday, November 11, 2013 10:09 PM
To: Jitendra Pandey; Eric Hanson (SQL SERVER)
Cc: Remus Rusanu; hive
Subject: Re: Review Request 13059: HIVE-4850 Implement vector mode map join
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/#review28671
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java
<https://reviews.apache.org/r/13059/#comment55588>
Nice to see good comments
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java
<https://reviews.apache.org/r/13059/#comment55589>
A comment that explains at a high level where and how this interface is used would be helpful.
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55593>
should these fields be marked private?
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55594>
Please add a comment that explains this method.
Should this be a new method on ByteColumnVector or can you use BytesColumnVector.setVal()?
setVal automatically extends the internal buffer if needed.
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55612>
Please add a comment to explain the purpose of this method
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55617>
conventions are to put blanks before and after operators =, < etc.
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55616>
The Sun Java coding style conventions that are used for Hive say to use this style:
} else [if (...)] {
...
}
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55618>
Please add a descriptive comment for this method
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55622>
and the what?
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55627>
remove blank comment?
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55629>
correct spelling of Vectorization
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55632>
supper -> super?
Please explain what out-of-band params are.
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55635>
colon should be surounded by blanks
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java
<https://reviews.apache.org/r/13059/#comment55653>
Please add a comment that describes what this method does.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
<https://reviews.apache.org/r/13059/#comment55654>
Please add a comment
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
<https://reviews.apache.org/r/13059/#comment55649>
Please add a comment explaining what's done by this method
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
<https://reviews.apache.org/r/13059/#comment55650>
Please add a descriptive comment for this method
This looks good. I made a bunch of stylistic comments. Could you also add a page or so of design description to the design document for vectorized query execution attached to HIVE-4160? Thanks Remus. -Eric
- Eric Hanson
On Oct. 12, 2013, 9:51 p.m., Remus Rusanu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13059/
> -----------------------------------------------------------
>
> (Updated Oct. 12, 2013, 9:51 p.m.)
>
>
> Review request for hive, Eric Hanson and Jitendra Pandey.
>
>
> Bugs: HIVE-4850
> https://issues.apache.org/jira/browse/HIVE-4850
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This is a working implementation based on current trunk. It is simpler than the .1 patch in as it delegates the JOIN entirely to the row-mode MapJoinOperator. The vectorized operator is literally calling the row-mode implementaiton for each row in the input batch and collects the row-mode forward into the output batch. This is not as bad as it seems because the JOIN operators has to resort to row-mode operations anyway, due to the small tables (hashtables) being row-mode (objects and object-inspectors). By delegating the entire join logic to the row mode we piggyback on the correctness of exiting implementation. I do plan to come up with a full-vectorized mode implementation but that would require changes to the hash table creation-serialization. Note that the filtering and key evaluation of the big table does use vectorized operators. the row mode applies only to the key HT lookup and to the JOIN logic
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47
> ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044
> ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java fa9ee35
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea
> ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 54f2644
> ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java 0fb763a
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 8f10644
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 02c32cb
> ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b
> ql/src/test/queries/clientpositive/vectorized_mapjoin.q PRE-CREATION
> ql/src/test/results/clientpositive/vectorized_mapjoin.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/13059/diff/
>
>
> Testing
> -------
>
> Manually run some join queries on alltypes_orc table.
>
>
> Thanks,
>
> Remus Rusanu
>
>
Re: Review Request 13059: HIVE-4850 Implement vector mode map join
Posted by Eric Hanson <eh...@microsoft.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/#review28671
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java
<https://reviews.apache.org/r/13059/#comment55588>
Nice to see good comments
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java
<https://reviews.apache.org/r/13059/#comment55589>
A comment that explains at a high level where and how this interface is used would be helpful.
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55593>
should these fields be marked private?
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55594>
Please add a comment that explains this method.
Should this be a new method on ByteColumnVector or can you use BytesColumnVector.setVal()?
setVal automatically extends the internal buffer if needed.
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55612>
Please add a comment to explain the purpose of this method
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55617>
conventions are to put blanks before and after operators =, < etc.
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55616>
The Sun Java coding style conventions that are used for Hive say to use this style:
} else [if (...)] {
...
}
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55618>
Please add a descriptive comment for this method
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55622>
and the what?
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55627>
remove blank comment?
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55629>
correct spelling of Vectorization
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55632>
supper -> super?
Please explain what out-of-band params are.
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55635>
colon should be surounded by blanks
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java
<https://reviews.apache.org/r/13059/#comment55653>
Please add a comment that describes what this method does.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
<https://reviews.apache.org/r/13059/#comment55654>
Please add a comment
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
<https://reviews.apache.org/r/13059/#comment55649>
Please add a comment explaining what's done by this method
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
<https://reviews.apache.org/r/13059/#comment55650>
Please add a descriptive comment for this method
This looks good. I made a bunch of stylistic comments. Could you also add a page or so of design description to the design document for vectorized query execution attached to HIVE-4160? Thanks Remus. -Eric
- Eric Hanson
On Oct. 12, 2013, 9:51 p.m., Remus Rusanu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13059/
> -----------------------------------------------------------
>
> (Updated Oct. 12, 2013, 9:51 p.m.)
>
>
> Review request for hive, Eric Hanson and Jitendra Pandey.
>
>
> Bugs: HIVE-4850
> https://issues.apache.org/jira/browse/HIVE-4850
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This is a working implementation based on current trunk. It is simpler than the .1 patch in as it delegates the JOIN entirely to the row-mode MapJoinOperator. The vectorized operator is literally calling the row-mode implementaiton for each row in the input batch and collects the row-mode forward into the output batch. This is not as bad as it seems because the JOIN operators has to resort to row-mode operations anyway, due to the small tables (hashtables) being row-mode (objects and object-inspectors). By delegating the entire join logic to the row mode we piggyback on the correctness of exiting implementation. I do plan to come up with a full-vectorized mode implementation but that would require changes to the hash table creation-serialization. Note that the filtering and key evaluation of the big table does use vectorized operators. the row mode applies only to the key HT lookup and to the JOIN logic
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47
> ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044
> ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java fa9ee35
> ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea
> ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 54f2644
> ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java 0fb763a
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 8f10644
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 02c32cb
> ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b
> ql/src/test/queries/clientpositive/vectorized_mapjoin.q PRE-CREATION
> ql/src/test/results/clientpositive/vectorized_mapjoin.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/13059/diff/
>
>
> Testing
> -------
>
> Manually run some join queries on alltypes_orc table.
>
>
> Thanks,
>
> Remus Rusanu
>
>
Re: Review Request 13059: HIVE-4850 Implement vector mode map join
Posted by Remus Rusanu <re...@microsoft.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
-----------------------------------------------------------
(Updated Oct. 12, 2013, 9:51 p.m.)
Review request for hive, Eric Hanson and Jitendra Pandey.
Bugs: HIVE-4850
https://issues.apache.org/jira/browse/HIVE-4850
Repository: hive-git
Description
-------
This is a working implementation based on current trunk. It is simpler than the .1 patch in as it delegates the JOIN entirely to the row-mode MapJoinOperator. The vectorized operator is literally calling the row-mode implementaiton for each row in the input batch and collects the row-mode forward into the output batch. This is not as bad as it seems because the JOIN operators has to resort to row-mode operations anyway, due to the small tables (hashtables) being row-mode (objects and object-inspectors). By delegating the entire join logic to the row mode we piggyback on the correctness of exiting implementation. I do plan to come up with a full-vectorized mode implementation but that would require changes to the hash table creation-serialization. Note that the filtering and key evaluation of the big table does use vectorized operators. the row mode applies only to the key HT lookup and to the JOIN logic
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47
ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044
ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java fa9ee35
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea
ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 54f2644
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java 0fb763a
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 8f10644
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 02c32cb
ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b
ql/src/test/queries/clientpositive/vectorized_mapjoin.q PRE-CREATION
ql/src/test/results/clientpositive/vectorized_mapjoin.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/13059/diff/
Testing
-------
Manually run some join queries on alltypes_orc table.
Thanks,
Remus Rusanu
Re: Review Request 13059: HIVE-4850 Implement vector mode map join
Posted by Remus Rusanu <re...@microsoft.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
-----------------------------------------------------------
(Updated Oct. 9, 2013, 1:50 p.m.)
Review request for hive, Eric Hanson and Jitendra Pandey.
Bugs: HIVE-4850
https://issues.apache.org/jira/browse/HIVE-4850
Repository: hive-git
Description
-------
This is a working implementation based on current trunk. It is simpler than the .1 patch in as it delegates the JOIN entirely to the row-mode MapJoinOperator. The vectorized operator is literally calling the row-mode implementaiton for each row in the input batch and collects the row-mode forward into the output batch. This is not as bad as it seems because the JOIN operators has to resort to row-mode operations anyway, due to the small tables (hashtables) being row-mode (objects and object-inspectors). By delegating the entire join logic to the row mode we piggyback on the correctness of exiting implementation. I do plan to come up with a full-vectorized mode implementation but that would require changes to the hash table creation-serialization. Note that the filtering and key evaluation of the big table does use vectorized operators. the row mode applies only to the key HT lookup and to the JOIN logic
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47
ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044
ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java fa9ee35
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea
ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8ab5395
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java 0fb763a
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java bd0955e
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java df1c5a6
ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b
ql/src/test/queries/clientpositive/vectorized_mapjoin.q PRE-CREATION
ql/src/test/results/clientpositive/vectorized_mapjoin.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/13059/diff/
Testing
-------
Manually run some join queries on alltypes_orc table.
Thanks,
Remus Rusanu
Re: Review Request 13059: HIVE-4850 Implement vector mode map join
Posted by Remus Rusanu <re...@microsoft.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
-----------------------------------------------------------
(Updated Oct. 3, 2013, 2:20 p.m.)
Review request for hive, Eric Hanson and Jitendra Pandey.
Bugs: HIVE-4850
https://issues.apache.org/jira/browse/HIVE-4850
Repository: hive-git
Description (updated)
-------
This is a working implementation based on current trunk. It is simpler than the .1 patch in as it delegates the JOIN entirely to the row-mode MapJoinOperator. The vectorized operator is literally calling the row-mode implementaiton for each row in the input batch and collects the row-mode forward into the output batch. This is not as bad as it seems because the JOIN operators has to resort to row-mode operations anyway, due to the small tables (hashtables) being row-mode (objects and object-inspectors). By delegating the entire join logic to the row mode we piggyback on the correctness of exiting implementation. I do plan to come up with a full-vectorized mode implementation but that would require changes to the hash table creation-serialization. Note that the filtering and key evaluation of the big table does use vectorized operators. the row mode applies only to the key HT lookup and to the JOIN logic
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47
ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea
ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8ab5395
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 02ebe14
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java df1c5a6
ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b
Diff: https://reviews.apache.org/r/13059/diff/
Testing
-------
Manually run some join queries on alltypes_orc table.
Thanks,
Remus Rusanu