You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jinfeng Ni <jn...@maprtech.com> on 2013/12/13 19:58:56 UTC
Review Request 16248: Fix merge join batch size, and it value vectors
allocation; Add check to ensure a batch size is within a limit.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/
-----------------------------------------------------------
Review request for drill.
Repository: drill-git
Description
-------
In this patch, 3 changes are made:
1) In JoinTemplate.java
we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
2) In MergeJoinBatch.java
we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
3) In IteratorValidatorBatchIterator.java
we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.
Another minor change is DrillPrepareImpl.java.
The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause. If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.
Diffs
-----
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java f3a32cd
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java d6f08f1
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java c13838c
sqlparser/src/main/java/org/apache/drill/optiq/DrillPrepareImpl.java af7bbc7
Diff: https://reviews.apache.org/r/16248/diff/
Testing
-------
1) The 4 table join query is successful.
2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit.
Thanks,
Jinfeng Ni
Re: Review Request 16248: Fix merge join batch size, and it value vectors
allocation; Add check to ensure a batch size is within a limit.
Posted by Timothy Chen <tn...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/#review30435
-----------------------------------------------------------
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java
<https://reviews.apache.org/r/16248/#comment58234>
I wonder if you can have at least one unit test for your fix?
- Timothy Chen
On Dec. 13, 2013, 6:58 p.m., Jinfeng Ni wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16248/
> -----------------------------------------------------------
>
> (Updated Dec. 13, 2013, 6:58 p.m.)
>
>
> Review request for drill.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> In this patch, 3 changes are made:
>
> 1) In JoinTemplate.java
> we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
> 2) In MergeJoinBatch.java
> we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
> 3) In IteratorValidatorBatchIterator.java
> we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.
>
> Another minor change is DrillPrepareImpl.java.
> The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause. If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java f3a32cd
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java d6f08f1
> exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java c13838c
> sqlparser/src/main/java/org/apache/drill/optiq/DrillPrepareImpl.java af7bbc7
>
> Diff: https://reviews.apache.org/r/16248/diff/
>
>
> Testing
> -------
>
> 1) The 4 table join query is successful.
> 2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit.
>
>
> Thanks,
>
> Jinfeng Ni
>
>
Re: Review Request 16248: DRILL-326: Fix merge join batch size, and it value
vectors allocation; Add check to ensure a batch size is within a limit.
Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/
-----------------------------------------------------------
(Updated March 11, 2014, 6:25 p.m.)
Review request for drill.
Changes
-------
Rebase on latest master branch.
Summary (updated)
-----------------
DRILL-326: Fix merge join batch size, and it value vectors allocation; Add check to ensure a batch size is within a limit.
Repository: drill-git
Description
-------
In this patch, 3 changes are made:
1) In JoinTemplate.java
we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
2) In MergeJoinBatch.java
we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
3) In IteratorValidatorBatchIterator.java
we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.
Another minor change is DrillPrepareImpl.java.
The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause. If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.
Diffs (updated)
-----
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java 7c8a51c
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java bd668e7
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java 8552465
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java b41b733
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoin.java 27fae08
exec/java-exec/src/test/resources/join/join_batchsize.json PRE-CREATION
Diff: https://reviews.apache.org/r/16248/diff/
Testing
-------
1) The 4 table join query is successful.
2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit.
Thanks,
Jinfeng Ni
Re: Review Request 16248: Fix merge join batch size, and it value vectors
allocation; Add check to ensure a batch size is within a limit.
Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/
-----------------------------------------------------------
(Updated Feb. 19, 2014, 5:29 p.m.)
Review request for drill.
Changes
-------
Rebase on master branch, plus minor code change.
Repository: drill-git
Description
-------
In this patch, 3 changes are made:
1) In JoinTemplate.java
we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
2) In MergeJoinBatch.java
we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
3) In IteratorValidatorBatchIterator.java
we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.
Another minor change is DrillPrepareImpl.java.
The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause. If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.
Diffs (updated)
-----
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 298b031
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java 8552465
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java b41b733
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoin.java 493fea6
exec/java-exec/src/test/resources/join/join_batchsize.json PRE-CREATION
Diff: https://reviews.apache.org/r/16248/diff/
Testing
-------
1) The 4 table join query is successful.
2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit.
Thanks,
Jinfeng Ni
Re: Review Request 16248: Fix merge join batch size, and it value vectors
allocation; Add check to ensure a batch size is within a limit.
Posted by Timothy Chen <tn...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/#review30928
-----------------------------------------------------------
Ship it!
Ship It!
- Timothy Chen
On Dec. 28, 2013, 1:04 a.m., Jinfeng Ni wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16248/
> -----------------------------------------------------------
>
> (Updated Dec. 28, 2013, 1:04 a.m.)
>
>
> Review request for drill.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> In this patch, 3 changes are made:
>
> 1) In JoinTemplate.java
> we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
> 2) In MergeJoinBatch.java
> we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
> 3) In IteratorValidatorBatchIterator.java
> we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.
>
> Another minor change is DrillPrepareImpl.java.
> The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause. If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java f3a32cd
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java d6f08f1
> exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java c13838c
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoin.java 0120c7e
> exec/java-exec/src/test/resources/join/join_batchsize.json PRE-CREATION
> sqlparser/src/main/java/org/apache/drill/optiq/DrillPrepareImpl.java af7bbc7
>
> Diff: https://reviews.apache.org/r/16248/diff/
>
>
> Testing
> -------
>
> 1) The 4 table join query is successful.
> 2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit.
>
>
> Thanks,
>
> Jinfeng Ni
>
>
Re: Review Request 16248: Fix merge join batch size, and it value vectors
allocation; Add check to ensure a batch size is within a limit.
Posted by Timothy Chen <tn...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/#review30927
-----------------------------------------------------------
Ship it!
Ship It!
- Timothy Chen
On Dec. 28, 2013, 1:04 a.m., Jinfeng Ni wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16248/
> -----------------------------------------------------------
>
> (Updated Dec. 28, 2013, 1:04 a.m.)
>
>
> Review request for drill.
>
>
> Repository: drill-git
>
>
> Description
> -------
>
> In this patch, 3 changes are made:
>
> 1) In JoinTemplate.java
> we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
> 2) In MergeJoinBatch.java
> we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
> 3) In IteratorValidatorBatchIterator.java
> we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.
>
> Another minor change is DrillPrepareImpl.java.
> The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause. If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.
>
>
> Diffs
> -----
>
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java f3a32cd
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java d6f08f1
> exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java c13838c
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoin.java 0120c7e
> exec/java-exec/src/test/resources/join/join_batchsize.json PRE-CREATION
> sqlparser/src/main/java/org/apache/drill/optiq/DrillPrepareImpl.java af7bbc7
>
> Diff: https://reviews.apache.org/r/16248/diff/
>
>
> Testing
> -------
>
> 1) The 4 table join query is successful.
> 2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit.
>
>
> Thanks,
>
> Jinfeng Ni
>
>
Re: Review Request 16248: Fix merge join batch size, and it value vectors
allocation; Add check to ensure a batch size is within a limit.
Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/
-----------------------------------------------------------
(Updated Dec. 27, 2013, 5:04 p.m.)
Review request for drill.
Changes
-------
Add a physical plan using mock scan to re-create the problem of IOBE, when joining two batches with different size. The physical plan will hit IOBE, if the fix in Drill-326 is not applied.
Repository: drill-git
Description
-------
In this patch, 3 changes are made:
1) In JoinTemplate.java
we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
2) In MergeJoinBatch.java
we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
3) In IteratorValidatorBatchIterator.java
we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.
Another minor change is DrillPrepareImpl.java.
The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause. If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.
Diffs (updated)
-----
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java f3a32cd
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java d6f08f1
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java c13838c
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoin.java 0120c7e
exec/java-exec/src/test/resources/join/join_batchsize.json PRE-CREATION
sqlparser/src/main/java/org/apache/drill/optiq/DrillPrepareImpl.java af7bbc7
Diff: https://reviews.apache.org/r/16248/diff/
Testing
-------
1) The 4 table join query is successful.
2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit.
Thanks,
Jinfeng Ni