You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jinfeng Ni <jn...@maprtech.com> on 2013/12/13 19:58:56 UTC

Review Request 16248: Fix merge join batch size, and it value vectors allocation; Add check to ensure a batch size is within a limit.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/
-----------------------------------------------------------

Review request for drill.


Repository: drill-git


Description
-------

In this patch, 3 changes are made:

1) In JoinTemplate.java
 we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
2) In MergeJoinBatch.java
 we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
3) In IteratorValidatorBatchIterator.java
 we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.

Another minor change is DrillPrepareImpl.java. 
The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause.  If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.


Diffs
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java f3a32cd 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java d6f08f1 
  exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java c13838c 
  sqlparser/src/main/java/org/apache/drill/optiq/DrillPrepareImpl.java af7bbc7 

Diff: https://reviews.apache.org/r/16248/diff/


Testing
-------

1) The 4 table join query is successful.
2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit. 


Thanks,

Jinfeng Ni


Re: Review Request 16248: Fix merge join batch size, and it value vectors allocation; Add check to ensure a batch size is within a limit.

Posted by Timothy Chen <tn...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/#review30435
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java
<https://reviews.apache.org/r/16248/#comment58234>

    I wonder if you can have at least one unit test for your fix?


- Timothy Chen


On Dec. 13, 2013, 6:58 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16248/
> -----------------------------------------------------------
> 
> (Updated Dec. 13, 2013, 6:58 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> In this patch, 3 changes are made:
> 
> 1) In JoinTemplate.java
>  we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
> 2) In MergeJoinBatch.java
>  we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
> 3) In IteratorValidatorBatchIterator.java
>  we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.
> 
> Another minor change is DrillPrepareImpl.java. 
> The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause.  If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java f3a32cd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java d6f08f1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java c13838c 
>   sqlparser/src/main/java/org/apache/drill/optiq/DrillPrepareImpl.java af7bbc7 
> 
> Diff: https://reviews.apache.org/r/16248/diff/
> 
> 
> Testing
> -------
> 
> 1) The 4 table join query is successful.
> 2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit. 
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 16248: DRILL-326: Fix merge join batch size, and it value vectors allocation; Add check to ensure a batch size is within a limit.

Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/
-----------------------------------------------------------

(Updated March 11, 2014, 6:25 p.m.)


Review request for drill.


Changes
-------

Rebase on latest master branch. 


Summary (updated)
-----------------

DRILL-326: Fix merge join batch size, and it value vectors allocation; Add check to ensure a batch size is within a limit. 


Repository: drill-git


Description
-------

In this patch, 3 changes are made:

1) In JoinTemplate.java
 we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
2) In MergeJoinBatch.java
 we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
3) In IteratorValidatorBatchIterator.java
 we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.

Another minor change is DrillPrepareImpl.java. 
The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause.  If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java 7c8a51c 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java bd668e7 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java 8552465 
  exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java b41b733 
  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoin.java 27fae08 
  exec/java-exec/src/test/resources/join/join_batchsize.json PRE-CREATION 

Diff: https://reviews.apache.org/r/16248/diff/


Testing
-------

1) The 4 table join query is successful.
2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit. 


Thanks,

Jinfeng Ni


Re: Review Request 16248: Fix merge join batch size, and it value vectors allocation; Add check to ensure a batch size is within a limit.

Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/
-----------------------------------------------------------

(Updated Feb. 19, 2014, 5:29 p.m.)


Review request for drill.


Changes
-------

Rebase on master branch, plus minor code change. 


Repository: drill-git


Description
-------

In this patch, 3 changes are made:

1) In JoinTemplate.java
 we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
2) In MergeJoinBatch.java
 we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
3) In IteratorValidatorBatchIterator.java
 we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.

Another minor change is DrillPrepareImpl.java. 
The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause.  If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 298b031 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java 8552465 
  exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java b41b733 
  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoin.java 493fea6 
  exec/java-exec/src/test/resources/join/join_batchsize.json PRE-CREATION 

Diff: https://reviews.apache.org/r/16248/diff/


Testing
-------

1) The 4 table join query is successful.
2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit. 


Thanks,

Jinfeng Ni


Re: Review Request 16248: Fix merge join batch size, and it value vectors allocation; Add check to ensure a batch size is within a limit.

Posted by Timothy Chen <tn...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/#review30928
-----------------------------------------------------------

Ship it!


Ship It!

- Timothy Chen


On Dec. 28, 2013, 1:04 a.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16248/
> -----------------------------------------------------------
> 
> (Updated Dec. 28, 2013, 1:04 a.m.)
> 
> 
> Review request for drill.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> In this patch, 3 changes are made:
> 
> 1) In JoinTemplate.java
>  we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
> 2) In MergeJoinBatch.java
>  we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
> 3) In IteratorValidatorBatchIterator.java
>  we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.
> 
> Another minor change is DrillPrepareImpl.java. 
> The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause.  If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java f3a32cd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java d6f08f1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java c13838c 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoin.java 0120c7e 
>   exec/java-exec/src/test/resources/join/join_batchsize.json PRE-CREATION 
>   sqlparser/src/main/java/org/apache/drill/optiq/DrillPrepareImpl.java af7bbc7 
> 
> Diff: https://reviews.apache.org/r/16248/diff/
> 
> 
> Testing
> -------
> 
> 1) The 4 table join query is successful.
> 2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit. 
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 16248: Fix merge join batch size, and it value vectors allocation; Add check to ensure a batch size is within a limit.

Posted by Timothy Chen <tn...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/#review30927
-----------------------------------------------------------

Ship it!


Ship It!

- Timothy Chen


On Dec. 28, 2013, 1:04 a.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16248/
> -----------------------------------------------------------
> 
> (Updated Dec. 28, 2013, 1:04 a.m.)
> 
> 
> Review request for drill.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> In this patch, 3 changes are made:
> 
> 1) In JoinTemplate.java
>  we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
> 2) In MergeJoinBatch.java
>  we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
> 3) In IteratorValidatorBatchIterator.java
>  we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.
> 
> Another minor change is DrillPrepareImpl.java. 
> The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause.  If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java f3a32cd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java d6f08f1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java c13838c 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoin.java 0120c7e 
>   exec/java-exec/src/test/resources/join/join_batchsize.json PRE-CREATION 
>   sqlparser/src/main/java/org/apache/drill/optiq/DrillPrepareImpl.java af7bbc7 
> 
> Diff: https://reviews.apache.org/r/16248/diff/
> 
> 
> Testing
> -------
> 
> 1) The 4 table join query is successful.
> 2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit. 
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Re: Review Request 16248: Fix merge join batch size, and it value vectors allocation; Add check to ensure a batch size is within a limit.

Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16248/
-----------------------------------------------------------

(Updated Dec. 27, 2013, 5:04 p.m.)


Review request for drill.


Changes
-------

Add a physical plan using mock scan to re-create the problem of IOBE, when joining two batches with different size. The physical plan will hit IOBE, if the fix in Drill-326 is not applied. 


Repository: drill-git


Description
-------

In this patch, 3 changes are made:

1) In JoinTemplate.java
 we only advance outputPosition if the operation of copyLeft or copyRight is successful. Previously, the code will advance outputPosition even in case of fails, leading to an IndexOutofBoundary Exception in downstream batch processing.
2) In MergeJoinBatch.java
 we change the row count for the value vectors allocated to hold values from left and right, making sure the row count is same for left and right, and bounded by a limit
3) In IteratorValidatorBatchIterator.java
 we add check to make sure every incoming batch's size is within a limit. Otherwise, stop the execution immediately. This will help debug similar problem in the future, by turn on assert with vm option -ea.

Another minor change is DrillPrepareImpl.java. 
The change is to add PushJoinThroughJoinRule into optimizer, since currently optiq 0.4.11 does not add this rule, and could end up with cartisian join when the join predicates are in WHERE clause.  If we move to optiq 0.4.16, this change is no longer needed. For now, this minor change would help run multiple table join queries.


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinStatus.java 762cce7 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinTemplate.java aae1a3c 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java f3a32cd 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java d6f08f1 
  exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java c13838c 
  exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestMergeJoin.java 0120c7e 
  exec/java-exec/src/test/resources/join/join_batchsize.json PRE-CREATION 
  sqlparser/src/main/java/org/apache/drill/optiq/DrillPrepareImpl.java af7bbc7 

Diff: https://reviews.apache.org/r/16248/diff/


Testing
-------

1) The 4 table join query is successful.
2) Verify that the change in IteratorValidatorBatchIterator works as expected, by 1) reverse the code fix in JoinTemplate.java 2) run the query, and the query stops at the first time when the batch goes beyond the limit. 


Thanks,

Jinfeng Ni