You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Alex Bain <am...@gmail.com> on 2013/10/24 03:33:52 UTC
Review Request 14897: Implement LIMIT in Tez
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14897/
-----------------------------------------------------------
Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
Bugs: PIG-3538
https://issues.apache.org/jira/browse/PIG-3538
Repository: pig-git
Description
-------
Implement LIMIT in Tez by providing an implementation of visitLimit in TezCompiler.java.
Diffs
-----
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0c20214
Diff: https://reviews.apache.org/r/14897/diff/
Testing
-------
Thanks,
Alex Bain
Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez
Posted by Daniel Dai <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14897/#review27498
-----------------------------------------------------------
Ship it!
Ship It!
- Daniel Dai
On Oct. 24, 2013, 11:47 p.m., Alex Bain wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14897/
> -----------------------------------------------------------
>
> (Updated Oct. 24, 2013, 11:47 p.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3538
> https://issues.apache.org/jira/browse/PIG-3538
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Implement LIMIT in Tez by providing an implementation of visitLimit in TezCompiler.java.
>
> UPDATED (Oct 24 4:37 PM):
> 1. I added a test to TestTezCompiler.java and a GLD file
> 2. I included Daniel's patch for a new e2e test
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0c20214
> test/e2e/pig/tests/tez.conf 5edc093
> test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld PRE-CREATION
> test/org/apache/pig/tez/TestTezCompiler.java ef51876
> test/org/apache/pig/tez/TestTezJobControlCompiler.java 0a23513
>
> Diff: https://reviews.apache.org/r/14897/diff/
>
>
> Testing
> -------
>
> [abain@abain-ld pig]$ cat data/1.dat
> 1,orange
> 2,apple
> 3,strawberry
>
> [abain@abain-ld pig]$ cat test3.pig
> a = load './1.dat' using PigStorage(',') as (id:int, fruit:chararray);
> b = LIMIT a 2;
> STORE b INTO 'foo';
>
> I ran with with "pig -x tez -f test3.pig" and got the following (correct results):
>
> [abain@abain-ld pig]$ hadoop fs -ls /user/abain/foo
> Found 2 items
> -rw-r--r-- 1 abain supergroup 0 2013-10-23 18:38 /user/abain/foo/_SUCCESS
> -rw-r--r-- 1 abain supergroup 17 2013-10-23 18:38 /user/abain/foo/part-r-00000
>
> [abain@abain-ld pig]$ hadoop fs -cat /user/abain/foo/part-r-00000
> 1 orange
> 2 apple
>
> UPDATED (Oct 24 4:37 PM):
> 1. ant -Dtestcase=TestTezCompiler test passes
> 2. I ran test-e2e-tez. The new test seems to pass (although something else failed).
>
>
> Thanks,
>
> Alex Bain
>
>
Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez
Posted by Daniel Dai <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14897/#review27494
-----------------------------------------------------------
2. I ran test-e2e-tez. The new test seems to pass (although something else failed).
Yes, Checkin_2 fail for me. Should be a simple fix.
- Daniel Dai
On Oct. 24, 2013, 11:47 p.m., Alex Bain wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14897/
> -----------------------------------------------------------
>
> (Updated Oct. 24, 2013, 11:47 p.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3538
> https://issues.apache.org/jira/browse/PIG-3538
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Implement LIMIT in Tez by providing an implementation of visitLimit in TezCompiler.java.
>
> UPDATED (Oct 24 4:37 PM):
> 1. I added a test to TestTezCompiler.java and a GLD file
> 2. I included Daniel's patch for a new e2e test
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0c20214
> test/e2e/pig/tests/tez.conf 5edc093
> test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld PRE-CREATION
> test/org/apache/pig/tez/TestTezCompiler.java ef51876
> test/org/apache/pig/tez/TestTezJobControlCompiler.java 0a23513
>
> Diff: https://reviews.apache.org/r/14897/diff/
>
>
> Testing
> -------
>
> [abain@abain-ld pig]$ cat data/1.dat
> 1,orange
> 2,apple
> 3,strawberry
>
> [abain@abain-ld pig]$ cat test3.pig
> a = load './1.dat' using PigStorage(',') as (id:int, fruit:chararray);
> b = LIMIT a 2;
> STORE b INTO 'foo';
>
> I ran with with "pig -x tez -f test3.pig" and got the following (correct results):
>
> [abain@abain-ld pig]$ hadoop fs -ls /user/abain/foo
> Found 2 items
> -rw-r--r-- 1 abain supergroup 0 2013-10-23 18:38 /user/abain/foo/_SUCCESS
> -rw-r--r-- 1 abain supergroup 17 2013-10-23 18:38 /user/abain/foo/part-r-00000
>
> [abain@abain-ld pig]$ hadoop fs -cat /user/abain/foo/part-r-00000
> 1 orange
> 2 apple
>
> UPDATED (Oct 24 4:37 PM):
> 1. ant -Dtestcase=TestTezCompiler test passes
> 2. I ran test-e2e-tez. The new test seems to pass (although something else failed).
>
>
> Thanks,
>
> Alex Bain
>
>
Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez
Posted by Alex Bain <am...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14897/
-----------------------------------------------------------
(Updated Oct. 24, 2013, 4:47 p.m.)
Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
Changes
-------
New diff generated with --no-prefix
Bugs: PIG-3538
https://issues.apache.org/jira/browse/PIG-3538
Repository: pig-git
Description
-------
Implement LIMIT in Tez by providing an implementation of visitLimit in TezCompiler.java.
UPDATED (Oct 24 4:37 PM):
1. I added a test to TestTezCompiler.java and a GLD file
2. I included Daniel's patch for a new e2e test
Diffs (updated)
-----
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0c20214
test/e2e/pig/tests/tez.conf 5edc093
test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld PRE-CREATION
test/org/apache/pig/tez/TestTezCompiler.java ef51876
test/org/apache/pig/tez/TestTezJobControlCompiler.java 0a23513
Diff: https://reviews.apache.org/r/14897/diff/
Testing
-------
[abain@abain-ld pig]$ cat data/1.dat
1,orange
2,apple
3,strawberry
[abain@abain-ld pig]$ cat test3.pig
a = load './1.dat' using PigStorage(',') as (id:int, fruit:chararray);
b = LIMIT a 2;
STORE b INTO 'foo';
I ran with with "pig -x tez -f test3.pig" and got the following (correct results):
[abain@abain-ld pig]$ hadoop fs -ls /user/abain/foo
Found 2 items
-rw-r--r-- 1 abain supergroup 0 2013-10-23 18:38 /user/abain/foo/_SUCCESS
-rw-r--r-- 1 abain supergroup 17 2013-10-23 18:38 /user/abain/foo/part-r-00000
[abain@abain-ld pig]$ hadoop fs -cat /user/abain/foo/part-r-00000
1 orange
2 apple
UPDATED (Oct 24 4:37 PM):
1. ant -Dtestcase=TestTezCompiler test passes
2. I ran test-e2e-tez. The new test seems to pass (although something else failed).
Thanks,
Alex Bain
Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez
Posted by Alex Bain <am...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14897/
-----------------------------------------------------------
(Updated Oct. 24, 2013, 4:45 p.m.)
Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
Changes
-------
Updated diff based on comments from reviewers. Incorporated Daniel's patch for the e2e test.
Bugs: PIG-3538
https://issues.apache.org/jira/browse/PIG-3538
Repository: pig-git
Description (updated)
-------
Implement LIMIT in Tez by providing an implementation of visitLimit in TezCompiler.java.
UPDATED (Oct 24 4:37 PM):
1. I added a test to TestTezCompiler.java and a GLD file
2. I included Daniel's patch for a new e2e test
Diffs (updated)
-----
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0c20214
test/e2e/pig/tests/tez.conf 5edc093
test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld PRE-CREATION
test/org/apache/pig/tez/TestTezCompiler.java ef51876
test/org/apache/pig/tez/TestTezJobControlCompiler.java 0a23513
Diff: https://reviews.apache.org/r/14897/diff/
Testing (updated)
-------
[abain@abain-ld pig]$ cat data/1.dat
1,orange
2,apple
3,strawberry
[abain@abain-ld pig]$ cat test3.pig
a = load './1.dat' using PigStorage(',') as (id:int, fruit:chararray);
b = LIMIT a 2;
STORE b INTO 'foo';
I ran with with "pig -x tez -f test3.pig" and got the following (correct results):
[abain@abain-ld pig]$ hadoop fs -ls /user/abain/foo
Found 2 items
-rw-r--r-- 1 abain supergroup 0 2013-10-23 18:38 /user/abain/foo/_SUCCESS
-rw-r--r-- 1 abain supergroup 17 2013-10-23 18:38 /user/abain/foo/part-r-00000
[abain@abain-ld pig]$ hadoop fs -cat /user/abain/foo/part-r-00000
1 orange
2 apple
UPDATED (Oct 24 4:37 PM):
1. ant -Dtestcase=TestTezCompiler test passes
2. I ran test-e2e-tez. The new test seems to pass (although something else failed).
Thanks,
Alex Bain
Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez
Posted by Daniel Dai <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14897/#review27470
-----------------------------------------------------------
Thanks Alex. Patch looks good in general. Might consider some optimization in the future:
1. If the previous stage using 1 reduce, no need to add one more vertex
2. If the limitplan is null (ie, not the "limited order by" case), we might not need a shuffle edge, a pass through edge should be enough if possible
3. Similar to PIG-1270, we can push limit to InputHandler
4. We also need to think through the "limited order by" case once "order by" is implemented.
Once Choelsoo's comments are addressed, we are ready to go. I will add a e2e test case for it.
- Daniel Dai
On Oct. 24, 2013, 1:42 a.m., Alex Bain wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14897/
> -----------------------------------------------------------
>
> (Updated Oct. 24, 2013, 1:42 a.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3538
> https://issues.apache.org/jira/browse/PIG-3538
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Implement LIMIT in Tez by providing an implementation of visitLimit in TezCompiler.java.
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0c20214
>
> Diff: https://reviews.apache.org/r/14897/diff/
>
>
> Testing
> -------
>
> [abain@abain-ld pig]$ cat data/1.dat
> 1,orange
> 2,apple
> 3,strawberry
>
> [abain@abain-ld pig]$ cat test3.pig
> a = load './1.dat' using PigStorage(',') as (id:int, fruit:chararray);
> b = LIMIT a 2;
> STORE b INTO 'foo';
>
> I ran with with "pig -x tez -f test3.pig" and got the following (correct results):
>
> [abain@abain-ld pig]$ hadoop fs -ls /user/abain/foo
> Found 2 items
> -rw-r--r-- 1 abain supergroup 0 2013-10-23 18:38 /user/abain/foo/_SUCCESS
> -rw-r--r-- 1 abain supergroup 17 2013-10-23 18:38 /user/abain/foo/part-r-00000
>
> [abain@abain-ld pig]$ hadoop fs -cat /user/abain/foo/part-r-00000
> 1 orange
> 2 apple
>
>
> Thanks,
>
> Alex Bain
>
>
Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez
Posted by Cheolsoo Park <pi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14897/#review27443
-----------------------------------------------------------
Looks great! Thanks a lot!
In terms of tests, can you please add a test case to TestTezCompiler? Please look at existing test cases and TEZCx.gld files. With that, I think we can commit this patch. Later we should add more tests using either mini cluster or e2e.
I also made few minor comments below.
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53329>
Please remove all the trailing white spaces in the patch.
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53333>
Can you mark this comment as TODO?
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53335>
This is not your fault but mine. I realized that blocking(op) has no need to take an argument. Can you please change it to blocking()? It just confuses me when reading the code.
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53334>
In the same line.
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53330>
Do you mind using Lists.newArrayList() instead of new ArrayList<PhysicalPlan>()? Just for cleaner code.
There are several places.
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/14897/#comment53331>
Why is this method public? Can't we keep it private?
Also please put the method signature and curly brace in a single line.
- Cheolsoo Park
On Oct. 24, 2013, 1:42 a.m., Alex Bain wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14897/
> -----------------------------------------------------------
>
> (Updated Oct. 24, 2013, 1:42 a.m.)
>
>
> Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3538
> https://issues.apache.org/jira/browse/PIG-3538
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Implement LIMIT in Tez by providing an implementation of visitLimit in TezCompiler.java.
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0c20214
>
> Diff: https://reviews.apache.org/r/14897/diff/
>
>
> Testing
> -------
>
> [abain@abain-ld pig]$ cat data/1.dat
> 1,orange
> 2,apple
> 3,strawberry
>
> [abain@abain-ld pig]$ cat test3.pig
> a = load './1.dat' using PigStorage(',') as (id:int, fruit:chararray);
> b = LIMIT a 2;
> STORE b INTO 'foo';
>
> I ran with with "pig -x tez -f test3.pig" and got the following (correct results):
>
> [abain@abain-ld pig]$ hadoop fs -ls /user/abain/foo
> Found 2 items
> -rw-r--r-- 1 abain supergroup 0 2013-10-23 18:38 /user/abain/foo/_SUCCESS
> -rw-r--r-- 1 abain supergroup 17 2013-10-23 18:38 /user/abain/foo/part-r-00000
>
> [abain@abain-ld pig]$ hadoop fs -cat /user/abain/foo/part-r-00000
> 1 orange
> 2 apple
>
>
> Thanks,
>
> Alex Bain
>
>
Re: Review Request 14897: PIG-3538 Implement LIMIT in Tez
Posted by Alex Bain <am...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14897/
-----------------------------------------------------------
(Updated Oct. 23, 2013, 6:42 p.m.)
Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
Summary (updated)
-----------------
PIG-3538 Implement LIMIT in Tez
Bugs: PIG-3538
https://issues.apache.org/jira/browse/PIG-3538
Repository: pig-git
Description
-------
Implement LIMIT in Tez by providing an implementation of visitLimit in TezCompiler.java.
Diffs
-----
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 0c20214
Diff: https://reviews.apache.org/r/14897/diff/
Testing (updated)
-------
[abain@abain-ld pig]$ cat data/1.dat
1,orange
2,apple
3,strawberry
[abain@abain-ld pig]$ cat test3.pig
a = load './1.dat' using PigStorage(',') as (id:int, fruit:chararray);
b = LIMIT a 2;
STORE b INTO 'foo';
I ran with with "pig -x tez -f test3.pig" and got the following (correct results):
[abain@abain-ld pig]$ hadoop fs -ls /user/abain/foo
Found 2 items
-rw-r--r-- 1 abain supergroup 0 2013-10-23 18:38 /user/abain/foo/_SUCCESS
-rw-r--r-- 1 abain supergroup 17 2013-10-23 18:38 /user/abain/foo/part-r-00000
[abain@abain-ld pig]$ hadoop fs -cat /user/abain/foo/part-r-00000
1 orange
2 apple
Thanks,
Alex Bain