You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Rohini Palaniswamy <ro...@gmail.com> on 2015/06/16 09:20:00 UTC
Review Request 35491: PIG-4574: Eliminate identity vertex for order
by and skewed join right after LOAD
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35491/
-----------------------------------------------------------
Review request for pig.
Bugs: PIG-4574
https://issues.apache.org/jira/browse/PIG-4574
Repository: pig
Description
-------
Reading orderby/skewed join data from HDFS in Partitioner vertex, instead of getting from sampler vertex.
This jira does not optimize the case of
A = LOAD 'x' ...;
B = LOAD 'y' ...;
C = UNION A, B;
D = ORDER C BY ..;
This depends on UnionOptimizer being turned on and will need more changes. So will leave this for another jira.
Diffs
-----
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java 1685498
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POIdentityInOutTez.java 1685498
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POLocalRearrangeTez.java 1685498
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Limit-2.gld 1685498
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Order-1.gld 1685498
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Order-2.gld PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-SkewJoin-1.gld 1685498
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-SkewJoin-2.gld PRE-CREATION
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-16-OPTOFF.gld 1685498
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-16.gld 1685498
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezAutoParallelism.java 1685498
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezCompiler.java 1685498
Diff: https://reviews.apache.org/r/35491/diff/
Testing
-------
Ran subset of e2e tests - SkewedJoin,Union,Order,MultiQuery_Self,MultiQuery_Union
Ran L9.pig. Before the patch
File System Counters
FILE_BYTES_READ=2028282366911
FILE_BYTES_WRITTEN=4049785379197
HDFS_BYTES_READ=1011533488395
HDFS_BYTES_WRITTEN=1010554380555
After the patch
File System Counters
FILE_BYTES_READ=1007449863330
FILE_BYTES_WRITTEN=2016036957653
HDFS_BYTES_READ=2023066976790
HDFS_BYTES_WRITTEN=1010554380555
Thanks,
Rohini Palaniswamy
Re: Review Request 35491: PIG-4574: Eliminate identity vertex for
order by and skewed join right after LOAD
Posted by Daniel Dai <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35491/#review89294
-----------------------------------------------------------
Ship it!
Ship It!
- Daniel Dai
On June 16, 2015, 7:19 a.m., Rohini Palaniswamy wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35491/
> -----------------------------------------------------------
>
> (Updated June 16, 2015, 7:19 a.m.)
>
>
> Review request for pig.
>
>
> Bugs: PIG-4574
> https://issues.apache.org/jira/browse/PIG-4574
>
>
> Repository: pig
>
>
> Description
> -------
>
> Reading orderby/skewed join data from HDFS in Partitioner vertex, instead of getting from sampler vertex.
>
> This jira does not optimize the case of
>
> A = LOAD 'x' ...;
> B = LOAD 'y' ...;
> C = UNION A, B;
> D = ORDER C BY ..;
>
> This depends on UnionOptimizer being turned on and will need more changes. So will leave this for another jira.
>
>
> Diffs
> -----
>
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POIdentityInOutTez.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POLocalRearrangeTez.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Limit-2.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Order-1.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Order-2.gld PRE-CREATION
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-SkewJoin-1.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-SkewJoin-2.gld PRE-CREATION
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-16-OPTOFF.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-16.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezAutoParallelism.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezCompiler.java 1685498
>
> Diff: https://reviews.apache.org/r/35491/diff/
>
>
> Testing
> -------
>
> Ran subset of e2e tests - SkewedJoin,Union,Order,MultiQuery_Self,MultiQuery_Union
>
> Ran L9.pig. Before the patch
>
> File System Counters
> FILE_BYTES_READ=2028282366911
> FILE_BYTES_WRITTEN=4049785379197
> HDFS_BYTES_READ=1011533488395
> HDFS_BYTES_WRITTEN=1010554380555
>
> After the patch
>
> File System Counters
> FILE_BYTES_READ=1007449863330
> FILE_BYTES_WRITTEN=2016036957653
> HDFS_BYTES_READ=2023066976790
> HDFS_BYTES_WRITTEN=1010554380555
>
>
> Thanks,
>
> Rohini Palaniswamy
>
>
Re: Review Request 35491: PIG-4574: Eliminate identity vertex for
order by and skewed join right after LOAD
Posted by Rohini Palaniswamy <ro...@gmail.com>.
> On June 24, 2015, 6:38 p.m., Daniel Dai wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POLocalRearrangeTez.java, line 150
> > <https://reviews.apache.org/r/35491/diff/1/?file=985529#file985529line150>
> >
> > Can you add a comment why we need to wrap key into NullablePartitionWritable for skewed join?
Sure. POPartitionRearrange of the right table creates as NullablePartitionWritable as the key. Since left side uses LocalRearrange, we have to wrap it specifically to match the key type of the right one.
- Rohini
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35491/#review89225
-----------------------------------------------------------
On June 16, 2015, 7:19 a.m., Rohini Palaniswamy wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35491/
> -----------------------------------------------------------
>
> (Updated June 16, 2015, 7:19 a.m.)
>
>
> Review request for pig.
>
>
> Bugs: PIG-4574
> https://issues.apache.org/jira/browse/PIG-4574
>
>
> Repository: pig
>
>
> Description
> -------
>
> Reading orderby/skewed join data from HDFS in Partitioner vertex, instead of getting from sampler vertex.
>
> This jira does not optimize the case of
>
> A = LOAD 'x' ...;
> B = LOAD 'y' ...;
> C = UNION A, B;
> D = ORDER C BY ..;
>
> This depends on UnionOptimizer being turned on and will need more changes. So will leave this for another jira.
>
>
> Diffs
> -----
>
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POIdentityInOutTez.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POLocalRearrangeTez.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Limit-2.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Order-1.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Order-2.gld PRE-CREATION
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-SkewJoin-1.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-SkewJoin-2.gld PRE-CREATION
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-16-OPTOFF.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-16.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezAutoParallelism.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezCompiler.java 1685498
>
> Diff: https://reviews.apache.org/r/35491/diff/
>
>
> Testing
> -------
>
> Ran subset of e2e tests - SkewedJoin,Union,Order,MultiQuery_Self,MultiQuery_Union
>
> Ran L9.pig. Before the patch
>
> File System Counters
> FILE_BYTES_READ=2028282366911
> FILE_BYTES_WRITTEN=4049785379197
> HDFS_BYTES_READ=1011533488395
> HDFS_BYTES_WRITTEN=1010554380555
>
> After the patch
>
> File System Counters
> FILE_BYTES_READ=1007449863330
> FILE_BYTES_WRITTEN=2016036957653
> HDFS_BYTES_READ=2023066976790
> HDFS_BYTES_WRITTEN=1010554380555
>
>
> Thanks,
>
> Rohini Palaniswamy
>
>
Re: Review Request 35491: PIG-4574: Eliminate identity vertex for
order by and skewed join right after LOAD
Posted by Daniel Dai <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35491/#review89225
-----------------------------------------------------------
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POLocalRearrangeTez.java (line 150)
<https://reviews.apache.org/r/35491/#comment141821>
Can you add a comment why we need to wrap key into NullablePartitionWritable for skewed join?
- Daniel Dai
On June 16, 2015, 7:19 a.m., Rohini Palaniswamy wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35491/
> -----------------------------------------------------------
>
> (Updated June 16, 2015, 7:19 a.m.)
>
>
> Review request for pig.
>
>
> Bugs: PIG-4574
> https://issues.apache.org/jira/browse/PIG-4574
>
>
> Repository: pig
>
>
> Description
> -------
>
> Reading orderby/skewed join data from HDFS in Partitioner vertex, instead of getting from sampler vertex.
>
> This jira does not optimize the case of
>
> A = LOAD 'x' ...;
> B = LOAD 'y' ...;
> C = UNION A, B;
> D = ORDER C BY ..;
>
> This depends on UnionOptimizer being turned on and will need more changes. So will leave this for another jira.
>
>
> Diffs
> -----
>
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POIdentityInOutTez.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POLocalRearrangeTez.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Limit-2.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Order-1.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Order-2.gld PRE-CREATION
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-SkewJoin-1.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-SkewJoin-2.gld PRE-CREATION
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-16-OPTOFF.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-16.gld 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezAutoParallelism.java 1685498
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezCompiler.java 1685498
>
> Diff: https://reviews.apache.org/r/35491/diff/
>
>
> Testing
> -------
>
> Ran subset of e2e tests - SkewedJoin,Union,Order,MultiQuery_Self,MultiQuery_Union
>
> Ran L9.pig. Before the patch
>
> File System Counters
> FILE_BYTES_READ=2028282366911
> FILE_BYTES_WRITTEN=4049785379197
> HDFS_BYTES_READ=1011533488395
> HDFS_BYTES_WRITTEN=1010554380555
>
> After the patch
>
> File System Counters
> FILE_BYTES_READ=1007449863330
> FILE_BYTES_WRITTEN=2016036957653
> HDFS_BYTES_READ=2023066976790
> HDFS_BYTES_WRITTEN=1010554380555
>
>
> Thanks,
>
> Rohini Palaniswamy
>
>