You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Cheolsoo Park <pi...@gmail.com> on 2014/03/25 21:50:28 UTC
Review Request 19633: PIG-3743: Use VertexGroup and Alias vertex for union
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19633/
-----------------------------------------------------------
Review request for pig, Daniel Dai and Rohini Palaniswamy.
Bugs: PIG-3743
https://issues.apache.org/jira/browse/PIG-3743
Repository: pig-git
Description
-------
The patch reimplements union using Tez VertexGroup and GroupInputEdge.
The changes include-
* Implemented POVertexGroupInputTez that takes ConcatenatedMergedKeyValuesInput from VertexGroup.
* TezCompiler inserts an alias vertex for union, and the alias vertex is converted to VertexGroup by TezDagBuilder.
* TezStats JobGraphBuilder removes alias vertices since they're not materialized by Tez, and thus, there is no status for them.
Note that-
* Further optimization is possible for the case where union is only followed by store. In that case, we could directly attach a MROutput to VertexGroup instead of adding another vertex that runs the MROutput. I'll follow up with this soon.
* POLocalRearrangeTez is added to each union source because ConcatenatedMergedKeyValuesInput expected ShuffledMergedInputs that requires sorting.
Diffs
-----
src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java e496ca8
src/org/apache/pig/backend/hadoop/executionengine/tez/POVertexGroupInputTez.java e69de29
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 245cade
src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java bce8963
src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java b3aa020
src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java f00946c
src/org/apache/pig/tools/pigstats/tez/TezStats.java feac11d
test/org/apache/pig/test/data/GoldenFiles/TEZC19.gld e69de29
test/org/apache/pig/tez/TestTezCompiler.java e71d838
Diff: https://reviews.apache.org/r/19633/diff/
Testing
-------
ant test-tez passes except TestTezCompiler (known issue).
tez e2e tests all pass.
Thanks,
Cheolsoo Park
Re: Review Request 19633: PIG-3743: Use VertexGroup and Alias vertex for
union
Posted by Cheolsoo Park <pi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19633/
-----------------------------------------------------------
(Updated March 25, 2014, 10 p.m.)
Review request for pig, Daniel Dai and Rohini Palaniswamy.
Changes
-------
Incorporated Rohini's comments.
Bugs: PIG-3743
https://issues.apache.org/jira/browse/PIG-3743
Repository: pig-git
Description
-------
The patch reimplements union using Tez VertexGroup and GroupInputEdge.
The changes include-
* Implemented POVertexGroupInputTez that takes ConcatenatedMergedKeyValuesInput from VertexGroup.
* TezCompiler inserts an alias vertex for union, and the alias vertex is converted to VertexGroup by TezDagBuilder.
* TezStats JobGraphBuilder removes alias vertices since they're not materialized by Tez, and thus, there is no status for them.
Note that-
* Further optimization is possible for the case where union is only followed by store. In that case, we could directly attach a MROutput to VertexGroup instead of adding another vertex that runs the MROutput. I'll follow up with this soon.
* POLocalRearrangeTez is added to each union source because ConcatenatedMergedKeyValuesInput expected ShuffledMergedInputs that requires sorting.
Diffs (updated)
-----
src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java e496ca8
src/org/apache/pig/backend/hadoop/executionengine/tez/POVertexGroupInputTez.java e69de29
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 245cade
src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java bce8963
src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java b3aa020
src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java f00946c
src/org/apache/pig/tools/pigstats/tez/TezStats.java feac11d
test/org/apache/pig/test/data/GoldenFiles/TEZC19.gld e69de29
test/org/apache/pig/tez/TestTezCompiler.java e71d838
Diff: https://reviews.apache.org/r/19633/diff/
Testing
-------
ant test-tez passes except TestTezCompiler (known issue).
tez e2e tests all pass.
Thanks,
Cheolsoo Park
Re: Review Request 19633: PIG-3743: Use VertexGroup and Alias vertex for
union
Posted by Cheolsoo Park <pi...@gmail.com>.
> On March 25, 2014, 9:27 p.m., Rohini Palaniswamy wrote:
> > This will not handle union followed by store right? Can we create a separate jira for that?
Correct. Like I said in the description, MROutput should be added directly to VertexGroup.
I will file a jira.
> On March 25, 2014, 9:27 p.m., Rohini Palaniswamy wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/POVertexGroupInputTez.java, lines 82-83
> > <https://reviews.apache.org/r/19633/diff/1/?file=535690#file535690line82>
> >
> > Don't we have to iterate through all the values? Won't there be more than one if shuffle edge is used and input grouped?
No, I don't think so. It should be handled by ConcatenatedMergedKeyValuesInput. If I am understanding it correctly, calling reader.next() updates the underlying reader. So the following reader.getCurrentValues().iterator() returns the new iterator if the previous reader is done.
In fact, I can confirm the result by testing union with multiple sources, e.g. 3+. It seems correct.
> On March 25, 2014, 9:27 p.m., Rohini Palaniswamy wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java, line 1979
> > <https://reviews.apache.org/r/19633/diff/1/?file=535691#file535691line1979>
> >
> > Can you please add a TODO to use POValueOutputTez instead of POLocalRearrange and unsorted shuffle with TEZ-661 and PIG-3775
Sure.
> On March 25, 2014, 9:27 p.m., Rohini Palaniswamy wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java, line 54
> > <https://reviews.apache.org/r/19633/diff/1/?file=535694#file535694line54>
> >
> > hasVertexGroup()
I 100% agree hasVertexGroup() reads better, but since explain doesn't go through TezDagBuilder, VertexGroup of tezOper is not set when TezPrinter traverses the plan.
Maybe I consolidate hasVertexGroup() and isUnion() into one, and then use it everywhere.
- Cheolsoo
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19633/#review38532
-----------------------------------------------------------
On March 25, 2014, 8:50 p.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19633/
> -----------------------------------------------------------
>
> (Updated March 25, 2014, 8:50 p.m.)
>
>
> Review request for pig, Daniel Dai and Rohini Palaniswamy.
>
>
> Bugs: PIG-3743
> https://issues.apache.org/jira/browse/PIG-3743
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> The patch reimplements union using Tez VertexGroup and GroupInputEdge.
>
> The changes include-
> * Implemented POVertexGroupInputTez that takes ConcatenatedMergedKeyValuesInput from VertexGroup.
> * TezCompiler inserts an alias vertex for union, and the alias vertex is converted to VertexGroup by TezDagBuilder.
> * TezStats JobGraphBuilder removes alias vertices since they're not materialized by Tez, and thus, there is no status for them.
>
> Note that-
> * Further optimization is possible for the case where union is only followed by store. In that case, we could directly attach a MROutput to VertexGroup instead of adding another vertex that runs the MROutput. I'll follow up with this soon.
> * POLocalRearrangeTez is added to each union source because ConcatenatedMergedKeyValuesInput expected ShuffledMergedInputs that requires sorting.
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java e496ca8
> src/org/apache/pig/backend/hadoop/executionengine/tez/POVertexGroupInputTez.java e69de29
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 245cade
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java bce8963
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java b3aa020
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java f00946c
> src/org/apache/pig/tools/pigstats/tez/TezStats.java feac11d
> test/org/apache/pig/test/data/GoldenFiles/TEZC19.gld e69de29
> test/org/apache/pig/tez/TestTezCompiler.java e71d838
>
> Diff: https://reviews.apache.org/r/19633/diff/
>
>
> Testing
> -------
>
> ant test-tez passes except TestTezCompiler (known issue).
> tez e2e tests all pass.
>
>
> Thanks,
>
> Cheolsoo Park
>
>
Re: Review Request 19633: PIG-3743: Use VertexGroup and Alias vertex for
union
Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19633/#review38532
-----------------------------------------------------------
This will not handle union followed by store right? Can we create a separate jira for that?
src/org/apache/pig/backend/hadoop/executionengine/tez/POVertexGroupInputTez.java
<https://reviews.apache.org/r/19633/#comment70769>
Don't we have to iterate through all the values? Won't there be more than one if shuffle edge is used and input grouped?
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/19633/#comment70770>
Can you please add a TODO to use POValueOutputTez instead of POLocalRearrange and unsorted shuffle with TEZ-661 and PIG-3775
src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java
<https://reviews.apache.org/r/19633/#comment70772>
hasVertexGroup()
- Rohini Palaniswamy
On March 25, 2014, 8:50 p.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19633/
> -----------------------------------------------------------
>
> (Updated March 25, 2014, 8:50 p.m.)
>
>
> Review request for pig, Daniel Dai and Rohini Palaniswamy.
>
>
> Bugs: PIG-3743
> https://issues.apache.org/jira/browse/PIG-3743
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> The patch reimplements union using Tez VertexGroup and GroupInputEdge.
>
> The changes include-
> * Implemented POVertexGroupInputTez that takes ConcatenatedMergedKeyValuesInput from VertexGroup.
> * TezCompiler inserts an alias vertex for union, and the alias vertex is converted to VertexGroup by TezDagBuilder.
> * TezStats JobGraphBuilder removes alias vertices since they're not materialized by Tez, and thus, there is no status for them.
>
> Note that-
> * Further optimization is possible for the case where union is only followed by store. In that case, we could directly attach a MROutput to VertexGroup instead of adding another vertex that runs the MROutput. I'll follow up with this soon.
> * POLocalRearrangeTez is added to each union source because ConcatenatedMergedKeyValuesInput expected ShuffledMergedInputs that requires sorting.
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java e496ca8
> src/org/apache/pig/backend/hadoop/executionengine/tez/POVertexGroupInputTez.java e69de29
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 245cade
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java bce8963
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java b3aa020
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java f00946c
> src/org/apache/pig/tools/pigstats/tez/TezStats.java feac11d
> test/org/apache/pig/test/data/GoldenFiles/TEZC19.gld e69de29
> test/org/apache/pig/tez/TestTezCompiler.java e71d838
>
> Diff: https://reviews.apache.org/r/19633/diff/
>
>
> Testing
> -------
>
> ant test-tez passes except TestTezCompiler (known issue).
> tez e2e tests all pass.
>
>
> Thanks,
>
> Cheolsoo Park
>
>