You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Cheolsoo Park <pi...@gmail.com> on 2013/10/20 10:42:11 UTC

Review Request 14776: PIG-3514 Initial implementation of TezStats

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14776/
-----------------------------------------------------------

Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.


Bugs: PIG-3514
    https://issues.apache.org/jira/browse/PIG-3514


Repository: pig-git


Description
-------

This is an initial implementation of TezStats. For now, it collects the number of succeeded/failed Tez vertices and prints the "success/failed" message at the end.

In summary, I implemented the following classes:
* TezStats extends PigStats
* TezVertexStats extends JobStats

Note that TezVertexStats captures a Tez vertex not a Tez job.

In addition, I moved several fields and methods that can be commonly used by both SimplePigStats and TezStats to PigStats.


Diffs
-----

  src/org/apache/pig/backend/hadoop/executionengine/tez/TezJob.java d88eb6b 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java d7577c1 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezStats.java ef8733a 
  src/org/apache/pig/tools/pigstats/PigStats.java 6d2e58e 
  src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java bbfd5a9 
  src/org/apache/pig/tools/pigstats/tez/TezStats.java e69de29 
  src/org/apache/pig/tools/pigstats/tez/TezVertexStats.java e69de29 
  test/org/apache/pig/tez/TestTezLauncher.java 0a3bc73 

Diff: https://reviews.apache.org/r/14776/diff/


Testing
-------

Ran a MRR job on a single node Tez cluster and confirmed that the job status is printed correctly. Here are examples:

-----
2013-10-20 00:29:13,218 [main] INFO  org.apache.pig.tools.pigstats.tez.TezStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
2.2.0	0.13.0-SNAPSHOT	cheolsoop	2013-10-20 00:28:44	2013-10-20 00:29:13	GROUP_BY,FILTER

Success!
-----
2013-10-20 00:30:10,970 [main] INFO  org.apache.pig.tools.pigstats.tez.TezStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
2.2.0	0.13.0-SNAPSHOT	cheolsoop	2013-10-20 00:29:30	2013-10-20 00:30:10	GROUP_BY,FILTER

Failed!
-----

More unit tests will be added after Tez mini cluster is added.


Thanks,

Cheolsoo Park


Re: Review Request 14776: PIG-3514 Initial implementation of TezStats

Posted by Cheolsoo Park <pi...@gmail.com>.

> On Oct. 22, 2013, 12:19 a.m., Daniel Dai wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java, line 50
> > <https://reviews.apache.org/r/14776/diff/2/?file=368517#file368517line50>
> >
> >     We will soon need to launch multiple jobs from Pig, and collect and aggregate multiple TezStats.

Thank you Daniel for the review! I can look into handling multiple DAGs. It will require changes in TezJobControlCompiler as well.


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14776/#review27266
-----------------------------------------------------------


On Oct. 20, 2013, 11:06 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14776/
> -----------------------------------------------------------
> 
> (Updated Oct. 20, 2013, 11:06 p.m.)
> 
> 
> Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-3514
>     https://issues.apache.org/jira/browse/PIG-3514
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> This is an initial implementation of TezStats. For now, it collects the number of succeeded/failed Tez vertices and prints the "success/failed" message at the end.
> 
> In summary, I implemented the following classes:
> * TezStats extends PigStats
> * TezVertexStats extends JobStats
> 
> Note that TezVertexStats captures a Tez vertex not a Tez job.
> 
> In addition, I moved several fields and methods that can be commonly used by both SimplePigStats and TezStats to PigStats.
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezExecType.java c726923 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezExecutionEngine.java 6e748a8 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezJob.java d88eb6b 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java d7577c1 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezScriptState.java eb4eefb 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezStats.java ef8733a 
>   src/org/apache/pig/tools/pigstats/PigStats.java 6d2e58e 
>   src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java bbfd5a9 
>   src/org/apache/pig/tools/pigstats/tez/TezScriptState.java e69de29 
>   src/org/apache/pig/tools/pigstats/tez/TezStats.java e69de29 
>   src/org/apache/pig/tools/pigstats/tez/TezTaskStats.java e69de29 
>   test/org/apache/pig/tez/TestTezLauncher.java 0a3bc73 
> 
> Diff: https://reviews.apache.org/r/14776/diff/
> 
> 
> Testing
> -------
> 
> Ran a MRR job on a single node Tez cluster and confirmed that the job status is printed correctly. Here are examples:
> 
> -----
> 2013-10-20 00:29:13,218 [main] INFO  org.apache.pig.tools.pigstats.tez.TezStats - Script Statistics: 
> 
> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
> 2.2.0	0.13.0-SNAPSHOT	cheolsoop	2013-10-20 00:28:44	2013-10-20 00:29:13	GROUP_BY,FILTER
> 
> Success!
> -----
> 2013-10-20 00:30:10,970 [main] INFO  org.apache.pig.tools.pigstats.tez.TezStats - Script Statistics: 
> 
> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
> 2.2.0	0.13.0-SNAPSHOT	cheolsoop	2013-10-20 00:29:30	2013-10-20 00:30:10	GROUP_BY,FILTER
> 
> Failed!
> -----
> 
> More unit tests will be added after Tez mini cluster is added.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request 14776: PIG-3514 Initial implementation of TezStats

Posted by Daniel Dai <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14776/#review27266
-----------------------------------------------------------

Ship it!



src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java
<https://reviews.apache.org/r/14776/#comment53114>

    We will soon need to launch multiple jobs from Pig, and collect and aggregate multiple TezStats. 


Looks good generally. With the patch, some e2e tests runs successfully. Let's commit it and go from there.

- Daniel Dai


On Oct. 20, 2013, 11:06 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14776/
> -----------------------------------------------------------
> 
> (Updated Oct. 20, 2013, 11:06 p.m.)
> 
> 
> Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-3514
>     https://issues.apache.org/jira/browse/PIG-3514
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> This is an initial implementation of TezStats. For now, it collects the number of succeeded/failed Tez vertices and prints the "success/failed" message at the end.
> 
> In summary, I implemented the following classes:
> * TezStats extends PigStats
> * TezVertexStats extends JobStats
> 
> Note that TezVertexStats captures a Tez vertex not a Tez job.
> 
> In addition, I moved several fields and methods that can be commonly used by both SimplePigStats and TezStats to PigStats.
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezExecType.java c726923 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezExecutionEngine.java 6e748a8 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezJob.java d88eb6b 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java d7577c1 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezScriptState.java eb4eefb 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezStats.java ef8733a 
>   src/org/apache/pig/tools/pigstats/PigStats.java 6d2e58e 
>   src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java bbfd5a9 
>   src/org/apache/pig/tools/pigstats/tez/TezScriptState.java e69de29 
>   src/org/apache/pig/tools/pigstats/tez/TezStats.java e69de29 
>   src/org/apache/pig/tools/pigstats/tez/TezTaskStats.java e69de29 
>   test/org/apache/pig/tez/TestTezLauncher.java 0a3bc73 
> 
> Diff: https://reviews.apache.org/r/14776/diff/
> 
> 
> Testing
> -------
> 
> Ran a MRR job on a single node Tez cluster and confirmed that the job status is printed correctly. Here are examples:
> 
> -----
> 2013-10-20 00:29:13,218 [main] INFO  org.apache.pig.tools.pigstats.tez.TezStats - Script Statistics: 
> 
> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
> 2.2.0	0.13.0-SNAPSHOT	cheolsoop	2013-10-20 00:28:44	2013-10-20 00:29:13	GROUP_BY,FILTER
> 
> Success!
> -----
> 2013-10-20 00:30:10,970 [main] INFO  org.apache.pig.tools.pigstats.tez.TezStats - Script Statistics: 
> 
> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
> 2.2.0	0.13.0-SNAPSHOT	cheolsoop	2013-10-20 00:29:30	2013-10-20 00:30:10	GROUP_BY,FILTER
> 
> Failed!
> -----
> 
> More unit tests will be added after Tez mini cluster is added.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request 14776: PIG-3514 Initial implementation of TezStats

Posted by Cheolsoo Park <pi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14776/
-----------------------------------------------------------

(Updated Oct. 20, 2013, 11:06 p.m.)


Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.


Changes
-------

Minor updates:
* Moved TezScriptState to pig.stats.tez wherein TezStats and TezTaskStats are.
* Renamed TezVertexStats to TezTaskStats.
* Cleaned up the TezStats format.
* Added Tez version to TezStats.

Here is the new TezStats output:

2013-10-20 15:52:39,219 [main] INFO  org.apache.pig.tools.pigstats.tez.TezStats - Script Statistics: 

       HadoopVersion: 2.2.0                                                                                               
          TezVersion: 0.2.0-SNAPSHOT                                                                                      
          PigVersion: 0.13.0-SNAPSHOT                                                                                     
              UserId: cheolsoop                                                                                           
           StartedAt: 2013-10-20 15:52:10                                                                                 
          FinishedAt: 2013-10-20 15:52:39                                                                                 
            Features: GROUP_BY,FILTER                                                                                     

Success!


Bugs: PIG-3514
    https://issues.apache.org/jira/browse/PIG-3514


Repository: pig-git


Description
-------

This is an initial implementation of TezStats. For now, it collects the number of succeeded/failed Tez vertices and prints the "success/failed" message at the end.

In summary, I implemented the following classes:
* TezStats extends PigStats
* TezVertexStats extends JobStats

Note that TezVertexStats captures a Tez vertex not a Tez job.

In addition, I moved several fields and methods that can be commonly used by both SimplePigStats and TezStats to PigStats.


Diffs (updated)
-----

  src/org/apache/pig/backend/hadoop/executionengine/tez/TezExecType.java c726923 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezExecutionEngine.java 6e748a8 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezJob.java d88eb6b 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java d7577c1 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezScriptState.java eb4eefb 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezStats.java ef8733a 
  src/org/apache/pig/tools/pigstats/PigStats.java 6d2e58e 
  src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java bbfd5a9 
  src/org/apache/pig/tools/pigstats/tez/TezScriptState.java e69de29 
  src/org/apache/pig/tools/pigstats/tez/TezStats.java e69de29 
  src/org/apache/pig/tools/pigstats/tez/TezTaskStats.java e69de29 
  test/org/apache/pig/tez/TestTezLauncher.java 0a3bc73 

Diff: https://reviews.apache.org/r/14776/diff/


Testing
-------

Ran a MRR job on a single node Tez cluster and confirmed that the job status is printed correctly. Here are examples:

-----
2013-10-20 00:29:13,218 [main] INFO  org.apache.pig.tools.pigstats.tez.TezStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
2.2.0	0.13.0-SNAPSHOT	cheolsoop	2013-10-20 00:28:44	2013-10-20 00:29:13	GROUP_BY,FILTER

Success!
-----
2013-10-20 00:30:10,970 [main] INFO  org.apache.pig.tools.pigstats.tez.TezStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
2.2.0	0.13.0-SNAPSHOT	cheolsoop	2013-10-20 00:29:30	2013-10-20 00:30:10	GROUP_BY,FILTER

Failed!
-----

More unit tests will be added after Tez mini cluster is added.


Thanks,

Cheolsoo Park