You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by da...@apache.org on 2017/01/18 18:02:23 UTC

svn commit: r1779362 - in /pig/trunk: ./ contrib/piggybank/java/ contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/ contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/ src/docs/src/documentation/content/xdocs/

Author: daijy
Date: Wed Jan 18 18:02:23 2017
New Revision: 1779362

URL: http://svn.apache.org/viewvc?rev=1779362&view=rev
Log:
PIG-5109: Remove HadoopJobHistoryLoader

Removed:
    pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HadoopJobHistoryLoader.java
    pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestHadoopJobHistoryLoader.java
Modified:
    pig/trunk/CHANGES.txt
    pig/trunk/contrib/piggybank/java/build.xml
    pig/trunk/src/docs/src/documentation/content/xdocs/pig-index.xml
    pig/trunk/src/docs/src/documentation/content/xdocs/test.xml

Modified: pig/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/pig/trunk/CHANGES.txt?rev=1779362&r1=1779361&r2=1779362&view=diff
==============================================================================
--- pig/trunk/CHANGES.txt (original)
+++ pig/trunk/CHANGES.txt Wed Jan 18 18:02:23 2017
@@ -28,6 +28,8 @@ PIG-4897: Scope of param substitution fo
 
 PIG-4923: Drop Hadoop 1.x support in Pig 0.17 (szita via rohini)
 
+PIG-5109: Remove HadoopJobHistoryLoader (szita via daijy)
+
 PIG-5067: Revisit union on numeric type and chararray to bytearray (knoguchi)
  
 IMPROVEMENTS

Modified: pig/trunk/contrib/piggybank/java/build.xml
URL: http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/build.xml?rev=1779362&r1=1779361&r2=1779362&view=diff
==============================================================================
--- pig/trunk/contrib/piggybank/java/build.xml (original)
+++ pig/trunk/contrib/piggybank/java/build.xml Wed Jan 18 18:02:23 2017
@@ -59,14 +59,6 @@
     </if>
     <property name="hadoopversion" value="2" />
 
-    <!-- JobHistoryLoader currently does not support 2 -->
-    <condition property="build.classes.excludes" value="**/HadoopJobHistoryLoader.java" else="">
-        <equals arg1="${hadoopversion}" arg2="2"/>
-    </condition>
-    <condition property="test.classes.excludes" value="**/TestHadoopJobHistoryLoader.java" else="">
-        <equals arg1="${hadoopversion}" arg2="2"/>
-    </condition>
-
     <condition property="hadoopsuffix" value="2" else="">
         <equals arg1="${hadoopversion}" arg2="2"/>
     </condition>

Modified: pig/trunk/src/docs/src/documentation/content/xdocs/pig-index.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/pig-index.xml?rev=1779362&r1=1779361&r2=1779362&view=diff
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/pig-index.xml (original)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/pig-index.xml Wed Jan 18 18:02:23 2017
@@ -404,7 +404,6 @@
 <p>Hadoop
 <br></br>&nbsp;&nbsp;&nbsp; <a href="cmds.html#fs">FsShell commands</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#load-glob">Hadoop globbing</a>
-<br></br>&nbsp;&nbsp;&nbsp; <a href="test.html#hadoop-job-history-loader">HadoopJobHistoryLoader</a>
 <br></br>&nbsp;&nbsp;&nbsp; hadoop partitioner. <em>See</em> PARTITION BY
 <br></br>&nbsp;&nbsp;&nbsp; <a href="start.html#hadoop-properties">Hadoop properties</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="start.html#req">versions supported</a>

Modified: pig/trunk/src/docs/src/documentation/content/xdocs/test.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/test.xml?rev=1779362&r1=1779361&r2=1779362&view=diff
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/test.xml (original)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/test.xml Wed Jan 18 18:02:23 2017
@@ -540,7 +540,7 @@ job_201004271216_12714 1 1 3 3 3 12 12 1
 
 <p>Pig Statistics is a framework for collecting and storing script-level statistics for Pig Latin. Characteristics of Pig Latin scripts and the resulting MapReduce jobs are collected while the script is executed. These statistics are then available for Pig users and tools using Pig (such as Oozie) to retrieve after the job is done.</p>
 
-<p>The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file (and job xml file). Piggybank has a HadoopJobHistoryLoader which acts as an example of using Pig itself to query these statistics (the loader can be used as a reference implementation but is NOT supported for production use).</p>
+<p>The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file (and job xml file).</p>
 
 <!-- +++++++++++++++++++++++++++++++++++++++ -->
 <section>
@@ -708,93 +708,6 @@ public interface PigProgressNotification
 </tr>
 </table>
 </section>
-
-
-<!-- +++++++++++++++++++++++++++++++++++++++ -->
-<section id="hadoop-job-history-loader">
-<title>Hadoop Job History Loader</title>
-<p>The HadoopJobHistoryLoader in Piggybank loads Hadoop job history files and job xml files from file system. For each MapReduce job, the loader produces a tuple with schema (j:map[], m:map[], r:map[]). The first map in the schema contains job-related entries. Here are some of important key names in the map: </p>
-
-<table>
-<tr>
-<td>
-<p>PIG_SCRIPT_ID</p>
-<p>CLUSTER </p>
-<p>QUEUE_NAME</p>
-<p>JOBID</p>
-<p>JOBNAME</p>
-<p>STATUS</p>
-</td>
-<td>
-<p>USER </p>
-<p>HADOOP_VERSION  </p>
-<p>PIG_VERSION</p>
-<p>PIG_JOB_FEATURE</p>
-<p>PIG_JOB_ALIAS </p>
-<p>PIG_JOB_PARENTS</p>
-</td>
-<td>
-<p>SUBMIT_TIME</p>
-<p>LAUNCH_TIME</p>
-<p>FINISH_TIME</p>
-<p>TOTAL_MAPS</p>
-<p>TOTAL_REDUCES</p>
-</td>
-</tr>
-</table>
-<p></p>
-<p>Examples that use the loader to query Pig statistics are shown below.</p>
-</section>
-
-<!-- +++++++++++++++++++++++++++++++++++++++ -->
-<section>
-<title>Examples</title>
-<p>Find scripts that generate more then three MapReduce jobs:</p>
-<source>
-a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
-b = group a by (j#'PIG_SCRIPT_ID', j#'USER', j#'JOBNAME');
-c = foreach b generate group.$1, group.$2, COUNT(a);
-d = filter c by $2 > 3;
-dump d;
-</source>
-
-<p>Find the running time of each script (in seconds): </p>
-<source>
-a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
-b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, 
-         (Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as end;
-c = group b by (id, user, script_name)
-d = foreach c generate group.user, group.script_name, (MAX(b.end) - MIN(b.start)/1000;
-dump d;
-</source>
-
-<p>Find the number of scripts run by user and queue on a cluster: </p>
-<source>
-a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
-b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;
-c = group b by (id, user, queue) parallel 10;
-d = foreach c generate group.user, group.queue, COUNT(b);
-dump d;
-</source>
-
-<p>Find scripts that have failed jobs: </p>
-<source>
-a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
-b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as job;
-c = filter b by status != 'SUCCESS';
-dump c;
-</source>
-
-<p>Find scripts that use only the default parallelism: </p>
-<source>
-a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
-b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;
-c = group b by (id, user, script_name) parallel 10;
-d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;
-e = filter d by max_reduces == 1;
-dump e;
-</source>
-</section>
 </section>   
 
 <!-- =========================================================================== -->