You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by da...@apache.org on 2014/11/13 04:48:03 UTC

svn commit: r1639242 - in /pig/branches/branch-0.14: ./ src/docs/src/documentation/content/xdocs/

Author: daijy
Date: Thu Nov 13 03:48:02 2014
New Revision: 1639242

URL: http://svn.apache.org/r1639242
Log:
PIG-4321: Documentation for 0.14

Modified:
    pig/branches/branch-0.14/CHANGES.txt
    pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/cont.xml
    pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/func.xml
    pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/perf.xml
    pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/start.xml
    pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/tabs.xml
    pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/test.xml
    pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/udf.xml

Modified: pig/branches/branch-0.14/CHANGES.txt
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.14/CHANGES.txt?rev=1639242&r1=1639241&r2=1639242&view=diff
==============================================================================
--- pig/branches/branch-0.14/CHANGES.txt (original)
+++ pig/branches/branch-0.14/CHANGES.txt Thu Nov 13 03:48:02 2014
@@ -24,6 +24,8 @@ INCOMPATIBLE CHANGES
  
 IMPROVEMENTS
 
+PIG-4321: Documentation for 0.14 (daijy)
+
 PIG-4328: Upgrade Hive to 0.14 (daijy)
 
 PIG-4318: Make PigConfiguration naming consistent (rohini)

Modified: pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/cont.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/cont.xml?rev=1639242&r1=1639241&r2=1639242&view=diff
==============================================================================
--- pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/cont.xml (original)
+++ pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/cont.xml Thu Nov 13 03:48:02 2014
@@ -39,8 +39,6 @@
  <p><strong>Python</strong></p>
  <source>
  $ pig myembedded.py
-OR
-$ java -cp &lt;jython jars&gt;:&lt;pig jars&gt;; [--embedded python] /tmp/myembedded.py
  </source>
  <p></p>
  <p>Pig will look for the <code>#!/usr/bin/python</code> line in the script.</p>
@@ -69,8 +67,6 @@ else :
 <p><strong>JavaScript</strong></p>
 <source>
 $ pig myembedded.js
-OR
-$ java -cp &lt;rhino jars&gt;:&lt;pig jars&gt;; [--embedded javascript] /tmp/myembedded.js
 </source>
 <p></p>
 <p>Pig will look for the *.js extension in the script.</p>
@@ -98,8 +94,6 @@ function main() {
 <p><strong>Groovy</strong></p>
 <source>
 $ pig myembedded.groovy
-OR
-$ java -cp &lt;groovy-all jar&gt;:&lt;pig jars&gt;; [--embedded groovy] /tmp/myembedded.groovy
 </source>
 <p></p>
 <p>Pig will look for the *.groovy extension in the script.</p>
@@ -836,8 +830,8 @@ $ javac -cp pig.jar idlocal.java
 <p> </p>
 <p>From your current working directory, run the program. To view the results, check the output file, id.out.</p>
 <source>
-Unix:   $ java -cp pig.jar:. idlocal
-Cygwin: $ java –cp '.;pig.jar' idlocal
+Unix:    $ java -cp pig.jar:. idlocal
+Windows: $ java –cp .;pig.jar idlocal
 </source>
 
 <p>idlocal.java - The sample code is based on Pig Latin statements that extract all user IDs from the /etc/passwd file. 

Modified: pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/func.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/func.xml?rev=1639242&r1=1639241&r2=1639242&view=diff
==============================================================================
--- pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/func.xml (original)
+++ pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/func.xml Thu Nov 13 03:48:02 2014
@@ -2408,9 +2408,90 @@ STORE A INTO 'accumulo://flights?instanc
                 because the first element in the Tuple is used as the row in Accumulo.
             </p>
         </section>
-   </section>
-</section>
+    </section>
 
+    <section id="OrcStorage">
+        <title>OrcStorage</title>
+        <p>Loads from or stores data to Orc file.</p>
+        <section>
+        <title>Syntax</title>
+        <table>
+        <tr>
+            <td>
+                <p>OrcStorage(['options'])</p>
+            </td>
+        </tr>
+        </table>
+        </section>
+        <section>
+        <title>Options</title>
+        <table>
+        <tr>
+            <td>
+               <p>A string that contains space-separated options (&lsquo;-optionA valueA -optionB valueB -optionC &rsquo;). Current options are only applicable with STORE operation and not for LOAD.</p>
+               <p>Currently supported options are:</p>
+               <ul>
+               <li>--stripeSize or -s Set the stripe size for the file. Default is 268435456(256 MB).</li>
+               <li>--rowIndexStride or -r Set the distance between entries in the row index. Default is 10000.</li>
+               <li>--bufferSize or -b Set the size of the memory buffers used for compressing and storing the stripe in memory. Default is 262144 (256K).</li>
+               <li>--blockPadding or -p Sets whether the HDFS blocks are padded to prevent stripes from straddling blocks. Default is true.</li>
+               <li>--compress or -c Sets the generic compression that is used to compress the data. Valid codecs are: NONE, ZLIB, SNAPPY, LZO. Default is ZLIB.</li>
+               <li>--version or -v Sets the version of the file that will be written</li>
+               </ul>
+            </td>
+        </tr>
+        </table>
+        </section>
+        <section>
+        <title>Example</title>
+        <p>OrcStorage as a StoreFunc.</p>
+<source>
+A = LOAD 'student.txt' as (name:chararray, age:int, gpa:double);
+store A into 'student.orc' using OrcStorage('-c SNAPPY'); -- store student.txt into data.orc with SNAPPY compression
+</source>
+        <p>OrcStorage as a LoadFunc.</p>
+<source>
+A = LOAD 'student.orc' USING OrcStorage();
+describe A; -- See the schema of student.orc
+B = filter A by age &gt; 25 and gpa &lt; 3; -- filter condition will be pushed up to loader
+dump B; -- dump the content of student.orc
+</source>
+        </section>
+        <section>
+        <title>Data types</title>
+        <p>Most Orc data type has one to one mapping to Pig data type. Several exceptions are:</p>
+        <p>Loader side:</p>
+        <ul>
+        <li>Orc STRING/CHAR/VARCHAR all map to Pig varchar</li>
+        <li>Orc BYTE/BINARY all map to Pig bytearray</li>
+        <li>Orc TIMESTAMP/DATE all maps to Pig datetime</li>
+        <li>Orc DECIMAL maps to Pig bigdecimal</li>
+        </ul>
+        <p>Storer side:</p>
+        <ul>
+        <li>Pig chararray maps to Orc STRING</li>
+        <li>Pig datetime maps to Orc TIMESTAMP</li>
+        <li>Pig bigdecimal/biginteger all map to Orc DECIMAL</li>
+        <li>Pig bytearray maps to Orc BINARY</li>
+        </ul>
+        </section>
+        <section>
+        <title>Predicate pushdown</title>
+        <p>If there is a filter statement right after OrcStorage, Pig will push the filter condition to the loader.
+           OrcStorage will prune file/stripe/row group which does not satisfy the condition entirely. For the file/stripe/row group contains
+           data that satisfies the filter condition, OrcStorage will load the file/stripe/row group and Pig will evaluate the filter condition
+           again to remove additional data which does not satisfy the filter condition.</p>
+        <p>OrcStorage predicate pushdown currently support all primitive data types but none of the complex data types. For example, map condition
+           cannot push into OrcStorage:</p>
+<source>
+A = LOAD 'student.orc' USING OrcStorage();
+B = filter A by info#'age' > 25; -- map condition cannot push to OrcStorage
+dump B;
+</source>
+        <p>Currently, the following expressions in filter condition are supported in OrcStorage predicate pushdown: &gt;, &gt;=, &lt;, &lt;=, ==, !=, between, in, and, or, not. The missing expressions are: is null, is not null, matches.</p>
+        </section>
+    </section>
+</section>
 
 <!-- ======================================================== -->  
 <!-- ======================================================== -->  
@@ -4593,6 +4674,16 @@ Use the UPPER function to convert all ch
    </section>
 </section>
  
+ <section id="uniqueid">
+   <title>UniqueID</title>
+   <p>Returns a unique id string for each record in the alias. </p>
+   <section>
+     <title>Usage</title>
+     <p>
+       UniqueID generates a unique id for each records. The id takes form "taskindex-sequence"
+     </p>
+   </section>
+ </section>
 </section>
 <!-- End String Functions -->
 

Modified: pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/perf.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/perf.xml?rev=1639242&r1=1639241&r2=1639242&view=diff
==============================================================================
--- pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/perf.xml (original)
+++ pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/perf.xml Thu Nov 13 03:48:02 2014
@@ -22,6 +22,56 @@
   </header>
   <body> 
   
+<section id="tez-mode">
+  <title>Tez mode</title>
+  <p><a href="http://tez.apache.org">Apache Tez</a> provides an alternative execution engine than MapReduce focusing on performance. By using optimized job flow, edge semantics and container reuse, we see consistent performance boost for both large job and small job. </p>
+  <section id="enable-tez">
+    <title>How to enable Tez</title>
+    <p>To run Pig in tez mode, simply add "-x tez" in pig command line. Alternatively, you can add "exectype=tez" to conf/pig.properties to change the default exec type to Tez. Java system property "-Dexectype=tez" is also good to trigger the Tez mode.</p>
+    <p>Prerequisite: Tez requires the tez tarball to be available in hdfs while running a job on the cluster and a tez-site.xml with tez.lib.uris setting pointing to that hdfs location in classpath. Copy the tez tarball to hdfs and add the tez conf directory($TEZ_HOME/conf) containing tez-site.xml to environmental variable "PIG_CLASSPATH" if pig on tez fails with "tez.lib.uris is not defined". This is required by the Apache Pig distribution.</p>
+<source>
+  &lt;property&gt;
+    &lt;name&gt;tez.lib.uris&lt;/name&gt;
+    &lt;value&gt;${fs.default.name}/apps/tez/tez-0.5.2.tar.gz&lt;/value&gt;
+  &lt;/property&gt;
+</source>
+  </section>
+  <section id="tez-dag">
+    <title>Tez DAG generation</title>
+    <p>Every Pig script will be compiled into 1 or more Tez DAG (typically 1). Every Tez DAG consists of a number of vertices and and edges connecting vertices. For example, a simple join involves 1 DAG which consists of 3 vertices: load left input, load right input and join. Do an <a href="test.html#explain">explain</a> in Tez mode will show you the DAG Pig script compiled into.</p>
+  </section>
+  <section id="container-reuse">
+    <title>Tez session/container reuse</title>
+    <p>One downside of MapReduce is the startup cost for a job is very high. That hurts the performance especially for small job. Tez alleviate the problem by using session and container reuse, so it is not necessary to start an application master for every job, and start a JVM for every task. By default, session/container reuse is on and we usually shall not turn it off. JVM reuse might cause some side effect if static variable is used since static variable might live across different jobs. So if static variable is used in EvalFunc/LoadFunc/StoreFunc, be sure to implement a cleanup function and register with <a href="http://pig.apache.org/docs/r0.14.0/api/org/apache/pig/JVMReuseManager.html">JVMReuseManager</a>.</p>
+  </section>
+  <section id="auto-parallelism">
+    <title>Automatic parallelism</title>
+    <p>Just like MapReduce, if user specify "parallel" in their Pig statement, or user define default_parallel in Tez mode, Pig will honor it (the only exception is if user specify a parallel which is apparently too low, Pig will override it) </p>
+    <p>If user specify neither "parallel" or "default_parallel", Pig will use automatic parallelism. In MapReduce, Pig submit one MapReduce job a time and before submiting a job, Pig has chance to automatically set reduce parallelism based on the size of input file. On the contrary, Tez submit a DAG as a unit and automatic parallelism is managed in two parts</p>
+    <ul>
+    <li>Before submiting a DAG, Pig estimate parallelism of each vertex statically based on the input file size of the DAG and the complexity of the pipeline of each vertex</li>
+    <li>At runtime, Tez adjust vertex parallelism dynamically based on the input data volume of the vertex. Note currently Tez can only decrease the parallelism dynamically not increase. So in step 1, Pig overestimate the parallelism</li>
+    </ul>
+    <p>The following parameter control the behavior of automatic parallelism in Tez (share with MapReduce):</p>
+<source>
+pig.exec.reducers.bytes.per.reducer
+pig.exec.reducers.max
+</source>
+  </section>
+  <section id="api-change">
+    <title>API change</title>
+    <p>If invoking Pig in Java, there is change in PigStats and PigProgressNotificationListener if using PigRunner.run(), check <a href="test.html#pig-statistics">Pig Statistics</a> and <a href="test.html#ppnl">Pig Progress Notification Listener</a></p>
+  </section>
+  <section id="known-issue">
+    <title>Known issues</title>
+    <p>Currently known issue in Tez mode includes:</p>
+    <ul>
+    <li>Tez local mode is not stable, we see job hang in some cases</li>
+    <li>Tez specific GUI is not available yet, there is no GUI to track task progress. However, log message is available in GUI</li>
+    </ul>
+  </section>
+</section>
+
 <section id="profiling">
   <title>Timing your UDFs</title>
   <p>The first step to improving performance and efficiency is measuring where the time is going. Pig provides a light-weight method for approximately measuring how much time is spent in different user-defined functions (UDFs) and Loaders. Simply set the pig.udf.profile property to true. This will cause new counters to be tracked for all Map-Reduce jobs generated by your script: approx_microsecs measures the approximate amount of time spent in a UDF, and approx_invocations measures the approximate number of times the UDF was invoked. In addition, the frequency of profiling can be configured via the pig.udf.profile.frequency (by default, every 100th invocation). Note that this may produce a large number of counters (two per UDF). Excessive amounts of counters can lead to poor JobTracker performance, so use this feature carefully, and preferably on a test cluster.</p>
@@ -188,7 +238,7 @@ reducers. The maximum number of reducers
 <p>
 The default reducer estimation algorithm described above can be overridden by setting the
 pig.exec.reducer.estimator parameter to the fully qualified class name of an implementation of
-<a href="http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigReducerEstimator.java">org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigReducerEstimator</a>.
+<a href="http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigReducerEstimator.java">org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigReducerEstimator</a>(MapReduce) or <a href="http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/TezOperDependencyParallelismEstimator.java">org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.TezOperDependencyParallelismEstimator</a>(Tez).
 The class must exist on the classpath of the process submitting the Pig job. If the
 pig.exec.reducer.estimator.arg parameter is set, the value will be passed to a constructor
 of the implementing class that takes a single String.
@@ -513,9 +563,24 @@ A = LOAD 'input' as (dt, state, event) u
 </section>
 
 <!-- +++++++++++++++++++++++++++++++ -->
-<section id="FilterLogicExpressionSimplifier">
-<title>FilterLogicExpressionSimplifier</title>
-<p>This rule simplifies the expression in filter statement.</p>
+<section id="PredicatePushdownOptimizer">
+<title>PredicatePushdownOptimizer</title>
+<p>Push the filter condition to loader. Different than PartitionFilterOptimizer, the filter condition will be evaluated in Pig. In other words, the filter condition pushed to the loader is a hint. Loader might still load records which does not satisfy filter condition.</p>
+<source>
+A = LOAD 'input' using OrcStorage();
+B = FILTER A BY dt=='201310' AND state=='CA';
+</source>
+<p>Filter condition will be pushed to loader if loader supports</p>
+<source>
+A = LOAD 'input' using OrcStorage();  -- Filter condition push to loader
+B = FILTER A BY dt=='201310' AND state=='CA';  -- Filter evaluated in Pig again
+</source>
+</section>
+
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="ConstantCalculator">
+<title>ConstantCalculator</title>
+<p>This rule evaluates constant expression.</p>
 <source>
 1) Constant pre-calculation 
 
@@ -523,39 +588,12 @@ B = FILTER A BY a0 &gt; 5+7; 
 is simplified to 
 B = FILTER A BY a0 &gt; 12; 
 
-2) Elimination of negations 
-
-B = FILTER A BY NOT (NOT(a0 &gt; 5) OR a &gt; 10); 
-is simplified to 
-B = FILTER A BY a0 &gt; 5 AND a &lt;= 10; 
-
-3) Elimination of logical implied expression in AND 
-
-B = FILTER A BY (a0 &gt; 5 AND a0 &gt; 7); 
-is simplified to 
-B = FILTER A BY a0 &gt; 7; 
-
-4) Elimination of logical implied expression in OR 
-
-B = FILTER A BY ((a0 &gt; 5) OR (a0 &gt; 6 AND a1 &gt; 15); 
-is simplified to 
-B = FILTER C BY a0 &gt; 5; 
-
-5) Equivalence elimination 
+2) Evaluate UDF
 
-B = FILTER A BY (a0 v 5 AND a0 &gt; 5); 
+B = FOREACH A generate UPPER(CONCAT('a', 'b'));
 is simplified to 
-B = FILTER A BY a0 &gt; 5; 
-
-6) Elimination of complementary expressions in OR 
-
-B = FILTER A BY (a0 &gt; 5 OR a0 &lt;= 5); 
-is simplified to non-filtering 
-
-7) Elimination of naive TRUE expression 
+B = FOREACH A generate 'AB';
 
-B = FILTER A BY 1==1; 
-is simplified to non-filtering 
 </source>
 </section>
 

Modified: pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/start.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/start.xml?rev=1639242&r1=1639241&r2=1639242&view=diff
==============================================================================
--- pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/start.xml (original)
+++ pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/start.xml Thu Nov 13 03:48:02 2014
@@ -34,21 +34,14 @@
  <p><strong>Mandatory</strong></p>
       <p>Unix and Windows users need the following:</p>
 		<ul>
-		  <li> <strong>Hadoop 0.20.2, 020.203, 020.204,  0.20.205, 1.0.0, 1.0.1, or 0.23.0, 0.23.1</strong> - <a href="http://hadoop.apache.org/common/releases.html">http://hadoop.apache.org/common/releases.html</a> (You can run Pig with different versions of Hadoop by setting HADOOP_HOME to point to the directory where you have installed Hadoop. If you do not set HADOOP_HOME, by default Pig will run with the embedded version, currently Hadoop 1.0.0.)</li>
-		  <li> <strong>Java 1.6</strong> - <a href="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</a> (set JAVA_HOME to the root of your Java installation)</li>	
+		  <li> <strong>Hadoop 0.23.X, 1.X or 2.X</strong> - <a href="http://hadoop.apache.org/common/releases.html">http://hadoop.apache.org/common/releases.html</a> (You can run Pig with different versions of Hadoop by setting HADOOP_HOME to point to the directory where you have installed Hadoop. If you do not set HADOOP_HOME, by default Pig will run with the embedded version, currently Hadoop 1.0.4.)</li>
+		  <li> <strong>Java 1.7</strong> - <a href="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</a> (set JAVA_HOME to the root of your Java installation)</li>	
 		</ul>
 		<p></p>
-	<p>Windows users also need to install Cygwin and the Perl package: <a href="http://www.cygwin.com/"> http://www.cygwin.com/</a></p>
-
-<p></p>
  <p><strong>Optional</strong></p>
  		<ul>
-          <li> <strong>Python 2.5</strong> - <a href="http://jython.org/downloads.html">http://jython.org/downloads.html</a> (when using Python UDFs or embedding Pig in Python) </li>
-          <li> <strong>JavaScript 1.7</strong> - <a href="https://developer.mozilla.org/en/Rhino_downloads_archive">https://developer.mozilla.org/en/Rhino_downloads_archive</a> and <a href="http://mirrors.ibiblio.org/pub/mirrors/maven2/rhino/js/">http://mirrors.ibiblio.org/pub/mirrors/maven2/rhino/js/</a>  (when using JavaScript UDFs or embedding Pig in JavaScript) </li>		  
-          <li> <strong>JRuby 1.6.7</strong> - <a href="http://www.jruby.org/download">http://www.jruby.org/download</a> (when using JRuby UDFs) </li>
-          <li> <strong>Groovy (<em>groovy-all</em>) 1.8.6</strong> - <a href="http://groovy.codehaus.org/Download">http://groovy.codehaus.org/Download</a> or directly on a maven repo <a href="http://mirrors.ibiblio.org/pub/mirrors/maven2/org/codehaus/groovy/groovy-all/1.8.6/">http://mirrors.ibiblio.org/pub/mirrors/maven2/org/codehaus/groovy/groovy-all/1.8.6/</a> (when using Groovy UDFs or embedding Pig in Groovy) </li>
-		  <li> <strong>Ant 1.7</strong> - <a href="http://ant.apache.org/">http://ant.apache.org/</a> (for builds) </li>
-		  <li> <strong>JUnit 4.5</strong> - <a href="http://junit.sourceforge.net/">http://junit.sourceforge.net/</a> (for unit tests) </li>
+          <li> <strong>Python 2.7</strong> - <a href="http://jython.org/downloads.html">https://www.python.org</a> (when using Streaming Python UDFs) </li>
+          <li> <strong>Ant 1.8</strong> - <a href="http://ant.apache.org/">http://ant.apache.org/</a> (for builds) </li>
 		</ul>
  
   </section>         
@@ -89,6 +82,7 @@ Test the Pig installation with this simp
 	  <li> Build the code from the top directory: <code>ant</code> <br></br>
 	  If the build is successful, you should see the pig.jar file created in that directory. </li>	
 	  <li> Validate the pig.jar  by running a unit test: <code>ant test</code></li>
+	  <li> If you are using Hadoop 0.23.X or 2.X, please add -Dhadoopversion=23 in your ant command line in the previous steps</li>
      </ol>
  </section>
 </section>
@@ -103,16 +97,22 @@ Test the Pig installation with this simp
 	<tr>
 	<td></td>
     <td><strong>Local Mode</strong></td>
+    <td><strong>Tez Local Mode</strong></td>
     <td><strong>Mapreduce Mode</strong></td>
+    <td><strong>Tez Mode</strong></td>
 	</tr>
 	<tr>
 	<td><strong>Interactive Mode </strong></td>
     <td>yes</td>
+    <td>experimental</td>
+    <td>yes</td>
     <td>yes</td>
 	</tr>
 	<tr>
 	<td><strong>Batch Mode</strong> </td>
     <td>yes</td>
+    <td>experimental</td>
+    <td>yes</td>
     <td>yes</td>
 	</tr>
 	</table>
@@ -122,10 +122,15 @@ Test the Pig installation with this simp
 	<title>Execution Modes</title> 
 <p>Pig has two execution modes or exectypes: </p>
 <ul>
-<li><strong>Local Mode</strong> - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local). Note that local mode does not support parallel mapper execution with Hadoop 0.20.x and 1.0.0. This is because the LocalJobRunner of these Hadoop versions is not thread-safe.
+<li><strong>Local Mode</strong> - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local).
+</li>
+<li><strong>Tez Local Mode</strong> - To run Pig in tez local mode. It is similar to local mode, except internally Pig will invoke tez runtime engine. Specify Tez local mode using the -x flag (pig -x tez_local).
+<p><strong>Note:</strong> Tez local mode is experimental. There are some queries which just error out on bigger data in local mode.</p>
 </li>
 <li><strong>Mapreduce Mode</strong> - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, <em>but don't need to</em>, specify it using the -x flag (pig OR pig -x mapreduce).
 </li>
+<li><strong>Tez Mode</strong> - To run Pig in Tez mode, you need access to a Hadoop cluster and HDFS installation. Specify Tez mode using the -x flag (-x tez).
+</li>
 </ul>
 <p></p>
 
@@ -141,23 +146,16 @@ Test the Pig installation with this simp
 /* local mode */
 $ pig -x local ...
  
+/* Tez local mode */
+$ pig -x tez_local ...
  
 /* mapreduce mode */
 $ pig ...
 or
 $ pig -x mapreduce ...
-</source>
-
-<p>This example shows how to run Pig in local and mapreduce mode using the java command.</p>
-<source>
-/* local mode */
-$ java -cp pig.jar org.apache.pig.Main -x local ...
-
 
-/* mapreduce mode */
-$ java -cp pig.jar org.apache.pig.Main ...
-or
-$ java -cp pig.jar org.apache.pig.Main -x mapreduce ...
+/* Tez mode */
+$ pig -x tez ...
 </source>
 
 </section>
@@ -185,6 +183,13 @@ $ pig -x local
 grunt> 
 </source>
 
+<p><strong>Tez Local Mode</strong></p>
+<source>
+$ pig -x tez_local
+... - Connecting to ...
+grunt> 
+</source>
+
 <p><strong>Mapreduce Mode</strong> </p>
 <source>
 $ pig -x mapreduce
@@ -197,6 +202,13 @@ $ pig 
 ... - Connecting to ...
 grunt> 
 </source>
+
+<p><strong>Tez Mode</strong> </p>
+<source>
+$ pig -x tez
+... - Connecting to ...
+grunt> 
+</source>
 </section>
 </section>
 
@@ -222,12 +234,20 @@ store B into ‘id.out’;  -- wri
 <source>
 $ pig -x local id.pig
 </source>
+<p><strong>Tez Local Mode</strong></p>
+<source>
+$ pig -x tez_local id.pig
+</source>
 <p><strong>Mapreduce Mode</strong> </p>
 <source>
 $ pig id.pig
 or
 $ pig -x mapreduce id.pig
 </source>
+<p><strong>Tez Mode</strong> </p>
+<source>
+$ pig -x tez id.pig
+</source>
 </section>
 
   <!-- ==================================================================== -->
@@ -424,7 +444,7 @@ However, in a production environment you
 <p id="pig-properties">To specify Pig properties use one of these mechanisms:</p>
 <ul>
 	<li>The pig.properties file (add the directory that contains the pig.properties file to the classpath)</li>
-	<li>The -D command line option and a Pig property (pig -Dpig.tmpfilecompression=true)</li>
+	<li>The -D and a Pig property in PIG_OPTS environment variable (export PIG_OPTS=-Dpig.tmpfilecompression=true)</li>
 	<li>The -P command line option and a properties file (pig -P mypig.properties)</li>
 	<li>The <a href="cmds.html#set">set</a> command (set pig.exec.nocombiner true)</li>
 </ul>
@@ -434,7 +454,7 @@ However, in a production environment you
 <p id="hadoop-properties">To specify Hadoop properties you can use the same mechanisms:</p>
 <ul>
 	<li>Hadoop configuration files (include pig-cluster-hadoop-site.xml)</li>
-	<li>The -D command line option and a Hadoop property (pig –Dmapreduce.task.profile=true) </li>
+	<li>The -D and a Hadoop property in PIG_OPTS environment variable (export PIG_OPTS=–Dmapreduce.task.profile=true) </li>
 	<li>The -P command line option and a property file (pig -P property_file)</li>
 	<li>The <a href="cmds.html#set">set</a> command (set mapred.map.tasks.speculative.execution false)</li>
 </ul>
@@ -450,7 +470,7 @@ However, in a production environment you
   <section id="tutorial">
 <title>Pig Tutorial </title>
 
-<p>The Pig tutorial shows you how to run Pig scripts using Pig's local mode and mapreduce mode (see <a href="#execution-modes">Execution Modes</a>).</p>
+<p>The Pig tutorial shows you how to run Pig scripts using Pig's local mode, mapreduce mode and Tez mode (see <a href="#execution-modes">Execution Modes</a>).</p>
 
 <p>To get started, do the following preliminary tasks:</p>
 
@@ -458,22 +478,16 @@ However, in a production environment you
 <li>Make sure the JAVA_HOME environment variable is set the root of your Java installation.</li>
 <li>Make sure your PATH includes bin/pig (this enables you to run the tutorials using the "pig" command). 
 <source>
-$ export PATH=/&lt;my-path-to-pig&gt;/pig-0.9.0/bin:$PATH 
+$ export PATH=/&lt;my-path-to-pig&gt;/pig-0.14.0/bin:$PATH 
 </source>
 </li>
 <li>Set the PIG_HOME environment variable:
 <source>
-$ export PIG_HOME=/&lt;my-path-to-pig&gt;/pig-0.9.0 
+$ export PIG_HOME=/&lt;my-path-to-pig&gt;/pig-0.14.0 
 </source></li>
 <li>Create the pigtutorial.tar.gz file:
 <ul>
-    <li>Move to the Pig tutorial directory (.../pig-0.9.0/tutorial).</li>
-	<li>Edit the build.xml file in the tutorial directory. 
-<source>
-Change this:   &lt;property name="pigjar" value="../pig.jar" /&gt;
-To this:       &lt;property name="pigjar" value="../pig-0.9.0-core.jar" /&gt;
-</source>
-	</li>
+    <li>Move to the Pig tutorial directory (.../pig-0.14.0/tutorial).</li>
 	<li>Run the "ant" command from the tutorial directory. This will create the pigtutorial.tar.gz file.
 	</li>
 </ul>
@@ -503,8 +517,12 @@ $ tar -xzf pigtutorial.tar.gz
 <source>
 $ pig -x local script1-local.pig
 </source>
+Or if you are using Tez local mode:
+<source>
+$ pig -x tez_local script1-local.pig
+</source>
 </li>
-<li>Review the result files, located in the part-r-00000 directory.
+<li>Review the result files, located in the script1-local-results.txt directory.
 <p>The output may contain a few Hadoop warnings which can be ignored:</p>
 <source>
 2010-04-08 12:55:33,642 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
@@ -516,7 +534,7 @@ $ pig -x local script1-local.pig
 
  <!-- ++++++++++++++++++++++++++++++++++ --> 
 <section>
-<title> Running the Pig Scripts in Mapreduce Mode</title>
+<title> Running the Pig Scripts in Mapreduce Mode or Tez Mode</title>
 
 <p>To run the Pig scripts in mapreduce mode, do the following: </p>
 <ol>
@@ -531,6 +549,10 @@ $ hadoop fs –copyFromLocal excite.l
 <source>
 export PIG_CLASSPATH=/mycluster/conf
 </source>
+<p>If you are using Tez, you will also need to put Tez configuration directory (the directory that contains the tez-site.xml):</p>
+<source>
+export PIG_CLASSPATH=/mycluster/conf:/tez/conf
+</source>
 <p><strong>Note:</strong> The PIG_CLASSPATH can also be used to add any other 3rd party dependencies or resource files a pig script may require. If there is also a need to make the added entries take the highest precedence in the Pig JVM's classpath order, one may also set the env-var PIG_USER_CLASSPATH_FIRST to any value, such as 'true' (and unset the env-var to disable).</p></li>
 <li>Set the HADOOP_CONF_DIR environment variable to the location of the cluster configuration directory:
 <source>
@@ -541,6 +563,10 @@ export HADOOP_CONF_DIR=/mycluster/conf
 <source>
 $ pig script1-hadoop.pig
 </source>
+Or if you are using Tez:
+<source>
+$ pig -x tez script1-hadoop.pig
+</source>
 </li>
 
 <li>Review the result files, located in the script1-hadoop-results or script2-hadoop-results HDFS directory:

Modified: pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/tabs.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/tabs.xml?rev=1639242&r1=1639241&r2=1639242&view=diff
==============================================================================
--- pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/tabs.xml (original)
+++ pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/tabs.xml Thu Nov 13 03:48:02 2014
@@ -32,6 +32,6 @@
   -->
   <tab label="Project" href="http://hadoop.apache.org/pig/" type="visible" /> 
   <tab label="Wiki" href="http://wiki.apache.org/pig/" type="visible" /> 
-  <tab label="Pig 0.12.0 Documentation" dir="" type="visible" /> 
+  <tab label="Pig 0.14.0 Documentation" dir="" type="visible" /> 
 
 </tabs>

Modified: pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/test.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/test.xml?rev=1639242&r1=1639241&r2=1639242&view=diff
==============================================================================
--- pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/test.xml (original)
+++ pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/test.xml Thu Nov 13 03:48:02 2014
@@ -335,6 +335,21 @@ Local Rearrange[tuple]{chararray}(false)
 |   Project[chararray][0] - xxx-Fri Dec 05 19:42:29 UTC 2008-35
  <em>etc ... </em> 
 
+If you are running in Tez mode, Map Reduce Plan will be replaced with Tez Plan:
+
+#--------------------------------------------------
+# There are 1 DAGs in the session
+#--------------------------------------------------
+#--------------------------------------------------
+# TEZ DAG plan: PigLatin:185.pig-0_scope-0
+#--------------------------------------------------
+Tez vertex scope-21	->	Tez vertex scope-22,
+Tez vertex scope-22
+
+Tez vertex scope-21
+# Plan on vertex
+B: Local Rearrange[tuple]{chararray}(false) - scope-35	->	 scope-22
+ <em>etc ... </em> 
 </source> 
  </section></section>
   
@@ -505,7 +520,7 @@ grunt> illustrate -script visits.pig
 <!-- =========================================================================== -->
 <!-- DIAGNOSTIC OPERATORS -->    
 <section id="mapreduce-job-ids">
-<title>Pig Scripts and MapReduce Job IDs</title>
+<title>Pig Scripts and MapReduce Job IDs (MapReduce mode only)</title>
    <p>Complex Pig scripts often generate many MapReduce jobs. To help you debug a script, Pig prints a summary of the execution that shows which relations (aliases) are mapped to each MapReduce job. </p>
 <source>
 JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime 
@@ -533,12 +548,17 @@ job_201004271216_12714 1 1 3 3 3 12 12 1
 
 <p>Several new public classes make it easier for external tools such as Oozie to integrate with Pig statistics. </p>
 
-<p>The Pig statistics are available here: <a href="http://pig.apache.org/docs/r0.9.0/api/">http://pig.apache.org/docs/r0.9.0/api/</a></p>
+<p>The Pig statistics are available here: <a href="http://pig.apache.org/docs/r0.14.0/api/">http://pig.apache.org/docs/r0.14.0/api/</a></p>
 
 <p id="stats-classes">The stats classes are in the package: org.apache.pig.tools.pigstats</p>
 <ul>
 <li>PigStats</li>
+<li>SimplePigStats</li>
+<li>EmbeddedPigStats</li>
 <li>JobStats</li>
+<li>TezPigScriptStats</li>
+<li>TezDAGStats</li>
+<li>TezVertexStats</li>
 <li>OutputStats</li>
 <li>InputStats</li>
 </ul>
@@ -572,6 +592,8 @@ public interface PigProgressNotification
     public void launchCompletedNotification(int numJobsSucceeded);
 }
 </source>
+<p>Depends on the type of the pig script, PigRunner.run() returns a particular subclass of PigStats: SimplePigStats(MapReduce/local mode), TezPigScriptStats(Tez/Tez local mode) or EmbeddedPigStats(embedded script). SimplePigStats contains a map of JobStats which capture the stats for each MapReduce job of the Pig script. TezPigScriptStats contains a map of TezDAGStats which capture the stats for each Tez DAG of the Pig script, and TezDAGStats contains a map of TezVertexStats which capture the stats for each vertex within the Tez DAG. Depending on the execution type, EmbeddedPigStats contains a map of SimplePigStats or TezPigScriptStats, which captures the Pig job launched in the embeded script. </p>
+<p>If one is running Pig in Tez mode (or both Tez/MapReduce mode), should pass PigTezProgressNotificationListener which extends PigProgressNotificationListener to PigRunner.run() to make sure to get notification in both Tez mode or MapReduce mode. </p>
 </section>
 
 <!-- +++++++++++++++++++++++++++++++++++++++ -->
@@ -828,7 +850,7 @@ $pig_trunk ant pigunit-jar   
 
 <!-- +++++++++++++++++++++++++++++++++++++++ -->
     <section>
-      <title>Mapreduce Mode</title>
+      <title>Other Modes</title>
       <p>PigUnit also runs in Pig's mapreduce/tez/tez_local mode. Mapreduce/Tez mode requires you to use a Hadoop cluster and HDFS installation.
         It is enabled when the Java system property pigunit.exectype is set to specific values (mr/tez/tez_local): e.g. -Dpigunit.exectype=mr or System.getProperties().setProperty("pigunit.exectype", "mr"), which means PigUnit will run in mr mode. The cluster you select to run mr/tez test must be specified in the CLASSPATH (similar to the HADOOP_CONF_DIR variable). 
       </p>

Modified: pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/udf.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/udf.xml?rev=1639242&r1=1639241&r2=1639242&view=diff
==============================================================================
--- pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/udf.xml (original)
+++ pig/branches/branch-0.14/src/docs/src/documentation/content/xdocs/udf.xml Thu Nov 13 03:48:02 2014
@@ -71,10 +71,10 @@ DUMP B;
 </source>
 
 <p>The command below can be used to run the script. Note that all examples in this document run in local mode for simplicity 
-but the examples can also run in Hadoop mode. For more information on how to run Pig, please see the PigTutorial. </p>
+but the examples can also run in Tez local/Mapreduce/ Tez mode. For more information on how to run Pig, please see the PigTutorial. </p>
 
 <source>
-java -cp pig.jar org.apache.pig.Main -x local myscript.pig
+pig -x local myscript.pig
 </source>
 
 <p>The first line of the script provides the location of the <code>jar&nbsp;file</code> that contains the UDF. 
@@ -441,6 +441,38 @@ Java Class
 </tr>
 <tr>
 <td>
+<p> boolean </p>
+</td>
+<td>
+<p> Boolean </p>
+</td>
+</tr>
+<tr>
+<td>
+<p> datetime </p>
+</td>
+<td>
+<p> DateTime </p>
+</td>
+</tr>
+<tr>
+<td>
+<p> bigdecimal </p>
+</td>
+<td>
+<p> BigDecimal </p>
+</td>
+</tr>
+<tr>
+<td>
+<p> biginteger </p>
+</td>
+<td>
+<p> BigInteger </p>
+</td>
+</tr>
+<tr>
+<td>
 <p> tuple </p>
 </td>
 <td>
@@ -583,8 +615,11 @@ public class DataType {
     public static final byte LONG      =  15;
     public static final byte FLOAT     =  20;
     public static final byte DOUBLE    =  25;
+    public static final byte DATETIME  =  30;
     public static final byte BYTEARRAY =  50;
     public static final byte CHARARRAY =  55;
+    public static final byte BIGINTEGER =  65;
+    public static final byte BIGDECIMAL =  70;
     public static final byte MAP       = 100;
     public static final byte TUPLE     = 110;
     public static final byte BAG       = 120;
@@ -820,6 +855,50 @@ public SchemaType getSchemaType() {
 <p>For an example see <a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/builtin/CONCAT.java?view=markup">CONCAT</a>.</p>
 </section>
 
+<section id="counters">
+<title>Using Counters</title>
+<p>Hadoop counters are easily accessible within EvalFunc by using PigStatusReporter object. Here is one example:</p>
+<source>
+public class UPPER extends EvalFunc&lt;String&gt;
+{
+        public String exec(Tuple input) throws IOException {
+                if (input == null || input.size() == 0) {
+                    PigStatusReporter reporter = PigStatusReporter.getInstance();
+                    if (reporter != null) {
+                       reporter.incrCounter(PigWarning.UDF_WARNING_1, 1);
+                    }
+                    return null;
+                }
+                try{
+                        String str = (String)input.get(0);
+                        return str.toUpperCase();
+                }catch(Exception e){
+                    throw new IOException("Caught exception processing input row ", e);
+                }
+        }
+}
+</source>
+</section>
+<section id="access-input-schema">
+        <title>Access input schema inside EvalFunc</title>
+        <p>Not only inside outputSchema at compile time, input schema is also accessible in exec at runtime. For example:</p>
+<source>
+public class AddSchema extends EvalFunc&lt;String&gt;
+{
+        public String exec(Tuple input) throws IOException {
+                if (input == null || input.size() == 0)
+                    return null;
+                String result = "";
+                for (int i=0;i&lt;input.size();i++) {
+                    result += getInputSchema().getFields().get(i).alias;
+                    result += ":";
+                    result += input.get(i);
+                }
+                return result;
+        }
+}
+</source>
+</section>
 <!-- +++++++++++++++++++++++++++++++++++++++++++++++++ -->
 <section id="reporting-progress">
 <title>Reporting Progress</title>
@@ -849,21 +928,32 @@ public class UPPER extends EvalFunc&lt;S
 <!-- +++++++++++++++++++++++++++++++++++++++++++++++++ -->
 <section id="distributed-cache">
 	<title>Using Distributed Cache</title>
-	<p>Use getCacheFiles, an EvalFunc method, to return a list of HDFS files that need to be shipped to distributed cache. Inside exec method, you can assume that these files already exist in distributed cache. For example:</p>
+	<p>Use getCacheFiles or getShipFiles to return a list of HDFS files or local files that need to be shipped to distributed cache. Inside exec method, you can assume that these files already exist in distributed cache. For example:</p>
 <source>
 public class Udfcachetest extends EvalFunc&lt;String&gt; { 
 
     public String exec(Tuple input) throws IOException { 
-        FileReader fr = new FileReader("./smallfile"); 
-        BufferedReader d = new BufferedReader(fr); 
-        return d.readLine(); 
+        String concatResult = "";
+        FileReader fr = new FileReader("./smallfile1"); 
+        BufferedReader d = new BufferedReader(fr);
+        concatResult +=d.readLine();
+        fr = new FileReader("./smallfile2");
+        d = new BufferedReader(fr);
+        concatResult +=d.readLine();
+        return concatResult;
     } 
 
     public List&lt;String&gt; getCacheFiles() { 
         List&lt;String&gt; list = new ArrayList&lt;String&gt;(1); 
-        list.add("/user/pig/tests/data/small#smallfile"); 
+        list.add("/user/pig/tests/data/small#smallfile1");  // This is hdfs file
         return list; 
     } 
+
+    public List&lt;String&gt; getShipFiles() {
+        List&lt;String&gt; list = new ArrayList&lt;String&gt;(1);
+        list.add("/home/hadoop/pig/smallfile2");  // This local file
+        return list;
+    }
 } 
 
 a = load '1.txt'; 
@@ -871,7 +961,45 @@ b = foreach a generate Udfcachetest(*); 
 dump b;
 </source>
 </section>
-
+<section id="compile-time-eval">
+        <title>Compile time evaluation</title>
+        <p>If the parameters of the EvalFunc are all constants, Pig could evaluate the result at compile time. The benefit of evaluating at compile time is performance optimization, and enable certain other optimizations at front end (such as partition pruning, which only allow constant not UDF in filter condition). By default, compile time evaluation is disabled in EvalFunc to prevent potential side effect. To enable it, override allowCompileTimeCalculation. For example:</p>
+<source>
+public class CurrentTime extends EvalFunc&lt;DateTime&gt; {
+    public String exec(Tuple input) throws IOException {
+        return new DateTime();
+    }
+    @Override
+    public boolean allowCompileTimeCalculation() {
+        return true;
+    }
+}
+</source>
+</section>
+<section id="tez-jvm-reuse">
+        <title>Clean up static variable in Tez</title>
+        <p>In Tez, jvm could reuse for other tasks. It is important to cleanup static variable to make sure there is no side effect. Here is one example:</p>
+<source>
+public class UPPER extends EvalFunc&lt;String&gt;
+{
+        static boolean initialized = false;
+        static {
+            JVMReuseManager.getInstance().registerForStaticDataCleanup(UPPER.class);
+        }
+        public String exec(Tuple input) throws IOException {
+            if (!initialized) {
+                init();
+                initialized = true;
+            }
+            ......
+        }
+        @StaticDataCleanup
+        public static void staticDataCleanup() {
+            initialized = false;
+        }
+}
+</source>
+</section>
 </section>
 
 <!-- =============================================================== -->
@@ -907,6 +1035,9 @@ has methods to push operations from Pig 
 
 <li id="loadcaster"><a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/LoadCaster.java?view=markup">LoadCaster</a> 
 has methods to convert byte arrays to specific types. A loader implementation should implement this if casts (implicit or explicit) from DataByteArray fields to other types need to be supported. </li>
+
+<li id="loadpredicatepushdown"><a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/LoadPredicatePushdown.java?view=markup">LoadPredicatePushdown</a> 
+ has the methods to push predicates to the loader. It is different than LoadMetadata.setPartitionFilter in that loader may load records which does not satisfy the predicates. In other words, predicates is only a hint. Note this interface is still in development and might change in next version. Currently only OrcStorage implements this interface.</li>
 </ul>
 
  <p>The LoadFunc abstract class is the main class to extend for implementing a loader. The methods which need to be overridden are explained below:</p>
@@ -927,6 +1058,10 @@ has methods to convert byte arrays to sp
  <li id="setUdfContextSignature">setUdfContextSignature(): This method will be called by Pig both in the front end and back end to pass a unique signature to the Loader. The signature can be used to store into the UDFContext any information which the Loader needs to store between various method invocations in the front end and back end. A use case is to store RequiredFieldList passed to it in LoadPushDown.pushProjection(RequiredFieldList) for use in the back end before returning tuples in getNext(). The default implementation in LoadFunc has an empty body. This method will be called before other methods. </li>
  
  <li id="relativeToAbsolutePath">relativeToAbsolutePath(): Pig runtime will call this method to allow the Loader to convert a relative load location to an absolute location. The default implementation provided in LoadFunc handles this for FileSystem locations. If the load source is something else, loader implementation may choose to override this.</li>
+
+ <li id="getCacheFiles">getCacheFiles(): Return a list of hdfs files to ship to distributed cache.</li>
+
+ <li id="getShipFiles">getShipFiles(): Return a list of local files to ship to distributed cache.</li>
  </ul>
 
 <p><strong>Example Implementation</strong></p>
@@ -1055,6 +1190,8 @@ abstract class has the main methods for 
 <ul>
 <li id="storemetadata"><a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/StoreMetadata.java?view=markup">StoreMetadata:</a> 
 This interface has methods to interact with metadata systems to store schema and store statistics. This interface is optional and should only be implemented if metadata needs to stored. </li>
+<li id="storeresources"><a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/StoreResources.java?view=markup">StoreResources:</a> 
+This interface has methods to put hdfs files or local files to distributed cache. </li>
 </ul>
 
 <p id="storefunc-override">The methods which need to be overridden in StoreFunc are explained below: </p>