You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by ol...@apache.org on 2011/03/26 01:44:08 UTC

svn commit: r1085617 - in /pig/branches/branch-0.8: CHANGES.txt src/docs/src/documentation/content/xdocs/tutorial.xml src/docs/src/documentation/content/xdocs/udf.xml

Author: olga
Date: Sat Mar 26 00:44:08 2011
New Revision: 1085617

URL: http://svn.apache.org/viewvc?rev=1085617&view=rev
Log:
pig-1936: documentation update (chandec via olgan)

Modified:
    pig/branches/branch-0.8/CHANGES.txt
    pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/tutorial.xml
    pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/udf.xml

Modified: pig/branches/branch-0.8/CHANGES.txt
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.8/CHANGES.txt?rev=1085617&r1=1085616&r2=1085617&view=diff
==============================================================================
--- pig/branches/branch-0.8/CHANGES.txt (original)
+++ pig/branches/branch-0.8/CHANGES.txt Sat Mar 26 00:44:08 2011
@@ -22,6 +22,8 @@ Unreleased 0.8.1 (Unreleased)
 
 INCOMPATIBLE CHANGES
 
+pig-1936: documentation update (chandec via olgan)
+
 PIG-1680: HBaseStorage should work with HBase 0.90 (gstathis, billgraham, dvryaboy, tlipcon via dvryaboy)
 
 IMPORVEMENTS

Modified: pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/tutorial.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/tutorial.xml?rev=1085617&r1=1085616&r2=1085617&view=diff
==============================================================================
--- pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/tutorial.xml (original)
+++ pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/tutorial.xml Sat Mar 26 00:44:08 2011
@@ -30,13 +30,9 @@
 <p>The Pig tutorial shows you how to run two Pig scripts in local mode and mapreduce mode.   </p>
 
 <ul>
-<li><p> <strong>Local Mode</strong>: To run the scripts in local mode, no Hadoop or HDFS installation is required. All files are installed and run from your local host and file system. </p>
-</li>
-<li><p> <strong>Mapreduce Mode</strong>: To run the scripts in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. </p>
-</li>
+<li><strong>Local Mode</strong>: To run the scripts in local mode, no Hadoop or HDFS installation is required. All files are installed and run from your local host and file system.</li>
+<li><strong>Mapreduce Mode</strong>: To run the scripts in mapreduce mode, you need access to a Hadoop cluster and HDFS installation.</li>
 </ul>
-<p>The Pig tutorial file (tutorial/pigtutorial.tar.gz file in the pig distribution) includes the Pig JAR file (pig.jar) and the tutorial files (tutorial.jar, Pigs scripts, log files). 
-These files work with Hadoop 0.20.2 and include everything you need to run the Pig scripts.</p>
 </section>
 
 <section>
@@ -46,24 +42,38 @@ These files work with Hadoop 0.20.2 and 
 
 <ol>
 <li>Make sure the JAVA_HOME environment variable is set the root of your Java installation.</li>
-<li>Make sure that bin/pig is in your PATH (this enables you to run the tutorials using the "pig" command).
+<li>Make sure your PATH includes bin/pig (this enables you to run the tutorials using the "pig" command).
 <source>
-$ export PATH=/&lt;my-path-to-pig&gt;/pig-n.n.n/bin:$PATH 
+$ export PATH=/&lt;my-path-to-pig&gt;/pig-0.8.0/bin:$PATH 
 </source>
 </li>
 <li>Set the PIG_HOME environment variable:
 <source>
-$ export PIG_HOME=/&lt;my-path-to-pig&gt;/pig-n.n.n 
+$ export PIG_HOME=/&lt;my-path-to-pig&gt;/pig-0.8.0 
 </source></li>
-<li>Copy the pigtutorial.tar.gz file from the tutorial directory of your Pig installation to your local directory. </li>
-<li>Unzip the Pig tutorial file (the files are stored in a newly created directory, pigtmp). 
+<li>Create the pigtutorial.tar.gz file:
+<ul>
+    <li>Move to the Pig tutorial directory (.../pig-0.8.0/tutorial).</li>
+	<li>Edit the build.xml file in the tutorial directory. 
+<source>
+Change this:   &lt;property name="pigjar" value="../pig.jar" /&gt;
+To this:       &lt;property name="pigjar" value="../pig-0.8.0-core.jar" /&gt;
+</source>
+	</li>
+	<li>Run the "ant" command from the tutorial directory. This will create the pigtutorial.tar.gz file.
+	</li>
+</ul>
+
+</li>
+<li>Copy the pigtutorial.tar.gz file from the Pig tutorial directory to your local directory. </li>
+<li>Unzip the pigtutorial.tar.gz file.
 <source>
 $ tar -xzf pigtutorial.tar.gz
 </source>
 </li>
-<li>Review Pig Script 1 and Pig Script 2.</li>
+<li>A new directory named pigtmp is created. This directory contains the Pig tutorial files. 
+These files work with Hadoop 0.20.2 and include everything you need to run the Pig scripts.</li>
 </ol>
-
 </section>
 
 
@@ -74,18 +84,17 @@ $ tar -xzf pigtutorial.tar.gz
 <ol>
 
 <li>Move to the pigtmp directory.</li>
-<li>Execute the following command (using either script1-local.pig or script2-local.pig). 
+<li>Execute the following command using script1-local.pig (or script2-local.pig). 
 <source>
 $ pig -x local script1-local.pig
 </source>
-</li>
-<li>Review the result files, located in the part-r-00000 directory.
 <p>The output may contain a few Hadoop warnings which can be ignored:</p>
 <source>
 2010-04-08 12:55:33,642 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
 </source>
 </li>
+<li>A directory named script1-local-results.txt (or script2-local-results.txt) is created. This directory contains the results file, part-r-0000.</li>
 </ol>
 </section>
 
@@ -127,7 +136,7 @@ $ hadoop fs -cat 'script1-hadoop-results
 </section>
 
 <section>
-<title> Pig Tutorial File</title>
+<title> Pig Tutorial Files</title>
 
 <p>The contents of the Pig tutorial file (pigtutorial.tar.gz) are described here. </p>
 

Modified: pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/udf.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/udf.xml?rev=1085617&r1=1085616&r2=1085617&view=diff
==============================================================================
--- pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/udf.xml (original)
+++ pig/branches/branch-0.8/src/docs/src/documentation/content/xdocs/udf.xml Sat Mar 26 00:44:08 2011
@@ -749,14 +749,14 @@ This enables Pig users/developers to cre
 
 <section>
 <title> Load Functions</title>
-<p><a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/LoadFunc.java?view=markup">LoadFunc</a> 
+<p><a href="http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/org/apache/pig/LoadFunc.java?view=markup">LoadFunc</a> 
 abstract class has the main methods for loading data and for most use cases it would suffice to extend it. There are three other optional interfaces which can be implemented to achieve extended functionality: </p>
 
 <ul>
-<li><a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/LoadMetadata.java?view=markup">LoadMetadata</a> 
+<li><a href="http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/org/apache/pig/LoadMetadata.java?view=markup">LoadMetadata</a> 
 has methods to deal with metadata - most implementation of loaders don't need to implement this unless they interact with some metadata system. The getSchema() method in this interface provides a way for loader implementations to communicate the schema of the data back to pig. If a loader implementation returns data comprised of fields of real types (rather than DataByteArray fields), it should provide the schema describing the data returned through the getSchema() method. The other methods are concerned with other types of metadata like partition keys and statistics. Implementations can return null return values for these methods if they are not applicable for that implementation.</li>
 
-<li><a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/LoadPushDown.java?view=markup">LoadPushDown</a> 
+<li><a href="http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/org/apache/pig/LoadPushDown.java?view=markup">LoadPushDown</a> 
 has methods to push operations from Pig runtime into loader implementations. Currently only the pushProjection() method is called by Pig to communicate to the loader the exact fields that are required in the Pig script. The loader implementation can choose to honor the request (return only those fields required by Pig script) or not honor the request (return all fields in the data). If the loader implementation can efficiently honor the request, it should implement LoadPushDown to improve query performance. (Irrespective of whether the implementation can or cannot honor the request, if the implementation also implements getSchema(), the schema returned in getSchema() should describe the entire tuple of data.)
 <ul>
 	<li>pushProjection(): This method tells LoadFunc which fields are required in the Pig script, thus enabling LoadFunc to optimize performance by loading only those fields that are needed. pushProjection() takes a RequiredFieldList. RequiredFieldList includes a list of RequiredField: each RequiredField indicates a field required by the Pig script; each RequiredField includes index, alias, type (which is reserved for future use), and subFields. Pig will use the column index RequiredField.index to communicate with the LoadFunc about the fields required by the Pig script. If the required field is a map, Pig will optionally pass RequiredField.subFields which contains a list of keys that the Pig script needs for the map. For example, if the Pig script needs two keys for the map, "key1" and "key2", the subFields for that map will contain two RequiredField; the alias field for the first RequiredField will be "key1" and the alias for the second RequiredField will be "key2". LoadFunc 
 will use RequiredFieldResponse.requiredFieldRequestHonored to indicate whether the pushProjection() request is honored.
@@ -764,7 +764,7 @@ has methods to push operations from Pig 
 </ul>
 </li>
 
-<li><a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/LoadCaster.java?view=markup">LoadCaster</a> 
+<li><a href="http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/org/apache/pig/LoadCaster.java?view=markup">LoadCaster</a> 
 has methods to convert byte arrays to specific types. A loader implementation should implement this if casts (implicit or explicit) from DataByteArray fields to other types need to be supported. </li>
 </ul>
 
@@ -906,10 +906,10 @@ public class SimpleTextLoader extends Lo
 <section>
 <title> Store Functions</title>
 
-<p><a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/StoreFunc.java?view=markup">StoreFunc</a> 
+<p><a href="http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/org/apache/pig/StoreFunc.java?view=markup">StoreFunc</a> 
 abstract class has the main methods for storing data and for most use cases it should suffice to extend it. There is an optional interface which can be implemented to achieve extended functionality: </p>
 <ul>
-<li><a href="http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/StoreMetadata.java?view=markup">StoreMetadata:</a> 
+<li><a href="http://svn.apache.org/viewvc/pig/branches/branch-0.8/src/org/apache/pig/StoreMetadata.java?view=markup">StoreMetadata:</a> 
 This interface has methods to interact with metadata systems to store schema and store statistics. This interface is truely optional and should only be implemented if metadata needs to stored. </li>
 </ul>
 
@@ -1409,7 +1409,7 @@ def commaFormat(num):
   return '{:,}'.format(num)
 
 #concatMultiple- concat multiple words
-@outputSchema(“onestring:chararray")
+@outputSchema("onestring:chararray")
 def concatMult4(word1, word2, word3, word4):
   return word1+word2+word3+word4
 
@@ -1418,7 +1418,7 @@ def concatMult4(word1, word2, word3, wor
 #######################
 #collectBag- collect elements of a bag into other bag
 #This is useful UDF after group operation
-@outputSchema(“y:bag{t:tuple(len:int,word:chararray)}”) 
+@outputSchema("y:bag{t:tuple(len:int,word:chararray)}")
 def collectBag(bag):
   outBag = []
   for word in bag: