You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by da...@apache.org on 2012/04/05 20:40:32 UTC

svn commit: r1310002 - in /pig/branches/branch-0.10: ./ src/docs/src/documentation/content/xdocs/

Author: daijy
Date: Thu Apr  5 18:40:31 2012
New Revision: 1310002

URL: http://svn.apache.org/viewvc?rev=1310002&view=rev
Log:
PIG-2601: Additional document for 0.10

Modified:
    pig/branches/branch-0.10/CHANGES.txt
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/basic.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/cont.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/start.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/udf.xml

Modified: pig/branches/branch-0.10/CHANGES.txt
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/CHANGES.txt?rev=1310002&r1=1310001&r2=1310002&view=diff
==============================================================================
--- pig/branches/branch-0.10/CHANGES.txt (original)
+++ pig/branches/branch-0.10/CHANGES.txt Thu Apr  5 18:40:31 2012
@@ -24,6 +24,8 @@ INCOMPATIBLE CHANGES
 
 IMPROVEMENTS
 
+PIG-2601: Additional document for 0.10 (daijy)
+
 PIG-2317: Ruby/Jruby UDFs (jcoveney via daijy)
 
 PIG-2619: HBaseStorage constructs a Scan with cacheBlocks = false (andy lindeman via jcoveney)

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/basic.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/basic.xml?rev=1310002&r1=1310001&r2=1310002&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/basic.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/basic.xml Thu Apr  5 18:40:31 2012
@@ -342,6 +342,13 @@ DUMP A;
 (Bill,20,3.9F)
 (Joe,18,3.8F)
 </source>
+
+  <p>You an assign an alias to another alias. The new alias can be used in the place of the original alias to refer the original relation. </p>
+  <source>
+  A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
+  B = A;
+  DUMP B;
+  </source>
 </section>
    
    

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/cont.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/cont.xml?rev=1310002&r1=1310001&r2=1310002&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/cont.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/cont.xml Thu Apr  5 18:40:31 2012
@@ -264,6 +264,41 @@ Pig.compile(...).bind(...).runSingle(pro
 <p>As the case with runSingle, a set of Java Properties or a property file can be passed to the call.</p>
 </section>
 
+<section>
+<title>Passing Parameters to a Script</title>
+<p>Inside your script, you can define parameters and then pass parameters from command line to your script.  There are two ways to pass parameters to your script:</p>
+<p>
+1. -param
+Similar to regular Pig parameter substitution, you can define parameters using -param/–param_file on Pig's command line.  This variable will be treated as one of the binding variables when binding the Pig Latin script. For example, you can invoke the below Python script using: pig –param loadfile=student.txt script.py. </p>
+<source>
+#!/usr/bin/python
+from org.apache.pig.scripting import Pig
+
+P = Pig.compile("""A = load '$loadfile' as (name, age, gpa);
+store A into 'output';""")
+
+Q = P.bind()
+
+result = Q.runSingle()
+</source>
+
+<p>
+2. Command line arguments
+Currently this feature is only available in Python. You can pass command line arguments (the arguments after the script file name) to Python. These will become sys.argv in Python.  For example: pig script.py student.txt. The corresponding script is:
+</p>
+<source>
+#!/usr/bin/python
+from org.apache.pig.scripting import Pig
+
+P = Pig.compile("A = load '" + sys.argv[1] + "' as (name, age, gpa);" +
+"store A into 'output';");
+
+Q = P.bind()
+
+result = Q.runSingle()
+</source>
+</section>
+
 </section> 
 
 <section id="pigrunner-api">

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/start.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/start.xml?rev=1310002&r1=1310001&r2=1310002&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/start.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/start.xml Thu Apr  5 18:40:31 2012
@@ -34,7 +34,7 @@
  <p><strong>Mandatory</strong></p>
       <p>Unix and Windows users need the following:</p>
 		<ul>
-		  <li> <strong>Hadoop 0.20.2, 020.203, 020.204,  0.20.205, or 1.0.0</strong> - <a href="http://hadoop.apache.org/common/releases.html">http://hadoop.apache.org/common/releases.html</a> (You can run Pig with different versions of Hadoop by setting HADOOP_HOME to point to the directory where you have installed Hadoop. If you do not set HADOOP_HOME, by default Pig will run with the embedded version, currently Hadoop 0.20.2.)</li>
+		  <li> <strong>Hadoop 0.20.2, 020.203, 020.204,  0.20.205, 1.0.0, 1.0.1, or 0.23.0, 0.23.1</strong> - <a href="http://hadoop.apache.org/common/releases.html">http://hadoop.apache.org/common/releases.html</a> (You can run Pig with different versions of Hadoop by setting HADOOP_HOME to point to the directory where you have installed Hadoop. If you do not set HADOOP_HOME, by default Pig will run with the embedded version, currently Hadoop 1.0.0.)</li>
 		  <li> <strong>Java 1.6</strong> - <a href="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</a> (set JAVA_HOME to the root of your Java installation)</li>	
 		</ul>
 		<p></p>
@@ -45,7 +45,7 @@
  		<ul>
           <li> <strong>Python 2.5</strong> - <a href="http://jython.org/downloads.html">http://jython.org/downloads.html</a> (when using Python UDFs or embedding Pig in Python) </li>
           <li> <strong>JavaScript 1.7</strong> - <a href="https://developer.mozilla.org/en/Rhino_downloads_archive">https://developer.mozilla.org/en/Rhino_downloads_archive</a> and <a href="http://mirrors.ibiblio.org/pub/mirrors/maven2/rhino/js/">http://mirrors.ibiblio.org/pub/mirrors/maven2/rhino/js/</a>  (when using JavaScript UDFs or embedding Pig in JavaScript) </li>		  
-		  
+          <li> <strong>JRuby 1.6.7</strong> - <a href="http://www.jruby.org/download">http://www.jruby.org/download</a> (when using JRuby UDFs) </li>
 		  <li> <strong>Ant 1.7</strong> - <a href="http://ant.apache.org/">http://ant.apache.org/</a> (for builds) </li>
 		  <li> <strong>JUnit 4.5</strong> - <a href="http://junit.sourceforge.net/">http://junit.sourceforge.net/</a> (for unit tests) </li>
 		</ul>

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/udf.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/udf.xml?rev=1310002&r1=1310001&r2=1310002&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/udf.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/udf.xml Thu Apr  5 18:40:31 2012
@@ -28,7 +28,7 @@
 <section id="udfs">
 <title>Introduction</title>
 <p>
-Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, and JavaScript. 
+Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, JavaScript and Ruby. 
 </p>
 
 <p>
@@ -36,7 +36,7 @@ The most extensive support is provided f
 </p>
 
 <p>
-Limited support is provided for Python and JavaScript functions. These functions are new, still evolving, additions to the system. Currently only the basic interface is supported; load/store functions are not supported. Furthermore, JavaScript is provided as an experimental feature because it did not go through the same amount of testing as Java or Python. At runtime note that Pig will automatically detect the usage of a scripting UDF in the Pig script and will automatically ship the corresponding scripting jar, either Jython or Rhino, to the backend.</p>
+Limited support is provided for Python, JavaScript and Ruby functions. These functions are new, still evolving, additions to the system. Currently only the basic interface is supported; load/store functions are not supported. Furthermore, JavaScript and Ruby are provided as experimental features because they did not go through the same amount of testing as Java or Python. At runtime note that Pig will automatically detect the usage of a scripting UDF in the Pig script and will automatically ship the corresponding scripting jar, either Jython, Rhino or JRuby, to the backend.</p>
 
 <p></p>
 
@@ -372,6 +372,10 @@ public class IsEmpty extends FilterFunc 
 </source>
 </section>
 
+<section id="udf_simulation">
+<title>Implement UDF by Simulation</title>
+<p>When implementing more advanced types of EvalFuncs, the simpler implementations can be automatically provided by Pig.  Thus if your UDF implements <a href="#Algebraic-Interface">Algebraic</a> then you will get the Accumulator interface and basic the basic EvalFunc exec method for free.  Similarly, if your UDF implements <a href="#Accumulator-Interface">Accumulator Interface</a> you will get the basic EvalFunc exec method for free.  You will not get the Algebraic implemenation.  Note that these free implementations are based on simulation, which might not be the most efficient.  If you wish to ensure the efficiency of your Accumulator of EvalFunc exec method, you may still implement them yourself and your implementations will be used.</p>
+</section>
 <!-- +++++++++++++++++++++++++++++++++++++++++++++++++ -->
 <section id="pig-types">
 <title> Pig Types and Native Java Types</title>
@@ -1619,7 +1623,121 @@ function main() {
  </source>
  
  </section>
- 
+</section>
+
+ <!-- =============================================================== -->
+ <section id="jruby-udfs">
+ <title>Writing Ruby UDFs</title>
+
+ <p><strong>Note:</strong> <em>Ruby UDFs are an experimental feature.</em></p>
+
+<!-- ++++++++++++++++++++++++++++++++++++++++++++++++ -->
+ <section id="write-jruby">
+ <title>Writing a Ruby UDF</title>
+ <p>You must extend PigUdf and define your Ruby UDFs in the class. </p>
+ <source>
+ require 'pigudf'
+ class Myudfs &lt; PigUdf
+     def square num
+         return nil if num.nil?
+         num**2
+     end
+ end
+ </source>
+ </section>
+ <section id="jruby-schema-return-types">
+ <title>Return Types and Schemas</title>
+ <p>You have two ways to define the return schema:</p>
+ <p>outputSchema - Defines the schema for a UDF in a format that Pig understands. </p>
+ <source>
+ outputSchema "word:chararray"
+ </source>
+ <source>
+ outputSchema "t:(m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, age:int, gpa:double)})"
+ </source>
+ <p>Schema function</p>
+ <source>
+ outputSchemaFunction :squareSchema
+ def squareSchema input
+     input
+ end
+ </source>
+ <p>You need to put outputSchema/outputSchemaFunction statement right before your UDF. The schema function itself can be defined anywhere inside the class.</p>
+ </section>
+ <section id="register-jruby">
+ <title>Registering the UDF</title>
+ <p>You can register a Ruby UDF as shown here. </p>
+ <source>
+ register 'test.rb' using jruby as myfuncs;
+ </source>
+ <p> This is a shortcut to the complete syntax:</p>
+ <source>
+ register 'test.rb' using org.apache.pig.scripting.jruby.JrubyScriptEngine as myfuncs;
+ </source>
+ <p>The <code>register</code> statement above registers the Ruby functions defined in test.rb in Pig’s runtime within the defined namespace (myfuncs in this example). They can then be referred later on in the Pig Latin script as <code>myfuncs.square()</code>. An example usage is:</p>
+ <source>
+ b = foreach a generate myfuncs.concat($0, $1);
+ </source>
+ </section>
+ <section id="jruby-example">
+ <title>Example Scripts</title>
+ <p>Here are two complete Ruby UDF samples.</p>
+ <source>
+ require 'pigudf'
+ class Myudfs &lt; PigUdf
+ outputSchema "word:chararray"
+     def concat *input
+         input.inject(:+)
+     end
+ end
+ </source>
+ <source>
+ require 'pigudf'
+ class Myudfs &lt; PigUdf
+ outputSchemaFunction :squareSchema
+     def square num
+         return nil if num.nil?
+         num**2
+     end
+     def squareSchema input
+         input
+     end
+ end
+ </source>
+ </section>
+
+ <section id="jruby-advanced">
+ <title>Advanced Topics</title>
+ <p> You can also write Algebraic and Accumulator UDFs using Ruby. You need to extend your class from <code>AlgebraicPigUdf</code> and <code>AccumulatorPigUdf</code> respectively. For an Algebraic UDF, define <code>initial</code>, <code>intermed</code>, and <code>final</code> methods in the class. For an Accumulator UDF, define <code>exec</code> and <code>get</code> methods in the class. Below are example for each type of UDF: </p>
+ <source>
+ class Count &lt; AlgebraicPigUdf
+     output_schema Schema.long
+     def initial t
+          t.nil? ? 0 : 1
+     end
+     def intermed t
+          return 0 if t.nil?
+          t.flatten.inject(:+)
+     end
+     def final t
+         intermed(t)
+     end
+ end
+ </source>
+ <source>
+ class Sum &lt; AccumulatorPigUdf
+     output_schema { |i| i.in.in[0] }
+     def exec items
+         @sum ||= 0
+         @sum += items.flatten.inject(:+)
+     end
+     def get
+         @sum
+     end
+ end
+ </source>
+ </section>
+
 </section> 
 
 <!-- ================================================================== -->