You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by da...@apache.org on 2011/11/19 01:50:26 UTC
svn commit: r1203929 - in /pig/branches/branch-0.10: ./ src/docs/src/documentation/content/xdocs/

Author: daijy
Date: Sat Nov 19 00:50:26 2011
New Revision: 1203929

URL: http://svn.apache.org/viewvc?rev=1203929&view=rev
Log:
PIG-2300: Pig Docs - release 0.10.0

Modified:
    pig/branches/branch-0.10/CHANGES.txt
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/basic.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/cmds.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/func.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/perf.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/pig-index.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/site.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/start.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/tabs.xml
    pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/udf.xml

Modified: pig/branches/branch-0.10/CHANGES.txt
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/CHANGES.txt?rev=1203929&r1=1203928&r2=1203929&view=diff
==============================================================================
--- pig/branches/branch-0.10/CHANGES.txt (original)
+++ pig/branches/branch-0.10/CHANGES.txt Sat Nov 19 00:50:26 2011
@@ -24,6 +24,8 @@ INCOMPATIBLE CHANGES
 
 IMPROVEMENTS
 
+PIG-2300: Pig Docs - release 0.10.0 (and 0.9.1) (chandec via daijy)
+
 PIG-2230: Improved error message for invalid parameter format (xuitingz via olgan)
 
 PIG-2332: JsonLoader/JsonStorage (daijy)

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/basic.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/basic.xml?rev=1203929&r1=1203928&r2=1203929&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/basic.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/basic.xml Sat Nov 19 00:50:26 2011
@@ -503,7 +503,7 @@ DUMP X;
    <table>
       <tr>
             <td>
-               <p><strong>Simple Data Types</strong></p>
+               <p><strong>Simple Types</strong></p>
             </td>
             <td>
                <p>Description</p>
@@ -514,17 +514,6 @@ DUMP X;
          </tr>
          <tr>
             <td>
-               <p>Scalars</p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-         </tr>
-         <tr>
-            <td>
                <p>int</p>
             </td>
             <td>
@@ -572,40 +561,40 @@ DUMP X;
          </tr>
          <tr>
             <td>
-               <p>Arrays</p>
+               <p>chararray</p>
             </td>
             <td>
-               <p></p>
+               <p>Character array (string) in Unicode UTF-8 format</p>
             </td>
             <td>
-               <p></p>
+               <p>hello world</p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>chararray</p>
+               <p>bytearray</p>
             </td>
             <td>
-               <p>Character array (string) in Unicode UTF-8 format</p>
+               <p>Byte array (blob)</p>
             </td>
             <td>
-               <p>hello world</p>
+               <p></p>
             </td>
          </tr>
-         <tr>
+                  <tr>
             <td>
-               <p>bytearray</p>
+               <p>boolean</p>
             </td>
             <td>
-               <p>Byte array (blob)</p>
+               <p>boolean</p>
             </td>
             <td>
-               <p></p>
+               <p>true/false (case insensitive)</p>
             </td>
          </tr>
          <tr>
             <td>
-               <p><strong>Complex Data Types</strong></p>
+               <p><strong>Complex Types</strong></p>
             </td>
             <td>
                <p></p>
@@ -840,7 +829,7 @@ DUMP X;
    
    <section id="map">
    <title>Map</title>
-   <p>A map is a set of key value pairs.</p>
+   <p>A map is a set of key/value pairs.</p>
    
    <section>
    <title>Syntax (&lt;&gt; denotes optional)</title>
@@ -1209,26 +1198,15 @@ dump X;
                <p></p>
             </td>
             <td>
-               <p>Constant Example</p>
-            </td>
-            <td>
-               <p>Notes</p>
-            </td>
-         </tr>
-         <tr>
-            <td>
-               <p>Simple Data Types</p>
+               <p><strong>Constant Example</strong></p>
             </td>
             <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
+               <p><strong>Notes</strong></p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>Scalars</p>
+               <p><strong>Simple Data Types</strong></p>
             </td>
             <td>
                <p></p>
@@ -1283,10 +1261,10 @@ dump X;
          </tr>
          <tr>
             <td>
-               <p>Arrays </p>
+               <p>chararray</p>
             </td>
             <td>
-               <p></p>
+               <p>'hello world'</p>
             </td>
             <td>
                <p></p>
@@ -1294,29 +1272,29 @@ dump X;
          </tr>
          <tr>
             <td>
-               <p>chararray</p>
+               <p>bytearray</p>
             </td>
             <td>
-               <p>'hello world'</p>
+               <p></p>
             </td>
             <td>
-               <p></p>
+               <p>Not applicable.</p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>bytearray</p>
+               <p>boolean</p>
             </td>
             <td>
-               <p></p>
+               <p>true/false</p>
             </td>
             <td>
-               <p>Not applicable.</p>
+               <p>Case insensitive.</p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>Complex Data Types</p>
+               <p><strong>Complex Data Types</strong></p>
             </td>
             <td>
                <p></p>
@@ -1646,7 +1624,7 @@ X = FOREACH A GENERATE f1 as user, f2 as
      <!-- ++++++++++++++++++++++++++++++++++ -->  
    <section  id="schema-simple">
    <title>Schemas for Simple Data Types</title>
-   <p>Simple data types include int, long, float, double, chararray, and bytearray.</p>
+   <p>Simple data types include int, long, float, double, chararray, bytearray, and boolean.</p>
    
    <section>
    <title>Syntax</title>
@@ -2904,7 +2882,7 @@ DUMP X;
             </td>
          </tr>
    </table>
-   <p>Pig does not support a boolean data type. However, the result of a boolean expression (an expression that includes boolean and comparison operators) is always of type boolean (true or false).</p>
+   <p>The result of a boolean expression (an expression that includes boolean and comparison operators) is always of type boolean (true or false).</p>
    
    <section>
    <title>Example</title>
@@ -2922,41 +2900,9 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
    <title>Description</title>
    <p>Pig Latin supports casts as shown in this table. </p>
    <table>
-         <tr>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p>to </p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-         </tr>
-         <tr>
+       <tr>
             <td>
-               <p>from </p>
+               <p><strong>from /  to</strong></p>
             </td>
             <td>
                <p>bag </p>
@@ -2985,6 +2931,9 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
             <td>
                <p>bytearray </p>
             </td>
+            <td>
+               <p>boolean</p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3017,6 +2966,9 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>error</p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3049,6 +3001,9 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>error</p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3081,6 +3036,9 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>error</p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3113,6 +3071,9 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
             <td>
                <p>error </p>
             </td>
+           <td>
+               <p>error</p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3145,6 +3106,9 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
             <td>
                <p>error </p>
             </td>
+           <td>
+               <p>error</p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3177,6 +3141,9 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>error</p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3209,6 +3176,9 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>error</p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3241,6 +3211,9 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
             <td>
                <p>error </p>
             </td>
+           <td>
+               <p>yes</p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3273,6 +3246,44 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 >
             <td>
                <p></p>
             </td>
+            <td>
+               <p>yes</p>
+            </td>
+         </tr>
+                  <tr>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error</p>
+            </td>
+            <td>
+               <p>yes</p>
+            </td>
+            <td>
+               <p>error</p>
+            </td>
+            <td>
+               <p></p>
+            </td>
          </tr>
    </table>
 
@@ -3603,31 +3614,30 @@ dump E; 
          </tr>
    </table>
    <p>Use the comparison operators with numeric and string data.</p>
-   
+    </section>
+    
    <section>
-   <title>Example: numeric</title>
+   <title>Examples</title>
+
+<p><strong>Numeric Example</strong></p>
 <source>
 X = FILTER A BY (f1 == 8);
 </source>
-   </section>
-   
-   <section>
-   <title>Example: string</title>
+
+<p><strong>String Example</strong></p>   
 <source>
 X = FILTER A BY (f2 == 'apache');
 </source>
-   </section>
-   
-   <section>
-   <title>Example: matches</title>
+
+ <p><strong>Matches Example</strong></p>    
 <source>
 X = FILTER A BY (f1 matches '.*apache.*');
 </source>
    </section>
    
    <section id="types-table-equal">
-   <title>Types Table: equal (==) and not equal (!=) operators</title>
-   <p>* bytearray cast as this data type</p>
+   <title>Types Table: equal (==) operator</title>
+   <p></p>
    <table>
          <tr>
             <td>
@@ -3660,6 +3670,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>bytearray </p>
             </td>
+            <td>
+               <p>boolean </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3692,6 +3705,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3701,7 +3717,11 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p></p>
             </td>
             <td>
-               <p>boolean (see Note 1) </p>
+               <p>boolean </p>
+               <p>(see Note 1) </p>
+            </td>
+            <td>
+               <p>error </p>
             </td>
             <td>
                <p>error </p>
@@ -3757,6 +3777,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3789,6 +3812,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>cast as boolean </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3821,6 +3847,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>cast as boolean </p>
             </td>
+             <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3853,6 +3882,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>cast as boolean Â </p>
             </td>
+             <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3885,6 +3917,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>cast as boolean Â </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3917,6 +3952,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>cast as boolean </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -3949,14 +3987,53 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>boolean </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
+         </tr>
+                  <tr>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p>boolean </p>
+            </td>
          </tr>
    </table>
    <p>Note 1: boolean (Tuple A is equal to tuple B if they have the same size s, and for all 0 &lt;= i &lt; s A[i] = = B[i])</p>
    <p>Note 2: boolean (Map A is equal to map B if A and B have the same number of entries, and for every key k1 in A with a value of v1, there is a key k2 in B with a value of v2, such that k1 = = k2 and v1 = = v2)</p>
-   </section>
-   
-   <section>
-   <title></title>
+</section>
+
+   <section id="types-table-not-equal">
+   <title>Types Table: not equal (!=) operator</title>
+     <p></p>
    <table>
          <tr>
             <td>
@@ -3989,6 +4066,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>bytearray </p>
             </td>
+            <td>
+               <p>boolean </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -4021,6 +4101,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>error </p>
             </td>
+          <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -4053,6 +4136,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -4085,6 +4171,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -4117,6 +4206,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>boolean (bytearray cast as int) </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -4149,6 +4241,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>boolean (bytearray cast as long) </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -4181,6 +4276,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>boolean (bytearray cast as float) </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -4213,6 +4311,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>boolean (bytearray cast as double) </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -4245,6 +4346,9 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>boolean (bytearray cast as chararray) </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
@@ -4277,6 +4381,44 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>boolean </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p>error</p>
+            </td>
          </tr>
    </table>
    </section>
@@ -4319,8 +4461,143 @@ X = FILTER A BY (f1 matches '.*apache.*'
             </td>
          </tr>
    </table>
-   </section></section></section>
+   </section>
+   </section>
+
+   
+
+  <!-- =================================================================== -->    
+   <section id="type-construction">
+   <title>Type Construction Operators</title>
+   
+   <section>
+   <title>Description</title>
+   <table>
+      <tr>
+            <td>
+               <p>Operator</p>
+            </td>
+            <td>
+               <p>Symbol</p>
+            </td>
+            <td>
+               <p>Â Notes</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>tuple constructor </p>
+            </td>
+            <td>
+               <p> ( ) </p>
+            </td>
+            <td>
+               <p>Use to construct a tuple from the specified elements. Equivalent to <a href="func.html#totuple">TOTUPLE</a>.</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>bag constructor</p>
+            </td>
+            <td>
+               <p> { }</p>
+            </td>
+            <td>
+               <p>Use to construct a bag from the specified elements. Equivalent to <a href="func.html#tobag">TOBAG</a>.</p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>map constructor</p>
+            </td>
+            <td>
+               <p> [ ]</p>
+            </td>
+            <td>
+               <p>Use to construct a bag from the specified elements. Equivalent to <a href="func.html#tomap">TOMAP</a>.</p>
+            </td>
+         </tr>
+   </table>
+
+<p></p>   
+<p>Note the following:</p>  
+<ul>
+<li>These operators can be used anywhere where the expression of the corresponding type is acceptable including FOREACH GENERATE, FILTER, etc.</li>
+<li>A single element enclosed in parens ( ) like (5) is not considered to be a tuple but rather an arithmetic operator.</li>
+<li>For bags, every element is put in the bag; if the element is not a tuple Pig will create a tuple for it:
+<ul>
+<li> Given this {$1, $2}  Pig creates this {($1), ($2)} a bag with two tuples
+<p>... neither $1 and $2 are tuples so Pig creates a tuple around each item</p> <p>&nbsp;</p></li>
+
+<li> Given this {($1), $2} Pig creates this {($1), ($2)} a bag with two tuples
+<p>... since ($1) is treated as $1 (one cannot create a single element tuple using this syntax), {($1), $2} becomes {$1, $2} and Pig creates a tuple around each item</p><p>&nbsp;</p></li>
+
+<li> Given this {($1, $2)} Pig creates this {($1, $2)} a bag with a single tuple
+<p>... Pig creates a tuple ($1, $2) and then puts this tuple into the bag</p><p>&nbsp;</p></li>
+
+</ul> 
+</li>
+</ul>
+</section>
+   
+<!-- ++++++++++++++++++++++++++++++++++ --> 
+   <section>
+   <title>Examples</title>
+<p><strong>Tuple Construction</strong></p>
+<source>
+A = load 'students' as (name:chararray, age:int, gpa:float);
+B = foreach A generate (name, age);
+store B into âresultsâ;
+
+Input (students):
+joe smith  20  3.5
+amy chen   22  3.2
+leo allen  18  2.1
+
+Output (results):
+(joe smith,20)
+(amy chen,22)
+(leo allen,18)
+</source>   
    
+<!-- ++++++++++++++++++++++++++++++++++ --> 
+<p><strong>Bag Construction</strong></p>
+<source>
+A = load 'students' as (name:chararray, age:int, gpa:float);
+B = foreach A generate {(name, age)}, {name, age};
+store B into âresultsâ;
+
+Input (students):
+joe smith  20  3.5
+amy chen   22  3.2
+leo allen  18  2.1
+
+Output (results):
+{(joe smith,20)}   {(joe smith),(20)}
+{(amy chen,22)}    {(amy chen),(22)}
+{(leo allen,18)}   {(leo allen),(18)}
+</source>   
+
+<!-- ++++++++++++++++++++++++++++++++++ -->    
+<p><strong>Map Construction</strong></p>
+<source>
+A = load 'students' as (name:chararray, age:int, gpa:float);
+B = foreach A generate [name, gpa];
+store B into âresultsâ;
+
+Input (students):
+joe smith  20  3.5
+amy chen   22  3.2
+leo allen  18  2.1
+
+Output (results):
+[joe smith#3.5]
+[amy chen#3.2]
+[leo allen#2.1]
+</source>
+</section>
+</section>
+
 
   <!-- =================================================================== -->    
    <section id="deref">
@@ -4374,11 +4651,14 @@ X = FILTER A BY (f1 matches '.*apache.*'
             </td>
          </tr>
    </table>
+   </section>
    
-   <section>
-   <title>Example: Tuple</title>
-   <p>Suppose we have relation A.</p>
-
+<!-- ++++++++++++++++++++++++++++++++++++ -->    
+<section>
+<title>Examples</title> 
+  
+<p><strong>Tuple Example</strong></p>   
+<p>Suppose we have relation A.</p>
 <source>
 LOAD 'data' as (f1:int, f2:tuple(t1:int,t2:int,t3:int));
 
@@ -4401,11 +4681,12 @@ DUMP X;
 (1,7)
 (2,8)
 </source>
-   </section>
+
    
-   <section>
-   <title>Example: Bag</title>
-   <p>Suppose we have relation B, formed by grouping relation A (see the GROUP operator for information about the field names in relation B).</p>
+<!-- ++++++++++++++++++++++++++++++++++++ --> 
+<p><strong>Bag Example</strong></p>   
+   
+<p>Suppose we have relation B, formed by grouping relation A (see the GROUP operator for information about the field names in relation B).</p>
 <source>
 A = LOAD 'data' AS (f1:int, f2:int,f3:int);
 
@@ -4443,13 +4724,12 @@ DUMP X;
 ({(8),(8)})
 </source>
    
-   </section>
-   
-   <section>
-   <title>Example: Tuple and Bag</title>
-    <p>Suppose we have relation B, formed by grouping relation A Â (see the GROUP operator for information about the field names in relation B).</p>
 
-   
+<!-- ++++++++++++++++++++++++++++++++++++ --> 
+<p><strong>Tuple/Bag Example</strong></p>
+
+<p>Suppose we have relation B, formed by grouping relation A Â (see the GROUP operator for information about the field names in relation B).</p>
+
 <source>
 A = LOAD 'data' AS (f1:int, f2:int, f3:int);
 
@@ -4492,11 +4772,10 @@ DUMP X;
 (8,{(8)})
 (8,{(8)})
 </source>
-   </section>
-   
-   <section>
-   <title>Example: Map</title>
-   <p>Suppose we have relation A. </p>
+
+<!-- ++++++++++++++++++++++++++++++++++++ -->     
+<p><strong>Map Example</strong></p>
+ <p>Suppose we have relation A. </p>
 
 <source>
 A = LOAD 'data' AS (f1:int, f2:map[]);
@@ -4519,7 +4798,8 @@ DUMP X;
 ()
 ()
 </source>
-  </section></section></section>
+</section>
+</section>
   
   <!-- =================================================================== -->    
 <section id="disambiguate">
@@ -4611,13 +4891,17 @@ grunt>
    </table>
    <p></p>
    <p>For a detailed discussion of nulls see <a href="#nulls">Nulls and Pig Latin</a>.</p>
+      </section>
+      
    <section>
-   <title>Example</title>
+   <title>Examples</title>
+   
+   <p>In this example, values that are not null are obtained.</p>
 <source>
 X = FILTER A BY f1 is not null;
 </source>
-   </section></section>
-   
+   </section>
+
    <section id="types-table-nulls">
    <title>Types Table</title>
    <p>The null operators can be applied to all data types (see <a  href="#nulls">Nulls and Pig Latin</a>). </p>
@@ -4665,14 +4949,17 @@ X = FILTER A BY f1 is not null;
             </td>
          </tr>
    </table>
+   </section>
    
    <section>
-   <title>Example</title>
+   <title>Examples</title>
+<p>In this example, the negation operator is applied to the "x" values.</p>   
 <source>
 A = LOAD 'data' as (x, y, z);
 
 B = FOREACH A GENERATE -x, y;
 </source>
+</section>
    
    </section>
    
@@ -4752,7 +5039,7 @@ B = FOREACH A GENERATE -x, y;
             </td>
          </tr>
    </table>
-   </section></section></section>
+   </section>
   
 </section>   
 
@@ -5128,9 +5415,10 @@ DUMP X;
                <p>nested_op</p>
             </td>
             <td>
-               <p>Allowed operations are DISTINCT, FILTER, LIMIT, and ORDER BY. </p>
-               <p>The FOREACHâ¦GENERATE operation itself is not allowed since this could lead to an arbitrary number of nesting levels.</p>
-               <p>You can also perform projections (see <a href="#nestedblock">Example: Nested Block</a>).</p>
+               <p>Allowed operations are CROSS, DISTINCT, FILTER, FOREACH, LIMIT, and ORDER BY. </p>
+               <p>Note: FOREACH statements can be nested to two levels only. FOREACH statements that are nested to three or more levels will result in a grammar error.</p>
+               <p>You can also perform projections within the nested block.</p>
+               <p>For examples, see <a href="#nestedblock">Example: Nested Block</a>.</p>
             </td>
          </tr>
          <tr>
@@ -5223,6 +5511,8 @@ DUMP X;
    
    <section>
    <title>Example: Nested Projection</title>
+
+
    <p>In this example if one of the fields in the input relation is a tuple, bag or map, we can perform a projection on that field (using a deference operator).</p>
 <source>
 X = FOREACH C GENERATE group, B.b2;
@@ -5339,6 +5629,37 @@ E = GROUP D BY A::x;
    
    <section id="nestedblock">
    <title>Example: Nested Block</title>
+      <p>In this example a CROSS is performed within the nested block.</p>
+<source>
+ user = load 'user' as (uid, age, gender, region);
+session = load 'session' as (uid, region);
+C = cogroup user by uid, session by uid;
+D = foreach C {
+    crossed = cross user, session;
+    generate crossed;  
+</source>
+<p>In this example FOREACH is nested to the second level.</p>
+<source>
+a = load '1.txt' as (a0, a1:chararray, a2:chararray); 
+b = group a by a0; 
+c = foreach b { 
+    c0 = foreach a generate TOMAP(a1,a2); 
+    generate c0; 
+} 
+dump c; 
+</source>
+<p>This example shows a CROSS and FOREACH nested to the second level.</p>
+<source>
+a = load '1.txt' as (a0, a1, a2); 
+b = load '2.txt' as (b0, b1); 
+c = cogroup a by a0, b by b0; 
+d = foreach c { 
+    d0 = cross a, b; 
+    d1 = foreach d0 generate a1+b1; 
+    generate d1; 
+} 
+dump d;
+</source>
    <p>Suppose we have relations A and B. Note that relation B contains an inner bag.</p>
 <source>
 A = LOAD 'data' AS (url:chararray,outlink:chararray);
@@ -5468,7 +5789,6 @@ DUMP X;
                <p></p>
                <p>If your data and loaders satisfy these conditions, use the âcollectedâ clause to perform an optimized version of GROUP; 
             the operation will execute on the map side and avoid running the reduce phase.</p>
-               <p>Note that the Zebra loader satisfies the conditions (see <a href="zebra_pig.html">Zebra and Pig</a>).</p>
                <p></p>
             </td>
          </tr>    
@@ -5504,7 +5824,6 @@ DUMP X;
                <p></p>
                <p>If your data and loaders satisfy these conditions, the âmergeâ clause to perform an optimized version of COGROUP; 
                the operation will execute on the map side and avoid running the reduce phase.</p>
-               <p>Note that the Zebra loader satisfies the conditions (see <a href="zebra_pig.html">Zebra and Pig</a>).</p>
             </td>
             
          </tr>     
@@ -5704,29 +6023,7 @@ DUMP X;
  B = GROUP A BY (tcid, tpid); 
 </source>
     </section>   
-  
-   <section>
-   <title>Example</title>
-<p>This example shows how to group using the collected keyword.</p>   
-<source>
- register zebra.jar;
- A = LOAD 'studentsortedtab' USING org.apache.hadoop.zebra.pig.TableLoader('name, age, gpa', 'sorted');
- B = GROUP A BY name USING 'collected';
- C = FOREACH b GENERATE group, MAX(a.age), COUNT_STAR(a);
-</source>
-    </section>
-    
-       <section>
-   <title>Example</title>
-<p>This example shows how to use COGROUP with the merge keyword.</p>   
-<source>
- register zebra.jar;
- A = LOAD 'data1' USING org.apahce.hadoop.zebra.pig.TableLoader('id:int', 'sorted');
- B = LOAD 'data2' USING org.apahce.hadoop.zebra.pig.TableLoader('id:int', 'sorted');'
- C = COGROUP A BY id, B BY id USING 'merge';
-</source>
-    </section>
-    
+     
    <section id="partitionby">
    <title>Example: PARTITION BY</title>
 <p>To use the Hadoop Partitioner add PARTITION BY clause to the appropriate operator: </p>
@@ -5771,7 +6068,7 @@ public class SimpleCustomPartitioner ext
    <table>
       <tr> 
             <td>
-               <p>alias = JOIN alias BY {expression|'('expression [, expression â¦]')'} (, alias BY {expression|'('expression [, expression â¦]')'} â¦) [USING 'replicated' | 'skewed' | 'merge'] [PARTITION BY partitioner] [PARALLEL n];Â  </p>
+               <p>alias = JOIN alias BY {expression|'('expression [, expression â¦]')'} (, alias BY {expression|'('expression [, expression â¦]')'} â¦) [USING 'replicated' | 'skewed' | 'merge' | 'merge-sparse'] [PARTITION BY partitioner] [PARALLEL n];Â  </p>
             </td>
          </tr> 
    </table></section>
@@ -5839,6 +6136,15 @@ public class SimpleCustomPartitioner ext
             </td>
          </tr>
          
+          <tr>
+            <td>
+               <p>'merge-sparse'</p>
+            </td>
+            <td>
+               <p>Use to perform merge-sparse joins (see <a href="perf.html#Merge-sparse-Joins">Merge-Sparse Joins</a>).</p>
+            </td>
+         </tr>         
+         
               <tr>      
          <td>
                <p id="partition-by-join-inner">PARTITION BY partitioner</p>
@@ -6057,6 +6363,7 @@ DUMP X;
             </td>
          </tr>
          
+         
       <tr>      
          <td>
                <p id="partition-by-join-outer">PARTITION BY partitioner</p>
@@ -6172,7 +6479,14 @@ C = JOIN A BY name FULL, B BY name USING
                <p>n</p>
             </td>
             <td>
-               <p>The number of output tuples (a constant).</p>
+               <p>The number of output tuples, either:</p>
+               <ul>
+					<li>a constant (for example, 3)</li>
+					<li>a scalar used in an expression (for example, c.sum/100)</li>
+				</ul>
+				<p></p>
+				<p>Note: The expression can consist of constants or scalars; it cannot contain any columns from the input relation.</p>
+				<p>Note: Using a scalar instead of a constant in LIMIT automatically disables most optimizations (only push-before-foreach is performed).</p>
             </td>
          </tr> 
    </table></section>
@@ -6188,6 +6502,15 @@ C = JOIN A BY name FULL, B BY name USING
    
    <section>
    <title>Examples</title>
+   <p>In this example the lmit is express as a scalar.</p>
+ <source>
+a = load 'a.txt';
+b = group a all;
+c = foreach b generate COUNT(a) as sum;
+d = order a by $0;
+e = limit d c.sum/100;
+</source>
+<p></p>
    <p>Suppose we have relation A.</p>
 <source>
 A = LOAD 'data' AS (a1:int,a2:int,a3:int);
@@ -6620,7 +6943,13 @@ DUMP X;
                <p>size</p>
             </td>
             <td>
-               <p>Sample size, range 0 to 1 (for example, enter 0.1 for 10%).</p>
+               <p>Sample size, either</p>
+               <ul>
+               <li>a constant, rage 0 to 1 (for example, enter 0.1 for 10%)</li>
+                <li>a scalar used in an expression</li>
+               </ul>
+               <p></p>
+               <p>Note: The expression can consist of constants or scalars; it cannot contain any columns from the input relation.</p>
             </td>
          </tr> 
    </table></section>
@@ -6640,6 +6969,13 @@ A = LOAD 'data' AS (f1:int,f2:int,f3:int
 
 X = SAMPLE A 0.01;
 </source>
+<p>In this example, a scalar expression is used (it will sample approximately 1000 records from the input).</p>
+<source>
+a = load 'a.txt';
+b = group a all;
+c = foreach b generate COUNT(a) as num_rows;
+e = sample d 1000/num_rows;
+</source>
    </section></section>  
    
    <section>
@@ -6651,7 +6987,7 @@ X = SAMPLE A 0.01;
    <table>
       <tr> 
             <td>
-               <p>SPLIT alias INTO alias IF expression, alias IF expression [, alias IF expression â¦];</p>
+               <p>SPLIT alias INTO alias IF expression, alias IF expression [, alias IF expression â¦] [, alias OTHERWISE];</p>
             </td>
          </tr> 
    </table></section>
@@ -6691,6 +7027,14 @@ X = SAMPLE A 0.01;
                <p>An expression.</p>
             </td>
          </tr> 
+           <tr>
+            <td>
+               <p>OTHERWISE</p>
+            </td>
+            <td>
+               <p>Optional keyword. Designates a default relation.</p>
+            </td>
+         </tr>
    </table></section>
    
    <section>

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/cmds.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/cmds.xml?rev=1203929&r1=1203928&r2=1203929&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/cmds.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/cmds.xml Sat Nov 19 00:50:26 2011
@@ -341,7 +341,9 @@ The following properties are supported:
    
    <section>
    <title>Usage</title>
-   <p>The kill command enables you to kill a job based on a job id.</p></section>
+   <p>Use the kill command to kill a Pig job based on the job id.</p>
+   <p>The kill command will attempt to kill any MapReduce jobs associated with the Pig job. Under certain conditions, however, this may fail; for example, when a Pig job is killed and does not have a chance to call its shutdown procedures.</p>
+   </section>
    
    <section>
    <title>Example</title>

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/func.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/func.xml?rev=1203929&r1=1203928&r2=1203929&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/func.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/func.xml Sat Nov 19 00:50:26 2011
@@ -1011,7 +1011,7 @@ DUMP X;
    <table>
        <tr>
             <td>
-               <p>TOKENIZE(expression)Â  Â  Â  Â  </p>
+               <p>TOKENIZE(expression [, 'field_delimiter'])Â  Â  Â  Â  </p>
             </td>
          </tr> 
    </table></section>
@@ -1027,11 +1027,20 @@ DUMP X;
                <p>An expression with data type chararray.</p>
             </td>
          </tr> 
+          <tr>
+            <td>
+               <p>'field_delimiter'</p>
+            </td>
+            <td>
+               <p>An optional field delimiter (in single quotes).</p>
+               <p>If field_delimiter is null or not passed, the following will be used as delimiters: space [  ], double quote [ " ], coma [ , ] parenthesis [ () ], star [ * ].</p>
+            </td>
+         </tr>
    </table></section>
    
    <section>
    <title>Usage</title>
-   <p>Use the TOKENIZE function to split a string of words (all words in a single tuple) into a bag of words (each word in a single tuple). The following characters are considered to be word separators: space, double quote("), coma(,) parenthesis(()), star(*).</p>
+   <p>Use the TOKENIZE function to split a string of words (all words in a single tuple) into a bag of words (each word in a single tuple). </p>
    </section>
    
    <section>
@@ -1052,7 +1061,14 @@ DUMP X;
 ({(Here),(is),(the),(second),(string.)})
 ({(Here),(is),(the),(third),(string.)})
 </source>
-   
+   <p>In this example a field delimiter is specified.</p>
+<source>
+{code}
+A = LOAD 'data' AS (f1:chararray);
+B = FOREACH A TOKENIZE (f1,'||');
+DUMP B;
+{code} 
+</source>
    </section></section></section>
    
    <!-- ======================================================================== -->
@@ -1214,10 +1230,6 @@ dump X; 
 ()
 </source>
 </section>
-
-<section>
-<title>More Examples</title>
-</section>
 </section>
  
  
@@ -3081,8 +3093,70 @@ Use the UPPER function to convert all ch
 <!-- ======================================================== -->
 <!-- Other Functions -->
 <section id="bag-tuple-functions">
-<title>Bag and Tuple Functions</title>
+<title>Tuple, Bag, Map Functions</title>
+
+<!-- ======================================================== -->  
+ <section id="totuple">
+   <title>TOTUPLE</title>
+   <p>Converts one or more expressions to type tuple. </p>
+
+
+<section>
+   <title>Syntax</title>
+   <table>
+       <tr>
+            <td>
+               <p>TOTUPLE(expression [, expression ...])</p>
+            </td>
+         </tr> 
+        
+   </table>
+ </section>
+
+<section>
+   <title>Terms</title>
+   <table>
+       <tr>
+            <td>
+               <p>expression</p>
+            </td>
+            <td>
+               <p>An expression of any datatype.</p>
+            </td>
+         </tr> 
+   </table>
+</section>
 
+<section>
+     <title>Usage</title>
+     <p>Use the TOTUPLE function to convert one or more expressions to a tuple.</p>
+        <p>See also: <a href="basic.html#tuple">Tuple</a> data type and <a href="basic.html#type-construction">Type Construction Operators</a></p>
+ </section>
+ 
+ <section>
+     <title>Example</title>
+     <p>
+In this example, fields f1, f2 and f3 are converted to a tuple.
+     </p>
+ <source>
+a = LOAD 'student' AS (f1:chararray, f2:int, f3:float);
+DUMP a;
+
+(John,18,4.0)
+(Mary,19,3.8)
+(Bill,20,3.9)
+(Joe,18,3.8)
+
+b = FOREACH a GENERATE TOTUPLE(f1,f2,f3);
+DUMP b;
+
+((John,18,4.0))
+((Mary,19,3.8))
+((Bill,20,3.9))
+((Joe,18,3.8))
+</source>
+ </section>
+</section>
 
 <!-- ======================================================== -->
  <section id="tobag">
@@ -3117,9 +3191,8 @@ Use the UPPER function to convert all ch
 
 <section>
      <title>Usage</title>
-     <p>
-Use the TOBAG function to convert one or more expressions to individual tuples which are then placed in a bag.
-     </p>
+     <p>Use the TOBAG function to convert one or more expressions to individual tuples which are then placed in a bag.</p>
+     <p>See also: <a href="basic.html#bag">Bag</a> data type and <a href="basic.html#type-construction">Type Construction Operators</a></p>
  </section>
  
  <section>
@@ -3149,17 +3222,18 @@ DUMP b;
 
 </section>
 
- <!-- ======================================================== -->  
- <section id="topx">
-   <title>TOP</title>
-   <p>Returns the top-n tuples from a bag of tuples.</p>
+
+<!-- ======================================================== -->
+ <section id="tomap">
+   <title>TOMAP</title>
+   <p>Converts key/value expression pairs into a map. </p>
 
 <section>
    <title>Syntax</title>
    <table>
        <tr>
             <td>
-               <p>TOP(topN,column,relation)</p>
+               <p>TOMAP(key-expression, value-expression [, key-expression, value-expression ...])</p>
             </td>
          </tr> 
         
@@ -3171,68 +3245,77 @@ DUMP b;
    <table>
        <tr>
             <td>
-               <p>topN</p>
+               <p>key-expression</p>
             </td>
             <td>
-               <p>The number of top tuples to return (type integer).</p>
+               <p>An expression of type chararray.</p>
             </td>
          </tr> 
                 <tr>
             <td>
-               <p>column</p>
+               <p>value-expression</p>
             </td>
             <td>
-               <p>The tuple column whose values are being compared.</p>
+               <p>An expression of any type supported by a map.</p>
             </td>
-         </tr> 
-                <tr>
-            <td>
-               <p>relation</p>
-            </td>
-            <td>
-               <p>The relation (bag of tuples) containing the tuple column.</p>
-            </td>
-         </tr> 
+         </tr>
    </table>
 </section>
 
 <section>
      <title>Usage</title>
-     <p>
-TOP function returns a bag containing top N tuples from the input bag where N is controlled by the first parameter to the function. The tuple comparison is performed based on a single column from the tuple. The column position is determined by the second parameter to the function. The function assumes that all tuples in the bag contain an element of the same type in the compared column
-     </p>
-</section>
+     <p>Use the TOMAP function to convert pairs of expressions into a map. Note the following:</p>
+     <ul>
+<li>You must supply an even number of expressions as parameters</li>
+<li>The elements must comply with map type rules:
+<ul>
+<li>Every odd element (key-expression) must be a chararray since only chararrays can be keys into the map</li>
+<li>Every even element (value-expression) can be of any type supported by a map. </li>
+</ul>
+</li>
+     </ul>
+ <p></p>    
+ <p>See also: <a href="basic.html#map">Map</a> data type and  <a href="basic.html#type-construction">Type Construction Operators</a></p>  
+ </section>
  
  <section>
      <title>Example</title>
      <p>
-In this example the top 10 occurrences are returned.
+In this example, student names (type chararray) and student GPAs (type float) are used to create three maps.
      </p>
  <source>
-A = LOAD 'data' as (first: chararray, second: chararray);
-B = GROUP A BY (first, second);
-C = FOREACH B generate FLATTEN(group), COUNT(*) as count;
-D = GROUP C BY first; // again group by first
-topResults = FOREACH D {
-    result = TOP(10, 2, C); // and retain top 10 occurrences of 'second' in first
-    GENERATE FLATTEN(result);
-}
+A = load 'students' as (name:chararray, age:int, gpa:float);
+B = foreach A generate TOMAP(name, gpa);
+store B into âresultsâ;
+
+Input (students)
+joe smith 20 3.5
+amy chen 22 3.2
+leo allen 18 2.1
+
+Output (results)
+[joe smith#3.5]
+[amy chen#3.2]
+[leo allen#2.1]
 </source>
      
  </section>
 
 </section>
-<!-- ======================================================== -->  
- <section id="totuple">
-   <title>TOTUPLE</title>
-   <p>Converts one or more expressions to type tuple. </p>
+
+
+
+ <!-- ======================================================== -->  
+ <section id="topx">
+   <title>TOP</title>
+   <p>Returns the top-n tuples from a bag of tuples.</p>
 
 <section>
    <title>Syntax</title>
    <table>
        <tr>
             <td>
-               <p>TOTUPLE(expression [, expression ...])</p>
+               <p>TOP(topN,column,relation)</p>
             </td>
          </tr> 
         
@@ -3244,10 +3327,26 @@ topResults = FOREACH D {
    <table>
        <tr>
             <td>
-               <p>expression</p>
+               <p>topN</p>
             </td>
             <td>
-               <p>An expression of any datatype.</p>
+               <p>The number of top tuples to return (type integer).</p>
+            </td>
+         </tr> 
+                <tr>
+            <td>
+               <p>column</p>
+            </td>
+            <td>
+               <p>The tuple column whose values are being compared.</p>
+            </td>
+         </tr> 
+                <tr>
+            <td>
+               <p>relation</p>
+            </td>
+            <td>
+               <p>The relation (bag of tuples) containing the tuple column.</p>
             </td>
          </tr> 
    </table>
@@ -3256,35 +3355,28 @@ topResults = FOREACH D {
 <section>
      <title>Usage</title>
      <p>
-Use the TOTUPLE function to convert one or more expressions to a tuple.
+TOP function returns a bag containing top N tuples from the input bag where N is controlled by the first parameter to the function. The tuple comparison is performed based on a single column from the tuple. The column position is determined by the second parameter to the function. The function assumes that all tuples in the bag contain an element of the same type in the compared column
      </p>
- </section>
+</section>
  
  <section>
      <title>Example</title>
      <p>
-In this example, fields f1, f2 and f3 are converted to a tuple.
+In this example the top 10 occurrences are returned.
      </p>
  <source>
-a = LOAD 'student' AS (f1:chararray, f2:int, f3:float);
-DUMP a;
-
-(John,18,4.0)
-(Mary,19,3.8)
-(Bill,20,3.9)
-(Joe,18,3.8)
-
-b = FOREACH a GENERATE TOTUPLE(f1,f2,f3);
-DUMP b;
-
-((John,18,4.0))
-((Mary,19,3.8))
-((Bill,20,3.9))
-((Joe,18,3.8))
+A = LOAD 'data' as (first: chararray, second: chararray);
+B = GROUP A BY (first, second);
+C = FOREACH B generate FLATTEN(group), COUNT(*) as count;
+D = GROUP C BY first; // again group by first
+topResults = FOREACH D {
+    result = TOP(10, 2, C); // and retain top 10 occurrences of 'second' in first
+    GENERATE FLATTEN(result);
+}
 </source>
+     
  </section>
 </section>
-
 </section>
 <!-- End Other Functions -->
 

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/perf.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/perf.xml?rev=1203929&r1=1203928&r2=1203929&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/perf.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/perf.xml Sat Nov 19 00:50:26 2011
@@ -1118,7 +1118,7 @@ foreach statement where the foreach stat
 <li>Type information must be provided for the join key in the schema.</li>
 </ul>
 <p></p>
-<p>The Zebra and PigStorage loaders satisfy all of these conditions.</p>
+<p>The PigStorage loader satisfies all of these conditions.</p>
 <p></p>
 
 <p><strong>Condition B</strong></p>
@@ -1131,17 +1131,43 @@ foreach statement where the foreach stat
 <li>Type information must be provided for the join key in the schema.</li>
 </ul>
 <p></p>
-<p>The Zebra loader satisfies all of these conditions.</p>
+<p>Pig does not provide a loader that supports outer merge joins. You will need to build your own loader to take advantage of this feature.</p>
+</section>
+</section>
+<!-- END MERGE JOIN -->
+
+
+
+<!-- +++++++++++++++++++++++++++++++ -->
+<!-- MERGE SPARSE JOIN-->
+<section id="merge-sparse-joins">
+<title>Merge-Sparse Joins</title>
+<p>Merge-Sparse join is a specialization of merge join. Merge-sparse join is intended for use when one of the tables is very sparse, meaning you expect only a small number of records to be matched during the join. In tests this join performed well for cases where less than 1% of the data was matched in the join.</p>
+
 
-<p>An example of a left outer merge join using the Zebra loader:</p>
+<section>
+<title>Usage</title>
+<p>Perform a merge-sparse join with the USING clause (see <a href="basic.html#JOIN-inner">JOIN (inner)</a>). </p>
 <source>
-A = load 'data1' using org.apache.hadoop.zebra.pig.TableLoader('id:int', 'sorted'); 
-B = load 'data2' using org.apache.hadoop.zebra.pig.TableLoader('id:int', 'sorted'); 
-C = join A by id left, B by id using 'merge'; 
+a = load 'sorted_input1' using org.apache.pig.piggybank.storage.IndexedStorage('\t', '0');
+b = load 'sorted_input2' using org.apache.pig.piggybank.storage.IndexedStorage('\t', '0');
+c = join a by $0, b by $0 using 'merge-sparse';
+store c into 'results';
 </source>
 </section>
+
+<section>
+<title>Conditions</title>
+<p>Merge-sparse join only works for inner joins and is not currently implemented for outer joins.</p>
+
+<p>For inner joins, the preconditions are the same as for merge join with the exception of constrains on the right-side loader. For sparse-merge joins the loader must implement IndexedLoadFunc or the join will fail.</p>
+
+<p>Piggybank now contains a load function called org.apache.pig.piggybank.storage.IndexedStorage that is a derivation of PigStorage and implements IndexedLoadFunc. This is the only loader included in the standard Pig distribution that can be used for merge-sparse join.</p>
+
 </section>
-<!-- END MERGE JOIN -->
+
+</section>
+<!-- END MERGE-SPARSE JOIN -->
 
 <!-- +++++++++++++++++++++++++++++++ -->
 <section id="specialized-joins-performance">

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/pig-index.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/pig-index.xml?rev=1203929&r1=1203928&r2=1203929&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/pig-index.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/pig-index.xml Sat Nov 19 00:50:26 2011
@@ -83,7 +83,9 @@
 <p>&nbsp; + &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="basic.html#sign">sign operator (positive)</a></p>
 <p>&nbsp; *  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="basic.html#sexp">star expression</a></p>
 <p>&nbsp; -  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="basic.html#arithmetic">subtraction operator</a></p>
-
+<p>&nbsp; ( ) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="basic.html#type-construction">type construction operator (tuple)</a></p>
+<p>&nbsp; { } &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="basic.html#type-construction">type construction operator (bag)</a></p>
+<p>&nbsp; [ ] &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="basic.html#type-construction">type construction operator (map)</a></p>
 
 <!-- ==== A ================================================================== -->
 <p></p>
@@ -130,6 +132,8 @@
 <br></br>&nbsp;&nbsp;&nbsp; <a href="perf.html#memory-bags">and memory allocation</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#relations">and relations</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#bag-schema">and schemas</a>
+<br></br>&nbsp;&nbsp;&nbsp; <a href="func.html#tobag">and TOBAG function</a>
+<br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#type-construction">and type construction operators</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#schema-multi">schemas for multiple types</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#bag">syntax</a>
 </p>
@@ -229,7 +233,7 @@
 <br></br>&nbsp;&nbsp;&nbsp; <a href="start.html#data-work-with">working with</a>
 </p>
 
-<p><a href="basic.html#data-types">data types</a> (simple and complex)</p>
+<p>data types. <em>See</em> types</p>
 
 <p>debugging
 <br></br>&nbsp;&nbsp;&nbsp; <a href="test.html#diagnostic-ops">diagnostic operators</a>
@@ -389,6 +393,7 @@
 <br></br>&nbsp;&nbsp;&nbsp; <a href="test.html#hadoop-job-history-loader">HadoopJobHistoryLoader</a>
 <br></br>&nbsp;&nbsp;&nbsp; hadoop partitioner. <em>See</em> PARTITION BY
 <br></br>&nbsp;&nbsp;&nbsp; <a href="start.html#hadoop-properties">Hadoop properties</a>
+<br></br>&nbsp;&nbsp;&nbsp; <a href="start.html#req">versions supported</a>
 </p>
 
 <p><a href="start.html#dfs">HDFS</a></p>
@@ -472,6 +477,7 @@
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#join-inner">inner joins</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="perf.html#join-optimizations">join optimizations</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="perf.html#merge-joins">merge joins</a>
+<br></br>&nbsp;&nbsp;&nbsp; <a href="perf.html#merge-sparse-joins">merge-sparse joins</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#join-outer">outer joins </a> 
 <br></br>&nbsp;&nbsp;&nbsp; <a href="perf.html#replicated-joins">replicated joins</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#self-joins">self joins</a>
@@ -566,6 +572,8 @@
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#map-schema">and schemas</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#schema-multi">schemas for multiple types</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#map">syntax</a>
+<br></br>&nbsp;&nbsp;&nbsp; <a href="func.html#tomap">and TOMAP function</a>
+<br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#type-construction">and type construction operators</a>
 </p>
 
 <p>matches. <em>See</em> pattern matching</p>
@@ -582,6 +590,8 @@
 
 <p><a href="perf.html#merge-joins">merge joins</a></p>
 
+<p><a href="perf.html#merge-sparse-joins">merge-sparse joins</a></p>
+
 <p><a href="func.html#min">MIN</a> function</p>
 
 <p><a href="basic.html#arithmetic">modulo operator</a> ( % )</p>
@@ -947,6 +957,8 @@
 
 <p><a href="func.html#tokenize">TOKENIZE</a> function</p>
 
+<p><a href="func.html#tomap">TOMAP</a> function</p>
+
 <p><a href="func.html#topx">TOP</a> function</p>
 
 <p><a href="func.html#totuple">TOTUPLE</a> function</p>
@@ -961,12 +973,16 @@
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#relations">and relations</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#tuple-schema">and schemas</a>
 <br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#tuple">syntax</a>
+<br></br>&nbsp;&nbsp;&nbsp; <a href="func.html#totuple">and TOTUPLE function</a>
+<br></br>&nbsp;&nbsp;&nbsp; <a href="basic.html#type-construction">and type construction operators</a>
 </p>
 
-
+<p><a href="basic.html#type-construction">type construction operators</a> (tuple, bag, map)</p>
 
 <p>type conversions. <em>See</em> casting types, types tables</p>
 
+<p><a href="basic.html#data-types">types</a> (simple and complex)</p>
+
 <p><a href="perf.html#types">types and performance</a></p>
 
 <p>types tables
@@ -979,8 +995,6 @@
 <br></br>&nbsp;&nbsp;&nbsp; <em>See also</em> casting types
 </p>
 
-
-
 <p><a href="start.html#tutorial">tutorial</a> (for Pig)</p>
 
 <!-- ==== U ================================================================== -->

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/site.xml?rev=1203929&r1=1203928&r2=1203929&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/site.xml Sat Nov 19 00:50:26 2011
@@ -51,15 +51,6 @@ See http://forrest.apache.org/docs/linki
     <index label="Index" href="pig-index.html" />
     </docs>  
       
-    <docs label="Zebra"> 
-     <zover label="Zebra Overview "	href="zebra_overview.html" />
-     <zusers label="Zebra Users "	href="zebra_users.html" />
-     <zref label="Zebra Reference "	href="zebra_reference.html" />
-     <zmr label="Zebra MapReduce "	href="zebra_mapreduce.html" />
-     <zpig label="Zebra Pig "	            href="zebra_pig.html" />
-     <zstream label="Zebra Streaming "	href="zebra_stream.html" />
-     </docs>
-
      <docs label="Miscellaneous"> 
      <api	label="API Docs" href="api/"/>
      <jdiff label="API Changes" href="ext:jdiff/changes"/>

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/start.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/start.xml?rev=1203929&r1=1203928&r2=1203929&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/start.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/start.xml Sat Nov 19 00:50:26 2011
@@ -31,18 +31,25 @@
 <!-- ++++++++++++++++++++++++++++++++++ -->
  <section id="req">
  <title>Requirements</title>
+ <p><strong>Mandatory</strong></p>
       <p>Unix and Windows users need the following:</p>
 		<ul>
-		  <li> <strong>Hadoop 0.20.2</strong> - <a href="http://hadoop.apache.org/common/releases.html">http://hadoop.apache.org/common/releases.html</a></li>
-		  <li> <strong>Java 1.6</strong> - <a href="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</a> (set JAVA_HOME to the root of your Java installation)</li>
-          <li> <strong>Python 2.5</strong> - <a href="http://jython.org/downloads.html">http://jython.org/downloads.html</a> (optional, for embedded Pig) </li>
-          <li> <strong>JavaScript 1.7</strong> - <a href="https://developer.mozilla.org/en/Rhino_downloads_archive">https://developer.mozilla.org/en/Rhino_downloads_archive</a> and <a href="http://mirrors.ibiblio.org/pub/mirrors/maven2/rhino/js/">http://mirrors.ibiblio.org/pub/mirrors/maven2/rhino/js/</a>  (optional, for embedded Pig) </li>		  
-		  
-		  <li> <strong>Ant 1.7</strong> - <a href="http://ant.apache.org/">http://ant.apache.org/</a> (optional, for builds) </li>
-		  <li> <strong>JUnit 4.5</strong> - <a href="http://junit.sourceforge.net/">http://junit.sourceforge.net/</a> (optional, for unit tests) </li>
+		  <li> <strong>Hadoop 0.20.2, 020.203, or 020.204</strong> - <a href="http://hadoop.apache.org/common/releases.html">http://hadoop.apache.org/common/releases.html</a> (You can run Pig with different versions of Hadoop by setting HADOOP_HOME to point to the directory where you have installed Hadoop. If you do not set HADOOP_HOME, by default Pig will run with the embedded version, currently Hadoop 0.20.2.)</li>
+		  <li> <strong>Java 1.6</strong> - <a href="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</a> (set JAVA_HOME to the root of your Java installation)</li>	
 		</ul>
 		<p></p>
 	<p>Windows users also need to install Cygwin and the Perl package: <a href="http://www.cygwin.com/"> http://www.cygwin.com/</a></p>
+
+<p></p>
+ <p><strong>Optional</strong></p>
+ 		<ul>
+          <li> <strong>Python 2.5</strong> - <a href="http://jython.org/downloads.html">http://jython.org/downloads.html</a> (when using Python UDFs or embedding Pig in Python) </li>
+          <li> <strong>JavaScript 1.7</strong> - <a href="https://developer.mozilla.org/en/Rhino_downloads_archive">https://developer.mozilla.org/en/Rhino_downloads_archive</a> and <a href="http://mirrors.ibiblio.org/pub/mirrors/maven2/rhino/js/">http://mirrors.ibiblio.org/pub/mirrors/maven2/rhino/js/</a>  (when using JavaScript UDFs or embedding Pig in JavaScript) </li>		  
+		  
+		  <li> <strong>Ant 1.7</strong> - <a href="http://ant.apache.org/">http://ant.apache.org/</a> (for builds) </li>
+		  <li> <strong>JUnit 4.5</strong> - <a href="http://junit.sourceforge.net/">http://junit.sourceforge.net/</a> (for unit tests) </li>
+		</ul>
+ 
   </section>         
    
 <!-- ++++++++++++++++++++++++++++++++++ -->        

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/tabs.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/tabs.xml?rev=1203929&r1=1203928&r2=1203929&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/tabs.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/tabs.xml Sat Nov 19 00:50:26 2011
@@ -32,6 +32,6 @@
   -->
   <tab label="Project" href="http://hadoop.apache.org/pig/" type="visible" /> 
   <tab label="Wiki" href="http://wiki.apache.org/pig/" type="visible" /> 
-  <tab label="Pig 0.9.0 Documentation" dir="" type="visible" /> 
+  <tab label="Pig 0.10.0 Documentation" dir="" type="visible" /> 
 
 </tabs>

Modified: pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/udf.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/udf.xml?rev=1203929&r1=1203928&r2=1203929&view=diff
==============================================================================
--- pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/udf.xml (original)
+++ pig/branches/branch-0.10/src/docs/src/documentation/content/xdocs/udf.xml Sat Nov 19 00:50:26 2011
@@ -1492,11 +1492,20 @@ b = foreach a generate myfuncs.helloworl
  </source>
  </section>
  
-    <!-- ++++++++++++++++++++++++++++++++++++++++++++++++ -->
+ <!-- ++++++++++++++++++++++++++++++++++++++++++++++++ -->
  <section id="python-advanced">
  <title>Advanced Topics</title> 
-  <p>UDFs and Pig scripts are generally stored in separate files. For testing purposes you can combine the code in a single file - a "combined" script. 
- Note, however, if you then decide to embed this "combined" script in a host language, the language of the UDF must match the host language.</p>
+  <section id="import-modules">
+  
+<!-- ++++++++++++++++++++ -->
+<title>Importing Modules</title>
+<p>You can import Python modules in your Python script. Pig resolves Python dependencies recursively, which means Pig will automatically ship all dependent Python modules to the backend. Python modules should be found in the jython search path: JYTHON_HOME, JYTHON_PATH, or current directory.</p>
+</section>
+
+<!-- ++++++++++++++++++++ -->
+<section id="combined-scripts">
+<title>Combined Scripts</title>
+  <p>UDFs and Pig scripts are generally stored in separate files. For testing purposes you can combine the code in a single file - a "combined" script. Note, however, if you then decide to embed this "combined" script in a host language, the language of the UDF must match the host language.</p>
  
  <p>This example combines Python and Pig. This "combined" script can only be embedded in Python.</p>
  <p>With Python you MUST use the <code>if __name__ == '__main__': </code> construct to separate UDFs and control flow. Otherwise the script will result in an error.</p>
@@ -1515,6 +1524,8 @@ if __name__ == '__main__':
 
 result = P.bind().runSingle();
  </source>
+   </section>
+
   </section>
 </section>