You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by ol...@apache.org on 2011/05/27 23:54:25 UTC

svn commit: r1128478 [1/3] - in /pig/branches/branch-0.9: ./ src/docs/src/documentation/content/xdocs/

Author: olga
Date: Fri May 27 21:54:25 2011
New Revision: 1128478

URL: http://svn.apache.org/viewvc?rev=1128478&view=rev
Log:
PIG-1772: Pig 090 Documentation (chandec via olgan)

Modified:
    pig/branches/branch-0.9/CHANGES.txt
    pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/basic.xml
    pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/cmds.xml
    pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/cont.xml
    pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/func.xml
    pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/index.xml
    pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/perf.xml
    pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/pig-index.xml
    pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/start.xml
    pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/test.xml
    pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/udf.xml

Modified: pig/branches/branch-0.9/CHANGES.txt
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.9/CHANGES.txt?rev=1128478&r1=1128477&r2=1128478&view=diff
==============================================================================
--- pig/branches/branch-0.9/CHANGES.txt (original)
+++ pig/branches/branch-0.9/CHANGES.txt Fri May 27 21:54:25 2011
@@ -36,6 +36,8 @@ IMPROVEMENTS
 
 PIG-1772: Pig 090 Documentation (chandec via olgan)
 
+PIG-1772: Pig 090 Documentation (chandec via olgan)
+
 PIG-2059: PIG doesn't validate incomplete query in batch mode even if -c option is given (xuefu)
 
 PIG-2062: Script silently ended (xuefu)

Modified: pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/basic.xml
URL: http://svn.apache.org/viewvc/pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/basic.xml?rev=1128478&r1=1128477&r2=1128478&view=diff
==============================================================================
--- pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/basic.xml (original)
+++ pig/branches/branch-0.9/src/docs/src/documentation/content/xdocs/basic.xml Fri May 27 21:54:25 2011
@@ -354,7 +354,7 @@ DUMP A;
          <p>Positional notation is generated by the system. Positional notation is indicated with the dollar sign ($) and begins with zero (0); for example, $0, $1, $2. </p>
       </li>
       <li>
-         <p>Names are assigned by you using schemas (or, in the case of the GROUP operator and some functions, by the system). You can use any name that is not a Pig keyword; for example, f1, f2, f3 or a, b, c or name, age, gpa.</p>
+         <p>Names are assigned by you using schemas (or, in the case of the GROUP operator and some functions, by the system). You can use any name that is not a Pig keyword (see <a href="#identifiers">Identifiers</a> for valid name examples).</p>
       </li>
    </ul>
    <p>Given relation A above, the three fields are separated out in this table. </p>
@@ -1071,6 +1071,7 @@ B = FOREACH A GENERATE a + (int)null;
 </source>
    </section>
    
+      <!-- ++++++++++++++++++++++++++++++++++ -->  
    <section id="nulls-ops-produce">
    <title>Operations That Produce Nulls</title>
    <p>As noted, nulls can be the result of an operation. These operations can produce null values: </p>
@@ -1118,6 +1119,8 @@ DUMP B;
    
    </section></section>
    
+   
+      <!-- ++++++++++++++++++++++++++++++++++ -->  
    <section id="nulls-load">
    <title>Nulls and Load Functions</title>
    <p>As noted, nulls can occur naturally in the data. If nulls are part of the data, it is the responsibility of the load function to handle them correctly. Keep in mind that what is considered a null value is loader-specific; however, the load function should always communicate null values to Pig by producing Java nulls.</p>
@@ -1165,6 +1168,7 @@ dump X;
 </source>
    </section>
    
+      <!-- ++++++++++++++++++++++++++++++++++ -->  
    <section id="nulls_join">
    <title>Nulls and JOIN Operator</title>
    <p>The JOIN operator - when performing inner joins - adheres to the SQL standard and disregards (filters out) null values. 
@@ -1429,13 +1433,15 @@ X = FILTER A BY (f1==8) OR (NOT (f2+f3 &
       </li>
    </ul>
 
+   <!-- ++++++++++++++++++++++++++++++++++ --> 
       <section id="fexp">
-          <title>Field expressions</title>
+          <title>Field Expressions</title>
           <p>Field expressions represent a field or a <a href="#deref">dereference operator</a> applied to a field.</p>
       </section>
 
+   <!-- ++++++++++++++++++++++++++++++++++ --> 
       <section id="sexp">
-          <title>Star expression</title>
+          <title>Star Expressions</title>
           <p>Star expressions ( * ) can be used to represent all the fields of a tuple. It is equivalent to writing out the fields explicitly. In the following example the definition of B and C are exactly the same, and MyUDF will be invoked with exactly the same arguments in both cases.</p>
           <source>
 A = LOAD 'data' USING MyStorage() AS (name:chararray, age: int);
@@ -1455,8 +1461,9 @@ C = FOREACH G GENERATE COUNT(*)
 </ul>
       </section>
 
+   <!-- ++++++++++++++++++++++++++++++++++ --> 
 <section id="prexp">
-<title>Project-Range expressions</title>
+<title>Project-Range Expressions</title>
 <p>Project-range ( .. ) expressions can be used to project a range of columns from input. For example:</p>
 <ul>
 <li>.. $x : projects columns $0 through $x, inclusive </li>
@@ -1516,29 +1523,31 @@ SORT = order IN by $2 .. $3, $6 ..; 
 </ul>
 </section>
       
-      
+      <!-- ++++++++++++++++++++++++++++++++++ -->    
       <section id="bexp">
-          <title>Boolean expressions</title>
+          <title>Boolean Expressions</title>
           <p>Boolean expressions can be made up of UDFs that return a boolean value or boolean operators 
           (see <a href="#boolops">Boolean Operators</a>). 
           </p>
       </section>
-           
+
+   <!-- ++++++++++++++++++++++++++++++++++ -->            
       <section id="texp">
-          <title>Tuple expressions</title>
+          <title>Tuple Expressions</title>
           <p>Tuple expressions form subexpressions into tuples. The tuple expression has the form (expression [, expression …]), where expression is a general expression. The simplest tuple expression is the star expression, which represents all fields.
           </p>
       </section>
 
+   <!-- ++++++++++++++++++++++++++++++++++ --> 
     <section id="gexp">
-          <title>General expressions</title>
+          <title>General Expressions</title>
           <p>General expressions can be made up of UDFs and almost any operator. Since Pig does not consider boolean a base type, the result of a general expression cannot be a boolean. Field expressions are the simpliest general expressions.
           </p>
       </section>
    </section>
    
    
-    <!-- ++++++++++++++++++++++++++++++++++ --> 
+    <!-- ================================================== --> 
    <section id="Schemas">
    <title>Schemas</title>
 
@@ -1597,19 +1606,20 @@ B = foreach A generate x+y;
  <p>If the schema of a relationship can’t be inferred, Pig will just use the runtime data as is and propagate it through the pipeline.</p>
 
 
-    
+   <!-- ++++++++++++++++++++++++++++++++++ -->     
    <section id="schema-load">
-   <title>Schemas with LOAD and STREAM Statements</title>
-   <p>With LOAD and STREAM statements, the schema following the AS keyword must be enclosed in parentheses.</p>
+   <title>Schemas with LOAD and STREAM </title>
+   <p>With LOAD and STREAM operators, the schema following the AS keyword must be enclosed in parentheses.</p>
    <p>In this example the LOAD statement includes a schema definition for simple data types.</p>
 <source>
 A = LOAD 'data' AS (f1:int, f2:int);
 </source>   
    </section>
-   
+ 
+    <!-- ++++++++++++++++++++++++++++++++++ -->   
    <section id="schemaforeach">
-   <title>Schemas with FOREACH Statements</title>
-   <p>With FOREACH statements, the schema following the AS keyword must be enclosed in parentheses when the FLATTEN operator is used. Otherwise, the schema should not be enclosed in parentheses.</p>
+   <title>Schemas with FOREACH </title>
+   <p>With FOREACH operators, the schema following the AS keyword must be enclosed in parentheses when the FLATTEN operator is used. Otherwise, the schema should not be enclosed in parentheses.</p>
    <p>In this example the FOREACH statement includes FLATTEN and a schema for simple data types.</p>
 <source>
 X = FOREACH C GENERATE FLATTEN(B) AS (f1:int, f2:int, f3:int), group;
@@ -1623,7 +1633,8 @@ X = FOREACH A GENERATE f1+f2 AS x1:int;
 X = FOREACH A GENERATE f1 as user, f2 as age, f3 as gpa;
 </source> 
    </section>
-   
+  
+     <!-- ++++++++++++++++++++++++++++++++++ -->  
    <section  id="schema-simple">
    <title>Schemas for Simple Data Types</title>
    <p>Simple data types include int, long, float, double, chararray, and bytearray.</p>
@@ -1712,11 +1723,13 @@ DUMP A;
 </source>
    
    </section></section>
-   
+ 
+    <!-- ++++++++++++++++++++++++++++++++++ -->   
    <section id="schema-complex">
    <title>Schemas for Complex Data Types</title>
    <p>Complex data types include tuples, bags, and maps.</p></section>
    
+      <!-- ++++++++++++++++++++++++++++++++++ --> 
    <section id="tuple-schema">
    <title>Tuple Schema</title>
    <p>A tuple is an ordered set of fields.</p>
@@ -1818,7 +1831,8 @@ DUMP A;
 ((2,5,8),(joe,18))
 </source>
    </section></section>
-   
+
+   <!-- ++++++++++++++++++++++++++++++++++ -->    
    <section id="bag-schema">
    <title>Bag Schema</title>
    <p>A bag is a collection of tuples.</p>
@@ -1894,6 +1908,7 @@ DUMP A;
 </source>
    </section></section>
    
+      <!-- ++++++++++++++++++++++++++++++++++ --> 
    <section id="map-schema">
    <title>Map Schema</title>
    <p>A map is a set of key value pairs.</p>
@@ -1969,6 +1984,7 @@ b: {int}
 </source>
  </section></section>
    
+      <!-- ++++++++++++++++++++++++++++++++++ --> 
    <section id="schema-multi">
    <title>Schemas for Multiple Types</title>
    <p>You can define schemas for data that includes multiple types.</p>
@@ -2827,11 +2843,12 @@ DUMP X;
    </table>
    </section></section></section>
    
-   <!-- =================================================================== --> 
-   <section id="comparison">
-   <title>Comparison Operators</title>
+<!-- =================================================================== --> 
+   <section id="boolops">
+   <title>Boolean Operators</title>
       
-    <section><title>Description</title>
+      <section>
+      <title>Description</title>
    <table>
       <tr>
             <td>
@@ -2846,10 +2863,10 @@ DUMP X;
          </tr>
          <tr>
             <td>
-               <p>equal  </p>
+               <p>AND       </p>
             </td>
             <td>
-               <p>==</p>
+               <p>and</p>
             </td>
             <td>
                <p></p>
@@ -2857,10 +2874,10 @@ DUMP X;
          </tr>
          <tr>
             <td>
-               <p>not equal </p>
+               <p>OR  </p>
             </td>
             <td>
-               <p>!=</p>
+               <p>or</p>
             </td>
             <td>
                <p></p>
@@ -2868,93 +2885,69 @@ DUMP X;
          </tr>
          <tr>
             <td>
-               <p>less than  </p>
+               <p>NOT</p>
             </td>
             <td>
-               <p>&lt;</p>
+               <p>not</p>
             </td>
             <td>
                <p></p>
             </td>
          </tr>
+   </table>
+   <p>Pig does not support a boolean data type. However, the result of a boolean expression (an expression that includes boolean and comparison operators) is always of type boolean (true or false).</p>
+   
+   <section>
+   <title>Example</title>
+<source>
+X = FILTER A BY (f1==8) OR (NOT (f2+f3 > f1));
+</source>
+   
+   </section></section></section>   
+   
+   <!-- =================================================================== -->
+   <section id="cast">
+   <title>Cast Operators</title>
+   
+   <section>
+   <title>Description</title>
+   <p>Pig Latin supports casts as shown in this table. </p>
+   <table>
          <tr>
             <td>
-               <p>greater than </p>
-            </td>
-            <td>
-               <p>&gt;</p>
-            </td>
-            <td>
                <p></p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>less than or equal to  </p>
+               <p>to </p>
             </td>
             <td>
-               <p>&lt;=</p>
+               <p></p>
             </td>
             <td>
                <p></p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>greater than or equal to</p>
+               <p></p>
             </td>
             <td>
-               <p>&gt;=</p>
+               <p></p>
             </td>
             <td>
                <p></p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>pattern matching  </p>
+               <p></p>
             </td>
             <td>
-               <p>matches</p>
+               <p></p>
             </td>
             <td>
-            <p>Takes an expression on the left and a string constant on the right.</p>
-            <p><em>expression</em> matches <em>string-constant</em></p>
-            <p>Use the Java <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html">format</a> for regular expressions.</p>
-
+               <p></p>
             </td>
          </tr>
-   </table>
-   <p>Use the comparison operators with numeric and string data.</p>
-   
-   <section>
-   <title>Example: numeric</title>
-<source>
-X = FILTER A BY (f1 == 8);
-</source>
-   </section>
-   
-   <section>
-   <title>Example: string</title>
-<source>
-X = FILTER A BY (f2 == 'apache');
-</source>
-   </section>
-   
-   <section>
-   <title>Example: matches</title>
-<source>
-X = FILTER A BY (f1 matches '.*apache.*');
-</source>
-   </section>
-   
-   <section id="types-table-equal">
-   <title>Types Table: equal (==) and not equal (!=) operators</title>
-   <p>* bytearray cast as this data type</p>
-   <table>
          <tr>
             <td>
-               <p></p>
+               <p>from </p>
             </td>
             <td>
                <p>bag </p>
@@ -2989,7 +2982,7 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p>bag </p>
             </td>
             <td>
-               <p>error </p>
+               <p></p>
             </td>
             <td>
                <p>error </p>
@@ -3021,10 +3014,10 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p>tuple </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean (see Note 1) </p>
+               <p></p>
             </td>
             <td>
                <p>error </p>
@@ -3053,14 +3046,13 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p>map </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
-               <p>(see Note 2)</p>
+               <p></p>
             </td>
             <td>
                <p>error </p>
@@ -3086,31 +3078,31 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p>int </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p></p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>yes </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>yes </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>yes </p>
             </td>
             <td>
-               <p>error </p>
+               <p>yes </p>
             </td>
             <td>
-               <p>cast as boolean </p>
+               <p>error </p>
             </td>
          </tr>
          <tr>
@@ -3118,31 +3110,31 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p>long </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p></p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>yes </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>yes </p>
             </td>
             <td>
-               <p>error </p>
+               <p>yes </p>
             </td>
             <td>
-               <p>cast as boolean </p>
+               <p>error </p>
             </td>
          </tr>
          <tr>
@@ -3150,31 +3142,31 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p>float </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p></p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>yes </p>
             </td>
             <td>
-               <p>error </p>
+               <p>yes </p>
             </td>
             <td>
-               <p>cast as boolean  </p>
+               <p>error </p>
             </td>
          </tr>
          <tr>
@@ -3182,31 +3174,31 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p>double </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p></p>
             </td>
             <td>
-               <p>error </p>
+               <p>yes </p>
             </td>
             <td>
-               <p>cast as boolean  </p>
+               <p>error </p>
             </td>
          </tr>
          <tr>
@@ -3214,31 +3206,31 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p>chararray </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>yes</p>
             </td>
             <td>
-               <p></p>
+               <p>yes</p>
             </td>
             <td>
-               <p></p>
+               <p>yes</p>
             </td>
             <td>
-               <p></p>
+               <p>yes</p>
             </td>
             <td>
-               <p>boolean </p>
+               <p></p>
             </td>
             <td>
-               <p>cast as boolean </p>
+               <p>error </p>
             </td>
          </tr>
          <tr>
@@ -3246,236 +3238,487 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p>bytearray </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
             </td>
             <td>
-               <p></p>
+               <p>yes </p>
+            </td>
+            <td>
+               <p>yes </p>
             </td>
             <td>
                <p></p>
             </td>
+         </tr>
+   </table>
+
+   <section>
+   <title>Syntax  </title>
+   <table>
+      <tr>
             <td>
-               <p>boolean </p>
+               <p>{(data_type) |  (tuple(data_type))  | (bag{tuple(data_type)}) | (map[]) } field</p>
             </td>
          </tr>
    </table>
-   <p>Note 1: boolean (Tuple A is equal to tuple B if they have the same size s, and for all 0 &lt;= i &lt; s A[i] = = B[i])</p>
-   <p>Note 2: boolean (Map A is equal to map B if A and B have the same number of entries, and for every key k1 in A with a value of v1, there is a key k2 in B with a value of v2, such that k1 = = k2 and v1 = = v2)</p>
    </section>
    
    <section>
-   <title></title>
+   <title>Terms</title>
    <table>
-         <tr>
+      <tr>
             <td>
-               <p></p>
+               <p>(data_type)</p>
             </td>
             <td>
-               <p>bag </p>
+               <p>The data type you want to cast to, enclosed in parentheses. You can cast to any data type except bytearray (see the table above).</p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>tuple </p>
+               <p>field</p>
             </td>
             <td>
-               <p>map </p>
+               <p>The field whose type you want to change. </p>
+               <p>The field can be represented by positional notation or by name (alias). For example, if f1 is the first field and type int, you can cast to type long using (long)$0 or (long)f1.</p>
             </td>
+         </tr>
+   </table>
+   </section>
+   
+   <section>
+   <title>Usage</title>
+   <p>Cast operators enable you to cast or convert data from one type to another, as long as conversion is supported (see the table above). For example, suppose you have an integer field, myint, which you want to convert to a string. You can cast this field from int to chararray using (chararray)myint.</p>
+   <p>Please note the following:</p>
+   <ul>
+      <li>
+         <p>A field can be explicitly cast. Once cast, the field remains that type (it is not automatically cast back). In this example $0 is explicitly cast to int.</p>
+<source>
+B = FOREACH A GENERATE (int)$0 + 1;
+</source>
+      </li>
+   </ul>
+   <p></p>
+   <ul>
+      <li>
+         <p>Where possible, Pig performs implicit casts. In this example $0 is cast to int (regardless of underlying data) and $1 is cast to double.</p>
+<source>
+B = FOREACH A GENERATE $0 + 1, $1 + 1.0
+</source>
+      </li>
+   </ul>
+   <ul>
+      <li>
+         <p>When two bytearrays are used in arithmetic expressions or with built in aggregate functions (such as SUM) they are implicitly cast to double. If the underlying data is really int or long, you’ll get better performance by declaring the type or explicitly casting the data.</p>
+      </li>
+      <li>
+         <p>Downcasts may cause loss of data. For example casting from long to int may drop bits.</p>
+      </li>
+   </ul>
+   </section>
+    </section>
+   
+   <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
+   <section>
+   <title>Examples</title>
+   <p>In this example an int is cast to type chararray (see relation X).</p>
+<source>
+A = LOAD 'data' AS (f1:int,f2:int,f3:int);
+
+DUMP A;
+(1,2,3)
+(4,2,1)
+(8,3,4)
+(4,3,3)
+(7,2,5)
+(8,4,3)
+
+B = GROUP A BY f1;
+
+DUMP B;
+(1,{(1,2,3)})
+(4,{(4,2,1),(4,3,3)})
+(7,{(7,2,5)})
+(8,{(8,3,4),(8,4,3)})
+
+DESCRIBE B;
+B: {group: int,A: {f1: int,f2: int,f3: int}}
+
+X = FOREACH B GENERATE group, (chararray)COUNT(A) AS total;
+(1,1)
+(4,2)
+(7,1)
+(8,2)
+
+DESCRIBE X;
+X: {group: int,total: chararray}
+</source>
+   
+   
+<p>In this example a bytearray (fld in relation A) is cast to type tuple.</p>
+<source>
+cat data;
+(1,2,3)
+(4,2,1)
+(8,3,4)
+
+A = LOAD 'data' AS fld:bytearray;
+
+DESCRIBE A;
+a: {fld: bytearray}
+
+DUMP A;
+((1,2,3))
+((4,2,1))
+((8,3,4))
+
+B = FOREACH A GENERATE (tuple(int,int,float))fld;
+
+DESCRIBE B;
+b: {(int,int,float)}
+
+DUMP B;
+((1,2,3))
+((4,2,1))
+((8,3,4))
+</source>
+   
+   <p>In this example a bytearray (fld in relation A) is cast to type bag.</p>
+<source>
+cat data;
+{(4829090493980522200L)}
+{(4893298569862837493L)}
+{(1297789302897398783L)}
+
+A = LOAD 'data' AS fld:bytearray;
+
+DESCRIBE A;
+A: {fld: bytearray}
+
+DUMP A;
+({(4829090493980522200L)})
+({(4893298569862837493L)})
+({(1297789302897398783L)})
+
+B = FOREACH A GENERATE (bag{tuple(long)})fld; 
+
+DESCRIBE B;
+B: {{(long)}}
+
+DUMP B;
+({(4829090493980522200L)})
+({(4893298569862837493L)})
+({(1297789302897398783L)})
+</source>
+
+
+   <p>In this example a bytearray (fld in relation A) is cast to type map.</p>
+<source>
+cat data;
+[open#apache]
+[apache#hadoop]
+[hadoop#pig]
+[pig#grunt]
+
+A = LOAD 'data' AS fld:bytearray;
+
+DESCRIBE A;
+A: {fld: bytearray}
+
+DUMP A;
+([open#apache])
+([apache#hadoop])
+([hadoop#pig])
+([pig#grunt])
+
+B = FOREACH A GENERATE ((map[])fld;
+
+DESCRIBE B;
+B: {map[ ]}
+
+DUMP B;
+([open#apache])
+([apache#hadoop])
+([hadoop#pig])
+([pig#grunt])
+</source>
+   
+</section>
+
+<!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
+   <section id="cast-relations">
+   <title>Casting Relations to Scalars</title>
+<p>Pig allows you to cast the elements of a single-tuple relation into a scalar value. 
+The tuple can be a single-field or multi-field tulple. 
+If the relation contains more than one tuple, however, a runtime error is generated: "Scalar has more than one row in the output". 
+</p>
+
+<p>The cast relation can be used in any place where an expression of the type would make sense, including FOREACH, FILTER, and SPLIT. Note that if an explicit cast is not used an implict cast will be inserted according to Pig rules. Also, when the schema can't be inferred bytearray is used.</p>  
+ 
+<p>The primary use case for casting relations to scalars is the ability to use the values of global aggregates in follow up computations. </p> 
+ 
+<p>In this example the percentage of clicks belonging to a particular user are computed. For the FOREACH statement, an explicit cast if used. If the SUM is not given a name, a position can be used as well (userid, clicks/(double)C.$0). </p>
+
+<source>
+A = load 'mydata' as (userid, clicks); 
+B = group A all; 
+C = foreach B genertate SUM(A.clicks) as total; 
+D = foreach A generate userid, clicks/(double)C.total; 
+dump D;
+</source>
+   
+<p>In this example a multi-field tuple is used. For the FILTER statement, Pig performs an implicit cast. For the FOREACH statement, 
+an explicit cast is used.</p>
+<source>
+A = load 'mydata' as (userid, clicks); 
+B = group A all; 
+C = foreach B genertate SUM(A.clicks) as total, COUNT(A) as cnt; 
+D = FILTER A by clicks > C.total/3 
+E = foreach D generate userid, clicks/(double)C.total, cnt; 
+dump E; 
+</source>
+</section>
+</section>
+   
+   <!-- =================================================================== --> 
+   <section id="comparison">
+   <title>Comparison Operators</title>
+      
+    <section><title>Description</title>
+   <table>
+      <tr>
             <td>
-               <p>int </p>
+               <p>Operator</p>
             </td>
             <td>
-               <p>long </p>
+               <p>Symbol</p>
             </td>
             <td>
-               <p>float </p>
+               <p> Notes</p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>double </p>
+               <p>equal  </p>
             </td>
             <td>
-               <p>chararray </p>
+               <p>==</p>
             </td>
             <td>
-               <p>bytearray </p>
+               <p></p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>bag </p>
-            </td>
-            <td>
-               <p>error </p>
+               <p>not equal </p>
             </td>
             <td>
-               <p>error </p>
+               <p>!=</p>
             </td>
             <td>
-               <p>error </p>
+               <p></p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>error </p>
+               <p>less than  </p>
             </td>
             <td>
-               <p>error </p>
+               <p>&lt;</p>
             </td>
             <td>
-               <p>error </p>
+               <p></p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>error </p>
+               <p>greater than </p>
             </td>
             <td>
-               <p>error </p>
+               <p>&gt;</p>
             </td>
             <td>
-               <p>error </p>
+               <p></p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>tuple </p>
-            </td>
-            <td>
-               <p></p>
+               <p>less than or equal to  </p>
             </td>
             <td>
-               <p>error </p>
+               <p>&lt;=</p>
             </td>
             <td>
-               <p>error </p>
+               <p></p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>error </p>
+               <p>greater than or equal to</p>
             </td>
             <td>
-               <p>error </p>
+               <p>&gt;=</p>
             </td>
             <td>
-               <p>error </p>
+               <p></p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>error </p>
+               <p>pattern matching  </p>
             </td>
             <td>
-               <p>error </p>
+               <p>matches</p>
             </td>
             <td>
-               <p>error </p>
+            <p>Takes an expression on the left and a string constant on the right.</p>
+            <p><em>expression</em> matches <em>string-constant</em></p>
+            <p>Use the Java <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html">format</a> for regular expressions.</p>
+
             </td>
          </tr>
+   </table>
+   <p>Use the comparison operators with numeric and string data.</p>
+   
+   <section>
+   <title>Example: numeric</title>
+<source>
+X = FILTER A BY (f1 == 8);
+</source>
+   </section>
+   
+   <section>
+   <title>Example: string</title>
+<source>
+X = FILTER A BY (f2 == 'apache');
+</source>
+   </section>
+   
+   <section>
+   <title>Example: matches</title>
+<source>
+X = FILTER A BY (f1 matches '.*apache.*');
+</source>
+   </section>
+   
+   <section id="types-table-equal">
+   <title>Types Table: equal (==) and not equal (!=) operators</title>
+   <p>* bytearray cast as this data type</p>
+   <table>
          <tr>
             <td>
-               <p>map </p>
+               <p></p>
             </td>
             <td>
-               <p></p>
+               <p>bag </p>
             </td>
             <td>
-               <p></p>
+               <p>tuple </p>
             </td>
             <td>
-               <p>error </p>
+               <p>map </p>
             </td>
             <td>
-               <p>error </p>
+               <p>int </p>
             </td>
             <td>
-               <p>error </p>
+               <p>long </p>
             </td>
             <td>
-               <p>error </p>
+               <p>float </p>
             </td>
             <td>
-               <p>error </p>
+               <p>double </p>
             </td>
             <td>
-               <p>error </p>
+               <p>chararray </p>
             </td>
             <td>
-               <p>error </p>
+               <p>bytearray </p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>int </p>
+               <p>bag </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>error </p>
             </td>
             <td>
                <p>error </p>
             </td>
             <td>
-               <p>boolean (bytearray cast as int) </p>
+               <p>error </p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>long </p>
+               <p>tuple </p>
             </td>
             <td>
                <p></p>
             </td>
             <td>
-               <p></p>
+               <p>boolean (see Note 1) </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>error </p>
             </td>
             <td>
                <p>error </p>
             </td>
             <td>
-               <p>boolean (bytearray cast as long) </p>
+               <p>error </p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>float </p>
+               <p>map </p>
             </td>
             <td>
                <p></p>
@@ -3484,30 +3727,31 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p></p>
             </td>
             <td>
-               <p></p>
+               <p>boolean </p>
+               <p>(see Note 2)</p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>error </p>
             </td>
             <td>
                <p>error </p>
             </td>
             <td>
-               <p>boolean (bytearray cast as float) </p>
+               <p>error </p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>double </p>
+               <p>int </p>
             </td>
             <td>
                <p></p>
@@ -3519,13 +3763,13 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p></p>
             </td>
             <td>
-               <p></p>
+               <p>boolean </p>
             </td>
             <td>
-               <p></p>
+               <p>boolean </p>
             </td>
             <td>
-               <p></p>
+               <p>boolean </p>
             </td>
             <td>
                <p>boolean </p>
@@ -3534,12 +3778,12 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p>error </p>
             </td>
             <td>
-               <p>boolean (bytearray cast as double) </p>
+               <p>cast as boolean </p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>chararray </p>
+               <p>long </p>
             </td>
             <td>
                <p></p>
@@ -3554,33 +3798,24 @@ X = FILTER A BY (f1 matches '.*apache.*'
                <p></p>
             </td>
             <td>
-               <p></p>
+               <p>boolean </p>
             </td>
             <td>
-               <p></p>
+               <p>boolean </p>
             </td>
             <td>
-               <p></p>
+               <p>boolean </p>
             </td>
             <td>
-               <p>boolean </p>
+               <p>error </p>
             </td>
             <td>
-               <p>boolean (bytearray cast as chararray) </p>
+               <p>cast as boolean </p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>bytearray </p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p></p>
+               <p>float </p>
             </td>
             <td>
                <p></p>
@@ -3600,435 +3835,213 @@ X = FILTER A BY (f1 matches '.*apache.*'
             <td>
                <p>boolean </p>
             </td>
-         </tr>
-   </table>
-   </section>
-   
-   <section id="types-table-matches">
-   <title>Types Table: matches operator</title>
-   <p>*Cast as chararray (the second argument must be chararray)</p>
-   <table>
-         <tr>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p>chararray </p>
-            </td>
-            <td>
-               <p>bytearray* </p>
-            </td>
-         </tr>
-         <tr>
-            <td>
-               <p>chararray </p>
-            </td>
-            <td>
-               <p>boolean </p>
-            </td>
-            <td>
-               <p>boolean  </p>
-            </td>
-         </tr>
-         <tr>
-            <td>
-               <p>bytearray </p>
-            </td>
-            <td>
-               <p>boolean</p>
-            </td>
             <td>
                <p>boolean </p>
             </td>
-         </tr>
-   </table>
-   </section></section></section>
-   
-   <section id="null_operators">
-   <title>Null Operators</title>
-     
-   <section>
-   <title>Description</title>
-   <table>
-      <tr>
-            <td>
-               <p>Operator</p>
-            </td>
             <td>
-               <p>Symbol</p>
+               <p>error </p>
             </td>
             <td>
-               <p> Notes</p>
+               <p>cast as boolean  </p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>is null </p>
-            </td>
-            <td>
-               <p>is null</p>
+               <p>double </p>
             </td>
             <td>
                <p></p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>is not null  </p>
-            </td>
-            <td>
-               <p>is not null  </p>
+               <p></p>
             </td>
             <td>
                <p></p>
             </td>
-         </tr>
-   </table>
-   <p></p>
-   <p>For a detailed discussion of nulls see <a href="#nulls">Nulls and Pig Latin</a>.</p>
-   <section>
-   <title>Example</title>
-<source>
-X = FILTER A BY f1 is not null;
-</source>
-   </section></section>
-   
-   <section id="types-table-nulls">
-   <title>Types Table</title>
-   <p>The null operators can be applied to all data types (see <a  href="#nulls">Nulls and Pig Latin</a>). </p>
-   </section>
-   </section>
-   
-  <!-- =================================================================== --> 
-   <section id="boolops">
-   <title>Boolean Operators</title>
-      
-      <section>
-      <title>Description</title>
-   <table>
-      <tr>
             <td>
-               <p>Operator</p>
+               <p></p>
             </td>
             <td>
-               <p>Symbol</p>
+               <p></p>
             </td>
             <td>
-               <p> Notes</p>
+               <p></p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>AND       </p>
+               <p>boolean </p>
             </td>
             <td>
-               <p>and</p>
+               <p>error </p>
             </td>
             <td>
-               <p></p>
+               <p>cast as boolean  </p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>OR  </p>
+               <p>chararray </p>
             </td>
             <td>
-               <p>or</p>
+               <p></p>
             </td>
             <td>
                <p></p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>NOT</p>
+               <p></p>
             </td>
             <td>
-               <p>not</p>
+               <p></p>
             </td>
             <td>
                <p></p>
             </td>
-         </tr>
-   </table>
-   <p>Pig does not support a boolean data type. However, the result of a boolean expression (an expression that includes boolean and comparison operators) is always of type boolean (true or false).</p>
-   
-   <section>
-   <title>Example</title>
-<source>
-X = FILTER A BY (f1==8) OR (NOT (f2+f3 > f1));
-</source>
-   
-   </section></section></section>
-
-
-  <!-- =================================================================== -->    
-   <section id="deref">
-   <title>Dereference Operators</title>
-   
-   <section>
-   <title>Description</title>
-   <table>
-      <tr>
             <td>
-               <p>Operator</p>
+               <p></p>
             </td>
             <td>
-               <p>Symbol</p>
+               <p></p>
             </td>
             <td>
-               <p> Notes</p>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>cast as boolean </p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>tuple dereference      </p>
+               <p>bytearray </p>
             </td>
             <td>
-               <p>tuple.id or tuple.(id,…)</p>
+               <p></p>
             </td>
             <td>
-               <p>Tuple dereferencing can be done by name (tuple.field_name) or position (mytuple.$0). If a set of fields are dereferenced (tuple.(name1, name2) or tuple.($0, $1)), the expression represents a tuple composed of the specified fields. Note that if the dot operator is applied to a bytearray, the bytearray will be assumed to be a tuple.</p>
+               <p></p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>bag dereference</p>
+               <p></p>
             </td>
             <td>
-               <p>bag.id or bag.(id,…)</p>
+               <p></p>
             </td>
             <td>
-               <p>Bag dereferencing can be done by name (bag.field_name) or position (bag.$0). If a set of fields are dereferenced (bag.(name1, name2) or bag.($0, $1)), the expression represents a bag composed of the specified fields.</p>
+               <p></p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>map dereference</p>
+               <p></p>
             </td>
             <td>
-               <p>map#'key'</p>
+               <p></p>
             </td>
             <td>
-               <p>Map dereferencing must be done by key (field_name#key or $0#key). If the pound operator is applied to a bytearray, the bytearray is assumed to be a map. If the key does not exist, the empty string is returned.</p>
+               <p></p>
             </td>
-         </tr>
-   </table>
-   
-   <section>
-   <title>Example: Tuple</title>
-   <p>Suppose we have relation A.</p>
-
-<source>
-LOAD 'data' as (f1:int, f2:tuple(t1:int,t2:int,t3:int));
-
-DUMP A;
-(1,(1,2,3))
-(2,(4,5,6))
-(3,(7,8,9))
-(4,(1,4,7))
-(5,(2,5,8))
-</source>
-   
-   <p>In this example dereferencing is used to retrieve two fields from tuple f2.</p>
-<source>
-X = FOREACH A GENERATE f2.t1,f2.t3;
-
-DUMP X;
-(1,3)
-(4,6)
-(7,9)
-(1,7)
-(2,8)
-</source>
-   </section>
-   
-   <section>
-   <title>Example: Bag</title>
-   <p>Suppose we have relation B, formed by grouping relation A (see the GROUP operator for information about the field names in relation B).</p>
-<source>
-A = LOAD 'data' AS (f1:int, f2:int,f3:int);
-
-DUMP A;
-(1,2,3)
-(4,2,1)
-(8,3,4)
-(4,3,3)
-(7,2,5)
-(8,4,3)
-
-B = GROUP A BY f1;
-
-DUMP B;
-(1,{(1,2,3)})
-(4,{(4,2,1),(4,3,3)})
-(7,{(7,2,5)})
-(8,{(8,3,4),(8,4,3)})
-
-ILLUSTRATE B;
-<em>etc …</em>
-----------------------------------------------------------
-| b   | group: int | a: bag({f1: int,f2: int,f3: int}) |
-----------------------------------------------------------
-</source>
-   
-<p>In this example dereferencing is used with relation X to project the first field (f1) of each tuple in the bag (a).</p>
-<source>
-X = FOREACH B GENERATE a.f1;
-
-DUMP X;
-({(1)})
-({(4),(4)})
-({(7)})
-({(8),(8)})
-</source>
-   
-   </section>
-   
-   <section>
-   <title>Example: Tuple and Bag</title>
-    <p>Suppose we have relation B, formed by grouping relation A  (see the GROUP operator for information about the field names in relation B).</p>
-
-   
-<source>
-A = LOAD 'data' AS (f1:int, f2:int, f3:int);
-
-DUMP A;
-(1,2,3)
-(4,2,1)
-(8,3,4)
-(4,3,3)
-(7,2,5)
-(8,4,3)
-
-B = GROUP A BY (f1,f2);
-
-DUMP B;
-((1,2),{(1,2,3)})
-((4,2),{(4,2,1)})
-((4,3),{(4,3,3)})
-((7,2),{(7,2,5)})
-((8,3),{(8,3,4)})
-((8,4),{(8,4,3)})
-
-ILLUSTRATE B;
-<em>etc …</em>
--------------------------------------------------------------------------------
-| b     | group: tuple({f1: int,f2: int}) | a: bag({f1: int,f2: int,f3: int}) |
--------------------------------------------------------------------------------
-|       | (8, 3)                                | {(8, 3, 4), (8, 3, 4)} |
--------------------------------------------------------------------------------
-</source>   
-   
-   <p>In this example dereferencing is used to project a field (f1) from a tuple (group) and a field (f1) from a bag (a).</p>
-<source>
-X = FOREACH B GENERATE group.f1, a.f1;
-
-DUMP X;
-(1,{(1)})
-(4,{(4)})
-(4,{(4)})
-(7,{(7)})
-(8,{(8)})
-(8,{(8)})
-</source>
-   </section>
-   
-   <section>
-   <title>Example: Map</title>
-   <p>Suppose we have relation A. </p>
-
-<source>
-A = LOAD 'data' AS (f1:int, f2:map[]);
-
-DUMP A;
-(1,[open#apache])
-(2,[apache#hadoop])
-(3,[hadoop#pig])
-(4,[pig#grunt])
-</source>
-
-   <p>In this example dereferencing is used to look up the value of key 'open'.</p>
-
-<source>
-X = FOREACH A GENERATE f2#'open';
-
-DUMP X;
-(apache)
-()
-()
-()
-</source>
-  </section></section></section>
-   
-  <!-- =================================================================== -->    
-   <section id="sign">
-   <title>Sign Operators</title>
+            <td>
+               <p>boolean </p>
+            </td>
+         </tr>
+   </table>
+   <p>Note 1: boolean (Tuple A is equal to tuple B if they have the same size s, and for all 0 &lt;= i &lt; s A[i] = = B[i])</p>
+   <p>Note 2: boolean (Map A is equal to map B if A and B have the same number of entries, and for every key k1 in A with a value of v1, there is a key k2 in B with a value of v2, such that k1 = = k2 and v1 = = v2)</p>
+   </section>
    
    <section>
-   <title>Description</title>
+   <title></title>
    <table>
-      <tr>
+         <tr>
             <td>
-               <p>Operator</p>
+               <p></p>
             </td>
             <td>
-               <p>Symbol</p>
+               <p>bag </p>
             </td>
             <td>
-               <p> Notes</p>
+               <p>tuple </p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>positive       </p>
+               <p>map </p>
             </td>
             <td>
-               <p>+</p>
+               <p>int </p>
             </td>
             <td>
-               <p> Has no effect.</p>
+               <p>long </p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>negative (negation)</p>
+               <p>float </p>
             </td>
             <td>
-               <p> -</p>
+               <p>double </p>
             </td>
             <td>
-               <p> Changes the sign of a positive or negative number.</p>
+               <p>chararray </p>
+            </td>
+            <td>
+               <p>bytearray </p>
             </td>
          </tr>
-   </table>
-   
-   <section>
-   <title>Example</title>
-<source>
-A = LOAD 'data' as (x, y, z);
-
-B = FOREACH A GENERATE -x, y;
-</source>
-   
-   </section>
-   
-   <section id="types-table-negative">
-   <title>Types Table: negative ( - ) operator</title>
-   <table>
-      <tr>
+         <tr>
             <td>
                <p>bag </p>
             </td>
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
          </tr>
          <tr>
             <td>
                <p>tuple </p>
             </td>
             <td>
+               <p></p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
                <p>error </p>
             </td>
          </tr>
@@ -4037,6 +4050,30 @@ B = FOREACH A GENERATE -x, y;
                <p>map </p>
             </td>
             <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
                <p>error </p>
             </td>
          </tr>
@@ -4045,7 +4082,31 @@ B = FOREACH A GENERATE -x, y;
                <p>int </p>
             </td>
             <td>
-               <p>int </p>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>boolean (bytearray cast as int) </p>
             </td>
          </tr>
          <tr>
@@ -4053,7 +4114,31 @@ B = FOREACH A GENERATE -x, y;
                <p>long </p>
             </td>
             <td>
-               <p>long </p>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>boolean (bytearray cast as long) </p>
             </td>
          </tr>
          <tr>
@@ -4061,85 +4146,68 @@ B = FOREACH A GENERATE -x, y;
                <p>float </p>
             </td>
             <td>
-               <p>float </p>
+               <p></p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>double </p>
+               <p></p>
             </td>
             <td>
-               <p>double </p>
+               <p></p>
             </td>
-         </tr>
-         <tr>
             <td>
-               <p>chararray </p>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>boolean </p>
             </td>
             <td>
                <p>error </p>
             </td>
+            <td>
+               <p>boolean (bytearray cast as float) </p>
+            </td>
          </tr>
          <tr>
             <td>
-               <p>bytearray </p>
+               <p>double </p>
             </td>
             <td>
-               <p>double (as double) </p>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
+            </td>
+            <td>
+               <p></p>
             </td>
-         </tr>
-   </table>
-   </section></section></section>
-  
-    <!-- =================================================================== -->  
-   <section  id="flatten">
-   <title>Flatten Operator</title>
-   <p>The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples 
-   and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and 
-   result is different for each type of structure.</p>
-
-   <p>For tuples, flatten substitutes the fields of a tuple in place of the tuple. For example, consider a relation that has a tuple 
-   of the form (a, (b, c)). The expression GENERATE $0, flatten($1), will cause that tuple to become (a, b, c).</p>
-
-   <p>For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a 
-   relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two 
-   tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen. 
-   For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. 
-   If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e).</p>
-   
-   <p>Also note that the flatten of empty bag will result in that row being discarded; no output is generated. 
-   (See also <a href="perf.html#nulls">Drop Nulls Before a Join</a>.) </p>
-   
-   <source>
-grunt> cat empty.bag
-{}      1
-grunt> A = LOAD 'empty.bag' AS (b : bag{}, i : int);
-grunt> B = FOREACH A GENERATE flatten(b), i;
-grunt> DUMP B;
-grunt>
-</source>
-   
-   <p>For examples using the FLATTEN operator, see <a href="#FOREACH">FOREACH</a>.</p>
-   </section>
-
-
-   <!-- =================================================================== -->
-   <section id="cast">
-   <title>Cast Operators</title>
-   
-   <section>
-   <title>Description</title>
-   <p>Pig Latin supports casts as shown in this table. </p>
-   <table>
-         <tr>
             <td>
                <p></p>
             </td>
             <td>
-               <p>to </p>
+               <p></p>
             </td>
             <td>
-               <p></p>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>error </p>
+            </td>
+            <td>
+               <p>boolean (bytearray cast as double) </p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>chararray </p>
             </td>
             <td>
                <p></p>
@@ -4162,290 +4230,449 @@ grunt>
             <td>
                <p></p>
             </td>
+            <td>
+               <p>boolean </p>
+            </td>
+            <td>
+               <p>boolean (bytearray cast as chararray) </p>
+            </td>
          </tr>
          <tr>
             <td>
-               <p>from </p>
+               <p>bytearray </p>
             </td>
             <td>
-               <p>bag </p>
+               <p></p>
             </td>
             <td>
-               <p>tuple </p>
+               <p></p>
             </td>
             <td>
-               <p>map </p>
+               <p></p>
             </td>
             <td>
-               <p>int </p>
+               <p></p>
             </td>
             <td>
-               <p>long </p>
+               <p></p>
             </td>
             <td>
-               <p>float </p>
+               <p></p>
             </td>
             <td>
-               <p>double </p>
+               <p></p>
             </td>
             <td>
-               <p>chararray </p>
+               <p></p>
             </td>
             <td>
-               <p>bytearray </p>
+               <p>boolean </p>
             </td>
          </tr>
+   </table>
+   </section>
+   
+   <section id="types-table-matches">
+   <title>Types Table: matches operator</title>
+   <p>*Cast as chararray (the second argument must be chararray)</p>
+   <table>
          <tr>
             <td>
-               <p>bag </p>
+               <p></p>
             </td>
             <td>
-               <p></p>
+               <p>chararray </p>
             </td>
             <td>
-               <p>error </p>
+               <p>bytearray* </p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>error </p>
+               <p>chararray </p>
             </td>
             <td>
-               <p>error </p>
+               <p>boolean </p>
             </td>
             <td>
-               <p>error </p>
+               <p>boolean  </p>
+            </td>
+         </tr>
+         <tr>
+            <td>
+               <p>bytearray </p>
             </td>
             <td>
-               <p>error </p>
+               <p>boolean</p>
             </td>
             <td>
-               <p>error </p>
+               <p>boolean </p>
+            </td>
+         </tr>
+   </table>
+   </section></section></section>
+   
+
+  <!-- =================================================================== -->    
+   <section id="deref">
+   <title>Dereference Operators</title>
+   
+   <section>
+   <title>Description</title>
+   <table>
+      <tr>
+            <td>
+               <p>Operator</p>
             </td>
             <td>
-               <p>error </p>
+               <p>Symbol</p>
             </td>
             <td>
-               <p>error </p>
+               <p> Notes</p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>tuple </p>
+               <p>tuple dereference      </p>
             </td>
             <td>
-               <p>error </p>
+               <p>tuple.id or tuple.(id,…)</p>
             </td>
             <td>
-               <p></p>
+               <p>Tuple dereferencing can be done by name (tuple.field_name) or position (mytuple.$0). If a set of fields are dereferenced (tuple.(name1, name2) or tuple.($0, $1)), the expression represents a tuple composed of the specified fields. Note that if the dot operator is applied to a bytearray, the bytearray will be assumed to be a tuple.</p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>error </p>
+               <p>bag dereference</p>
             </td>
             <td>
-               <p>error </p>
+               <p>bag.id or bag.(id,…)</p>
             </td>
             <td>
-               <p>error </p>
+               <p>Bag dereferencing can be done by name (bag.field_name) or position (bag.$0). If a set of fields are dereferenced (bag.(name1, name2) or bag.($0, $1)), the expression represents a bag composed of the specified fields.</p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>error </p>
+               <p>map dereference</p>
+            </td>
+            <td>
+               <p>map#'key'</p>
             </td>
             <td>
-               <p>error </p>
+               <p>Map dereferencing must be done by key (field_name#key or $0#key). If the pound operator is applied to a bytearray, the bytearray is assumed to be a map. If the key does not exist, the empty string is returned.</p>
+            </td>
+         </tr>
+   </table>
+   
+   <section>
+   <title>Example: Tuple</title>
+   <p>Suppose we have relation A.</p>
+
+<source>
+LOAD 'data' as (f1:int, f2:tuple(t1:int,t2:int,t3:int));
+
+DUMP A;
+(1,(1,2,3))
+(2,(4,5,6))
+(3,(7,8,9))
+(4,(1,4,7))
+(5,(2,5,8))
+</source>
+   
+   <p>In this example dereferencing is used to retrieve two fields from tuple f2.</p>
+<source>
+X = FOREACH A GENERATE f2.t1,f2.t3;
+
+DUMP X;
+(1,3)
+(4,6)
+(7,9)
+(1,7)
+(2,8)
+</source>
+   </section>
+   
+   <section>
+   <title>Example: Bag</title>
+   <p>Suppose we have relation B, formed by grouping relation A (see the GROUP operator for information about the field names in relation B).</p>
+<source>
+A = LOAD 'data' AS (f1:int, f2:int,f3:int);
+
+DUMP A;
+(1,2,3)
+(4,2,1)
+(8,3,4)
+(4,3,3)
+(7,2,5)
+(8,4,3)
+
+B = GROUP A BY f1;
+
+DUMP B;
+(1,{(1,2,3)})
+(4,{(4,2,1),(4,3,3)})
+(7,{(7,2,5)})
+(8,{(8,3,4),(8,4,3)})
+
+ILLUSTRATE B;
+<em>etc …</em>
+----------------------------------------------------------
+| b   | group: int | a: bag({f1: int,f2: int,f3: int}) |
+----------------------------------------------------------
+</source>
+   
+<p>In this example dereferencing is used with relation X to project the first field (f1) of each tuple in the bag (a).</p>
+<source>
+X = FOREACH B GENERATE a.f1;
+
+DUMP X;
+({(1)})
+({(4),(4)})
+({(7)})
+({(8),(8)})
+</source>
+   
+   </section>
+   
+   <section>
+   <title>Example: Tuple and Bag</title>
+    <p>Suppose we have relation B, formed by grouping relation A  (see the GROUP operator for information about the field names in relation B).</p>
+
+   
+<source>
+A = LOAD 'data' AS (f1:int, f2:int, f3:int);
+
+DUMP A;
+(1,2,3)
+(4,2,1)
+(8,3,4)
+(4,3,3)
+(7,2,5)
+(8,4,3)
+
+B = GROUP A BY (f1,f2);
+
+DUMP B;
+((1,2),{(1,2,3)})
+((4,2),{(4,2,1)})
+((4,3),{(4,3,3)})
+((7,2),{(7,2,5)})
+((8,3),{(8,3,4)})
+((8,4),{(8,4,3)})
+
+ILLUSTRATE B;
+<em>etc …</em>
+-------------------------------------------------------------------------------
+| b     | group: tuple({f1: int,f2: int}) | a: bag({f1: int,f2: int,f3: int}) |
+-------------------------------------------------------------------------------
+|       | (8, 3)                                | {(8, 3, 4), (8, 3, 4)} |
+-------------------------------------------------------------------------------
+</source>   
+   
+   <p>In this example dereferencing is used to project a field (f1) from a tuple (group) and a field (f1) from a bag (a).</p>
+<source>
+X = FOREACH B GENERATE group.f1, a.f1;
+
+DUMP X;
+(1,{(1)})
+(4,{(4)})
+(4,{(4)})
+(7,{(7)})
+(8,{(8)})
+(8,{(8)})
+</source>
+   </section>
+   
+   <section>
+   <title>Example: Map</title>
+   <p>Suppose we have relation A. </p>
+
+<source>
+A = LOAD 'data' AS (f1:int, f2:map[]);
+
+DUMP A;
+(1,[open#apache])
+(2,[apache#hadoop])
+(3,[hadoop#pig])
+(4,[pig#grunt])
+</source>
+
+   <p>In this example dereferencing is used to look up the value of key 'open'.</p>
+
+<source>
+X = FOREACH A GENERATE f2#'open';
+
+DUMP X;
+(apache)
+()
+()
+()
+</source>
+  </section></section></section>
+  
+  <!-- =================================================================== -->    
+<section id="disambiguate">
+<title>Disambiguate Operator</title>
+
+<p>Use the disambiguate operator ( :: ) to identify field names after JOIN, COGROUP, CROSS, or FLATTEN operators.</p>
+
+<p>In this example, to disambiguate y,  use A::y or B::y.  In cases where there is no ambiguity, such as z, the :: is not necessary but is still supported.</p>
+
+<source>
+A = load 'data1' as (x, y);
+B = load 'data2' as (x, y, z);
+C = join A by x, B by x;
+D = foreach C generate y; -- which y?
+</source>
+</section>
+
+    <!-- =================================================================== -->  
+   <section  id="flatten">
+   <title>Flatten Operator</title>
+   <p>The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples 
+   and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and 
+   result is different for each type of structure.</p>
+
+   <p>For tuples, flatten substitutes the fields of a tuple in place of the tuple. For example, consider a relation that has a tuple 
+   of the form (a, (b, c)). The expression GENERATE $0, flatten($1), will cause that tuple to become (a, b, c).</p>
+
+   <p>For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a 
+   relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two 
+   tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen. 
+   For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. 
+   If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e).</p>
+   
+   <p>Also note that the flatten of empty bag will result in that row being discarded; no output is generated. 
+   (See also <a href="perf.html#nulls">Drop Nulls Before a Join</a>.) </p>
+   
+   <source>
+grunt> cat empty.bag
+{}      1
+grunt> A = LOAD 'empty.bag' AS (b : bag{}, i : int);
+grunt> B = FOREACH A GENERATE flatten(b), i;
+grunt> DUMP B;
+grunt>
+</source>
+   
+   <p>For examples using the FLATTEN operator, see <a href="#flatten-example">FOREACH</a>.</p>
+   </section>
+
+  <!-- =================================================================== -->    
+<section id="null_operators">
+   <title>Null Operators</title>
+     
+   <section>
+   <title>Description</title>
+   <table>
+      <tr>
+            <td>
+               <p>Operator</p>
             </td>
             <td>
-               <p>error </p>
+               <p>Symbol</p>
             </td>
             <td>
-               <p>error </p>
+               <p> Notes</p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>map </p>
-            </td>
-            <td>
-               <p>error </p>
+               <p id="is-null">is null </p>
             </td>
             <td>
-               <p>error </p>
+               <p>is null</p>
             </td>
             <td>
                <p></p>
             </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
          </tr>
          <tr>
             <td>
-               <p>int </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
+               <p id="is-not-null">is not null  </p>
             </td>
             <td>
-               <p>error </p>
+               <p>is not null  </p>
             </td>
             <td>
                <p></p>
             </td>
-            <td>
-               <p>yes </p>
-            </td>
-            <td>
-               <p>yes </p>
-            </td>
-            <td>
-               <p>yes </p>
-            </td>
-            <td>
-               <p>yes </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
          </tr>
-         <tr>
-            <td>
-               <p>long </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>yes </p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p>yes </p>
-            </td>
+   </table>
+   <p></p>
+   <p>For a detailed discussion of nulls see <a href="#nulls">Nulls and Pig Latin</a>.</p>
+   <section>
+   <title>Example</title>
+<source>
+X = FILTER A BY f1 is not null;
+</source>
+   </section></section>
+   
+   <section id="types-table-nulls">
+   <title>Types Table</title>
+   <p>The null operators can be applied to all data types (see <a  href="#nulls">Nulls and Pig Latin</a>). </p>
+   </section>
+   </section>
+   
+  <!-- =================================================================== -->    
+   <section id="sign">
+   <title>Sign Operators</title>
+   
+   <section>
+   <title>Description</title>
+   <table>
+      <tr>
             <td>
-               <p>yes </p>
+               <p>Operator</p>
             </td>
             <td>
-               <p>yes </p>
+               <p>Symbol</p>
             </td>
             <td>
-               <p>error </p>
+               <p> Notes</p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>float </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>yes </p>
-            </td>
-            <td>
-               <p>yes </p>
-            </td>
-            <td>
-               <p></p>
-            </td>
-            <td>
-               <p>yes </p>
+               <p>positive       </p>
             </td>
             <td>
-               <p>yes </p>
+               <p>+</p>
             </td>
             <td>
-               <p>error </p>
+               <p> Has no effect.</p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>double </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>yes </p>
-            </td>
-            <td>
-               <p>yes </p>
-            </td>
-            <td>
-               <p>yes </p>
-            </td>
-            <td>
-               <p></p>
+               <p>negative (negation)</p>
             </td>
             <td>
-               <p>yes </p>
+               <p> -</p>
             </td>
             <td>
-               <p>error </p>
+               <p> Changes the sign of a positive or negative number.</p>
             </td>
          </tr>
-         <tr>
-            <td>
-               <p>chararray </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>error </p>
-            </td>
-            <td>
-               <p>yes</p>
-            </td>
-            <td>
-               <p>yes</p>
-            </td>
-            <td>
-               <p>yes</p>
-            </td>
-            <td>
-               <p>yes</p>
-            </td>
+   </table>
+   
+   <section>
+   <title>Example</title>
+<source>
+A = LOAD 'data' as (x, y, z);
+
+B = FOREACH A GENERATE -x, y;
+</source>
+   
+   </section>
+   
+   <section id="types-table-negative">
+   <title>Types Table: negative ( - ) operator</title>
+   <table>
+      <tr>
             <td>
-               <p></p>
+               <p>bag </p>
             </td>
             <td>
                <p>error </p>
@@ -4453,262 +4680,71 @@ grunt>
          </tr>
          <tr>
             <td>
-               <p>bytearray </p>
+               <p>tuple </p>
             </td>
             <td>
-               <p>yes </p>
+               <p>error </p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>yes </p>
+               <p>map </p>
             </td>
             <td>
-               <p>yes </p>
+               <p>error </p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>yes </p>
+               <p>int </p>
             </td>
             <td>
-               <p>yes </p>
+               <p>int </p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>yes </p>
+               <p>long </p>
             </td>
             <td>
-               <p>yes </p>
+               <p>long </p>
             </td>
+         </tr>
+         <tr>
             <td>
-               <p>yes </p>
+               <p>float </p>
             </td>
             <td>
-               <p></p>
+               <p>float </p>
             </td>
          </tr>
-   </table>
-
-   <section>
-   <title>Syntax  </title>
-   <table>
-      <tr>
+         <tr>
             <td>
-               <p>{(data_type) |  (tuple(data_type))  | (bag{tuple(data_type)}) | (map[]) } field</p>
+               <p>double </p>
+            </td>
+            <td>
+               <p>double </p>
             </td>
          </tr>
-   </table>
-   </section>
-   
-   <section>
-   <title>Terms</title>
-   <table>
-      <tr>
+         <tr>
             <td>
-               <p>(data_type)</p>
+               <p>chararray </p>
             </td>
             <td>
-               <p>The data type you want to cast to, enclosed in parentheses. You can cast to any data type except bytearray (see the table above).</p>
+               <p>error </p>
             </td>
          </tr>
          <tr>
             <td>
-               <p>field</p>
+               <p>bytearray </p>
             </td>
             <td>
-               <p>The field whose type you want to change. </p>
-               <p>The field can be represented by positional notation or by name (alias). For example, if f1 is the first field and type int, you can cast to type long using (long)$0 or (long)f1.</p>
+               <p>double (as double) </p>
             </td>
          </tr>
    </table>
-   </section>
-   
-   <section>
-   <title>Usage</title>
-   <p>Cast operators enable you to cast or convert data from one type to another, as long as conversion is supported (see the table above). For example, suppose you have an integer field, myint, which you want to convert to a string. You can cast this field from int to chararray using (chararray)myint.</p>
-   <p>Please note the following:</p>
-   <ul>
-      <li>
-         <p>A field can be explicitly cast. Once cast, the field remains that type (it is not automatically cast back). In this example $0 is explicitly cast to int.</p>
-<source>
-B = FOREACH A GENERATE (int)$0 + 1;
-</source>
-      </li>
-   </ul>
-   <p></p>
-   <ul>
-      <li>
-         <p>Where possible, Pig performs implicit casts. In this example $0 is cast to int (regardless of underlying data) and $1 is cast to double.</p>
-<source>
-B = FOREACH A GENERATE $0 + 1, $1 + 1.0
-</source>
-      </li>
-   </ul>
-   <ul>
-      <li>
-         <p>When two bytearrays are used in arithmetic expressions or with built in aggregate functions (such as SUM) they are implicitly cast to double. If the underlying data is really int or long, you’ll get better performance by declaring the type or explicitly casting the data.</p>
-      </li>
-      <li>
-         <p>Downcasts may cause loss of data. For example casting from long to int may drop bits.</p>
-      </li>
-   </ul>
-   </section>
-   
-   <section>
-   <title>Examples</title>
-   <p>In this example an int is cast to type chararray (see relation X).</p>
-<source>
-A = LOAD 'data' AS (f1:int,f2:int,f3:int);
-
-DUMP A;
-(1,2,3)
-(4,2,1)
-(8,3,4)
-(4,3,3)
-(7,2,5)
-(8,4,3)
-
-B = GROUP A BY f1;
-
-DUMP B;
-(1,{(1,2,3)})
-(4,{(4,2,1),(4,3,3)})
-(7,{(7,2,5)})
-(8,{(8,3,4),(8,4,3)})
-
-DESCRIBE B;
-B: {group: int,A: {f1: int,f2: int,f3: int}}
-
-X = FOREACH B GENERATE group, (chararray)COUNT(A) AS total;
-(1,1)
-(4,2)
-(7,1)
-(8,2)
-
-DESCRIBE X;
-X: {group: int,total: chararray}
-</source>
-   
-   
-<p>In this example a bytearray (fld in relation A) is cast to type tuple.</p>
-<source>
-cat data;
-(1,2,3)
-(4,2,1)
-(8,3,4)
-
-A = LOAD 'data' AS fld:bytearray;
-
-DESCRIBE A;
-a: {fld: bytearray}
-
-DUMP A;
-((1,2,3))
-((4,2,1))
-((8,3,4))
-
-B = FOREACH A GENERATE (tuple(int,int,float))fld;
-
-DESCRIBE B;
-b: {(int,int,float)}
-
-DUMP B;
-((1,2,3))
-((4,2,1))
-((8,3,4))
-</source>
-   
-   <p>In this example a bytearray (fld in relation A) is cast to type bag.</p>
-<source>
-cat data;
-{(4829090493980522200L)}
-{(4893298569862837493L)}
-{(1297789302897398783L)}
-
-A = LOAD 'data' AS fld:bytearray;
-
-DESCRIBE A;
-A: {fld: bytearray}
-
-DUMP A;
-({(4829090493980522200L)})
-({(4893298569862837493L)})
-({(1297789302897398783L)})
-
-B = FOREACH A GENERATE (bag{tuple(long)})fld; 
-
-DESCRIBE B;
-B: {{(long)}}
-
-DUMP B;
-({(4829090493980522200L)})
-({(4893298569862837493L)})
-({(1297789302897398783L)})
-</source>
-
-
-   <p>In this example a bytearray (fld in relation A) is cast to type map.</p>
-<source>
-cat data;
-[open#apache]
-[apache#hadoop]
-[hadoop#pig]
-[pig#grunt]
-
-A = LOAD 'data' AS fld:bytearray;
-
-DESCRIBE A;
-A: {fld: bytearray}
-
-DUMP A;
-([open#apache])
-([apache#hadoop])
-([hadoop#pig])
-([pig#grunt])
-
-B = FOREACH A GENERATE ((map[])fld;
-
-DESCRIBE B;
-B: {map[ ]}
-
-DUMP B;
-([open#apache])
-([apache#hadoop])
-([hadoop#pig])
-([pig#grunt])
-</source>
-   
-</section></section></section> 
-
-<!-- =================================================================== -->
-   <section id="cast-relations">
-   <title>Casting Relations to Scalars</title>
-<p>Pig allows you to cast the elements of a single-tuple relation into a scalar value. 
-The tuple can be a single-field or multi-field tulple. 
-If the relation contains more than one tuple, however, a runtime error is generated: "Scalar has more than one row in the output". 
-</p>
-
-<p>The cast relation can be used in any place where an expression of the type would make sense, including FOREACH, FILTER, and SPLIT. Note that if an explicit cast is not used an implict cast will be inserted according to Pig rules. Also, when the schema can't be inferred bytearray is used.</p>  
- 
-<p>The primary use case for casting relations to scalars is the ability to use the values of global aggregates in follow up computations. </p> 
- 
-<p>In this example the percentage of clicks belonging to a particular user are computed. For the FOREACH statement, an explicit cast if used. If the SUM is not given a name, a position can be used as well (userid, clicks/(double)C.$0). </p>
-
-<source>
-A = load 'mydata' as (userid, clicks); 
-B = group A all; 
-C = foreach B genertate SUM(A.clicks) as total; 
-D = foreach A generate userid, clicks/(double)C.total; 
-dump D;
-</source>
-   
-<p>In this example a multi-field tuple is used. For the FILTER statement, Pig performs an implicit cast. For the FOREACH statement, 
-an explicit cast is used.</p>
-<source>
-A = load 'mydata' as (userid, clicks); 
-B = group A all; 
-C = foreach B genertate SUM(A.clicks) as total, COUNT(A) as cnt; 
-D = FILTER A by clicks > C.total/3 
-E = foreach D generate userid, clicks/(double)C.total, cnt; 
-dump E; 
-</source>
-</section>
+   </section></section></section>
+  
 </section>   
 
 <!-- =================================================================== -->
@@ -5045,7 +5081,7 @@ DUMP X;
          </tr>
          <tr>
             <td>
-               <p>nested_block</p>
+               <p id="nested-block">nested_block</p>
             </td>
             <td>
                <p>Nested FOREACH...GENERATE block used with a inner bag. Use this syntax:</p>
@@ -5146,7 +5182,7 @@ X = FOREACH B {
    </ul>
    </section>
    
-   <section>
+   <section id="projection">
    <title>Example: Projection</title>
    <p>In this example the asterisk (*) is used to project all tuples from relation A to relation X. Relation A and X are identical.</p>
 <source>
@@ -5239,8 +5275,8 @@ DUMP X;
    
    </section>
    
-   <section>
-   <title>Example: Flattening</title>
+   <section id="flatten-example">
+   <title>Example: Flatten</title>
    <p>In this example the <a href="#Flatten">FLATTEN</a> operator is used to eliminate nesting. </p>
 <source>
 X = FOREACH C GENERATE group, FLATTEN(A);
@@ -5465,8 +5501,8 @@ DUMP X;
          </tr>     
          
      <tr>      
-         <td>
-               <p>PARTITION BY partitioner</p>
+         <td >
+               <p id="partition-by-group">PARTITION BY partitioner</p>
             </td>
             <td>
              <p>Use this feature to specify the Hadoop Partitioner. The partitioner controls the partitioning of the keys of the intermediate map-outputs. </p>
@@ -5839,8 +5875,8 @@ public class SimpleCustomPartitioner ext
      </li>
    </ul>
    <p></p>
-   <p id="self-joins"><strong>Self-Joins</strong></p>
-   <p>To perform self-joins in Pig load the same data multiple times, under different aliases, to avoid naming conflicts.</p>  
+   <p id="self-joins"><strong>Self Joins</strong></p>
+   <p>To perform self joins in Pig load the same data multiple times, under different aliases, to avoid naming conflicts.</p>  
    <p>In this example the same data is loaded twice using aliases A and B.</p>
    <source>
 grunt> A = load 'mydata';
@@ -6215,7 +6251,7 @@ DUMP X;
                <p>If you specify a directory name, all the files in the directory are loaded. </p>
                <p>You can use Hadoop globing to specify files at the file system or directory levels (see Hadoop
                   <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)">globStatus</a> for details on globing syntax).</p>
-                  <p><strong>Note:</strong> Pig uses Hadoop globbing so the functionality is IDENTICAL. However, when you run from the command line using the Hadoop fs command (rather than the Pig LOAD operator), the Unix shell may do some of the substitutions; this could alter the outcome giving the impression that globing works differently for Pig and Hadoop. For example:</p>
+                  <p id="load-glob"><strong>Note:</strong> Pig uses Hadoop globbing so the functionality is IDENTICAL. However, when you run from the command line using the Hadoop fs command (rather than the Pig LOAD operator), the Unix shell may do some of the substitutions; this could alter the outcome giving the impression that globing works differently for Pig and Hadoop. For example:</p>
                 <ul>
 					<li>This works <br></br>hadoop fs -ls /mydata/20110423{00,01,02,03,04,05,06,07,08,09,{10..23}}00//<strong>part </strong></li>
 					<li>This does not work <br></br>LOAD '/mydata/20110423{00,01,02,03,04,05,06,07,08,09,{10..23}}00//<strong>part </strong>'</li>
@@ -7028,7 +7064,7 @@ X = STREAM A THROUGH `stream.pl` as (f1:
       
       <tr>
              <td>
-               <p>ONSCHEMA </p>  
+               <p id="onschema">ONSCHEMA </p>  
             </td>
             <td>
                <p>Use the ONSCHEMA clause to base the union on named fields (rather than positional notation). 
@@ -7260,7 +7296,7 @@ DUMP U;
          </tr>
          <tr>
             <td>
-               <p>ship</p>
+               <p id="ship">ship</p>
             </td>
             <td>
             <p>For use with streaming.</p>
@@ -7278,7 +7314,7 @@ DUMP U;
          </tr>
          <tr>
             <td>
-               <p>cache</p>
+               <p id="cache">cache</p>
             </td>
             <td>
             <p>For use with streaming.</p>
@@ -7375,7 +7411,7 @@ interface PigToStream {
    
    </section>
    
-   <section>
+   <section id="ship-about">
    <title>About Ship </title>
    <p>Use the ship option to send streaming binary and supporting files, if any, from the client node to the compute nodes. Pig does not automatically ship dependencies; it is your responsibility to explicitly specify all the dependencies and to make sure that the software the processing relies on (for instance, perl or python) is installed on the cluster. Supporting files are shipped to the task's current working directory and only relative paths should be specified. Any pre-installed binaries should be specified in the PATH. </p>
    <p>Only files, not directories, can be specified with the ship option. One way to work around this limitation is to tar all the dependencies into a tar file that accurately reflects the structure needed on the compute nodes, then have a wrapper for your script that un-tars the dependencies prior to execution.</p>
@@ -7400,7 +7436,7 @@ OP = stream IP through 'CMD';
    </section>
    
    
-   <section>
+   <section id="cache-about">
    <title>About Cache</title>
    <p>The ship option works with binaries, jars, and small datasets. However, loading larger datasets at run time for every execution can severely impact performance. Instead, use the cache option to access large files already moved to and available on the compute nodes. Only files, not directories, can be specified with the cache option.</p>
    </section>
@@ -7555,7 +7591,7 @@ B = FOREACH A GENERATE myFunc($0);
    
    <p>Use the REGISTER statement inside a Pig script to specify a JAR file or a Python/JavaScript module. Pig supports JAR files and modules stored in local file systems as well as remote, distributed file systems such as HDFS and Amazon S3 (see <a href="start.html#Pig-Scripts">Pig Scripts</a>).</p>
    
-   <p>Additionally, JAR files stored in local file systems can be specified as a glob pattern using “*”. Pig will search for matching jars in the local file system, either the relative path (relative to your working directory) or the absolute path. Pig will pick up all JARs that match the glob.</p>
+   <p id="register-glob">Additionally, JAR files stored in local file systems can be specified as a glob pattern using “*”. Pig will search for matching jars in the local file system, either the relative path (relative to your working directory) or the absolute path. Pig will pick up all JARs that match the glob.</p>
    
    <p><strong>Command Line</strong></p>
    <p>You can register additional files (to use with your Pig script) via the command line using the -Dpig.additional.jars option.