You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by ol...@apache.org on 2011/05/23 22:20:49 UTC

svn commit: r1126714 [2/3] - in /pig/trunk: ./ src/docs/src/documentation/content/xdocs/

Modified: pig/trunk/src/docs/src/documentation/content/xdocs/func.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/func.xml?rev=1126714&r1=1126713&r2=1126714&view=diff
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/func.xml (original)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/func.xml Mon May 23 20:20:49 2011
@@ -23,8 +23,8 @@
   </header>
   <body>
   
-<section>
-<title>About Built In Functions</title>
+<section id="built-in-functions">
+<title>Introduction</title>
 <p>
 Pig comes with a set of built in functions (the eval, load/store, math, string, bag and tuple functions). Two main properties differentiate built in functions from <a href="udf.html">user defined  functions</a> (UDFs). First, built in functions don't need to be registered because Pig knows where they are. Second, built in functions don't need to be qualified when they are used because Pig knows where to find them. 
 </p>	
@@ -32,8 +32,8 @@ Pig comes with a set of built in functio
 
 <!-- ================================================================== -->
 <!-- DYNAMIC INVOKERS -->
-<section>
-<title>About Dynamic Invokers</title>
+<section id="dynamic-invokers">
+<title>Dynamic Invokers</title>
 
 <p>Often you may need to use a simple function that is already provided by standard Java libraries, but for which a <a href="udf.html">user defined  functions</a> (UDF) has not been written. Dynamic invokers allow you to refer to Java functions without having to wrap them in custom UDFs, at the cost of doing some Java reflection on every function call. 
 </p>
@@ -60,10 +60,12 @@ decoded_strings = FOREACH encoded_string
   
 <!-- ======================================================== -->  
 <!-- EVAL FUNCTIONS -->    
-<section>
+<section id="eval-functions">
 <title>Eval Functions</title>
 
-<section>
+
+<!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->  
+<section id="avg">
 <title>AVG</title>
    <p>Computes the average of the numeric values in a single-column bag. </p>
    <section>
@@ -180,7 +182,8 @@ DUMP C;
    </table>
    </section></section>
    
-   <section>
+   <!-- ++++++++++++++++++++++++++++++++++++++++++++++ --> 
+   <section id="concat">
    <title>CONCAT</title>
    <p>Concatenates two expressions of identical type.</p>
    
@@ -213,7 +216,7 @@ DUMP C;
    <title>Usage</title>
    <p>Use the CONCAT function to concatenate two expressions. The result values of the two expressions must have identical types.</p>
    </section>
-  </section>
+
    
    <section>
    <title>Example</title>
@@ -234,8 +237,11 @@ DUMP X;
 (piglatin)
 </source>
 </section>
-   
-   <section >
+</section>
+  
+  
+  <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->  
+   <section id="count">
    <title>COUNT</title>
    <p>Computes the number of elements in a bag. </p>
    <section>
@@ -270,7 +276,7 @@ DUMP X;
     The COUNT function follows syntax semantics and ignores nulls. 
     What this means is that a tuple in the bag will not be counted if the FIRST FIELD in this tuple is NULL. 
     If you want to include NULL values in the count computation, use 
-    <a href="#COUNT_STAR">COUNT_STAR</a>.
+    <a href="#COUNT-STAR">COUNT_STAR</a>.
    </p>   
    
    <p>
@@ -363,7 +369,8 @@ DUMP X;
    </table>
    </section></section>
    
- <section>
+     <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->
+ <section id="count-star">
    <title>COUNT_STAR</title>
    <p>Computes the number of elements in a bag. </p>
    <section>
@@ -408,8 +415,9 @@ X = FOREACH B GENERATE COUNT_STAR(A);
 </source>
    </section>
     </section>
-   
-   <section>
+  
+    <!-- ++++++++++++++++++++++++++++++++++++++++++++++ --> 
+   <section id="diff">
    <title>DIFF</title>
    <p>Compares two fields in a tuple.</p>
    
@@ -471,8 +479,10 @@ grunt> dump x;
 ({(6,7),(2,2)})
 </source>
    </section></section>
-   
-<section>
+
+
+  <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->   
+<section id="isempty">
    <title>IsEmpty</title>
    <p>Checks if a bag or map is empty.</p>
    
@@ -519,8 +529,9 @@ Y = filter X by IsEmpty(SSN_NAME);
 </source>
    </section></section>    
    
-   
-   <section>
+
+  <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->   
+   <section id="max">
    <title>MAX</title>
    <p>Computes the maximum of the numeric values or chararrays in a single-column bag. MAX requires a preceding GROUP ALL statement for global maximums and a GROUP BY statement for group maximums.</p>
    
@@ -632,8 +643,10 @@ DUMP X;
          </tr> 
    </table>
    </section></section>
-   
-   <section>
+ 
+ 
+   <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->  
+   <section id="min">
    <title>MIN</title>
    <p>Computes the minimum of the numeric values or chararrays in a single-column bag. MIN requires a preceding GROUP… ALL statement for global minimums and a GROUP … BY statement for group minimums.</p>
    
@@ -747,7 +760,9 @@ DUMP X;
    </table>
    </section></section>
    
-   <section>
+   
+     <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->
+   <section id="size">
    <title>SIZE</title>
    <p>Computes the number of elements based on any Pig data type. </p>
    
@@ -874,8 +889,10 @@ DUMP X;
             </td>
          </tr> 
    </table></section></section>
-   
-   <section>
+
+
+  <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->   
+   <section id="sum">
    <title>SUM</title>
    <p>Computes the sum of the numeric values in a single-column bag. SUM requires a preceding GROUP ALL statement for global sums and a GROUP BY statement for group sums.</p>
    
@@ -984,7 +1001,8 @@ DUMP X;
    </table>
    </section></section>
    
-   <section>
+     <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->
+   <section id="tokenize">
    <title>TOKENIZE</title>
    <p>Splits a string and outputs a bag of words. </p>
    
@@ -1037,14 +1055,15 @@ DUMP X;
    
    </section></section></section>
    
-   <section>
+   <!-- ======================================================================== -->
+   <section id="load-store-functions">
    <title>Load/Store Functions</title>
    <p>Load/store functions determine how data goes into Pig and comes out of Pig. 
    Pig provides a set of built-in load/store functions, described in the sections below. 
    You can also write your own load/store functions  (see <a href="udf.html">User Defined Functions</a>).</p>
    
-
-   <section>
+  <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->
+   <section id="handling-compression">
    <title>Handling Compression</title>
 
 <p>Support for compression is determined by the load/store function. PigStorage and TextLoader support gzip and bzip compression for both read (load) and write (store). BinStorage does not support compression.</p>
@@ -1081,7 +1100,8 @@ store A into ‘myoutput.bz’; 
 
 </section>
 
-   <section>
+  <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->
+   <section id="binstorage">
    <title>BinStorage</title>
    <p>Loads and stores data in machine-readable format.</p>
    
@@ -1114,7 +1134,7 @@ store A into ‘myoutput.bz’; 
    <p>Pig uses BinStorage to load and store the temporary data that is generated between multiple MapReduce jobs.</p>
    <ul>
    <li>BinStorage works with data that is represented on disk in machine-readable format. 
-   BinStorage does NOT support <a href="#Handling+Compression">compression</a>.</li>
+   BinStorage does NOT support <a href="#Handling-Compression">compression</a>.</li>
    <li>BinStorage supports multiple locations (files, directories, globs) as input.</li>
    </ul>
     <p></p>
@@ -1199,8 +1219,53 @@ dump X; 
 <title>More Examples</title>
 </section>
 </section>
+ 
+ 
+
+   
+     <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->
+   <section id="pigdump">
+   <title>PigDump</title>
+   <p>Stores data in UTF-8 format.</p>
+   
+   <section>
+   <title>Syntax</title>
+   <table>
+       <tr>
+            <td>
+               <p>PigDump()        </p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Terms</title>
+   <table>
+       <tr>
+            <td>
+               <p>none</p>
+            </td>
+            <td>
+               <p>no parameters</p>
+            </td>
+         </tr> 
+   </table></section>
+   
+   <section>
+   <title>Usage</title>
+   <p>PigDump stores data as tuples in human-readable UTF-8 format. </p></section>
    
    <section>
+   <title>Example</title>
+   <p>In this example PigDump is used with the STORE function.</p>
+<source>
+STORE X INTO 'output' USING PigDump();
+</source>
+   </section></section>
+   
+   
+      <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->  
+   <section id="pigstorage">
    <title>PigStorage</title>
    <p>Loads and stores data in UTF-8 format.</p>
    
@@ -1233,7 +1298,7 @@ dump X; 
    <title>Usage</title>
    <p>PigStorage is the default function for the LOAD and STORE operators and works with both simple and complex data types. </p>
    
-   <p>PigStorage supports structured text files (in human-readable UTF-8 format). PigStorage also supports <a href="#Handling+Compression">compression</a>.</p>
+   <p>PigStorage supports structured text files (in human-readable UTF-8 format). PigStorage also supports <a href="#Handling-Compression">compression</a>.</p>
    
     <p>PigStorage supports multiple locations (files, directories, globs) as input.</p>
 
@@ -1261,46 +1326,9 @@ STORE X INTO  'output' USING PigStorage(
 </source>
    </section></section>
    
-   <section>
-   <title>PigDump</title>
-   <p>Stores data in UTF-8 format.</p>
-   
-   <section>
-   <title>Syntax</title>
-   <table>
-       <tr>
-            <td>
-               <p>PigDump()        </p>
-            </td>
-         </tr> 
-   </table></section>
-   
-   <section>
-   <title>Terms</title>
-   <table>
-       <tr>
-            <td>
-               <p>none</p>
-            </td>
-            <td>
-               <p>no parameters</p>
-            </td>
-         </tr> 
-   </table></section>
-   
-   <section>
-   <title>Usage</title>
-   <p>PigDump stores data as tuples in human-readable UTF-8 format. </p></section>
-   
-   <section>
-   <title>Example</title>
-   <p>In this example PigDump is used with the STORE function.</p>
-<source>
-STORE X INTO 'output' USING PigDump();
-</source>
-   </section></section>
    
-   <section>
+  <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->
+   <section id="textloader">
    <title>TextLoader</title>
    <p>Loads unstructured data in UTF-8 format.</p>
    
@@ -1330,7 +1358,7 @@ STORE X INTO 'output' USING PigDump();
    
    <section>
    <title>Usage</title>
-   <p>TextLoader works with unstructured data in UTF8 format. Each resulting tuple contains a single field with one line of input text. TextLoader also supports <a href="#Handling+Compression">compression</a>.</p>
+   <p>TextLoader works with unstructured data in UTF8 format. Each resulting tuple contains a single field with one line of input text. TextLoader also supports <a href="#Handling-Compression">compression</a>.</p>
    <p>Currently, TextLoader support for compression is limited.</p>  
    <p>TextLoader cannot be used to store data.</p>
    </section>
@@ -1347,7 +1375,7 @@ A = LOAD 'data' USING TextLoader();
 <!-- ======================================================== -->  
 <!-- ======================================================== -->  
 <!-- Math Functions -->
-<section>
+<section id="math-functions">
 <title>Math Functions</title>
 
 <p>For general information about these functions, see the <a href="http://download.oracle.com/javase/6/docs/api/">Java API Specification</a>, 
@@ -1373,7 +1401,9 @@ A = LOAD 'data' USING TextLoader();
 	</li>
 </ul> 
  
-<section>
+ 
+   <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->
+<section id="abs">
    <title>ABS</title>
    <p>Returns the absolute value of an expression.</p>
 
@@ -1412,9 +1442,9 @@ A = LOAD 'data' USING TextLoader();
    
 </section>
 
-<!-- ======================================================== --> 
+  <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->
     
-<section>
+<section id="acos">
    <title>ACOS</title>
    <p>Returns the arc cosine of an expression.</p>
 
@@ -1453,7 +1483,7 @@ Use the ACOS function to return the arc 
 </section>    
    
   <!-- ======================================================== -->     
-    <section>
+    <section id="asin">
    <title>ASIN</title>
    <p>Returns the arc sine of an expression.</p>
 
@@ -1494,7 +1524,7 @@ Use the ASIN function to return the arc 
    
   <!-- ======================================================== -->  
   
- <section>
+ <section id="atan">
    <title>ATAN</title>
    <p>Returns the arc tangent of an expression.</p>
 
@@ -1534,7 +1564,7 @@ Use the ATAN function to return the arc 
 
   <!-- ======================================================== -->  
   
- <section>
+ <section id="cbrt">
    <title>CBRT</title>
    <p>Returns the cube root of an expression.</p>
 
@@ -1574,7 +1604,7 @@ Use the CBRT function to return the cube
 
  <!-- ======================================================== -->  
   
- <section>
+ <section id="ceil">
    <title>CEIL</title>
    <p>Returns the value of an expression rounded up to the nearest integer.
 </p>
@@ -1696,12 +1726,12 @@ This function never decreases the result
 
    </section>
 </section>    
+
+  <!-- ======================================================== -->  
   
-<!-- ======================================================== -->  
-  
- <section>
-   <title>COSH</title>
-   <p>Returns the hyperbolic cosine of an expression.
+ <section id="cos">
+   <title>COS</title>
+   <p>Returns the trigonometric cosine of an expression.
 </p>
 
 <section>
@@ -1709,7 +1739,7 @@ This function never decreases the result
    <table>
        <tr>
             <td>
-               <p>COSH(expression)</p>
+               <p>COS(expression)</p>
             </td>
          </tr> 
    </table></section>
@@ -1722,7 +1752,7 @@ This function never decreases the result
                <p>expression</p>
             </td>
             <td>
-               <p>An expression whose result is type double.</p>
+               <p>An expression (angle) whose result is type double.</p>
             </td>
          </tr>
         
@@ -1732,16 +1762,17 @@ This function never decreases the result
 <section>
      <title>Usage</title>
      <p>
-Use the COSH function to return the hyperbolic cosine of an expression. 
+Use the COS function to return the trigonometric cosine of an expression.
      </p>
    </section>
+   
 </section>    
   
-  <!-- ======================================================== -->  
+<!-- ======================================================== -->  
   
- <section>
-   <title>COS</title>
-   <p>Returns the trigonometric cosine of an expression.
+ <section id="cosh">
+   <title>COSH</title>
+   <p>Returns the hyperbolic cosine of an expression.
 </p>
 
 <section>
@@ -1749,7 +1780,7 @@ Use the COSH function to return the hype
    <table>
        <tr>
             <td>
-               <p>COS(expression)</p>
+               <p>COSH(expression)</p>
             </td>
          </tr> 
    </table></section>
@@ -1762,7 +1793,7 @@ Use the COSH function to return the hype
                <p>expression</p>
             </td>
             <td>
-               <p>An expression (angle) whose result is type double.</p>
+               <p>An expression whose result is type double.</p>
             </td>
          </tr>
         
@@ -1772,15 +1803,16 @@ Use the COSH function to return the hype
 <section>
      <title>Usage</title>
      <p>
-Use the COS function to return the trigonometric cosine of an expression.
+Use the COSH function to return the hyperbolic cosine of an expression. 
      </p>
    </section>
-   
 </section>    
+  
+
 
 <!-- ======================================================== -->  
   
- <section>
+ <section id="exp">
    <title>EXP</title>
    <p>Returns Euler's number e raised to the power of x.
 </p>
@@ -1820,7 +1852,7 @@ Use the EXP function to return the value
   
 <!-- ======================================================== -->  
   
- <section>
+ <section id="floor">
    <title>FLOOR</title>
    <p>Returns the value of an expression rounded down to the nearest integer. 
 </p>
@@ -1945,7 +1977,7 @@ This function never increases the result
 </section>      
 <!-- ======================================================== -->  
   
- <section>
+ <section id="log">
    <title>LOG</title>
    <p>Returns the natural logarithm (base e) of an expression.</p>
 
@@ -1984,7 +2016,7 @@ Use the LOG function to return the natur
   
   <!-- ======================================================== -->  
   
- <section>
+ <section id="log10">
    <title>LOG10</title>
    <p>Returns the base 10 logarithm of an expression.</p>
 
@@ -2023,7 +2055,7 @@ Use the LOG10 function to return the bas
 
   <!-- ======================================================== -->  
   
- <section>
+ <section id="random">
    <title>RANDOM</title>
    <p>Returns a pseudo random number.</p>
 
@@ -2062,7 +2094,7 @@ Use the RANDOM function to return a pseu
   
 <!-- ======================================================== -->  
   
- <section>
+ <section id="round">
    <title>ROUND</title>
    <p>Returns the value of an expression rounded to an integer.</p>
 
@@ -2183,7 +2215,7 @@ Use the ROUND function to return the val
 </section>       
   
 <!-- ======================================================== -->  
- <section>
+ <section id="sin">
    <title>SIN</title>
    <p>Returns the sine of an expression.</p>
 
@@ -2221,7 +2253,7 @@ Use the SIN function to return the sine 
 </section>       
   
 <!-- ======================================================== -->  
- <section>
+ <section id="sinh">
    <title>SINH</title>
    <p>Returns the hyperbolic sine of an expression.</p>
 
@@ -2260,7 +2292,7 @@ Use the SINH function to return the hype
 
 
 <!-- ======================================================== -->  
- <section>
+ <section id="sqrt">
    <title>SQRT</title>
    <p>Returns the positive square root of an expression.</p>
 
@@ -2298,7 +2330,7 @@ Use the SQRT function to return the posi
 </section>
 
 <!-- ======================================================== -->  
- <section>
+ <section id="tan">
    <title>TAN</title>
    <p>Returns the trignometric tangent of an angle.</p>
 
@@ -2337,7 +2369,7 @@ Use the TAN function to return the trign
 </section>
 
 <!-- ======================================================== -->  
- <section>
+ <section id="tanh">
    <title>TANH</title>
    <p>Returns the hyperbolic tangent of an expression. </p>
 
@@ -2381,7 +2413,7 @@ Use the TANH function to return the hype
 <!-- ======================================================== -->   
 
 <!-- String Functions -->
-<section>
+<section id="string-functions">
 <title>String Functions</title>
 
 <p>For general information about these functions, see the <a href="http://download.oracle.com/javase/6/docs/api/">Java API Specification</a>, 
@@ -2400,7 +2432,8 @@ Use the TANH function to return the hype
 
 </ul>
  
- <section>
+   <!-- ++++++++++++++++++++++++++++++++++++++++++++++ -->
+ <section id="indexof">
    <title>INDEXOF</title>
    <p>Returns the index of the first occurrence of a character in a string, searching forward from a start index. </p>
 
@@ -2456,7 +2489,7 @@ Use the INDEXOF function to determine th
 </section> 
 
 <!-- ======================================================== -->  
- <section>
+ <section id="last-index-of">
    <title>LAST_INDEX_OF</title>
    <p>Returns the index of the last occurrence of a character in a string, searching backward from a start index. </p>
 
@@ -2512,7 +2545,7 @@ Use the LAST_INDEX_OF function to determ
 
 
 <!-- ======================================================== -->  
- <section>
+ <section id="lcfirst">
    <title>LCFIRST</title>
    <p>Converts the first character in a string to lower case. </p>
 
@@ -2550,7 +2583,7 @@ Use the LCFIRST function to convert only
 </section> 
 
 <!-- ======================================================== -->  
- <section>
+ <section id="lower">
    <title>LOWER</title>
    <p>Converts all characters in a string to lower case. </p>
 
@@ -2589,7 +2622,7 @@ Use the LOWER function to convert all ch
 
 
 <!-- ======================================================== -->
- <section>
+ <section id="regex-extract">
    <title>REGEX_EXTRACT </title>
    <p>Performs regular expression matching and extracts the matched group defined by an index parameter. </p>
 
@@ -2659,7 +2692,7 @@ REGEX_EXTRACT('192.168.1.5:8020', '(.*)\
 </section>
 
 <!-- ======================================================== -->
- <section>
+ <section id="regex-extract-all">
    <title>REGEX_EXTRACT_ALL </title>
    <p>Performs regular expression matching and extracts all matched groups.</p>
 
@@ -2722,7 +2755,7 @@ REGEX_EXTRACT_ALL('192.168.1.5:8020', '(
 
 
 <!-- ======================================================== -->  
- <section>
+ <section id="replace">
    <title>REPLACE</title>
    <p>Replaces existing characters in a string with new characters.</p>
 
@@ -2780,7 +2813,7 @@ REPLACE(string,'software','wiki');
 </section> 
 
 <!-- ======================================================== -->  
- <section>
+ <section id="strsplit">
    <title>STRSPLIT</title>
    <p>Splits a string around matches of a given regular expression. </p>
 
@@ -2837,7 +2870,7 @@ For example, given the string (open:sour
 </section> 
 
 <!-- ======================================================== -->  
- <section>
+ <section id="substring">
    <title>SUBSTRING</title>
    <p>Returns a substring from a given string. </p>
 
@@ -2895,7 +2928,7 @@ Given a field named alpha whose value is
 </section> 
 
 <!-- ======================================================== -->  
- <section>
+ <section id="trim">
    <title>TRIM</title>
    <p>Returns a copy of a string with leading and trailing white space removed.</p>
 
@@ -2933,7 +2966,7 @@ Use the TRIM function to remove leading 
 </section> 
 
 <!-- ======================================================== -->  
- <section>
+ <section id="ucfirst">
    <title>UCFIRST</title>
    <p>Returns a string with the first character converted to upper case. </p>
 
@@ -2971,7 +3004,7 @@ Use the UCFIRST function to convert only
 </section>
 
 <!-- ======================================================== -->  
- <section>
+ <section id="upper">
    <title>UPPER</title>
    <p>Returns a string converted to upper case. </p>
 
@@ -3015,12 +3048,12 @@ Use the UPPER function to convert all ch
 <!-- ======================================================== -->
 <!-- ======================================================== -->
 <!-- Other Functions -->
-<section>
+<section id="bag-tuple-functions">
 <title>Bag and Tuple Functions</title>
 
 
 <!-- ======================================================== -->
- <section>
+ <section id="tobag">
    <title>TOBAG</title>
    <p>Converts one or more expressions to type bag. </p>
 
@@ -3085,7 +3118,7 @@ DUMP b;
 </section>
 
  <!-- ======================================================== -->  
- <section>
+ <section id="topx">
    <title>TOP</title>
    <p>Returns the top-n tuples from a bag of tuples.</p>
 
@@ -3158,7 +3191,7 @@ topResults = FOREACH D {
 
 </section>
 <!-- ======================================================== -->  
- <section>
+ <section id="totuple">
    <title>TOTUPLE</title>
    <p>Converts one or more expressions to type tuple. </p>
 

Modified: pig/trunk/src/docs/src/documentation/content/xdocs/perf.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/perf.xml?rev=1126714&r1=1126713&r2=1126714&view=diff
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/perf.xml (original)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/perf.xml Mon May 23 20:20:49 2011
@@ -24,14 +24,14 @@
   
 <!-- ================================================================== -->
 <!-- COMBINER -->
-<section>
+<section id="combiner">
 <title>Combiner</title> 
 
 <p>The Pig combiner is an optimizer that is invoked when the statements in your scripts are arranged in certain ways. The examples below demonstrate when the combiner is used and not used. Whenever possible, make sure the combiner is used as it frequently yields an order of magnitude improvement in performance. </p>
 
 <section>
 <title>When the Combiner is Used</title> 
-<p>The combiner is generally used in the case of non-nested foreach where all projections are either expressions on the group column or expressions on algebraic UDFs (see  <a href="#Make+Your+UDFs+Algebraic">Make Your UDFs Algebraic</a>).</p>
+<p>The combiner is generally used in the case of non-nested foreach where all projections are either expressions on the group column or expressions on algebraic UDFs (see  <a href="#Algebraic-interface">Make Your UDFs Algebraic</a>).</p>
 
 <p>Example:</p>
 
@@ -144,7 +144,7 @@ D = foreach B generate group, MIN (A.gpa
   
 <!-- ================================================================== -->
 <!-- MEMORY MANAGEMENT -->
-<section>
+<section id="memory-management">
 <title>Memory Management</title>
 
 <p>Pig allocates a fix amount of memory to store bags and spills to disk as soon as the memory limit is reached. This is very similar to how Hadoop decides when to spill data accumulated by the combiner. </p>
@@ -156,7 +156,7 @@ D = foreach B generate group, MIN (A.gpa
 
 <!-- ==================================================================== -->
 <!-- MULTI-QUERY EXECUTION-->
-<section>
+<section id="multi-query-execution">
 <title>Multi-Query Execution</title>
 <p>With multi-query execution Pig processes an entire script or a batch of statements at once.</p>
 
@@ -246,7 +246,7 @@ With multi-query execution, the script w
 
 
 <!-- ++++++++++++++++++++++++++++++++++++++++++ -->
-<section>
+<section id="store-dump">
 	<title>Store vs. Dump</title>
 	<p>With multi-query exection, you want to use <a href="basic.html#STORE">STORE</a> to save (persist) your results. 
 	You do not want to use <a href="test.html#DUMP">DUMP</a> as it will disable multi-query execution and is likely to slow down execution. (If you have included DUMP statements in your scripts for debugging purposes, you should remove them.) </p>
@@ -272,7 +272,9 @@ STORE C INTO 'output2';	
 </source>
 
 </section>
-<section>
+
+<!-- ++++++++++++++++++++++++++++++++++++++++++ -->
+<section id="error-handling">
 	<title>Error Handling</title>
 	<p>With multi-query execution Pig processes an entire script or a batch of statements at once. 
 	By default Pig tries to run all the jobs that result from that, regardless of whether some jobs fail during execution. 
@@ -302,7 +304,8 @@ $ pig -stop_on_failure myscript.pig
 </source>
 </section>
 
-<section>
+<!-- ++++++++++++++++++++++++++++++++++++++++++ -->
+<section id="backward-compatibility">
 	<title>Backward Compatibility</title>
 	
 	<p>Most existing Pig scripts will produce the same result with or without the multi-query execution. 
@@ -425,7 +428,7 @@ STORE Gtab INTO '/user/vxj/finalresult2'
 
 <!-- ==================================================================== -->
  <!-- OPTIMIZATION RULES -->
-<section>
+<section id="optimization-rules">
 <title>Optimization Rules</title>
 <p>Pig supports various optimization rules. By default optimization, and all optimization rules, are turned on. 
 To turn off optimiztion, use:</p>
@@ -437,7 +440,7 @@ pig -optimizer_off [opt_rule | all ]
 <p>Note that some rules are mandatory and cannot be turned off.</p>
 
 <!-- +++++++++++++++++++++++++++++++ -->
-<section>
+<section id="ImplicitSplitInserter">
 <title>ImplicitSplitInserter</title>
 <p>Status: Mandatory</p>
 <p>
@@ -455,7 +458,7 @@ C = COGROUP A BY $0, B BY $0;
 </section>
 
 <!-- +++++++++++++++++++++++++++++++ -->
-<section>
+<section id="LogicalExpressionSimplifier">
 <title>LogicalExpressionSimplifier</title>
 <p>This rule contains several types of simplifications.</p>
 
@@ -504,7 +507,7 @@ is simplified to non-filtering 
 
 
 <!-- +++++++++++++++++++++++++++++++ -->
-<section>
+<section id="MergeForEach">
 <title>MergeForEach</title>
 <p>The objective of this rule is to merge together two feach statements, if these preconditions are met:</p>
 <ul>
@@ -526,9 +529,8 @@ C = FOREACH A GENERATE a+b+5, c-b; 
 </source>
 </section>
 
-
 <!-- +++++++++++++++++++++++++++++++ -->
-<section>
+<section id="OpLimitOptimizer">
 <title>OpLimitOptimizer</title>
 <p>
 The objective of this rule is to push the <a href="basic.html#LIMIT">LIMIT</a> operator up the data flow graph 
@@ -541,26 +543,13 @@ C = LIMIT B 10;
 </source>
 </section>
 
-<section>
-<title>PushUpFilters</title>
-<p>
-The objective of this rule is to push the <a href="basic.html#FILTER">FILTER</a> operators up the data flow graph. 
-As a result, the number of records that flow through the pipeline is reduced. 
-</p>
-<source>
-A = LOAD 'input';
-B = GROUP A BY $0;
-C = FILTER B BY $0 &lt; 10;
-</source>
-</section>
-
 <!-- +++++++++++++++++++++++++++++++ -->
-<section>
+<section id="PushDownExplodes">
 <title>PushDownExplodes</title>
 <p>
 The objective of this rule is to reduce the number of records that flow through the pipeline by moving 
 <a href="basic.html#FOREACH">FOREACH</a> operators with a 
-<a href="basic.html#Flatten+Operator">FLATTEN</a> down the data flow graph. 
+<a href="basic.html#Flatten">FLATTEN</a> down the data flow graph. 
 In the example shown below, it would be more efficient to move the foreach after the join to reduce the cost of the join operation.
 </p>
 <source>
@@ -572,7 +561,23 @@ D = JOIN C BY $1, B BY $1;
 </section>
 
 <!-- +++++++++++++++++++++++++++++++ -->
-<section>
+<section id="pushupfilters">
+<title>PushUpFilters</title>
+<p>
+The objective of this rule is to push the <a href="basic.html#FILTER">FILTER</a> operators up the data flow graph. 
+As a result, the number of records that flow through the pipeline is reduced. 
+</p>
+<source>
+A = LOAD 'input';
+B = GROUP A BY $0;
+C = FILTER B BY $0 &lt; 10;
+</source>
+</section>
+
+
+
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="StreamOptimizer">
 <title>StreamOptimizer</title>
 <p>
 Optimize when <a href="basic.html#LOAD">LOAD</a> precedes <a href="basic.html#STREAM">STREAM</a> 
@@ -584,13 +589,13 @@ around and to replace the storer/deseria
 </section>
 
 <!-- +++++++++++++++++++++++++++++++ -->
-<section>
+<section id="TypeCastInserter">
 <title>TypeCastInserter</title>
 <p>Status: Mandatory</p>
 <p>
 If you specify a <a href="basic.html#Schemas">schema</a> with the 
 <a href="basic.html#LOAD">LOAD</a> statement, the optimizer will perform a pre-fix projection of the columns 
-and <a href="basic.html#Cast+Operators">cast</a> the columns to the appropriate types. An example will illustrate the point. 
+and <a href="basic.html#Cast">cast</a> the columns to the appropriate types. An example will illustrate the point. 
 The LOAD statement (a) has a schema associated with it. The optimizer will insert a FOREACH operator that will project columns 0, 1 and 2 
 and also cast them to chararray, int and float respectively. 
 </p>
@@ -605,16 +610,17 @@ C = GROUP A By $0;
   
 <!-- ==================================================================== -->
 <!-- PERFORMANCE ENHANCERS-->
-<section>
+<section id="performance-enhancers">
 <title>Performance Enhancers</title>
 
 <section>
 <title>Use Optimization</title>
-<p>Pig supports various <a href="perf.html#Optimization+Rules">optimization rules</a> which are turned on by default. 
+<p>Pig supports various <a href="perf.html#Optimization-Rules">optimization rules</a> which are turned on by default. 
 Become familiar with these rules.</p>
 </section>
 
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="types">
 <title>Use Types</title>
 
 <p>If types are not specified in the load statement, Pig assumes the type of =double= for numeric computations. 
@@ -634,7 +640,8 @@ B = foreach A generate t + u;
 <p>The second query will run more efficiently than the first. In some of our queries with see 2x speedup. </p>
 </section>
 
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="projection">
 <title>Project Early and Often </title>
 
 <p>Pig does not (yet) determine when a field is no longer needed and drop the field from the row. For example, say you have a query like: </p>
@@ -663,7 +670,8 @@ E = foreach D generate group, COUNT($1);
 <p>Depending on your data, this can produce significant time savings. In queries similar to the example shown here we have seen total time drop by 50%.</p>
 </section>
 
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="filter">
 <title>Filter Early and Often</title>
 
 <p>As with early projection, in most cases it is beneficial to apply filters as early as possible to reduce the amount of data flowing through the pipeline. </p>
@@ -692,7 +700,8 @@ F = filter E by C.t == 1;
 
 </section>
 
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="pipeline">
 <title>Reduce Your Operator Pipeline</title>
 
 <p>For clarity of your script, you might choose to split your projects into several steps for instance: </p>
@@ -718,10 +727,11 @@ B = foreach A generate CONCAT(in#'k1', i
 
 </section>
 
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="algebraic-interface">
 <title>Make Your UDFs Algebraic</title>
 
-<p>Queries that can take advantage of the combiner generally ran much faster (sometimes several times faster) than the versions that don't. The latest code significantly improves combiner usage; however, you need to make sure you do your part. If you have a UDF that works on grouped data and is, by nature, algebraic (meaning their computation can be decomposed into multiple steps) make sure you implement it as such. For details on how to write algebraic UDFs, see <a href="udf.html#Aggregate+Functions">Aggregate Functions</a>.</p>
+<p>Queries that can take advantage of the combiner generally ran much faster (sometimes several times faster) than the versions that don't. The latest code significantly improves combiner usage; however, you need to make sure you do your part. If you have a UDF that works on grouped data and is, by nature, algebraic (meaning their computation can be decomposed into multiple steps) make sure you implement it as such. For details on how to write algebraic UDFs, see <a href="udf.html#algebraic-interface">Algebraic Interface</a>.</p>
 
 <source>
 A = load 'data' as (x, y, z)
@@ -733,15 +743,16 @@ C = foreach B generate group, MyUDF(A);
 <p>If <code>MyUDF</code> is algebraic, the query will use combiner and run much faster. You can run <code>explain</code> command on your query to make sure that combiner is used. </p>
 </section>
 
-<section>
-<title>Implement the Aggregator Interface</title>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="accumulator-interface">
+<title>Use the Accumulator Interface</title>
 <p>
-If your UDF can't be made Algebraic but is able to deal with getting input in chunks rather than all at once, consider implementing the Aggregator interface to reduce the amount of memory used by your script.If your function <em>is</em> Algebraic and can be used on conjunction with Accumulator functions, you will need to implement the Accumulator interface as well as the Algebraic interface. For more information, see <a href="udf.html#Accumulator+Interface">Accumulator Interface</a>.
+If your UDF can't be made Algebraic but is able to deal with getting input in chunks rather than all at once, consider implementing the Accumulator  interface to reduce the amount of memory used by your script.If your function <em>is</em> Algebraic and can be used on conjunction with Accumulator functions, you will need to implement the Accumulator interface as well as the Algebraic interface. For more information, see <a href="udf.html#Accumulator-Interface">Accumulator Interface</a>.
 </p>
 </section>
 
-
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="nulls">
 <title>Drop Nulls Before a Join</title>
 <p>With the introduction of nulls, join and cogroup semantics were altered to work with nulls. The semantic for cogrouping with nulls is that nulls from a given input are grouped together, but nulls across inputs are not grouped together. This preserves the semantics of grouping (nulls are collected together from a single input to be passed to aggregate functions like COUNT) and the semantics of join (nulls are not joined across inputs). Since flattening an empty bag results in an empty row (and no output), in a standard join the rows with a null key will always be dropped. </p>
 
@@ -775,7 +786,8 @@ C = join A1 by t, B1 by x;
 
 </section>
 
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="join-optimizations">
 <title>Take Advantage of Join Optimizations</title>
 <p><strong>Regular Join Optimizations</strong></p>
 <p>Optimization for regular joins ensures that the last table in the join is not brought into memory but streamed through instead. Optimization reduces the amount of memory used which means you can avoid spilling the data and also should be able to scale your query to larger data volumes. </p>
@@ -789,12 +801,12 @@ C = join small by t, large by x;
 
 <p><strong>Specialized Join Optimizations</strong></p>
 <p>Optimization can also be achieved using fragment replicate joins, skewed joins, and merge joins. 
-For more information see <a href="perf.html#Specialized+Joins">Specialized Joins</a>.</p>
+For more information see <a href="perf.html#Specialized-Joins">Specialized Joins</a>.</p>
 
 </section>
 
-
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="parallel">
 <title>Use the Parallel Features</title>
 
 <p>You can set the number of reduce tasks for the MapReduce jobs generated by Pig using two parallel features. 
@@ -810,9 +822,9 @@ You can include the PARALLEL clause with
 <a href="basic.html#CROSS">CROSS</a>, 
 <a href="basic.html#DISTINCT">DISTINCT</a>, 
 <a href="basic.html#GROUP">GROUP</a>, 
-<a href="basic.html#JOIN+%28inner%29">JOIN (inner)</a>, 
-<a href="basic.html#JOIN+%28outer%29">JOIN (outer)</a>, and
-<a href="basic.html#ORDER+BY">ORDER BY</a>.
+<a href="basic.html#JOIN-inner">JOIN (inner)</a>, 
+<a href="basic.html#JOIN-outer">JOIN (outer)</a>, and
+<a href="basic.html#ORDER-BY">ORDER BY</a>.
 </p>
 
 <p>The number of reducers you need for a particular construct in Pig that forms a MapReduce boundary depends entirely on (1) your data and the number of intermediate keys you are generating in your mappers and (2) the partitioner and distribution of map (combiner) output keys. In the best cases we have seen that a reducer processing about 1 GB of data behaves efficiently.</p>
@@ -848,8 +860,8 @@ STORE D INTO ‘mysortedcount’ U
 </source>
 </section>
 
-
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="limit">
 <title>Use the LIMIT Operator</title>
 <p>Often you are not interested in the entire output but rather a sample or top results. In such cases, using LIMIT can yield a much better performance as we push the limit as high as possible to minimize the amount of data travelling through the pipeline. </p>
 <p>Sample: 
@@ -870,7 +882,8 @@ C = limit B 500;
 
 </section>
 
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="distinct">
 <title>Prefer DISTINCT over GROUP BY/GENERATE</title>
 
 <p>To extract unique values from a column in a relation you can use DISTINCT or GROUP BY/GENERATE. DISTINCT is the preferred method; it is faster and more efficient.</p>
@@ -895,7 +908,8 @@ dump C; 
 </source>
 </section>
 
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="compression">
 <title>Compress the Results of Intermediate Jobs</title>
 <p>If your Pig script generates a sequence of MapReduce jobs, you can compress the output of the intermediate jobs using LZO compression. (Use the <a href="test.html#EXPLAIN">EXPLAIN</a> operator to determine if your script produces multiple MapReduce Jobs.)</p>
 
@@ -923,7 +937,8 @@ java -cp $PIG_HOME/pig.jar 
 </source>
 </section>
 
-<section>
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="combine-files">
 <title>Combine Small Input Files</title>
 <p>Processing input (either user input or intermediate input) from multiple small files can be inefficient because a separate map has to be created for each file. Pig can now combined small files so that they are processed as a single map.</p>
 
@@ -948,10 +963,12 @@ java -cp $PIG_HOME/pig.jar 
   
 <!-- ==================================================================== -->
 <!-- SPECIALIZED JOINS-->
-  <section>
+  <section id="specialized-joins">
    <title>Specialized Joins</title>
 <!-- FRAGMENT REPLICATE JOINS-->
-<section>
+
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="replicated-joins">
 <title>Replicated Joins</title>
 <p>Fragment replicate join is a special type of join that works well if one or more relations are small enough to fit into main memory. 
 In such cases, Pig can perform a very efficient join because all of the hadoop work is done on the map side. In this type of join the 
@@ -960,7 +977,7 @@ don't, the process fails and an error is
  
 <section>
 <title>Usage</title>
-<p>Perform a replicated join with the USING clause (see <a href="basic.html#JOIN+%28inner%29">inner joins</a> and <a href="basic.html#JOIN+%28outer%29">outer joins</a>).
+<p>Perform a replicated join with the USING clause (see <a href="basic.html#JOIN-inner">JOIN (inner)</a> and <a href="basic.html#JOIN-outer">JOIN (outer)</a>).
 In this example, a large relation is joined with two smaller relations. Note that the large relation comes first followed by the smaller relations; 
 and, all small relations together must fit into main memory, otherwise an error is generated. </p>
 <source>
@@ -983,9 +1000,9 @@ gets 1 GB of memory. Please share your o
 </section>
 <!-- END FRAGMENT REPLICATE JOINS-->
 
-
+<!-- +++++++++++++++++++++++++++++++ -->
 <!-- SKEWED JOINS-->
-<section>
+<section id="skewed-joins">
 <title>Skewed Joins</title>
 
 <p>
@@ -1005,7 +1022,7 @@ associated with a given key is too large
 
 <section>
 <title>Usage</title>
-<p>Perform a skewed join with the USING clause (see <a href="basic.html#JOIN+%28inner%29">inner joins</a> and <a href="basic.html#JOIN+%28outer%29">outer joins</a>). </p>
+<p>Perform a skewed join with the USING clause (see <a href="basic.html#JOIN-inner">JOIN (inner)</a> and <a href="basic.html#JOIN-outer">JOIN (outer)</a>). </p>
 <source>
 big = LOAD 'big_data' AS (b1,b2,b3);
 massive = LOAD 'massive_data' AS (m1,m2,m3);
@@ -1035,9 +1052,9 @@ However, in most cases, skewed join ensu
 </section>
 </section><!-- END SKEWED JOINS-->
 
-
+<!-- +++++++++++++++++++++++++++++++ -->
 <!-- MERGE JOIN-->
-<section>
+<section id="merge-joins">
 <title>Merge Joins</title>
 
 <p>
@@ -1061,7 +1078,7 @@ and the right input of the join to be th
 
 <section>
 <title>Usage</title>
-<p>Perform a merge join with the USING clause (see <a href="basic.html#JOIN+%28inner%29">inner joins</a> and <a href="basic.html#JOIN+%28outer%29">outer joins</a>). </p>
+<p>Perform a merge join with the USING clause (see <a href="basic.html#JOIN-inner">JOIN (inner)</a> and <a href="basic.html#JOIN-outer">JOIN (outer)</a>). </p>
 <source>
 C = JOIN A BY a1, B BY b1, C BY c1 USING 'merge';
 </source>
@@ -1109,7 +1126,9 @@ C = join A by id left, B by id using 'me
 </section>
 </section>
 <!-- END MERGE JOIN -->
-<section>
+
+<!-- +++++++++++++++++++++++++++++++ -->
+<section id="specialized-joins-performance">
 <title>Performance Considerations</title>
 <p>Note the following:</p>
 <ul>

Modified: pig/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=1126714&r1=1126713&r2=1126714&view=diff
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/site.xml Mon May 23 20:20:49 2011
@@ -48,6 +48,7 @@ See http://forrest.apache.org/docs/linki
     <cmds label="Shell and Utililty Commands" href="cmds.html" />
     <perform label="Performance and Efficiency" href="perf.html" />
     <test label="Testing and Diagnostics" href="test.html" />
+    <index label="Index" href="pig-index.html" />
     </docs>  
       
     <docs label="Zebra"> 

Modified: pig/trunk/src/docs/src/documentation/content/xdocs/start.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/start.xml?rev=1126714&r1=1126713&r2=1126714&view=diff
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/start.xml (original)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/start.xml Mon May 23 20:20:49 2011
@@ -46,7 +46,7 @@
   </section>         
    
 <!-- ++++++++++++++++++++++++++++++++++ -->        
- <section>
+ <section id="download">
  <title>Download Pig</title>
 	<p>To get a Pig distribution, do the following:</p>
 	
@@ -73,7 +73,7 @@ Test the Pig installation with this simp
 </section>  
 
 <!-- ++++++++++++++++++++++++++++++++++ -->
-<section>
+<section id="build">
 <title>Build Pig</title>
       <p>To build pig, do the following:</p>
      <ol>
@@ -83,52 +83,46 @@ Test the Pig installation with this simp
 	  <li> Validate the pig.jar  by running a unit test: <code>ant test</code></li>
      </ol>
  </section>
-
 </section>
 
   <!-- ==================================================================== -->
     
    <!-- RUNNING PIG  -->
-   <section>
+   <section id="run">
 	<title>Running Pig </title> 
-	<p>You can run or execute Pig Latin statements in various ways.</p>
+	<p>You can run Pig (execute Pig Latin statements and Pig commands) using various modes.</p>
 	<table>
 	<tr>
-	<td><strong>Pig Latin Statements</strong></td>
+	<td></td>
     <td><strong>Local Mode</strong></td>
     <td><strong>Mapreduce Mode</strong></td>
 	</tr>
 	<tr>
-	<td>Grunt Shell (enter statements interactively or run Pig scripts)</td>
-    <td>yes</td>
-    <td>yes</td>
-	</tr>
-	<tr>
-	<td>Pig Scripts (run batch statements from command line or Grunt shell)</td>
+	<td><strong>Interactive Mode </strong></td>
     <td>yes</td>
     <td>yes</td>
 	</tr>
 	<tr>
-	<td>Embedded Pig (embed statements in a host language)</td>
+	<td><strong>Batch Mode</strong> </td>
     <td>yes</td>
     <td>yes</td>
 	</tr>
 	</table>
 	
 	<!-- ++++++++++++++++++++++++++++++++++ -->
-	   <section>
-	<title>Run Modes</title> 
-<p>Pig has two run modes or exectypes: </p>
+	   <section id="execution-modes">
+	<title>Execution Modes</title> 
+<p>Pig has two execution modes or exectypes: </p>
 <ul>
 <li><strong>Local Mode</strong> - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local).
 </li>
-<li><strong>Mapreduce Mode</strong> - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, <em>but don't need to</em>, specify it using the -x flag (pig -x mapreduce).
+<li><strong>Mapreduce Mode</strong> - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, <em>but don't need to</em>, specify it using the -x flag (pig OR pig -x mapreduce).
 </li>
 </ul>
 <p></p>
 
-<p>You can run the Grunt shell and your Pig scripts in either mode using the "pig" command (the bin/pig Perl script) and/or the "java" command (java -cp pig.jar ...).
-You can compile and run embedded programs in either mode using the conventions of the host langugage. </p>
+<p>You can run Pig in either mode using the "pig" command (the bin/pig Perl script) or the "java" command (java -cp pig.jar ...).
+</p>
 
 
 <section>
@@ -138,9 +132,11 @@ You can compile and run embedded program
 <source>
 /* local mode */
 $ pig -x local ...
-
+ 
+ 
 /* mapreduce mode */
 $ pig ...
+or
 $ pig -x mapreduce ...
 </source>
 
@@ -149,8 +145,10 @@ $ pig -x mapreduce ...
 /* local mode */
 $ java -cp pig.jar org.apache.pig.Main -x local ...
 
+
 /* mapreduce mode */
 $ java -cp pig.jar org.apache.pig.Main ...
+or
 $ java -cp pig.jar org.apache.pig.Main -x mapreduce ...
 </source>
 
@@ -158,15 +156,14 @@ $ java -cp pig.jar org.apache.pig.Main -
 </section>
 
 <!-- ++++++++++++++++++++++++++++++++++ -->
-<section>
-<title>Grunt Shell</title>
-<p>Use Pig's Grunt shell to enter Pig Latin statements interactively. 
-You can also run Pig scripts from the Grunt shell 
-(see the <a href="cmds.html#run">run</a> and <a href="cmds.html#exec">exec</a> commands). </p>
+<section id="interactive-mode">
+<title>Interactive Mode</title>
+<p>You can run Pig in interactive mode using the Grunt shell. Invoke the Grunt shell using the "pig" command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line.
+</p>
 
 <section>
 <title>Example</title>
-<p>These Pig Latin statements extract all user IDs from the /etc/passwd file. First, copy the /etc/passwd file to your local working directory. Next, invoke the Grunt shell by typing the pig command (in local mode or mapreduce mode). Then, enter the Pig Latin statements interactively at the grunt prompt (be sure to include the semicolon after each statement). The DUMP operator will display the results to your terminal screen.</p>
+<p>These Pig Latin statements extract all user IDs from the /etc/passwd file. First, copy the /etc/passwd file to your local working directory. Next, invoke the Grunt shell by typing the "pig" command (in local or hadoop mode). Then, enter the Pig Latin statements interactively at the grunt prompt (be sure to include the semicolon after each statement). The DUMP operator will display the results to your terminal screen.</p>
 <source>
 grunt&gt; A = load 'passwd' using PigStorage(':'); 
 grunt&gt; B = foreach A generate $0 as id; 
@@ -177,55 +174,29 @@ grunt&gt; dump B; 
 <source>
 $ pig -x local
 ... - Connecting to ...
-grunt>
+grunt> 
 </source>
 
 <p><strong>Mapreduce Mode</strong> </p>
 <source>
 $ pig -x mapreduce
+... - Connecting to ...
+grunt> 
+
 or
+
 $ pig 
 ... - Connecting to ...
-grunt>
+grunt> 
 </source>
 </section>
 </section>
 
 <!-- ++++++++++++++++++++++++++++++++++ -->
-<section>
-<title>Pig Scripts</title>
-<p>Use script files to run Pig Latin statements as batch jobs. With Pig scripts you can pass values to parameters using <a href="cont.html#Parameter+Substitution">parameter subtitution</a>. </p>
+<section id="batch-mode">
+<title>Batch Mode</title>
 
-   <section>
-   <title>Pig Scripts and Comments</title>
-   <p>You can include comments in Pig scripts:</p>
-   <ul>
-      <li>
-         <p>For multi-line comments use /* …. */</p>
-      </li>
-      <li>
-         <p>For single-line comments use --</p>
-      </li>
-   </ul>
-<source>
-/* myscript.pig
-My script is simple.
-It includes three Pig Latin statements.
-*/
-
-A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float); -- loading data
-B = FOREACH A GENERATE name;  -- transforming data
-DUMP B;  -- retrieving results
-</source>   
-</section>
-
-<section>
-<title>Pig Scripts and DFS</title>
-<p>Pig supports running scripts (and Jar files) that are stored in HDFS, Amazon S3, and other distributed file systems. The script's full location URI is required (see <a href="basic.html#REGISTER">REGISTER</a> for information about Jar files). For example, to run a Pig script on HDFS, do the following:</p>
-<source>
-$ pig hdfs://nn.mydomain.com:9020/myscripts/script.pig
-</source> 
-</section>
+<p>You can run Pig in batch mode using <a href="#pig-scripts">Pig scripts</a> and the "pig" command (in local or hadoop mode).</p>
 
 <section>
 <title>Example</title>
@@ -251,28 +222,65 @@ $ pig -x mapreduce id.pig
 </source>
 </section>
 
-</section>
+  <!-- ==================================================================== -->
+    
+   <!-- PIG SCRIPTS -->
+   <section id="pig-scripts">
+	<title>Pig Scripts</title>
+	
+<p>Use Pig scripts to place Pig Latin statements and Pig commands in a single file. While not required, it is good practice to identify the file using the *.pig extension.</p>	
+	
+<p>You can run Pig scripts from the command line and from the Grunt shell
+(see the <a href="cmds.html#run">run</a> and <a href="cmds.html#exec">exec</a> commands). </p>
+	
+<p>Pig scripts allow you to pass values to parameters using <a href="cont.html#Parameter-Sub">parameter substitution</a>. </p>
 
-<!-- ++++++++++++++++++++++++++++++++++ -->
-<section>
-<title>Embedded Pig</title>
-<p>You can embed Pig statements in a host language. Supported languages include Python, JavaScript, and Java (see <a href="cont.html">Control Structures</a>). </p>
+<!-- +++++++++++++++++++++++++++++++++++++++++++ -->	
+   <p id="comments"><strong>Comments in Scripts</strong></p>
+   
+   <p>You can include comments in Pig scripts:</p>
+   <ul>
+      <li>
+         <p>For multi-line comments use /* …. */</p>
+      </li>
+      <li>
+         <p>For single-line comments use --</p>
+      </li>
+   </ul>
+<source>
+/* myscript.pig
+My script is simple.
+It includes three Pig Latin statements.
+*/
 
+A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float); -- loading data
+B = FOREACH A GENERATE name;  -- transforming data
+DUMP B;  -- retrieving results
+</source>   
+	
+<!-- +++++++++++++++++++++++++++++++++++++++++++ -->		
+
+<p id="dfs"><strong>Scripts and Distributed File Systems</strong></p>
+
+<p>Pig supports running scripts (and Jar files) that are stored in HDFS, Amazon S3, and other distributed file systems. The script's full location URI is required (see <a href="basic.html#REGISTER">REGISTER</a> for information about Jar files). For example, to run a Pig script on HDFS, do the following:</p>
+<source>
+$ pig hdfs://nn.mydomain.com:9020/myscripts/script.pig
+</source> 
+</section>	
 </section>
 </section>
 
-
   <!-- ==================================================================== -->
     
    <!-- PIG LATIN STATEMENTS -->
-   <section>
+   <section id="pl-statements">
 	<title>Pig Latin Statements</title>	
    <p>Pig Latin statements are the basic constructs you use to process data using Pig. 
    A Pig Latin statement is an operator that takes a <a href="basic.html#relations">relation</a> as input and produces another relation as output. 
    (This definition applies to all Pig Latin operators except LOAD and STORE which read data from and write data to the file system.) 
    Pig Latin statements may include <a href="basic.html#Expressions">expressions</a> and <a href="basic.html#Schemas">schemas</a>. 
    Pig Latin statements can span multiple lines and must end with a semi-colon ( ; ).  
-   By default, Pig Latin statements are processed using <a href="perf.html#Multi-Query+Execution">multi-query execution</a>.  
+   By default, Pig Latin statements are processed using <a href="perf.html#multi-query-execution">multi-query execution</a>.  
  </p>
    
    <p>Pig Latin statements are generally organized as follows:</p>
@@ -312,13 +320,13 @@ DUMP B;
 </ul>
   
    <!-- ++++++++++++++++++++++++++++++++++ -->   
-   <section>
+   <section id="data-load">
    <title>Loading Data</title>
-   <p>Use the  <a href="basic.html#LOAD">LOAD</a> operator and the <a href="func.html#Load%2FStore+Functions">load/store functions</a> to read data into Pig (PigStorage is the default load function).</p>
+   <p>Use the  <a href="basic.html#LOAD">LOAD</a> operator and the <a href="udf.html#load-store-functions">load/store functions</a> to read data into Pig (PigStorage is the default load function).</p>
    </section>
   
    <!-- ++++++++++++++++++++++++++++++++++ -->   
-   <section>
+   <section id="data-work-with">
    <title>Working with Data</title>
    <p>Pig allows you to transform data in many ways. As a starting point, become familiar with these operators:</p>
    <ul>
@@ -329,8 +337,8 @@ DUMP B;
       <li>
          <p>Use the <a href="basic.html#GROUP ">GROUP</a> operator to group data in a single relation. 
          Use the <a href="basic.html#COGROUP ">COGROUP</a>,
-         <a href="basic.html#JOIN+%28inner%29">inner JOIN</a>, and
-         <a href="basic.html#JOIN+%28outer%29">outer JOIN</a>
+         <a href="basic.html#join-inner">inner JOIN</a>, and
+         <a href="basic.html#join-outer">outer JOIN</a>
          operators  to group or join data in two or more relations.</p>
       </li>
       <li>
@@ -341,7 +349,7 @@ DUMP B;
    </section>
    
 <!-- ++++++++++++++++++++++++++++++++++ --> 
-      <section>
+      <section id="data-store">
    <title>Storing Intermediate Data</title>
 
       <p>Pig stores the intermediate data generated between MapReduce jobs in a temporary location on HDFS. 
@@ -350,16 +358,16 @@ DUMP B;
    as the hardcoded location in Pig 0.7.0 and earlier versions. </p>
       </section>
    
-    <section>
+    <section id="data-results">
    <title>Storing Results</title>
-   <p>Use the  <a href="basic.html#STORE">STORE</a> operator and the <a href="func.html#Load%2FStore+Functions">load/store functions</a> 
+   <p>Use the  <a href="basic.html#STORE">STORE</a> operator and the <a href="udf.html#load-store-functions">load/store functions</a> 
    to write results to the file system (PigStorage is the default store function). </p>
 <p><strong>Note:</strong> During the testing/debugging phase of your implementation, you can use DUMP to display results to your terminal screen. 
-However, in a production environment you always want to use the STORE operator to save your results (see <a href="perf.html#Store+vs.+Dump">Store vs. Dump</a>).</p>   
+However, in a production environment you always want to use the STORE operator to save your results (see <a href="perf.html#Store-Dump">Store vs. Dump</a>).</p>   
    </section> 
 
  <!-- ++++++++++++++++++++++++++++++++++ -->     
-   <section>
+   <section id="debug">
    <title>Debugging Pig Latin</title>
    <p>Pig Latin provides operators that can help you debug your Pig Latin statements:</p>
    <ul>
@@ -382,7 +390,7 @@ However, in a production environment you
 
 <!-- ================================================================== -->
 <!-- PIG PROPERTIES -->
-<section>
+<section id="properties">
 <title>Pig Properties</title>
    <p>Pig supports a number of Java properties that you can use to customize Pig behavior. You can retrieve a list of the properties using the <a href="cmds.html#help">help properties</a> command. All of these properties are optional; none are required. </p>
 <p></p>
@@ -411,36 +419,51 @@ However, in a production environment you
 
   <!-- ==================================================================== -->
   <!-- PIG TUTORIAL -->
-  <section>
+  <section id="tutorial">
 <title>Pig Tutorial </title>
 
-<p>The Pig tutorial shows you how to run two Pig scripts using Pig's local mode and mapreduce mode (see <a href="#Run+Modes">Run Modes</a>).</p>
+<p>The Pig tutorial shows you how to run Pig scripts using Pig's local mode and mapreduce mode (see <a href="#execution-modes">Execution Modes</a>).</p>
 
-<p>The Pig tutorial file, tutorial/pigtutorial.tar.gz, is part of the Pig distribution (see <a href="#Download+Pig">Download Pig</a>). The zipped file includes the tutorial JAR file, Pig scripts, and log/data files (see <a href="#Pig+Tutorial+Files">Pig Tutorial Files</a>). These files work with Hadoop 0.20.2 and include everything you need to run the Pig scripts, which are explained line-by-line (see <a href="#Pig+Script+1%3A+Query+Phrase+Popularity">Pig Script 1</a> and 
-<a href="#Pig+Script+2%3A+Temporal+Query+Phrase+Popularity">Pig Script 2</a>).</p>
-
-<p>To get started with the Pig tutorial, do the following preliminary tasks:</p>
+<p>To get started, do the following preliminary tasks:</p>
 
 <ol>
 <li>Make sure the JAVA_HOME environment variable is set the root of your Java installation.</li>
-<li>Make sure that bin/pig is in your PATH (this enables you to run the scripts using the "pig" command).
+<li>Make sure your PATH includes bin/pig (this enables you to run the tutorials using the "pig" command). 
 <source>
-$ export PATH=/&lt;my-path-to-pig&gt;/pig-n.n.n/bin:$PATH 
+$ export PATH=/&lt;my-path-to-pig&gt;/pig-0.9.0/bin:$PATH 
 </source>
 </li>
 <li>Set the PIG_HOME environment variable:
 <source>
-$ export PIG_HOME=/&lt;my-path-to-pig&gt;/pig-n.n.n 
+$ export PIG_HOME=/&lt;my-path-to-pig&gt;/pig-0.9.0 
 </source></li>
-<li>Copy the pigtutorial.tar.gz file from the tutorial directory of your Pig installation to your local directory. </li>
-<li>Unzip the Pig tutorial file; the files are stored in a newly created directory, pigtmp. 
+<li>Create the pigtutorial.tar.gz file:
+<ul>
+    <li>Move to the Pig tutorial directory (.../pig-0.9.0/tutorial).</li>
+	<li>Edit the build.xml file in the tutorial directory. 
+<source>
+Change this:   &lt;property name="pigjar" value="../pig.jar" /&gt;
+To this:       &lt;property name="pigjar" value="../pig-0.9.0-core.jar" /&gt;
+</source>
+	</li>
+	<li>Run the "ant" command from the tutorial directory. This will create the pigtutorial.tar.gz file.
+	</li>
+</ul>
+
+</li>
+<li>Copy the pigtutorial.tar.gz file from the Pig tutorial directory to your local directory. </li>
+<li>Unzip the pigtutorial.tar.gz file.
 <source>
 $ tar -xzf pigtutorial.tar.gz
 </source>
 </li>
+<li>A new directory named pigtmp is created. This directory contains the <a href="#Pig+Tutorial+Files">Pig Tutorial Files</a>. These files work with Hadoop 0.20.2 and include everything you need to run <a href="#pig-script-1">Pig Script 1</a> and <a href="#pig-script-2">Pig Script 2</a>.</li>
 </ol>
 
 
+
+
+
  <!-- ++++++++++++++++++++++++++++++++++ --> 
 <section>
 <title> Running the Pig Scripts in Local Mode</title>
@@ -646,7 +669,7 @@ $ hadoop fs -cat 'script1-hadoop-results
 </section>
 
  <!-- ++++++++++++++++++++++++++++++++++ -->   
-<section>
+<section id="pig-script-1">
 <title> Pig Script 1: Query Phrase Popularity</title>
 
 <p>The Query Phrase Popularity script (script1-local.pig or script1-hadoop.pig) processes a search query log file from the Excite search engine and finds search phrases that occur with particular high frequency during certain times of the day. </p>
@@ -773,7 +796,7 @@ STORE ordered_uniq_frequency INTO '/tmp/
 </section>
 
  <!-- ++++++++++++++++++++++++++++++++++ -->   
-<section>
+<section id="pig-script-2">
 <title>Pig Script 2: Temporal Query Phrase Popularity</title>
 
 <p>The Temporal Query Phrase Popularity script (script2-local.pig or script2-hadoop.pig) processes a search query log file from the Excite search engine and compares the occurrence of frequency of search phrases across two time periods separated by twelve hours. </p>

Modified: pig/trunk/src/docs/src/documentation/content/xdocs/test.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/test.xml?rev=1126714&r1=1126713&r2=1126714&view=diff
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/test.xml (original)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/test.xml Mon May 23 20:20:49 2011
@@ -24,11 +24,11 @@
 
 <!-- =========================================================================== -->
 <!-- DIAGNOSTIC OPERATORS -->    
-<section>
+<section id="diagnostic-ops">
 	<title>Diagnostic Operators</title>
 	
  <!-- +++++++++++++++++++++++++++++++++++++++ --> 
-   <section>
+   <section id="describe">
    <title>DESCRIBE</title>
    <p>Returns the schema of a relation.</p>
    
@@ -126,7 +126,7 @@ D: {age: bytearray} 
 </section></section>
    
  <!-- +++++++++++++++++++++++++++++++++++++++ -->   
- <section>
+ <section id="dump">
    <title>DUMP</title>
    <p>Dumps or displays results to screen.</p>
    
@@ -159,7 +159,7 @@ D: {age: bytearray} 
    
    <p>
    Note that production scripts SHOULD NOT use DUMP as it will disable multi-query optimizations and is likely to slow down execution 
-   (see <a href="perf.html#Store+vs.+Dump">Store vs. Dump</a>).
+   (see <a href="perf.html#Store-Dump">Store vs. Dump</a>).
    </p>
    </section>
    
@@ -186,7 +186,7 @@ DUMP B;
 </section></section>      
    
  <!-- +++++++++++++++++++++++++++++++++++++++ -->
-   <section>
+   <section id="explain">
    <title>EXPLAIN</title>
    <p>Displays execution plans.</p>
    
@@ -250,7 +250,7 @@ DUMP B;
                <p>–param param_name = param_value</p>
             </td>
             <td>
-               <p>See <a href="cont.html#Parameter+Substitution">Parameter Substitution</a>.</p>
+               <p>See <a href="cont.html#Parameter-Sub">Parameter Substitution</a>.</p>
             </td>
          </tr>
 
@@ -259,7 +259,7 @@ DUMP B;
                <p>–param_file file_name</p>
             </td>
             <td>
-               <p>See <a href="cont.html#Parameter+Substitution">Parameter Substitution</a>. </p>
+               <p>See <a href="cont.html#Parameter-Sub">Parameter Substitution</a>. </p>
             </td>
          </tr>
       
@@ -339,7 +339,7 @@ Local Rearrange[tuple]{chararray}(false)
   
   
  <!-- +++++++++++++++++++++++++++++++++++++++ -->
-      <section>
+      <section id="illustrate">
    <title>ILLUSTRATE</title>
    <p>Displays a step-by-step execution of a sequence of statements.</p>
 
@@ -503,7 +503,7 @@ grunt> illustrate -script visits.pig
 
 <!-- =========================================================================== -->
 <!-- DIAGNOSTIC OPERATORS -->    
-<section>
+<section id="mapreduce-job-ids">
 <title>Pig Scripts and MapReduce Job IDs</title>
    <p>Complex Pig scripts often generate many MapReduce jobs. To help you debug a script, Pig prints a summary of the execution that shows which relations (aliases) are mapped to each MapReduce job. </p>
 <source>
@@ -519,7 +519,7 @@ job_201004271216_12714 1 1 3 3 3 12 12 1
 
 <!-- ==================================================================== -->
 <!-- PIG STATISTICS-->
-<section>
+<section id="pig-statistics">
 <title>Pig Statistics</title>
 
 <p>Pig Statistics is a framework for collecting and storing script-level statistics for Pig Latin. Characteristics of Pig Latin scripts and the resulting MapReduce jobs are collected while the script is executed. These statistics are then available for Pig users and tools using Pig (such as Oozie) to retrieve after the job is done.</p>
@@ -534,7 +534,7 @@ job_201004271216_12714 1 1 3 3 3 12 12 1
 
 <p>The Pig statistics are available here: <a href="http://pig.apache.org/docs/r0.9.0/api/">http://pig.apache.org/docs/r0.9.0/api/</a></p>
 
-<p>The stats classes are in the package: org.apache.pig.tools.pigstats</p>
+<p id="stats-classes">The stats classes are in the package: org.apache.pig.tools.pigstats</p>
 <ul>
 <li>PigStats</li>
 <li>JobStats</li>
@@ -589,7 +589,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.script.id</p>
+<p id="pig-script-id">pig.script.id</p>
 </td>
 <td>
 <p>The UUID for the script. All jobs spawned by the script have the same script ID.</p>
@@ -597,7 +597,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.script</p>
+<p id="pig-script">pig.script</p>
 </td>
 <td>
 <p>The base64 encoded script text.</p>
@@ -605,7 +605,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.command.line</p>
+<p id="pig-command-line">pig.command.line</p>
 </td>
 <td>
 <p>The command line used to invoke the script.</p>
@@ -613,7 +613,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.hadoop.version</p>
+<p id="pig-hadoop-version">pig.hadoop.version</p>
 </td>
 <td>
 <p>The Hadoop version installed.</p>
@@ -621,7 +621,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.version</p>
+<p id="pig-version">pig.version</p>
 </td>
 <td>
 <p>The Pig version used.</p>
@@ -629,7 +629,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.input.dirs</p>
+<p id="pig-input-dirs">pig.input.dirs</p>
 </td>
 <td>
 <p>A comma-separated list of input directories for the job.</p>
@@ -637,7 +637,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.map.output.dirs</p>
+<p id="pig-map-output-dirs">pig.map.output.dirs</p>
 </td>
 <td>
 <p>A comma-separated list of output directories in the map phase of the job.</p>
@@ -645,7 +645,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.reduce.output.dirs</p>
+<p id="pig-reduce-output-dirs">pig.reduce.output.dirs</p>
 </td>
 <td>
 <p>A comma-separated list of output directories in the reduce phase of the job.</p>
@@ -653,7 +653,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.parent.jobid</p>
+<p id="pig-parent-jobid">pig.parent.jobid</p>
 </td>
 <td>
 <p>A comma-separated list of parent job ids.</p>
@@ -661,7 +661,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.script.features</p>
+<p id="pig-script-features">pig.script.features</p>
 </td>
 <td>
 <p>A list of Pig features used in the script.</p>
@@ -669,7 +669,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.job.feature</p>
+<p id="pig-job-feature">pig.job.feature</p>
 </td>
 <td>
 <p>A list of Pig features used in the job.</p>
@@ -677,7 +677,7 @@ public interface PigProgressNotification
 </tr>
 <tr>
 <td>
-<p>pig.alias</p>
+<p id="pig-alias">pig.alias</p>
 </td>
 <td>
 <p>The alias associated with the job.</p>
@@ -688,7 +688,7 @@ public interface PigProgressNotification
 
 
 <!-- +++++++++++++++++++++++++++++++++++++++ -->
-<section>
+<section id="hadoop-job-history-loader">
 <title>Hadoop Job History Loader</title>
 <p>The HadoopJobHistoryLoader in Piggybank loads Hadoop job history files and job xml files from file system. For each MapReduce job, the loader produces a tuple with schema (j:map[], m:map[], r:map[]). The first map in the schema contains job-related entries. Here are some of important key names in the map: </p>
 
@@ -778,7 +778,7 @@ dump e;
 <!-- =========================================================================== -->
 <!-- PIGUNIT -->    
 
-  <section>
+  <section id="pigunit">
       <title>PigUnit</title>
       <p>PigUnit is a simple xUnit framework that enables you to easily test your Pig scripts.
         With PigUnit you can perform unit testing, regression testing, and rapid prototyping. 
@@ -827,7 +827,7 @@ $pig_trunk ant pigunit-jar   
       
       <p>The example included here computes the top N of the most common queries. 
         The Pig script, top_queries.pig, is similar to the 
-        <a href="start.html#Pig+Script+1%3A+Query+Phrase+Popularity">Query Phrase Popularity</a> 
+        <a href="start.html#Pig-Script-1">Query Phrase Popularity</a> 
         in the Pig tutorial. It expects an input a file of queries and a parameter n (n is 2 in our case in order to do a top 2). 
       </p>
       
@@ -995,7 +995,7 @@ test.runScript();
 <!-- =========================================================================== -->
 <!-- PENNY -->    
 
-  <section>
+  <section id="penny">
       <title>Penny</title>
       <p><strong>Note:</strong> <em>Penny is an experimental feature.</em></p>
       <p></p>