You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by gd...@apache.org on 2012/10/17 04:22:51 UTC

svn commit: r1399082 - in /pig/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/basic.xml

Author: gdfm
Date: Wed Oct 17 02:22:51 2012
New Revision: 1399082

URL: http://svn.apache.org/viewvc?rev=1399082&view=rev
Log:
PIG-2947: Documentation for Rank operator (xalan via azaroth)

Modified:
    pig/trunk/CHANGES.txt
    pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml

Modified: pig/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/pig/trunk/CHANGES.txt?rev=1399082&r1=1399081&r2=1399082&view=diff
==============================================================================
--- pig/trunk/CHANGES.txt (original)
+++ pig/trunk/CHANGES.txt Wed Oct 17 02:22:51 2012
@@ -45,6 +45,8 @@ PIG-1891 Enable StoreFunc to make intell
 
 IMPROVEMENTS
 
+PIG-2947: Documentation for Rank operator (xalan via azaroth)
+
 PIG-2910: Add function to read schema from outout of Schema.toString() (initialcontext via thejas)
 
 PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney)

Modified: pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml?rev=1399082&r1=1399081&r2=1399082&view=diff
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml (original)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml Wed Oct 17 02:22:51 2012
@@ -6906,7 +6906,162 @@ DUMP X;
 </source>
    
    </section></section>
+    <!-- =================================================================== -->     
+    <section id="rank">
+        <title>RANK</title>
+        <p>Returns each tuple with the rank within a relation.</p>
+        
+        <section>
+            <title>Syntax</title>
+            <table>
+                <tr> 
+                    <td>
+                        <p>alias = RANK alias [ BY { * [ASC|DESC] | field_alias [ASC|DESC] [, field_alias [ASC|DESC] …] } [DENSE] ];</p>
+                    </td>
+                </tr> 
+            </table>
+        </section>
+    
+        
+        <section>
+            <title>Terms</title>
+            <table>
+                <tr>
+                    <td>
+                        <p>alias</p>
+                    </td>
+                    <td>
+                        <p>The name of a relation.</p>
+                    </td>
+                </tr>
+                <tr>
+                    <td>
+                        <p>*</p>
+                    </td>
+                    <td>
+                        <p>The designator for a tuple.</p>
+                    </td>
+                </tr>
+                <tr>
+                    <td>
+                        <p>field_alias</p>
+                    </td>
+                    <td>
+                        <p>A field in the relation. The field must be a simple type.</p>
+                    </td>
+                </tr>
+                <tr>
+                    <td>
+                        <p>ASC</p>
+                    </td>
+                    <td>
+                        <p>Sort in ascending order.</p>
+                    </td>
+                </tr>
+                <tr>
+                    <td>
+                        <p>DESC</p>
+                    </td>
+                    <td>
+                        <p>Sort in descending order.</p>
+                    </td>
+                </tr>
+                
+                <tr>
+                    <td>
+                        <p>DENSE</p>
+                    </td>
+                    <td>
+                        <p>No gap in the ranking values. </p>
+                    </td>
+                </tr> 
+            </table>
+        </section>
+        
+        <section>
+            <title>Usage</title>
+            <p>When specifying no field to sort on, the RANK operator simply prepends a sequential value to each tuple.</p>
+            <p>Otherwise, the RANK operator uses each field (or set of fields) to sort the relation. The rank of a tuple is one plus the number of different rank values preceding it. If two or more tuples tie on the sorting field values, they will receive the same rank.</p>
+            <p><strong>NOTE:</strong> When using the option <strong>DENSE</strong>, ties do not cause gaps in ranking values.</p>
+
+        </section>  
+        
+        <section>
+            <title>Examples</title>
+            <p>Suppose we have relation A.</p>
+            <source>
+A = load 'data' AS (f1:chararray,f2:int,f3:chararray);
    
+DUMP A;
+(David,1,N)
+(Tete,2,N)
+(Ranjit,3,M)
+(Ranjit,3,P)
+(David,4,Q)
+(David,4,Q)
+(Jillian,8,Q)
+(JaePak,7,Q)
+(Michael,8,T)
+(Jillian,8,Q)
+(Jose,10,V)
+            </source>
+            <p>In this example, the RANK operator does not change the order of the relation and simply prepends to each tuple a sequential value.</p>
+            <source>
+B = rank A;
+
+dump B;
+(1,David,1,N)
+(2,Tete,2,N)
+(3,Ranjit,3,M)
+(4,Ranjit,3,P)
+(5,David,4,Q)
+(6,David,4,Q)
+(7,Jillian,8,Q)
+(8,JaePak,7,Q)
+(9,Michael,8,T)
+(10,Jillian,8,Q)
+(11,Jose,10,V)
+            </source>
+            
+            <p>In this example, the RANK operator works with f1 and f2 fields, and each one with different sorting order. RANK sorts the relation on these fields and 
+                prepends the rank value to each tuple. Otherwise, the RANK operator uses each field (or set of fields) to sort the relation. The rank of a tuple is one plus the number of different rank values preceding it. If two or more tuples tie on the sorting field values, they will receive the same rank.</p>
+            <source>
+C = rank A by f1 DESC, f2 ASC;
+                                
+dump C;
+(1,Tete,2,N)
+(2,Ranjit,3,M)
+(2,Ranjit,3,P)
+(4,Michael,8,T)
+(5,Jose,10,V)
+(6,Jillian,8,Q)
+(6,Jillian,8,Q)
+(8,JaePak,7,Q)
+(9,David,1,N)
+(10,David,4,Q)
+(10,David,4,Q)                
+            </source>
+            
+            <p>Same example as previous, but DENSE. In this case there are no gaps in ranking values.</p>
+            <source>
+C = rank A by f1 DESC, f2 ASC DENSE;
+
+dump C;
+(1,Tete,2,N)
+(2,Ranjit,3,M)
+(2,Ranjit,3,P)
+(3,Michael,8,T)
+(4,Jose,10,V)
+(5,Jillian,8,Q)
+(5,Jillian,8,Q)
+(6,JaePak,7,Q)
+(7,David,1,N)
+(8,David,4,Q)
+(8,David,4,Q)
+            </source>
+            
+        </section>
+    </section>
 
 
 <!-- =========================================================================== -->