You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by ch...@apache.org on 2012/12/15 21:29:09 UTC
svn commit: r1422348 - in /pig/trunk: CHANGES.txt
src/docs/src/documentation/content/xdocs/func.xml
Author: cheolsoo
Date: Sat Dec 15 20:29:08 2012
New Revision: 1422348
URL: http://svn.apache.org/viewvc?rev=1422348&view=rev
Log:
PIG-3085: Errors and lacks in document "Built In Functions" (miyakawataku via cheolsoo)
Modified:
pig/trunk/CHANGES.txt
pig/trunk/src/docs/src/documentation/content/xdocs/func.xml
Modified: pig/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/pig/trunk/CHANGES.txt?rev=1422348&r1=1422347&r2=1422348&view=diff
==============================================================================
--- pig/trunk/CHANGES.txt (original)
+++ pig/trunk/CHANGES.txt Sat Dec 15 20:29:08 2012
@@ -64,6 +64,8 @@ PIG-3013: BinInterSedes improve chararra
BUG FIXES
+PIG-3085: Errors and lacks in document "Built In Functions" (miyakawataku via cheolsoo)
+
PIG-3084: Improve exceptions messages in POPackage (julien)
PIG-3072: Pig job reporting negative progress (knoguchi via rohini)
Modified: pig/trunk/src/docs/src/documentation/content/xdocs/func.xml
URL: http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/func.xml?rev=1422348&r1=1422347&r2=1422348&view=diff
==============================================================================
--- pig/trunk/src/docs/src/documentation/content/xdocs/func.xml (original)
+++ pig/trunk/src/docs/src/documentation/content/xdocs/func.xml Sat Dec 15 20:29:08 2012
@@ -410,7 +410,7 @@ DUMP X;
<section>
<title>Example</title>
- <p>In this example COUNT_STAR is used the count the tuples in a bag.</p>
+ <p>In this example COUNT_STAR is used to count the tuples in a bag.</p>
<source>
X = FOREACH B GENERATE COUNT_STAR(A);
</source>
@@ -1339,7 +1339,7 @@ dump X;
<p>Use JsonStorage to store JSON data.</p>
- <p>Note that there is no concept of delimit in JsonLoader or JsonStorer. The data is encoded in standard JSON format. JsonLoader optionally takes a schema as the construct argument.</p>
+ <p>Note that there is no concept of delimit in JsonLoader or JsonStorage. The data is encoded in standard JSON format. JsonLoader optionally takes a schema as the construct argument.</p>
</section>
<section>
@@ -1453,7 +1453,7 @@ STORE X INTO 'output' USING PigDump();
<p><strong>Load/Store Statements</strong></p>
<p>Load statements â PigStorage expects data to be formatted using field delimiters, either the tab character ('\t') or other specified character.</p>
- <p>Store statements â PigStorage outputs data using field deliminters, either the tab character ('\t') or other specified character, and the line feed record delimiter ('\n'). </p>
+ <p>Store statements â PigStorage outputs data using field delimiters, either the tab character ('\t') or other specified character, and the line feed record delimiter ('\n'). </p>
<p><strong>Field/Record Delimiters</strong></p>
<p>Field Delimiters â For load and store statements the default field delimiter is the tab character ('\t'). You can use other characters as field delimiters, but separators such as ^A or Ctrl-A should be represented in Unicode (\u0001) using UTF-16 encoding (see Wikipedia <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a>, <a href="http://en.wikipedia.org/wiki/Unicode">Unicode</a>, and <a href="http://en.wikipedia.org/wiki/UTF-16">UTF-16</a>).</p>
@@ -1470,7 +1470,7 @@ STORE X INTO 'output' USING PigDump();
<p>If the noschema option is NOT specified, and a schema is found, it gets loaded when loading data.</p>
-<p>Note that regardless of whether or not you store the schema, you always need to specify the correct delimiter to read your data. If you store reading delimiter "#" and then load using the default delimiter, your data will not be parsed correctly.</p>
+<p>Note that regardless of whether or not you store the schema, you always need to specify the correct delimiter to read your data. If you store using delimiter "#" and then load using the default delimiter, your data will not be parsed correctly.</p>
<p><strong>Record Provenance</strong></p>
<p>If tagPath or tagFile option is specified, PigStorage will add a pseudo-column INPUT_FILE_PATH or INPUT_FILE_NAME respectively to the beginning of the record. As the name suggests, it is the input file path/name containing this particular record. Please note tagsource is deprecated.</p>
@@ -1511,7 +1511,7 @@ A = LOAD 'student' USING PigStorage('\t'
A = LOAD 'student' AS (name: chararray, age:int, gpa: float);
</source>
- <p>In this example PigStorage stores the contents of X into files with fields that are delimited with an asterisk ( * ). The STORE function specifies that the files will be located in a directory named output and that the files will be named part-nnnnn (for example, part-00000).</p>
+ <p>In this example PigStorage stores the contents of X into files with fields that are delimited with an asterisk ( * ). The STORE statement specifies that the files will be located in a directory named output and that the files will be named part-nnnnn (for example, part-00000).</p>
<source>
STORE X INTO 'output' USING PigStorage('*');
</source>
@@ -1708,8 +1708,8 @@ STORE A INTO 'hbase://users_table' USING
<section id="math-functions">
<title>Math Functions</title>
-<p>For general information about these functions, see the <a href="http://download.oracle.com/javase/6/docs/api/">Java API Specification</a>,
-<a href="http://download.oracle.com/javase/6/docs/api/java/lang/Math.html">Class Math</a>. Note the following:</p>
+<p>For general information about these functions, see the <a href="http://docs.oracle.com/javase/6/docs/api/">Java API Specification</a>,
+<a href="http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html">Class Math</a>. Note the following:</p>
<ul>
<li>
@@ -2464,7 +2464,7 @@ Use the ROUND function to return the val
<p>x</p>
</td>
<td>
- <p>CEIL(x)</p>
+ <p>ROUND(x)</p>
</td>
</tr>
@@ -2746,8 +2746,8 @@ Use the TANH function to return the hype
<section id="string-functions">
<title>String Functions</title>
-<p>For general information about these functions, see the <a href="http://download.oracle.com/javase/6/docs/api/">Java API Specification</a>,
-<a href="http://download.oracle.com/javase/6/docs/api/java/lang/String.html">Class String</a>. Note the following:</p>
+<p>For general information about these functions, see the <a href="http://docs.oracle.com/javase/6/docs/api/">Java API Specification</a>,
+<a href="http://docs.oracle.com/javase/6/docs/api/java/lang/String.html">Class String</a>. Note the following:</p>
<ul>
<li>
@@ -2821,14 +2821,14 @@ Use the INDEXOF function to determine th
<!-- ======================================================== -->
<section id="last-index-of">
<title>LAST_INDEX_OF</title>
- <p>Returns the index of the last occurrence of a character in a string, searching backward from a start index. </p>
+ <p>Returns the index of the last occurrence of a character in a string, searching backward from the end of the string. </p>
<section>
<title>Syntax</title>
<table>
<tr>
<td>
- <p>LAST_INDEX_OF(expression)</p>
+ <p>LAST_INDEX_OF(string, 'character')</p>
</td>
</tr>
</table>
@@ -2853,22 +2853,13 @@ Use the INDEXOF function to determine th
<p>The character being searched for, in quotes.</p>
</td>
</tr>
- <tr>
- <td>
- <p>startIndex</p>
- </td>
- <td>
- <p>The index from which to begin the backward search.</p>
- <p>The string index begins with zero (0).</p>
- </td>
- </tr>
</table>
</section>
<section>
<title>Usage</title>
<p>
-Use the LAST_INDEX_OF function to determine the index of the last occurrence of a character in a string. The backward search for the character begins at the designated start index.
+Use the LAST_INDEX_OF function to determine the index of the last occurrence of a character in a string. The backward search for the character begins at the end of the string.
</p>
</section>
</section>
@@ -3031,7 +3022,7 @@ REGEX_EXTRACT('192.168.1.5:8020', '(.*):
<table>
<tr>
<td>
- <p>REGEX_EXTRACT (string, regex)</p>
+ <p>REGEX_EXTRACT_ALL (string, regex)</p>
</td>
</tr>
</table>
@@ -3137,7 +3128,7 @@ Use the REPLACE function to replace exis
</p>
<p>
For example, to change "open source software" to "open source wiki" use this statement:
-REPLACE(string,'software','wiki');
+REPLACE(string,'software','wiki')
</p>
<p>
Note that the REPLACE function is internally implemented using
@@ -3189,10 +3180,12 @@ by prefixing them with double backslashe
</tr>
<tr>
<td>
- <p>Limit</p>
+ <p>limit</p>
</td>
<td>
- <p>The number of times the pattern (the compiled representation of the regular expression) is applied.</p>
+ <p>If the value is positive, the pattern (the compiled representation of the regular expression) is applied at most limit-1 times, therefore the value of the argument means the maximum length of the result tuple. The last element of the result tuple will contain all input after the last match.</p>
+ <p>If the value is negative, no limit is applied for the length of the result tuple.</p>
+ <p>If the value is zero, no limit is applied for the length of the result tuple too, and trailing empty strings (if any) will be removed.</p>
</td>
</tr>
</table>
@@ -3392,7 +3385,7 @@ Use the UPPER function to convert all ch
<title>Datetime Functions</title>
<p>
-For general information about datetime type operations, see the <a href="http://download.oracle.com/javase/6/docs/api/">Java API Specification</a>,
+For general information about datetime type operations, see the <a href="http://docs.oracle.com/javase/6/docs/api/">Java API Specification</a>,
<a href="http://docs.oracle.com/javase/6/docs/api/java/util/Date.html">Java Date class</a>, and <a href="http://joda-time.sourceforge.net/apidocs/index.html">JODA DateTime class</a>.
And for the information of ISO date and time formats, please refer to <a href="http://www.w3.org/TR/NOTE-datetime">Date and Time Formats</a>.
</p>
@@ -4580,7 +4573,7 @@ In this example the top 10 occurrences a
<source>
A = LOAD 'data' as (first: chararray, second: chararray);
B = GROUP A BY (first, second);
-C = FOREACH B generate FLATTEN(group), COUNT(*) as count;
+C = FOREACH B generate FLATTEN(group), COUNT(A) as count;
D = GROUP C BY first; // again group by first
topResults = FOREACH D {
result = TOP(10, 2, C); // and retain top 10 occurrences of 'second' in first