You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@crunch.apache.org by bu...@apache.org on 2014/06/03 17:43:41 UTC

svn commit: r911105 - in /websites/staging/crunch/trunk/content: ./ user-guide.html

Author: buildbot
Date: Tue Jun  3 15:43:41 2014
New Revision: 911105

Log:
Staging update by buildbot for crunch

Modified:
    websites/staging/crunch/trunk/content/   (props changed)
    websites/staging/crunch/trunk/content/user-guide.html

Propchange: websites/staging/crunch/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue Jun  3 15:43:41 2014
@@ -1 +1 @@
-1589941
+1599620

Modified: websites/staging/crunch/trunk/content/user-guide.html
==============================================================================
--- websites/staging/crunch/trunk/content/user-guide.html (original)
+++ websites/staging/crunch/trunk/content/user-guide.html Tue Jun  3 15:43:41 2014
@@ -463,8 +463,8 @@ framework won't kill it,</li>
 <li><code>setStatus(String status)</code> and <code>getStatus</code> for setting and retrieving task status information, and</li>
 <li><code>getTaskAttemptID()</code> for accessing the current <code>TaskAttemptID</code> information.</li>
 </ul>
-<p>DoFns also have a number of helper methods for working with <a href="http://codingwiththomas.blogspot.com/2011/04/controlling-hadoop-job-recursion.html">Hadoop Counters</a>, all named <code>increment</code>. Counters are an incredibly useful way of keeping track of the state of long running data pipelines and detecting any exceptional conditions that
-occur during processing, and they are supported in both the MapReduce-based and in-memory Crunch pipeline contexts. You can retrive the value of the Counters
+<p>DoFns also have a number of helper methods for working with <a href="http://codingwiththomas.blogspot.com/2011/04/controlling-hadoop-job-recursion.html">Hadoop Counters</a>, all named <code>increment</code>. Counters are an incredibly useful way of keeping track of the state of long-running data pipelines and detecting any exceptional conditions that
+occur during processing, and they are supported in both the MapReduce-based and in-memory Crunch pipeline contexts. You can retrieve the value of the Counters
 in your client code at the end of a MapReduce pipeline by getting them from the <a href="apidocs/0.9.0/org/apache/crunch/PipelineResult.StageResult.html">StageResult</a>
 objects returned by Crunch at the end of a run.</p>
 <ul>
@@ -474,7 +474,7 @@ objects returned by Crunch at the end of
 <li><code>increment(Enum&lt;?&gt; counterName, long value)</code> increments the value of the given counter by the given value.</li>
 </ul>
 <p>(Note that there was a change in the Counters API from Hadoop 1.0 to Hadoop 2.0, and thus we do not recommend that you work with the
-Counter classes directly in yoru Crunch pipelines (the two <code>getCounter</code> methods that were defined in DoFn are both deprecated) so that you will not be
+Counter classes directly in your Crunch pipelines (the two <code>getCounter</code> methods that were defined in DoFn are both deprecated) so that you will not be
 required to recompile your job jars when you move from a Hadoop 1.0 cluster to a Hadoop 2.0 cluster.)</p>
 <p><a name="doplan"></a></p>
 <h3 id="configuring-the-crunch-planner-and-mapreduce-jobs-with-dofns">Configuring the Crunch Planner and MapReduce Jobs with DoFns</h3>
@@ -562,7 +562,7 @@ PTypes that can be constructed out of ot
 call on a PCollection will be a PTable instead of a PCollection, and only the PTable interface has the groupByKey method that
 can be used to kick off a shuffle on the cluster.</p>
 <pre>
-  public static class InidicatorFn&lt;T&gt; extends MapFn&lt;T, Pair&lt;T, Boolean&gt;&gt; {
+  public static class IndicatorFn&lt;T&gt; extends MapFn&lt;T, Pair&lt;T, Boolean&gt;&gt; {
     public Pair&lt;T, Boolean&gt; map(T input) { ... }
   }
 
@@ -1129,7 +1129,7 @@ in the Apache Pig book.</p>
 Crunch APIs have a number of utilities for performing fully distributed sorts as well as
 more advanced patterns like secondary sorts.</p>
 <p><a name="stdsort"></a></p>
-<h4 id="standard-and-reveserse-sorting">Standard and Reveserse Sorting</h4>
+<h4 id="standard-and-reverse-sorting">Standard and Reverse Sorting</h4>
 <p>The <a href="apidocs/0.9.0/org/apache/crunch/lib/Sort.html">Sort</a> API methods contain utility functions
 for sorting the contents of PCollections and PTables whose contents implement the <code>Comparable</code>
 interface. By default, MapReduce does not perform total sorts on its keys during a shuffle; instead