You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hama.apache.org by Apache Wiki <wi...@apache.org> on 2012/02/24 17:10:12 UTC

[Hama Wiki] Update of "BSPModel" by thomasjungblut

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "BSPModel" page has been changed by thomasjungblut:
http://wiki.apache.org/hama/BSPModel?action=diff&rev1=14&rev2=15

  
  There are also {{{setup()}}} and {{{cleanup()}}} which will be called at the beginning of your computation, respectively at the end of the computation.
  
- {{{cleanup()}}} is '''guranteed''' to run after the computation or in case of failure.
+ {{{cleanup()}}} is '''guranteed''' to run after the computation or in case of failure. (In 0.4.0 it is actually not, we expect this to be fixed in 0.5.0).
  
  You can simply override the functions you need from BSP class.
  
@@ -35, +35 @@

  
  Since Hama 0.4.0 we provide a input and output system for BSP Jobs.
  
+ We choose the key/value model from Hadoop, since we want to provide a conherent API to widely used products like Hadoop MapReduce (SequenceFiles) and HBase (Column-storage). 
- TODO: Some blahblah about key value and stuff
- What's in case when no input is configured? and stuff like that should be documented here..
- 
  
  == Input ==
  
@@ -111, +109 @@

  
  === Custom Inputformat ===
  
- You can implement your own inputformat blabla
+ You can implement your own inputformat. It is similar to Hadoop MapReduce's input formats, so you can use existing literature to get into it.
  
  == Output ==
  
  === Configuring Output ===
  
+ Like the input, you can configure the output while setting up your BSPJob.
+ 
+ 
+ {{{
+     job.setOutputKeyClass(Text.class);
+     job.setOutputValueClass(DoubleWritable.class);
+ 
+     job.setOutputFormat(TextOutputFormat.class);
+ 
+     FileOutputFormat.setOutputPath(job, TMP_OUTPUT);
+ }}}
+ 
+ As you can see there are 3 major sections.
+ 
+ The first section is about setting the classes for output key and output value.
+ 
+ The second section is about setting the format of your output. In this case this is TextOutputFormat, it outputs key separated by tabstops ('\t') from the value. Each record (key+value) is separated by a newline ('\n').
+ 
+ The third and last section is about setting the path where your output should go. 
+ You can use the static method in your choosen Outputformat as well as the convenience method in BSPJob:
+ 
+ {{{
+  job.setOutputPath(new Path("/tmp/out"));
+ }}}
+ 
+ If you don't provide output, no output folder or collector will be allocated.
+ 
- === Using Input ===
+ === Using Output ===
+ 
+ From your BSP, you can output like this:
+ 
+ {{{
+  @Override
+  public void bsp(
+         BSPPeer<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> peer)
+         throws IOException, SyncException, InterruptedException {
+ 
+      peer.write(new Text("Estimated value of PI is"), new DoubleWritable(3.14));
+ 
+  }
+ }}}
+ 
+ Note that you can always output, even from Setup or Cleanup methods!
  
  === Custom Outputformat ===
+ 
+ You can implement your own outputformat. It is similar to Hadoop MapReduce's output formats, so you can use existing literature to get into it.
  
  == Implementation notes ==
  
@@ -198, +240 @@

  
  = Counters =
  
+ Just like in Hadoop MapReduce you can use Counters.
+ 
+ Counters are basically enums that you can only increment. You can use them to track meaningful metrics in your code, e.G. how often a loop has been executed.
+ 
+ From your BSP code you can use counters like this:
+ 
+ {{{
+     // enum definition
+     enum LoopCounter{
+       LOOPS
+     }
+ 
+     @Override
+     public void bsp(
+         BSPPeer<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> peer)
+         throws IOException, SyncException, InterruptedException {
+       for (int i = 0; i < iterations; i++) {
+         // details ommitted
+         peer.getCounter(LoopCounter.LOOPS).increment(1L);
+       }
+       // rest ommitted
+     }
+ }}}
+ 
+ Counters are in 0.4.0 not usable for flow controls, since they are not synced during sync phase. Watch [[https://issues.apache.org/jira/browse/HAMA-515|HAMA-515]] for details.
+ 
  == Setup and Cleanup ==
  
  == Combiners ==
@@ -244, +312 @@

  
  }}}
  
- == Job Configuration and Submission ==
- 
- TODO:
-