You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/02/03 23:54:30 UTC

[Hadoop Wiki] Update of "Hbase/Cascading" by ChrisWensel

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by ChrisWensel:
http://wiki.apache.org/hadoop/Hbase/Cascading

New page:
[http://www.cascading.org/ Cascading] is an alternative API to Hadoop MapReduce. Under the covers it uses MapReduce during execution, but during development, users don't have to think in MapReduce to create solutions for execution on Hadoop.

Cascading now has support for reading and writing data to and from a HBase cluster.

Detailed information and access to the source code can be found on the [http://www.cascading.org/modules.html Cascading Modules] page.

A simple example (see the github repo for more up-to-date API):

{{{#!java
// read data from the default filesystem
// emits two fields: "offset" and "line"
Tap source = new Hfs( new TextLine(), inputFileLhs );

// store data in a HBase cluster
// accepts fields "num", "lower", and "upper"
// will automatically scope incoming fields to their proper familyname, "left" or "right"
Fields keyFields = new Fields( "num" );
String[] familyNames = {"left", "right"};
Fields[] valueFields = new Fields[] {new Fields( "lower" ), new Fields( "upper" ) };
Tap hBaseTap = new HBaseTap( "multitable", new HBaseScheme( keyFields, familyNames, valueFields ), SinkMode.REPLACE );

// a simple pipe assembly to parse the input into fields
// a real app would likely chain multiple Pipes together for more complex processing
Pipe parsePipe = new Each( "insert", new Fields( "line" ), new RegexSplitter( new Fields( "num", "lower", "upper" ), " " ) );

// "plan" a cluster executable Flow
// this connects the source Tap and hBaseTap (the sink Tap) to the parsePipe
Flow parseFlow = new FlowConnector( properties ).connect( source, hBaseTap, parsePipe );

// start the flow, and block until complete
parseFlow.complete();

// open an iterator on the HBase table we stuffed data into
TupleEntryIterator iterator = parseFlow.openSink();

while(iterator.hasNext())
  {
  // print out each tuple from HBase
  System.out.println( "iterator.next() = " + iterator.next() );
  }

iterator.close();
}}}

Note the "hBaseTap" above can be used as both a sink and a source in a Flow. So another Flow could be created to process data stored in HBase.