You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Ian Brooks <i....@sensewhere.com> on 2014/03/13 14:48:31 UTC

Streaming a subset of HBase data

Hi,

I'm trying to implement a way of using the hadoop-streaming-2.2.0.jar to export a subset of data ( timerange ) to a mapper and reduce application written in another language. However I have been unable to get anything but all the data from HBase table.

Looking at the code and forums, it seems that as hadoop-streaming doesnt support the new API it isn't possible to give it scan parameters to set the timerange or other filters. I found some classes online (http://cp1985chenpeng.iteye.com/blog/1315076) that implement the funuctionality of the newer API in a say that hadoop-streaming seems to be ok with, but when it gets the the mapreduce.Job part of processing it still just returns the whole table rather than the rows between the timeframe I am specifying.

Is there a known way that I should be able to do this?

-- 
-Ian Brooks