You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by smallufo <sm...@gmail.com> on 2008/06/03 13:56:54 UTC

Input Data from DB or Memory rather than HDFS

Hi
I hava a question , what if my data is not originally located from HDFS.
What if my data come from DB or memory ?
I should implement a DatabaseInputFormat implements InputFormat<int rowIndex
, MyData value> , right ?
But , how to implement the getSplits() , and getRecordReader() ?
I looks into the sample source code for a long time , but still don't know
how to "split" the data.

Is there any example code demonstrating data not come from DB or objects in
memory ?

Thanks a lot .

Re: Input Data from DB or Memory rather than HDFS

Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Jun 3, 2008, at 4:56 AM, smallufo wrote:

> What if my data come from DB or memory ?
> I should implement a DatabaseInputFormat implements InputFormat<int  
> rowIndex
> , MyData value> , right ?

Yes

> But , how to implement the getSplits() , and getRecordReader() ?
> I looks into the sample source code for a long time , but still  
> don't know
> how to "split" the data.

For most tables, I would choose key ranges for the splits. For  
example, if your primary key was name, choose split points that  
divide the table into roughly equal parts.

name < 'b' -> mapper 0
'b' <= name < 'c'  -> mapper 1

or whatever makes sense for your data.

>
> Is there any example code demonstrating data not come from DB or  
> objects in
> memory ?

Take a look at the hbase table splitter:

http://tinyurl.com/48s76f

-- Owen