You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by smallufo <sm...@gmail.com> on 2008/06/03 13:56:54 UTC
Input Data from DB or Memory rather than HDFS
Hi
I hava a question , what if my data is not originally located from HDFS.
What if my data come from DB or memory ?
I should implement a DatabaseInputFormat implements InputFormat<int rowIndex
, MyData value> , right ?
But , how to implement the getSplits() , and getRecordReader() ?
I looks into the sample source code for a long time , but still don't know
how to "split" the data.
Is there any example code demonstrating data not come from DB or objects in
memory ?
Thanks a lot .
Re: Input Data from DB or Memory rather than HDFS
Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Jun 3, 2008, at 4:56 AM, smallufo wrote:
> What if my data come from DB or memory ?
> I should implement a DatabaseInputFormat implements InputFormat<int
> rowIndex
> , MyData value> , right ?
Yes
> But , how to implement the getSplits() , and getRecordReader() ?
> I looks into the sample source code for a long time , but still
> don't know
> how to "split" the data.
For most tables, I would choose key ranges for the splits. For
example, if your primary key was name, choose split points that
divide the table into roughly equal parts.
name < 'b' -> mapper 0
'b' <= name < 'c' -> mapper 1
or whatever makes sense for your data.
>
> Is there any example code demonstrating data not come from DB or
> objects in
> memory ?
Take a look at the hbase table splitter:
http://tinyurl.com/48s76f
-- Owen