You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Guillermo Ortiz <ko...@gmail.com> on 2014/06/26 17:24:28 UTC

Store data in HBase with a MapReduce.

I have a question.
I want to execute an MapReduce and the output of my reduce it's going to
store in HBase.

So, it's a MapReduce with an output which it's going to be stored in HBase.
I can do a Map and use HFileOutputFormat.configureIncrementalLoad(pJob,
table); but, I don't know how I could do it if I have a Reduce as well,,
since the configureIncrementalLoad generates an reduce.

Re: Store data in HBase with a MapReduce.

Posted by Stack <st...@duboce.net>.

On Fri, Jun 27, 2014 at 12:22 AM, Guillermo Ortiz <ko...@gmail.com>
wrote:

> If I have to.... how me reducers I should have??



Depends.  Best if you can have zero.  Otherwise, try default partitioning
and go from there?



> as many as number of
> regions?? I have read about HRegionPartitioner, but it has some
> limitations, and you have to be sure that any region isn't going to split
> while you're putting new data in your table.



It is just looking at region boundaries calculating partitions
http://hbase.apache.org/xref/org/apache/hadoop/hbase/mapreduce/HRegionPartitioner.html#73




> Is it only for performance?
> what could it happen if you put too many data in your table and it splits
> an region with a HRegionPartitioner?
>
>
It'll keep on writing over the split.
St.Ack




>
> 2014-06-26 21:43 GMT+02:00 Stack <st...@duboce.net>:
>
> > Be sure to read http://hbase.apache.org/book.html#d3314e5975 Guillermo
> if
> > you have not already.  Avoid reduce phase if you can.
> >
> > St.Ack
> >
> >
> > On Thu, Jun 26, 2014 at 8:24 AM, Guillermo Ortiz <ko...@gmail.com>
> > wrote:
> >
> > > I have a question.
> > > I want to execute an MapReduce and the output of my reduce it's going
> to
> > > store in HBase.
> > >
> > > So, it's a MapReduce with an output which it's going to be stored in
> > HBase.
> > > I can do a Map and use HFileOutputFormat.configureIncrementalLoad(pJob,
> > > table); but, I don't know how I could do it if I have a Reduce as
> well,,
> > > since the configureIncrementalLoad generates an reduce.
> > >
> >
>

Re: Store data in HBase with a MapReduce.

Posted by Guillermo Ortiz <ko...@gmail.com>.

If I have to.... how me reducers I should have?? as many as number of
regions?? I have read about HRegionPartitioner, but it has some
limitations, and you have to be sure that any region isn't going to split
while you're putting new data in your table. Is it only for performance?
what could it happen if you put too many data in your table and it splits
an region with a HRegionPartitioner?

2014-06-26 21:43 GMT+02:00 Stack <st...@duboce.net>:

> Be sure to read http://hbase.apache.org/book.html#d3314e5975 Guillermo if
> you have not already.  Avoid reduce phase if you can.
>
> St.Ack
>
>
> On Thu, Jun 26, 2014 at 8:24 AM, Guillermo Ortiz <ko...@gmail.com>
> wrote:
>
> > I have a question.
> > I want to execute an MapReduce and the output of my reduce it's going to
> > store in HBase.
> >
> > So, it's a MapReduce with an output which it's going to be stored in
> HBase.
> > I can do a Map and use HFileOutputFormat.configureIncrementalLoad(pJob,
> > table); but, I don't know how I could do it if I have a Reduce as well,,
> > since the configureIncrementalLoad generates an reduce.
> >
>

Re: Store data in HBase with a MapReduce.

Posted by Stack <st...@duboce.net>.

Be sure to read http://hbase.apache.org/book.html#d3314e5975 Guillermo if
you have not already.  Avoid reduce phase if you can.

St.Ack

On Thu, Jun 26, 2014 at 8:24 AM, Guillermo Ortiz <ko...@gmail.com>
wrote:

> I have a question.
> I want to execute an MapReduce and the output of my reduce it's going to
> store in HBase.
>
> So, it's a MapReduce with an output which it's going to be stored in HBase.
> I can do a Map and use HFileOutputFormat.configureIncrementalLoad(pJob,
> table); but, I don't know how I could do it if I have a Reduce as well,,
> since the configureIncrementalLoad generates an reduce.
>

Re: Store data in HBase with a MapReduce.

Posted by Wellington Chevreuil <we...@gmail.com>.

Hi Guillermo,

You can use the TableOutputFormat as the output format for your job, then on your reduce, you just need to write Put objects. 

On your driver:

Job job = new Job(conf);
…
job.setOutputFormatClass(TableOutputFormatClass);
job.setReducerClass(AverageReducer.class);
job.setOutputFormatClass(TableOutputFormat.class);
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "table");
job.setOutputKeyClass(ImmutableBytesWritable.class); 
job.setOutputValueClass(Writable.class);
...

On your reducer, just create related puts and write it:

Put put = new Put();
ImmutableBytesWritable key = new ImmutableBytesWritable();
...
context.write(key, put);

Cheers,
Wellington.

On 26 Jun 2014, at 16:24, Guillermo Ortiz <ko...@gmail.com> wrote:

> I have a question.
> I want to execute an MapReduce and the output of my reduce it's going to
> store in HBase.
> 
> So, it's a MapReduce with an output which it's going to be stored in HBase.
> I can do a Map and use HFileOutputFormat.configureIncrementalLoad(pJob,
> table); but, I don't know how I could do it if I have a Reduce as well,,
> since the configureIncrementalLoad generates an reduce.

Re: Store data in HBase with a MapReduce.

Posted by Ted Yu <yu...@gmail.com>.

Depending on the MapOutputValueClass, you can override corresponding
XXXSortReducer so that your custom logic is added.

Cheers

On Thu, Jun 26, 2014 at 8:24 AM, Guillermo Ortiz <ko...@gmail.com>
wrote:

> I have a question.
> I want to execute an MapReduce and the output of my reduce it's going to
> store in HBase.
>
> So, it's a MapReduce with an output which it's going to be stored in HBase.
> I can do a Map and use HFileOutputFormat.configureIncrementalLoad(pJob,
> table); but, I don't know how I could do it if I have a Reduce as well,,
> since the configureIncrementalLoad generates an reduce.
>