You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Doğacan Güney <do...@gmail.com> on 2009/07/07 16:49:21 UTC

MapReduce in hbase 0.20

Hi list,

In current trunk, TableReducer is defined like this:

....
public abstract class TableReducer<KEYIN, VALUEIN>
extends Reducer<KEYIN, VALUEIN, ImmutableBytesWritable, Put>
....

As VALUEOUT is a Put, I guess one can not delete columns (like we could
do with BatchUpdate) using collect(). I can still create Delete-s in #reduce
and
do a table.delete but that seems unintuitive to me. Am I missing something
here
or is this the intended behavior?

Regards,
Doğacan Güney

Re: MapReduce in hbase 0.20

Posted by stack <st...@duboce.net>.
2009/7/7 Doğacan Güney <do...@gmail.com>

> On Tue, Jul 7, 2009 at 19:52, stack <st...@duboce.net> wrote:
>
> Anyway, I guess we are close to 0.20 release. So if this is unnecessarily
> complex, the solution you described works just fine for me.
>
> Yeah, nothing to fancy at this stage.  Please take a look at
https://issues.apache.org/jira/browse/HBASE-1626 if you have a chance.
Input appreciated.

St.Ack

Re: MapReduce in hbase 0.20

Posted by Doğacan Güney <do...@gmail.com>.
On Tue, Jul 7, 2009 at 19:52, stack <st...@duboce.net> wrote:

> 2009/7/7 Doğacan Güney <do...@gmail.com>
>
> > ...
> > > What would you suggest Doğacan?  Maybe we should add Marker interfaces
> to
> > > Put and Delete and then change TableReducer to take the Marker?
> > >
> >
> > Sure, that's a good idea.
> >
> > I haven't studied hadoop 0.20's API much yet so I am not sure if this can
> > be
> > done but can hbase have its own ReduceContext class? If this is possible,
> > then maybe we can just expose the HTable instance through the context and
> > allow user to do whatever he wants to do on the table (and throw an
> > exception if context.write is called) . I think this would be much more
> > simpler to understand than the write/collect() calls (e.g
> TableOutputFormat
> > ignores the collect-ed keys). Does this make sense?
>
>
> This is an interesting idea.  Sketch more how it work.  Currently the
> HTable
> is made in the TableOutputWriter.  Would we then change the Reducer input
> so
> it took any Writable rather than IBW and Put?
>

Yes, so TableReducer would look something like this:

class TableReducer<K extends WritableComparable<?>, V extends Writable>
extends Reducer<K, V, WritableComparable, Writable> { /* ..... */ };

And in general, you would write your reduce method like this:

void reduce(K, V, HbaseContext context) {
    Put put = new Put(row);
    // ...........
    // add stuff to put
    // ...........
    Delete delete = new Delete(row);
    // ......
    // add stuff to delete
    // .......

    context.putToTable(put);
    context.deleteFromTable(delete);
    // and context.write may raise an exception
}

Anyway, I guess we are close to 0.20 release. So if this is unnecessarily
complex, the solution you described works just fine for me.


>
> St.Ack
>



-- 
Doğacan Güney

Re: MapReduce in hbase 0.20

Posted by stack <st...@duboce.net>.
2009/7/7 Doğacan Güney <do...@gmail.com>

> ...
> > What would you suggest Doğacan?  Maybe we should add Marker interfaces to
> > Put and Delete and then change TableReducer to take the Marker?
> >
>
> Sure, that's a good idea.
>
> I haven't studied hadoop 0.20's API much yet so I am not sure if this can
> be
> done but can hbase have its own ReduceContext class? If this is possible,
> then maybe we can just expose the HTable instance through the context and
> allow user to do whatever he wants to do on the table (and throw an
> exception if context.write is called) . I think this would be much more
> simpler to understand than the write/collect() calls (e.g TableOutputFormat
> ignores the collect-ed keys). Does this make sense?


This is an interesting idea.  Sketch more how it work.  Currently the HTable
is made in the TableOutputWriter.  Would we then change the Reducer input so
it took any Writable rather than IBW and Put?

St.Ack

Re: MapReduce in hbase 0.20

Posted by Doğacan Güney <do...@gmail.com>.
On Tue, Jul 7, 2009 at 18:57, stack <st...@duboce.net> wrote:

> 2009/7/7 Doğacan Güney <do...@gmail.com>
>
> > Hi list,
> >
> > In current trunk, TableReducer is defined like this:
> >
> > ....
> > public abstract class TableReducer<KEYIN, VALUEIN>
> > extends Reducer<KEYIN, VALUEIN, ImmutableBytesWritable, Put>
> > ....
> >
> > As VALUEOUT is a Put, I guess one can not delete columns (like we could
> > do with BatchUpdate) using collect(). I can still create Delete-s in
> > #reduce
> > and
> > do a table.delete but that seems unintuitive to me. Am I missing
> something
> > here
> > or is this the intended behavior?
>
>
>
> Thats intended behavior for that class.  Put and Delete do not share common
> ancestor other than Writable so its a little awkward.
>
> What would you suggest Doğacan?  Maybe we should add Marker interfaces to
> Put and Delete and then change TableReducer to take the Marker?
>

Sure, that's a good idea.

I haven't studied hadoop 0.20's API much yet so I am not sure if this can be
done but can hbase have its own ReduceContext class? If this is possible,
then maybe we can just expose the HTable instance through the context and
allow user to do whatever he wants to do on the table (and throw an
exception if context.write is called) . I think this would be much more
simpler to understand than the write/collect() calls (e.g TableOutputFormat
ignores the collect-ed keys). Does this make sense?


>
> Now is a good time to bring this up before it gets set in stone by the
> 0.20.0 release.
>
> Thanks for looking at this.
>

No problem :) Hbase 0.20 is shaping up to be really awesome, btw :)


>
> St.Ack
>



-- 
Doğacan Güney

Re: MapReduce in hbase 0.20

Posted by stack <st...@duboce.net>.
2009/7/7 Doğacan Güney <do...@gmail.com>

> Hi list,
>
> In current trunk, TableReducer is defined like this:
>
> ....
> public abstract class TableReducer<KEYIN, VALUEIN>
> extends Reducer<KEYIN, VALUEIN, ImmutableBytesWritable, Put>
> ....
>
> As VALUEOUT is a Put, I guess one can not delete columns (like we could
> do with BatchUpdate) using collect(). I can still create Delete-s in
> #reduce
> and
> do a table.delete but that seems unintuitive to me. Am I missing something
> here
> or is this the intended behavior?



Thats intended behavior for that class.  Put and Delete do not share common
ancestor other than Writable so its a little awkward.

What would you suggest Doğacan?  Maybe we should add Marker interfaces to
Put and Delete and then change TableReducer to take the Marker?

Now is a good time to bring this up before it gets set in stone by the
0.20.0 release.

Thanks for looking at this.

St.Ack