You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Marc Harris <mh...@jumptap.com> on 2008/02/01 05:52:50 UTC

Deleting hbase rows in a reduce

I notice from HADOOP-2508 and HADOOP-2611 that the problem of deleting
rows inside a reduce is already being considered, with an improvement
planned for 0.17, scheduled for release in April.

Is there any way to delete rows in a reduce in the meantime, using
functionality in 0.15.3?

Also, I was a little confused by the discussion in HADOOP-2611. There
was mention that BatchUpdate and RowMutation were essentially the same.
However I don't see what is the difference between either of them and
calling methods (put, delete, etc.) directly on HTable. Is it just that
there is a desire to use the same interface from within a MapReduce job
as from within an external client? What am I missing?

Thanks,
- Marc

Re: Deleting hbase rows in a reduce

Posted by edward yoon <ed...@udanax.org>.
> You got it.  Changing TOF so it takes a modified BatchUpdate, one that
> allows specification of timestamp, will make it so an MR job can do
> deletes, puts and specify timestamps.

Should HADOOP-2508 be closed?

On 2/1/08, stack <st...@duboce.net> wrote:
> Marc Harris wrote:
> > I notice from HADOOP-2508 and HADOOP-2611 that the problem of deleting
> > rows inside a reduce is already being considered, with an improvement
> > planned for 0.17, scheduled for release in April.
> >
> > Is there any way to delete rows in a reduce in the meantime, using
> > functionality in 0.15.3?
> >
>
> Marc, there is a candidate for 0.16.0 release:
> http://people.apache.org/~nigel/hadoop-0.16.0-candidate-0/.  Use that
> instead of 0.15.3 if you can.
>
> It doesn't look like it, not without overriding TableOutputFormat.
>
> > Also, I was a little confused by the discussion in HADOOP-2611. There
> > was mention that BatchUpdate and RowMutation were essentially the same.
> > However I don't see what is the difference between either of them and
> > calling methods (put, delete, etc.) directly on HTable. Is it just that
> > there is a desire to use the same interface from within a MapReduce job
> > as from within an external client? What am I missing?
> >
>
> You got it.  Changing TOF so it takes a modified BatchUpdate, one that
> allows specification of timestamp, will make it so an MR job can do
> deletes, puts and specify timestamps.
>
> St.Ack
>


-- 
B. Regards,
Edward yoon @ NHN, corp.

Re: Deleting hbase rows in a reduce

Posted by stack <st...@duboce.net>.
Marc Harris wrote:
> I notice from HADOOP-2508 and HADOOP-2611 that the problem of deleting
> rows inside a reduce is already being considered, with an improvement
> planned for 0.17, scheduled for release in April.
>
> Is there any way to delete rows in a reduce in the meantime, using
> functionality in 0.15.3?
>   

Marc, there is a candidate for 0.16.0 release: 
http://people.apache.org/~nigel/hadoop-0.16.0-candidate-0/.  Use that 
instead of 0.15.3 if you can.

It doesn't look like it, not without overriding TableOutputFormat.

> Also, I was a little confused by the discussion in HADOOP-2611. There
> was mention that BatchUpdate and RowMutation were essentially the same.
> However I don't see what is the difference between either of them and
> calling methods (put, delete, etc.) directly on HTable. Is it just that
> there is a desire to use the same interface from within a MapReduce job
> as from within an external client? What am I missing?
>   

You got it.  Changing TOF so it takes a modified BatchUpdate, one that 
allows specification of timestamp, will make it so an MR job can do 
deletes, puts and specify timestamps.

St.Ack