You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by schnitzi <ma...@fastsearch.com> on 2009/06/25 05:45:16 UTC

Transactions and Map/Reduce

I have some scenarios involving map/reduce that update HBase tables.  (They
all involve creating or modifying a number of individual records, so I'm
planning to do it all in the mapper.)

What I'm hoping to do is to only commit at the end of the map/reduce, if it
succeeds.  If I start a TransactionManager transaction in the mapper and
commit it there, in my understanding, it will only guarantee atomicity for
the set of rows that that mapper updates.

Has anyone attempted anything like this before?  Is there a way to do it?

Also, is there any general documentation about transactions in HBase besides
the javadoc and the test cases?  There doesn't seem to be any on the wiki... 
I'm muddling through without it but it would be nice to see some.


Cheers
Mark
-- 
View this message in context: http://www.nabble.com/Transactions-and-Map-Reduce-tp24196501p24196501.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Transactions and Map/Reduce

Posted by schnitzi <ma...@fastsearch.com>.
Thanks for the quick reply.  My alternate approach is to just let the mappers
do their writes, and to retrieve values based on the timestamp of the last
*successful* run, which should effectively filter out the results of failed
runs.  It sounds like it may be my only option.

I suppose it might also be possible to generate the output to a separate
table, and then merge it back into the main table only if the m/r finishes
successfully, but that sounds a lot hairier...


Thanks
Mark


Clint Morgan-3 wrote:
> 
> Unfortunately, there is no more documentation on transactions.
> 
> I've never used transactions in M/Rs. You are correct that if you only
> start/commit a transaction in mapper then you will only get atomicity
> across
> the individual map.
> 
> One thing to keep in mind is that, in the current impl, all the writes of
> a
> pending transaction are keep in memory. A big M/R job could quickly blow
> up
> with an OOME.
> 
> -clint
> 
> On Wed, Jun 24, 2009 at 8:45 PM, schnitzi
> <ma...@fastsearch.com>wrote:
> 
>>
>> I have some scenarios involving map/reduce that update HBase tables. 
>> (They
>> all involve creating or modifying a number of individual records, so I'm
>> planning to do it all in the mapper.)
>>
>> What I'm hoping to do is to only commit at the end of the map/reduce, if
>> it
>> succeeds.  If I start a TransactionManager transaction in the mapper and
>> commit it there, in my understanding, it will only guarantee atomicity
>> for
>> the set of rows that that mapper updates.
>>
>> Has anyone attempted anything like this before?  Is there a way to do it?
>>
>> Also, is there any general documentation about transactions in HBase
>> besides
>> the javadoc and the test cases?  There doesn't seem to be any on the
>> wiki...
>> I'm muddling through without it but it would be nice to see some.
>>
>>
>> Cheers
>> Mark
>> --
>> View this message in context:
>> http://www.nabble.com/Transactions-and-Map-Reduce-tp24196501p24196501.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Transactions-and-Map-Reduce-tp24196501p24213337.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Transactions and Map/Reduce

Posted by Clint Morgan <cl...@troove.net>.
Unfortunately, there is no more documentation on transactions.

I've never used transactions in M/Rs. You are correct that if you only
start/commit a transaction in mapper then you will only get atomicity across
the individual map.

One thing to keep in mind is that, in the current impl, all the writes of a
pending transaction are keep in memory. A big M/R job could quickly blow up
with an OOME.

-clint

On Wed, Jun 24, 2009 at 8:45 PM, schnitzi <ma...@fastsearch.com>wrote:

>
> I have some scenarios involving map/reduce that update HBase tables.  (They
> all involve creating or modifying a number of individual records, so I'm
> planning to do it all in the mapper.)
>
> What I'm hoping to do is to only commit at the end of the map/reduce, if it
> succeeds.  If I start a TransactionManager transaction in the mapper and
> commit it there, in my understanding, it will only guarantee atomicity for
> the set of rows that that mapper updates.
>
> Has anyone attempted anything like this before?  Is there a way to do it?
>
> Also, is there any general documentation about transactions in HBase
> besides
> the javadoc and the test cases?  There doesn't seem to be any on the
> wiki...
> I'm muddling through without it but it would be nice to see some.
>
>
> Cheers
> Mark
> --
> View this message in context:
> http://www.nabble.com/Transactions-and-Map-Reduce-tp24196501p24196501.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>