You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Pavel Lysov <pa...@gmail.com> on 2008/07/29 18:51:58 UTC

to mapreduce or not to mapreduce?

Hey all!

I'd like to ask you to take a look at the stuff I have and advice is  
it right direction to proceed it with map reduce approach?

There's MESSAGES table, each message has sender and recipient. It  
works nice so far and next I want to get the following info:

Total of messages user X has sent
Total of messages user X has received
Total of messages in the system

It would be USERS table, with USER_ID as row key and with 'messages'  
column family:
   messages:total_sent 345
   messages:total_received 543

Similar to the above, I'd create SYSTEM table with 'messages:total'  
column that will hold the total count of messages.

Next I think I should implement map reduce job that will update  
'messages:total_sent/total_received' for every user by adding one to  
output collector for given user id. Next, in reduce, I'll sum them up  
and update the user's row. Is it good idea to do like that? Could it  
cause any probs if more than two concurrent reduce jobs will try to  
update the same row?

Similar question for SYSTEM table, suppose there a bunch of reduce  
jobs that try to update messages:total column at the same time?

I think table locks would help there but it seems I am missing some  
basis understanding of how that all is supposed to work. Could you  
please advice?

I appreciate your help!
Pavel Lysov
pavlikus@gmail.com