You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Lars George (JIRA)" <ji...@apache.org> on 2009/07/08 15:37:14 UTC

[jira] Commented: (HBASE-1626) Allow emitting Deletes out of new TableReducer

    [ https://issues.apache.org/jira/browse/HBASE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728695#action_12728695 ] 

Lars George commented on HBASE-1626:
------------------------------------

Seems like there are two issues buried here, one is to be able to "generalize" the class that is handed into the reduce phase. The other is how to access a table. For the latter - correct me if I am wrong Doğacan - you seem to have tackled the wrong end of the stick. Instead of extending TableReducer and make use of a table in the IdentityTableReducer you leave that as is and simply add a custom TableReducer that creates the the table in the "setup()" method, does the put's etc. in the "reduce()" call and closes/flushes in the "cleanup()" method.

In other words you do not need to do anything but create a simple job that uses IdentityTableReducer together with TableOutputFormat - which takes care of the table.put(). As long as I do not miss anything else that is pretty much what you are doing. Use the TableMapReduceUtil class to set up the job and also the name of the table etc.

The crucial part is abstracting the type of the class the reducer actually receives, so instead of assuming a Put it should be a Delete as well if possible. I think Stack has that down 100% in his patch. So his patch together with using the above classes you are fine. 

Question for Stack
{code}
+      if (value instanceof Put) this.table.put(new Put((Put)value));
+      else if (value instanceof Delete) this.table.delete(new Delete((Delete)value));
{code}

why doing that and not 

{code}
+      if (value instanceof Put) this.table.put((Put) value);
+      else if (value instanceof Delete) this.table.delete((Delete) value);
{code}

Just wondering if there is a reason to create a new object. Are the cached in the framework and the object reference causes them to be modified before written? They are already written to an intermediate during the map/reduce cross over so they are already copies. 

> Allow emitting Deletes out of new TableReducer
> ----------------------------------------------
>
>                 Key: HBASE-1626
>                 URL: https://issues.apache.org/jira/browse/HBASE-1626
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Lars George
>             Fix For: 0.20.0
>
>         Attachments: deletes.patch, table-reduce.patch
>
>
> Doğacan Güney (nutch) wants to emit Delete from TableReduce.  Currently we only do Put.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.