You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Dave Beech (JIRA)" <ji...@apache.org> on 2012/10/22 17:50:12 UTC

[jira] [Created] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Dave Beech created HBASE-7024:
---------------------------------

             Summary: TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
                 Key: HBASE-7024
                 URL: https://issues.apache.org/jira/browse/HBASE-7024
             Project: HBase
          Issue Type: Bug
          Components: mapreduce
            Reporter: Dave Beech
            Priority: Minor


The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 

Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 

The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485287#comment-13485287 ] 

Hudson commented on HBASE-7024:
-------------------------------

Integrated in HBase-TRUNK #3490 (See [https://builds.apache.org/job/HBase-TRUNK/3490/])
    HBASE-7024 TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass (Revision 1402710)

     Result = FAILURE
stack : 
Files : 
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java

                
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "Dave Beech (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Beech updated HBASE-7024:
------------------------------

    Issue Type: Improvement  (was: Bug)
    
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "Dave Beech (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Beech updated HBASE-7024:
------------------------------

    Status: Patch Available  (was: Open)
    
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481663#comment-13481663 ] 

stack commented on HBASE-7024:
------------------------------

Not that I know of.  My guess is that it historical and long since addressed over in Hadoop.
                
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "Dave Beech (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Beech updated HBASE-7024:
------------------------------

    Attachment: HBASE-7024.patch

OK, thanks. Can I propose this patch - it simply removes the "extends Writable/WritableConfigurable" bit from the outputKeyClass and outputValueClass parameters
                
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482268#comment-13482268 ] 

Hadoop QA commented on HBASE-7024:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12550434/HBASE-7024.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 2.0 profile.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 82 warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3124//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3124//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3124//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3124//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3124//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3124//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3124//console

This message is automatically generated.
                
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482283#comment-13482283 ] 

Ted Yu commented on HBASE-7024:
-------------------------------

+1 on patch. 
                
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482723#comment-13482723 ] 

stack commented on HBASE-7024:
------------------------------

What does hadoop do for base/example task classes?  Does this patch work w/ h1 and h2?   The keys and values have to be serializable.  I'd think that there would be something else the  MR framework would expect us to implement so it could do the serializing?  Thanks Dave
                
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485297#comment-13485297 ] 

Hudson commented on HBASE-7024:
-------------------------------

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #240 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/240/])
    HBASE-7024 TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass (Revision 1402710)

     Result = FAILURE
stack : 
Files : 
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java

                
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "Dave Beech (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483096#comment-13483096 ] 

Dave Beech commented on HBASE-7024:
-----------------------------------

Thanks Ted, Stack. 

Stack - you are right that keys and values have to be serializable, but they don't have to be Serializable in the Java interface sense. The Job/JobConf classes in Hadoop accept absolutely any class. Map tasks use Hadoop's SerializationFactory to work out which serializer class to use (WritableSerialization is the default, but you can specify custom ones through the io.serialization job setting, like AvroSerialization)

The point is that Hadoop doesn't care at all what type your map output key and value classes are, so long as you have provided a serializer which works with them. If you haven't, the job dies horribly (no surprise there).

I haven't tested with Hadoop 2 yet, no, but I'd be very surprised if this patch broke anything. If they'd changed this behaviour in Hadoop I'm sure there'd be tons of regression problems with mapreduce jobs that need custom serializers.  

                
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-7024:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.96.0
     Hadoop Flags: Incompatible change,Reviewed
           Status: Resolved  (was: Patch Available)

Thanks Dave for the patch.  Committed to trunk.  Aligns w/ our purging Writables.  Thanks too for looking over hadoop way to see what it does.
                
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and outputValueClass parameters which need to extend WritableComparable and Writable respectively. 
> Because of this, it is not convenient to use an alternative serialization like Avro. (I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira