You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jan Lukavsky (JIRA)" <ji...@apache.org> on 2011/08/29 18:20:37 UTC

[jira] [Commented] (HBASE-3578) TableInputFormat does not setup the configuration for HBase mapreduce jobs correctly

    [ https://issues.apache.org/jira/browse/HBASE-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092950#comment-13092950 ] 

Jan Lukavsky commented on HBASE-3578:
-------------------------------------

Hi,

I think solution to this issue causes problems when job wants to change hbase specific options. Eg.

{noformat}
Configuration conf = HBaseConfiguration.create();

// change keyvalue size
conf.setInt("hbase.client.keyvalue.maxsize", 20971520);

Job job = new Job(conf, ...);

TableMapReduceUtil.initTableMapperJob(...);

// the job doesn't have the option changed, uses it from hbase-site or hbase-default
job.submit();

{noformat}

Although in this case it could be fixed by moving the set() after initTableMapperJob(), in case where user want's to change some option using GenericOptionsParser and -D this is impossible, making this cool feature useless.

In the 0.20.x era this code behaved as expected. The solution of this problem should be that we don't overwrite the options, but just read them if they are missing. I attached patch I think will fix this.




> TableInputFormat does not setup the configuration for HBase mapreduce jobs correctly
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-3578
>                 URL: https://issues.apache.org/jira/browse/HBASE-3578
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.0, 0.90.1
>            Reporter: Dan Harvey
>            Assignee: Dan Harvey
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3578.patch, HBASE-3578.patch, mapreduce_configuration.patch
>
>
> In 0.20.x and earlier TableMapReduceUtil (and other Input/OutputFormat classes) used to setup the HTable with a HBaseConfiguration object, now that has been deprecated in #HBASE-2036 they are constructed with Hadoop configuration objects which do not contain the configuration xml file resources required to setup HBase. I think it is currently expected this is done when constructing the job but as this needs to be done for every HBase mapreduce job it would be cleaner if the TableMapReduceUtil class did this whilst setting up the TableInput/OutputFormat classes. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira