You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Abhishek Bafna (JIRA)" <ji...@apache.org> on 2017/01/06 07:32:59 UTC

[jira] [Commented] (OOZIE-2619) Make Hive action defaults to match hive defaults when running from command line

    [ https://issues.apache.org/jira/browse/OOZIE-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803855#comment-15803855 ] 

Abhishek Bafna commented on OOZIE-2619:
---------------------------------------

[~venkatnrangan] Can you please rebase the patch. Thanks.

> Make  Hive action defaults to match hive defaults when running from command line
> --------------------------------------------------------------------------------
>
>                 Key: OOZIE-2619
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2619
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.3.0, 4.2.0
>            Reporter: Venkat Ranganathan
>            Assignee: Venkat Ranganathan
>         Attachments: OOZIE-2619.patch
>
>
> Over a few patches, we have done a few fixes to make Oozie Hive actions easier for users.
> One of them was OOZIE-2051 which allows default hive and tez site xml configs to be added to hive actions automatically by introducing action specific configuration directory under oozie conf/action-conf directory and as a bonus in an Ambari managed cluster the hive site changes done as part of the Hive components are automatically reflected into the oozie hive action defaults.
> But there is one issue pending for Oozie hive actions.
> Oozie Hive jobs launched via hive action  are historically restricted to one reducer by default (and also there are few other in terms of split sizes etc).   Thisvis because of the way Oozie action config management is done and how Hive was determining the reducers.   Hive uses mapreduce.job.reduces to determine if the reducers have to be dynamically determined (when this parameter is initialized to an invalid value -1) or explicitly determined by the users.   In HiveConf, this is internally set to -1 if not in hive-site.xml or in one of the set statements.
> Oozie, when it prepares the action configuration, has the mapreduce.job.reduces set to 1 (from mapred-default).   As part of the hive action, Oozie writes the action configuration prepared (the action.xml) also as hive-site.xml with the value for mapreduce.job.reduces set to 1.
> There are a few ways to overcome this issue, true to Oozie being very
> flexible with lots of options :).  I may be missing a few other
> options here!
> # Explicitly set the mapreduce.job.reduces parameter in the configuration element of the action
>     Every hive workflow configuration has be changed
> #  Add the parameter to a job-xml for the action
>     Once again affects all actions
> #  Set the parameter to disable loading of the default *-site.xml
> files as provided by OOZIE-2205
>    We need to make sure that the  *-site.xml are otherwise available to the containers - either have hadoop conf directory (typically /etc/hadoop/conf) in the mapred framework classpath or explicitly make the files using other mechanisms available (as files, archives, in sharelib ec).   The big issue is that this affects rolling upgrades once you add explicit config dependency
> Unfortunately we can't use the default action config addition introduced in OOZIE-2051 for adding one more configuration file to the oozie hive action conf directory with hive MR defaults.
> The config files under the action-conf/hive/*.xml or action-conf/hive.xml are all merged using the method injectDefaults which only updates the target only if it does not exist in the target configuration map.   In our case, mapreduce.job.reduces already exsits in the action default configuration (coming from mapred-default.xml) and hence does not get overwritten from the action-conf/hive configuration files.
> The fix (essentially one line of code change) is to use the copy method of XConfiguration  to copy the action-default config instead of using the injectDefaults method and then provide the action-default/hive.xm with the required mapred hive parameters with hive expected initial values.
> This patch introduces a change that has potential backward compatibility issues.
> * If the action-conf/<action>.xml currently has entries that were no-ops so far, they can be added to the action configuration.
> * Hive will work as expected when run as an Oozie action without users needing to resort to changes!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)