You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2014/08/01 00:09:40 UTC

[jira] [Updated] (OOZIE-1954) Add a way for the MapReduce action to be configured by Java code

     [ https://issues.apache.org/jira/browse/OOZIE-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter updated OOZIE-1954:
---------------------------------

    Attachment: OOZIE-1954.patch

The patch adds an {{OozieActionConfigurator}} class with one method that receives a {{JobConf}} object and can throw an {{OozieActionConfiguratorException}} (slightly different from the original proposal in the Description above).  Implementations can update the {{JobConf}} object as necessary and do whatever they want really; if they need to throw an Exception, they can wrap it in an {{OozieActionConfiguratorException}}.

As I suggested in the Description above, I made this generic enough to work with any action type, but only the MapReduce action is currently using it or allowing it.  I don't think the other actions types really need this feature currently.

I've added unit tests and even a modified "map-reduce" example.  The documentation also explains how to use this feature.

I also tried it out in an actual cluster, including some error cases.

I'll try to get this up on ReviewBoard, but it's not liking the patch for some reason.

> Add a way for the MapReduce action to be configured by Java code
> ----------------------------------------------------------------
>
>                 Key: OOZIE-1954
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1954
>             Project: Oozie
>          Issue Type: New Feature
>    Affects Versions: trunk
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: OOZIE-1954.patch
>
>
> With certain other components (e.g. Avro, HFileOutputFormat (HBase), etc), it becomes impractical to use the MapReduce action and users must instead use the Java action. The problem is that these components require a lot of extra configuration that is often hidden from the user in Java code (e.g. HFileOutputFormat.configureIncrementalLoad(job, table); which can also include decision logic, serialization, and other things that we can't do in an XML file directly.
> One way to solve this problem is to allow the user to give the MR action some Java code that would do this configuration, similar to how we allow the {{<job-xml>}} field to specify an external XML file of configuration properties.
> In more detail, we could have an interface; something like this:
> {code}
> public interface OozieActionConfigurator {
>      public void updateOozieActionConfiguration(Configuration conf);
> }
> {code}
> that the user can implement, create a jar, and include with their MR action (i.e. add a "{{<config-class>}}" field that let's them specify the class name). To protect the Oozie server from running user code (which could do anything it wants really), it would have to be run in the Launcher Job. The Launcher Job could call this method after it loads the configuration prepared by the Oozie server.
> Another thing this will be helpful is with users who use the Java action to launch MR jobs and expect a bunch of things to be done for them that are not (e.g. delegation token propagation, config loading, returning the hadoop job to Oozie, etc). These are all done with the MR action, so the more users we can move to the MR action from the Java action, the less they'll run into these difficulties.
> Some of this may change slightly as I try to actually implement this (e.g. have to handle throwing exceptions etc).  And one thing I may do is keep this general enough that it should be compatible with all action types in case we want to add this to any of them in the future; though for now, the schema would only accept it for the MapReduce action.



--
This message was sent by Atlassian JIRA
(v6.2#6252)