You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Harsh J (Created) (JIRA)" <ji...@apache.org> on 2012/01/17 23:35:39 UTC

[jira] [Created] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Provide a way to use 'uber' jars with Oozie MR actions
------------------------------------------------------

                 Key: OOZIE-654
                 URL: https://issues.apache.org/jira/browse/OOZIE-654
             Project: Oozie
          Issue Type: Improvement
            Reporter: Harsh J
            Assignee: Harsh J
            Priority: Minor


Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.

Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.

We should have a way in oozie to let users promote one of their jars as uber jars, as an option.

Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.

Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Robert Kanter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443777#comment-13443777 ] 

Robert Kanter commented on OOZIE-654:
-------------------------------------

The test failure is unrelated (looks like OOZIE-971 fixes this).

The 4 lines with trailing spaces are from the Apache license comment block in the two new files (this is how they are all formatted):
{code}
* with the License.  You may obtain a copy of the License at
* 
*      http://www.apache.org/licenses/LICENSE-2.0
* 
* Unless required by applicable law or agreed to in writing, software
{code}
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Harsh J (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192218#comment-13192218 ] 

Harsh J commented on OOZIE-654:
-------------------------------

Can anyone comment on which approach would be more welcome: XML change, or a simple jobconf entry that does it?
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Minor
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Robert Kanter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444504#comment-13444504 ] 

Robert Kanter commented on OOZIE-654:
-------------------------------------

The failed test looks unrelated.
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-654.patch, OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Mohammad Kamrul Islam (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192499#comment-13192499 ] 

Mohammad Kamrul Islam commented on OOZIE-654:
---------------------------------------------

yeap. that will be inconvenient.
what about pass the required info as part of configuration, that oozie will interpret.


                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Minor
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444427#comment-13444427 ] 

Alejandro Abdelnur commented on OOZIE-654:
------------------------------------------

* Unless I'm missing something, if the uberjar property has scheme://HOST:PORT it will not be set, this seems a miss, no?
* the createLauncherConf() method, is always called after the setupActionConf()? the logic of looking for the uberjar in the actionconf seems to imply that. Wouldn't be better to have a private method  that does the lookup/set of the uberjar and call that method from both setupActionConf and createLauncherConf?
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-654.patch, OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Mohammad Kamrul Islam (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192433#comment-13192433 ] 

Mohammad Kamrul Islam commented on OOZIE-654:
---------------------------------------------

I prefer the XML tag option.
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Minor
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444193#comment-13444193 ] 

Alejandro Abdelnur commented on OOZIE-654:
------------------------------------------

* please trim the license notice trailing spaces
* please add a comment in the pom.xml exclusion what versions of Hadoop uber-jars in HDFS are supported (1.2.0+ & 2.2.0+)
* the uberjar should also be set in the launcher config, as the client invocation may need the classes/jars in the uberjar.
* if the uberjar path is relative, it should resolve to the WF app.
* if the uberjar path is absolute (with no authority), it should resolve to the <name-node> (which must always be present, right). If I read the code correctly, I think you are doing this already.
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444452#comment-13444452 ] 

Alejandro Abdelnur commented on OOZIE-654:
------------------------------------------

got it, thx. +1 pending test-patch report.
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-654.patch, OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444492#comment-13444492 ] 

Hadoop QA commented on OOZIE-654:
---------------------------------

Testing JIRA OOZIE-654

Cleaning local svn workspace

{code}
----------------------------

+1 PATCH_APPLIES
   CLEAN cleaned target directories
+1 RAW_PATCH_ANALYSIS
    +1 the patch does not introduce any @author tags
    +1 the patch does not introduce any tabs
    +1 the patch does not introduce any trailing spaces
    +1 the patch does not introduce any line longer than 132
    +1 the patch does adds/modifies 3 testcase(s)
+1 RAT
    +1 the patch does not seem to introduce new RAT warnings
+1 JAVADOC
    +1 the patch does not seem to introduce new Javadoc warnings
+1 COMPILE
    +1 HEAD compiles
    +1 patch compiles
    +1 the patch does not seem to introduce new javac warnings
+1 BACKWARDS_COMPATIBILITY
    +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
    +1 the patch does not modify JPA files
+1 TESTS
   Tests run: 902
   Tests failures: 1
   Tests errors: 0
+1 DISTRO
    +1 distro tarball builds with the patch 

----------------------------
{code}

The full output of the test-patch run is available at

   https://builds.apache.org/job/oozie-trunk-precommit-build/70/
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-654.patch, OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Robert Kanter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter updated OOZIE-654:
--------------------------------

    Attachment: OOZIE-654.patch

New patch addresses all of the issues you pointed out; also added some more tests.  

And I fixed a bug in my patch where it would actually submit the uber jar with a streaming or pipes job; now it will ignore the uber jar and log a warning.  
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-654.patch, OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443770#comment-13443770 ] 

Hadoop QA commented on OOZIE-654:
---------------------------------

Testing JIRA OOZIE-654

Cleaning local svn workspace

{code}
----------------------------

+1 PATCH_APPLIES
   CLEAN cleaned target directories
-1 RAW_PATCH_ANALYSIS
    +1 the patch does not introduce any @author tags
    +1 the patch does not introduce any tabs
    -1 the patch contains 4 line(s) with trailing spaces
    +1 the patch does not introduce any line longer than 132
    +1 the patch does adds/modifies 3 testcase(s)
+1 RAT
    +1 the patch does not seem to introduce new RAT warnings
+1 JAVADOC
    +1 the patch does not seem to introduce new Javadoc warnings
+1 COMPILE
    +1 HEAD compiles
    +1 patch compiles
    +1 the patch does not seem to introduce new javac warnings
+1 BACKWARDS_COMPATIBILITY
    +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
    +1 the patch does not modify JPA files
+1 TESTS
   Tests run: 902
   Tests failures: 1
   Tests errors: 0
+1 DISTRO
    +1 distro tarball builds with the patch 

----------------------------
{code}

The full output of the test-patch run is available at

   https://builds.apache.org/job/oozie-trunk-precommit-build/66/
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J reassigned OOZIE-654:
-----------------------------

    Assignee:     (was: Harsh J)
    
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Priority: Minor
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Robert Kanter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter updated OOZIE-654:
--------------------------------

    Attachment: OOZIE-654.patch

If the version of hadoop that Oozie is using doesn't have MAPREDUCE-4408 then when Oozie tries to use an uber jar, the MapReduce task will fail.  To combat this, I've added a property to oozie-default.xml "oozie.action.mapreduce.uber.jar.enable" (default is false); when disabled, Oozie will not allow a workflow with an uber jar (it will give a "nicer" and more obvious error message than trying to submit the job and having to look through the logs wondering what happened).  

I've modified the previous tests to look at the oozie.action.mapreduce.uber.jar.enable property.  There is an additional test in TestMapReduceActionExecutorUberJar.java which actually submits a job with an uber jar and verifies that it worked correctly; currently, this test is disabled by default (i.e. "mvn test" won't run it) because it fails against the current release of Hadoop (because it doesn't have MAPREDUCE-4408).  However, I have tested it locally against branch-1 and trunk.  
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Robert Kanter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter reassigned OOZIE-654:
-----------------------------------

    Assignee: Robert Kanter
    
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Alejandro Abdelnur (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192460#comment-13192460 ] 

Alejandro Abdelnur commented on OOZIE-654:
------------------------------------------

well, the XML tag option means having to REV the schema.
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Minor
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Robert Kanter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444432#comment-13444432 ] 

Robert Kanter commented on OOZIE-654:
-------------------------------------

- The code that looking at the scheme://HOST:PORT stuff doesn't actually call setJar(); that happens later in the MapReduceMain class.  All this code does is resolve the uber jar url.  And because its writing to the same property that its reading, if the uberjar already has scheme://HOST:PORT, then there's nothing to resolve and we can leave it as is.  
- Yes, the createLauncherConf() method is always called after the setupActionConf() method; this happens in JavaActionExecutor.submitLauncher().  I could do what you said with a private method, but that's going to parse through the <configuration> and <job-xml> twice to do the same thing; wouldn't it be better to only do it once?  
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-654.patch, OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-654) Provide a way to use 'uber' jars with Oozie MR actions

Posted by "Mona Chitnis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444542#comment-13444542 ] 

Mona Chitnis commented on OOZIE-654:
------------------------------------

+1 nice work
                
> Provide a way to use 'uber' jars with Oozie MR actions
> ------------------------------------------------------
>
>                 Key: OOZIE-654
>                 URL: https://issues.apache.org/jira/browse/OOZIE-654
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Harsh J
>            Assignee: Robert Kanter
>            Priority: Minor
>             Fix For: trunk
>
>         Attachments: OOZIE-654.patch, OOZIE-654.patch, OOZIE-654.patch
>
>
> Right now, say if you have a custom MR code in a jar that has a {{lib/}} folder inside which carries more dependent jars (a structure known as 'uber' jars), and you submit the job via a regular 'hadoop jar' command, these lib/*.jars get picked up by the framework because the supplied jar is specified explicitly via conf.setJarByClass or conf.setJar. That is, if this user uber jar goes to the JT as the mapred.jar, then  it is handled by the framework properly and the lib/*.jars are all considered and placed on the classpath.
> Distributed cache jars do not have this effect, and that is cause the MR framework does not consider them as uber jars and does not extract and use their internal lib/ directories.
> We should have a way in oozie to let users promote one of their jars as uber jars, as an option.
> Proposal: Have an optional oozie-prefixed config, or an optional element in the MR action XML, that lets user specify what class should be loaded to be set as setJarByClass(...). This will have to be a class available in the higher level of the uber jar (not under lib/) but can be any class inside the targeted jar really (just not from a jar under lib/). We then set this as jobConf.setJarByClass(loadedCls), and then run the job.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira