You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2012/11/27 05:55:58 UTC

[jira] [Created] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Alejandro Abdelnur created OOZIE-1089:
-----------------------------------------

             Summary: DistributedCache workaround for Hadoop 2.0.2-alpha
                 Key: OOZIE-1089
                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
             Project: Oozie
          Issue Type: Bug
          Components: workflow
    Affects Versions: 3.3.0
            Reporter: Alejandro Abdelnur
            Assignee: Alejandro Abdelnur
             Fix For: 3.3.0


As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Roman Shaposhnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505096#comment-13505096 ] 

Roman Shaposhnik commented on OOZIE-1089:
-----------------------------------------

[~rohini] The policy we have in Bigtop is to test RCs with stable components. We don't test RC-to-RC or trunk-to-trunk since it presents too much of a moving target. Hadoop 2.0.2 was released in mid. October -- that's when we switched. So it tooks us a couple more weeks to start testing Oozie's RC and a week more or so to file an issue. Given a 100% volunteer nature of Bigtop this strikes me as rather expedient scale of testing (yes I'd love to have Oozie developers monitoring my Oozie test runs in Bigtop so that issues could be identified sooner).

It would be very nice if somebody can volunteer testing trunk-to-trunk or RC-to-RC -- Bigtop has all the mechanisms in place, but we need eyeballs.
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504702#comment-13504702 ] 

Alejandro Abdelnur commented on OOZIE-1089:
-------------------------------------------

Mohammad, 

Oozie does not add any duplicate entry in the DC. 

The problem resides on how Yarn handles distributed cache and a duplicate check introduced in MRApps.

Let me explain here again MAPREDUCE-4820 using Oozie terminology:

* Oozie ActionExecutor creates the jobconfs for both the launcher job and the action job
* Both jobconfs are configured with the corresponding DistributedCache entries
* The DistributedCached entries are identical in both
* The DistributedCached entries are required for both (for the launcher to submit the action job, for the action job to run)
* Because the way YARN works (and this changed from Hadoop 1), all JARs in the distributed cache are symlinked to the task running directory.
* Because the way MRApps works (for job submission), in injects to the distributed cache all JARs in the current directory and in the lib/ directory.
* Because the launcher job runs MRApps again (to submit the action job), the duplication happens between the entries in the distributed cache and in the task current directory.

The workaround flushes the action jobconf distributed cache entries (rightfully assuming in the case of Hadoop 2) that they'll be in the current dir of the launcher task, thus added to the distributed cache of the action jobconf implicitly.

Because of this, there is nothing to be done by Oozie other than the workaround.

I think the correct fix for MAPREDUCE-4820 is to dedup instead of fail, I'll be posting a patch momentarily, but until a Hadoop 2 release including the fix is released we need the workaround.

Hope this explains things clearly.

                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated OOZIE-1089:
--------------------------------------

    Attachment:     (was: OOZIE-1089-trunk.patch)
    
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504778#comment-13504778 ] 

Rohini Palaniswamy commented on OOZIE-1089:
-------------------------------------------

+1 Nonbinding.
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504914#comment-13504914 ] 

Alejandro Abdelnur commented on OOZIE-1089:
-------------------------------------------

Mohammad, on your clarification question, yes, only for logging. Hadoop 2.0.2-alpha is 'alpha', so it is OK if certain things don't work, without this patch nothing would work.

Rohini, agree on the TTL for this, that is why I've opened OOZIE-1090 at the same time I've created this JIRA.
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Roman Shaposhnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505066#comment-13505066 ] 

Roman Shaposhnik commented on OOZIE-1089:
-----------------------------------------

[~kamrul]
bq. My concern was to support hadoop 2.x for 3.3. The reasons are : Oozie was not tested against 2.x. I heard pig streaming will not also work for some unrelated issue. Moreover, not sure how stable is hadoop 2.x alpha.

The combination of Oozie 3.3.0 and Hadoop 2.0.2-alpha has been tested quite a bit in Bigtop on 9 different Linux distributions. Even though Bigtop efforts, for some reason, tend to not be that much appreciated by project developers -- this is way more integration testing than most other profiles in Oozie get.

                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Mohammad Kamrul Islam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504873#comment-13504873 ] 

Mohammad Kamrul Islam commented on OOZIE-1089:
----------------------------------------------

I agree about your use-case. I was talking about different use-case that our Qe found earlier.
Consider , I have a pig.jar in three places:  wf/lib, user/lib, share/lib. All are the same file name. Will DC allow to add duplicated jar file names?



                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505085#comment-13505085 ] 

Rohini Palaniswamy commented on OOZIE-1089:
-------------------------------------------

[~rvs],
   
bq. The combination of Oozie 3.3.0 and Hadoop 2.0.2-alpha has been tested quite a bit in Bigtop on 9 different Linux distributions.
   I don't get it. MAPREDUCE-4503 went in during Aug. Without this patch, Oozie 3.3.0 could not have been run against 2.0.2-alpha before.  How could you have possibly tested it?
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504870#comment-13504870 ] 

Hadoop QA commented on OOZIE-1089:
----------------------------------

Testing JIRA OOZIE-1089

Cleaning local svn workspace

----------------------------

{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.    {color:green}+1{color} the patch does not introduce any @author tags
.    {color:green}+1{color} the patch does not introduce any tabs
.    {color:green}+1{color} the patch does not introduce any trailing spaces
.    {color:red}-1{color} the patch contains 1 line(s) longer than 132 characters
.    {color:green}+1{color} the patch does adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.    {color:green}+1{color} the patch does not seem to introduce new RAT warnings
{color:green}+1 JAVADOC{color}
.    {color:green}+1{color} the patch does not seem to introduce new Javadoc warnings
{color:green}+1 COMPILE{color}
.    {color:green}+1{color} HEAD compiles
.    {color:green}+1{color} patch compiles
.    {color:green}+1{color} the patch does not seem to introduce new javac warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.    {color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
.    {color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.    Tests run: 922
.    Tests failed: 0
.    Tests errors: 1

.    The patch failed the following testcases:

.      

{color:green}+1 DISTRO{color}
.    {color:green}+1{color} distro tarball builds with the patch 

----------------------------
{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/206/
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504911#comment-13504911 ] 

Rohini Palaniswamy commented on OOZIE-1089:
-------------------------------------------

[~tucu00]
Just to confirm, will this patch be reverted once MAPREDUCE-4820 is fixed and next release of 2.0.x is out? Since it's hacky and is to support a alpha version of hadoop, it would be good to keep the TTL of this patch in oozie codebase as short as possible. 
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504374#comment-13504374 ] 

Alejandro Abdelnur commented on OOZIE-1089:
-------------------------------------------

Riding on the fact that all JARs in the current directory of the launcher job will be implicitly part of the distributed cache of the action job, we can create a workaround until MAPREDUCE-4820 is fixed. The workaround would remove the distributed cache settings from the action configuration.

This workaround behavior would only be activated if a flag is set in oozie-site.xml.

                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504384#comment-13504384 ] 

Hadoop QA commented on OOZIE-1089:
----------------------------------

Testing JIRA OOZIE-1089

Cleaning local svn workspace

----------------------------

{color:red}-1{color} Patch failed to apply to head of branch

----------------------------
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Mohammad Kamrul Islam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504910#comment-13504910 ] 

Mohammad Kamrul Islam commented on OOZIE-1089:
----------------------------------------------

This was the root cause and that's why we asked hadoop team to retract the patch from 0.23.
End user can't enforce the file name at the share/lib. We need to find a resolution at Oozie level too. which one of the two options (mentioned above) will be good for this? (btw, we could fix it in Oozie in later release)


About the patch:
My concern was to support hadoop 2.x for 3.3. The reasons are : Oozie was not tested against 2.x. I heard pig streaming will not also work for some unrelated issue. Moreover, not sure how stable is hadoop 2.x alpha. 

Having said that, if it is must from your side, I'm good to go. +1.


One coding clarification comment :
The following line of code is only for writing a log in Launcher Main. right?
launcherConf.setBoolean("oozie.hadoop-2.0.2-alpha.workaround.for.distributed.cache", true); 

                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated OOZIE-1089:
--------------------------------------

    Attachment: OOZIE-1089-trunk.patch

uploading patch for trunk.
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated OOZIE-1089:
--------------------------------------

    Attachment: OOZIE-1089.patch
    
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504819#comment-13504819 ] 

Hadoop QA commented on OOZIE-1089:
----------------------------------

Testing JIRA OOZIE-1089

Cleaning local svn workspace

----------------------------

{color:red}-1{color} Patch failed to apply to head of branch

----------------------------
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504929#comment-13504929 ] 

Hadoop QA commented on OOZIE-1089:
----------------------------------

Testing JIRA OOZIE-1089

Cleaning local svn workspace

----------------------------

{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.    {color:green}+1{color} the patch does not introduce any @author tags
.    {color:green}+1{color} the patch does not introduce any tabs
.    {color:green}+1{color} the patch does not introduce any trailing spaces
.    {color:red}-1{color} the patch contains 1 line(s) longer than 132 characters
.    {color:green}+1{color} the patch does adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.    {color:green}+1{color} the patch does not seem to introduce new RAT warnings
{color:green}+1 JAVADOC{color}
.    {color:green}+1{color} the patch does not seem to introduce new Javadoc warnings
{color:green}+1 COMPILE{color}
.    {color:green}+1{color} HEAD compiles
.    {color:green}+1{color} patch compiles
.    {color:green}+1{color} the patch does not seem to introduce new javac warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.    {color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
.    {color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.    Tests run: 922
{color:green}+1 DISTRO{color}
.    {color:green}+1{color} distro tarball builds with the patch 

----------------------------
{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/207/
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Mohammad Kamrul Islam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504444#comment-13504444 ] 

Mohammad Kamrul Islam commented on OOZIE-1089:
----------------------------------------------

Is there anyway for oozie to enforce the uniqueness of jar in DC? Currently if same jar file is included in multiple places (such as wf/lib. user/lib, share/lib), there is no issue. With this new hadoop DC behavior, it will be an issue. That will break Oozie backward compatibility.

>From the above perspective, can you please comment on the proposed two options that I mentioned above?
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504880#comment-13504880 ] 

Alejandro Abdelnur commented on OOZIE-1089:
-------------------------------------------

Test failure is unrelated.

Mohammad, regarding your question about dup JARs, I don't know if it will work. I assume it won't. That has a workaround at user level, remove the dups.

Are we OK with this patch for branch-3.3?
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504439#comment-13504439 ] 

Hadoop QA commented on OOZIE-1089:
----------------------------------

Testing JIRA OOZIE-1089

Cleaning local svn workspace

----------------------------

{color:red}-1{color} Patch failed to apply to head of branch

----------------------------
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504655#comment-13504655 ] 

Hadoop QA commented on OOZIE-1089:
----------------------------------

Testing JIRA OOZIE-1089

Cleaning local svn workspace

----------------------------

{color:red}-1{color} Patch failed to apply to head of branch

----------------------------
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated OOZIE-1089:
--------------------------------------

    Attachment: OOZIE-1089-trunk.patch

this time generated using --no-prefix
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch, OOZIE-1089-trunk.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Roman Shaposhnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504379#comment-13504379 ] 

Roman Shaposhnik commented on OOZIE-1089:
-----------------------------------------

A strong +1 -- this seems to be a non-invasive change that is easy to remove once we no longer need the workaround, it doesn't change the default behavior and it was tested on Hadoop 2.0.2, Hadoop 1.1.0 and Hadoop 1.0.X

Now if I only to make one more request this would be to RELNOTE this workaround in the Oozie docs for the upcoming release.
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-1089) DistributedCache workaround for Hadoop 2.0.2-alpha

Posted by "Mohammad Kamrul Islam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504410#comment-13504410 ] 

Mohammad Kamrul Islam commented on OOZIE-1089:
----------------------------------------------

I was considering two alternative options:
Option 1: Before adding any jar file into DC, we can check if the jar filename is already in the DC. If yes, we can skip the addition to DC. This way we can avoid the duplicate files into DC.

Option 2: Oozie can store all the jars into a local data structure (say HashSet). At then end, Oozie can add those jars (from HashSet) into class path.

Comments?

 
                
> DistributedCache workaround for Hadoop 2.0.2-alpha
> --------------------------------------------------
>
>                 Key: OOZIE-1089
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1089
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 3.3.0
>
>         Attachments: OOZIE-1089.patch
>
>
> As explained in MAPREDUCE-4820, Hadoop 2.0.2-alpha introduced a duplicate check that exposes an change of behavior in how the distributed-cache works in Hadoop 2 (as opposed to Hadoop-1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira