You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Daniel Zhang <ja...@hotmail.com> on 2019/03/21 22:06:22 UTC

oozie 5.0.0 on AWS EMR

Hi, oozier:

Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie 4.3.

We found out one nice feature was broken for us on Oozie 5.0.0, unfortunately.

On Oozie 4.3, we put our oozie applications in one S3 bucket, as our release repository, and in the oozie application properties file, we just use as following:

appBaseDir=${s3.app.bucket}/oozieJobs/${appName}

And oozie 4.3 runtime will load all the application code from the S3, and still use the oozie sharelib from the HDFS for us, and whole application workflow works perfectly.

After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as our application repository anymore. The same application will WORK fine if the application is stored in HDFS. But if stored in S3, we got the following error message:

Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not create lib paths list for application [s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS: hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected: s3://bucket-name
        at org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
        at org.apache.oozie.command.wf.SubmitXCommand.execute(SubmitXCommand.java:168)
        ... 36 more
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected: s3://bucket-name
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
        at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
        at com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
        at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
        at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
        at org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
        at org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
        ... 37 more

It looks like if we config the APP path as in S3 by appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will complain that it cannot load the sharelib any more from the HDFS URI, even though the all the share lib are indeed stored in the HFDS correct location as specified in the error message.

With this error message, I found out the following commit in the Oozie 5.0
https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109

Since the error comes from the FileSystem in core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>, so I think MAYBE above commit causing it?
[https://avatars3.githubusercontent.com/u/2914398?s=200&v=4]<https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>

OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… · apache/oozie@5998c18 - GitHub<https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>
Mirror of Apache Oozie. Contribute to apache/oozie development by creating an account on GitHub.
github.com


In 5.0.0, on line 202, it is using the "fs" which comes from line 177 with a "conf" coming from line 169 like following: https://github.com/apache/oozie/blob/branch-5.0/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L166

                    URI uri = new URI(jobConf.get(OozieClient.APP_PATH));

                    Configuration conf = has.createConfiguration(uri.getAuthority());


But in 4.3.0 at https://github.com/apache/oozie/blob/branch-4.3/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L167


            URI uri = new URI(jobConf.get(OozieClient.APP_PATH));

        Configuration conf = has.createJobConf(uri.getAuthority());


I am NOT 100% sure, but the above code indeed returns the FileSystem eventually complains "WRONG FS" in my case, and the above commit changes the "jobConf" from the createJobConf to createConfiguration.

So my question here, do you think that it is the above change causing my issue? If so, I believe there is a reason for the above commit, but do I have a solution also for my use case?

Thanks

Yong


Re: oozie 5.0.0 on AWS EMR

Posted by Daniel Zhang <ja...@hotmail.com>.
I am not sure if I can produce this error case on my laptop easily.
I am sure that I can setup oozie on my local laptop using local disk as underline OS file system for oozie.
The issue is that on the oozie 5.0, I need to simulate 2 different file systems, one configured in oozie to be used as share lib, another one configured in oozie to be used as application base dir.

Yong

________________________________
From: Peter Cseh <ge...@cloudera.com.INVALID>
Sent: Monday, March 25, 2019 3:03 PM
To: user@oozie.apache.org
Subject: Re: oozie 5.0.0 on AWS EMR

Hey Yong,

Thanks for reporting this issue!
If I see correctly, your Oozie is set up to talk to a HDFS instance and to
S3 as well. This is not a scenario I'm too familiar with.
Could you give us some easy-to-follow steps to reproduce this?
Thanks
gp

On Thu, Mar 21, 2019 at 11:13 PM Daniel Zhang <ja...@hotmail.com> wrote:

> Hi, oozier:
>
> Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie
> 4.3.
>
> We found out one nice feature was broken for us on Oozie 5.0.0,
> unfortunately.
>
> On Oozie 4.3, we put our oozie applications in one S3 bucket, as our
> release repository, and in the oozie application properties file, we just
> use as following:
>
> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}
>
> And oozie 4.3 runtime will load all the application code from the S3, and
> still use the oozie sharelib from the HDFS for us, and whole application
> workflow works perfectly.
>
> After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as our
> application repository anymore. The same application will WORK fine if the
> application is stored in HDFS. But if stored in S3, we got the following
> error message:
>
> Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not
> create lib paths list for application
> [s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS:
> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
> s3://bucket-name
>         at
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
>         at org.apache.oozie.command.wf
> .SubmitXCommand.execute(SubmitXCommand.java:168)
>         ... 36 more
> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
> s3://bucket-name
>         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
>         at
> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
>         at
> com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
>         at
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
>         at
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
>         at
> org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
>         at
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
>         ... 37 more
>
> It looks like if we config the APP path as in S3 by
> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will complain
> that it cannot load the sharelib any more from the HDFS URI, even though
> the all the share lib are indeed stored in the HFDS correct location as
> specified in the error message.
>
> With this error message, I found out the following commit in the Oozie 5.0
>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
>
> Since the error comes from the FileSystem in
> core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>,
> so I think MAYBE above commit causing it?
> [https://avatars3.githubusercontent.com/u/2914398?s=200&v=4]<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >
>
> OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… ·
> apache/oozie@5998c18 - GitHub<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >
> Mirror of Apache Oozie. Contribute to apache/oozie development by creating
> an account on GitHub.
> github.com
>
>
> In 5.0.0, on line 202, it is using the "fs" which comes from line 177 with
> a "conf" coming from line 169 like following:
> https://github.com/apache/oozie/blob/branch-5.0/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L166
>
>                     URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
>
>                     Configuration conf =
> has.createConfiguration(uri.getAuthority());
>
>
> But in 4.3.0 at
> https://github.com/apache/oozie/blob/branch-4.3/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L167
>
>
>             URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
>
>         Configuration conf = has.createJobConf(uri.getAuthority());
>
>
> I am NOT 100% sure, but the above code indeed returns the FileSystem
> eventually complains "WRONG FS" in my case, and the above commit changes
> the "jobConf" from the createJobConf to createConfiguration.
>
> So my question here, do you think that it is the above change causing my
> issue? If so, I believe there is a reason for the above commit, but do I
> have a solution also for my use case?
>
> Thanks
>
> Yong
>
>

--
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Re: oozie 5.0.0 on AWS EMR

Posted by Peter Cseh <ge...@cloudera.com.INVALID>.
Hi Yong,
The usage of local filesystems are strictly prohibited in Oozie 5.0.
I'd guess you have a hdfs://seomnode as fs.defaultFS and you're providing
the S3 credentials for the job only.
I'll try to carve out some time to reproduce and fix this, but I can't
promise you anything soon due to other priorities.
Once we have the reproduction steps, we should file a Jira for this.

gp

On Mon, Mar 25, 2019 at 8:34 PM <ve...@gmail.com> wrote:

> Hi Yong
>
> Have you also tried s3a in place of s3?
>
>
> -
> Suresh.
>
>
> > On Mar 25, 2019, at 2:03 PM, Peter Cseh <ge...@cloudera.com.invalid>
> wrote:
> >
> > Hey Yong,
> >
> > Thanks for reporting this issue!
> > If I see correctly, your Oozie is set up to talk to a HDFS instance and
> to
> > S3 as well. This is not a scenario I'm too familiar with.
> > Could you give us some easy-to-follow steps to reproduce this?
> > Thanks
> > gp
> >
> >> On Thu, Mar 21, 2019 at 11:13 PM Daniel Zhang <ja...@hotmail.com>
> wrote:
> >>
> >> Hi, oozier:
> >>
> >> Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie
> >> 4.3.
> >>
> >> We found out one nice feature was broken for us on Oozie 5.0.0,
> >> unfortunately.
> >>
> >> On Oozie 4.3, we put our oozie applications in one S3 bucket, as our
> >> release repository, and in the oozie application properties file, we
> just
> >> use as following:
> >>
> >> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}
> >>
> >> And oozie 4.3 runtime will load all the application code from the S3,
> and
> >> still use the oozie sharelib from the HDFS for us, and whole application
> >> workflow works perfectly.
> >>
> >> After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as
> our
> >> application repository anymore. The same application will WORK fine if
> the
> >> application is stored in HDFS. But if stored in S3, we got the following
> >> error message:
> >>
> >> Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not
> >> create lib paths list for application
> >> [s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS:
> >> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib,
> expected:
> >> s3://bucket-name
> >>        at
> >>
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
> >>        at org.apache.oozie.command.wf
> >> .SubmitXCommand.execute(SubmitXCommand.java:168)
> >>        ... 36 more
> >> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> >> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib,
> expected:
> >> s3://bucket-name
> >>        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
> >>        at
> >> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
> >>        at
> >>
> com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
> >>        at
> >>
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
> >>        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
> >>        at
> >> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
> >>        at
> >>
> org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
> >>        at
> >>
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
> >>        ... 37 more
> >>
> >> It looks like if we config the APP path as in S3 by
> >> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will
> complain
> >> that it cannot load the sharelib any more from the HDFS URI, even though
> >> the all the share lib are indeed stored in the HFDS correct location as
> >> specified in the error message.
> >>
> >> With this error message, I found out the following commit in the Oozie
> 5.0
> >>
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >>
> >> Since the error comes from the FileSystem in
> >> core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >,
> >> so I think MAYBE above commit causing it?
> >> [https://avatars3.githubusercontent.com/u/2914398?s=200&v=4]<
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >>>
> >>
> >> OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… ·
> >> apache/oozie@5998c18 - GitHub<
> >>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >>>
> >> Mirror of Apache Oozie. Contribute to apache/oozie development by
> creating
> >> an account on GitHub.
> >> github.com
> >>
> >>
> >> In 5.0.0, on line 202, it is using the "fs" which comes from line 177
> with
> >> a "conf" coming from line 169 like following:
> >>
> https://github.com/apache/oozie/blob/branch-5.0/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L166
> >>
> >>                    URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
> >>
> >>                    Configuration conf =
> >> has.createConfiguration(uri.getAuthority());
> >>
> >>
> >> But in 4.3.0 at
> >>
> https://github.com/apache/oozie/blob/branch-4.3/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L167
> >>
> >>
> >>            URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
> >>
> >>        Configuration conf = has.createJobConf(uri.getAuthority());
> >>
> >>
> >> I am NOT 100% sure, but the above code indeed returns the FileSystem
> >> eventually complains "WRONG FS" in my case, and the above commit changes
> >> the "jobConf" from the createJobConf to createConfiguration.
> >>
> >> So my question here, do you think that it is the above change causing my
> >> issue? If so, I believe there is a reason for the above commit, but do I
> >> have a solution also for my use case?
> >>
> >> Thanks
> >>
> >> Yong
> >>
> >>
> >
> > --
> > *Peter Cseh *| Software Engineer
> > cloudera.com <https://www.cloudera.com>
> >
> > [image: Cloudera] <https://www.cloudera.com/>
> >
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
>


-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Re: oozie 5.0.0 on AWS EMR

Posted by ve...@gmail.com.
Hi Yong

Have you also tried s3a in place of s3?


- 
Suresh.


> On Mar 25, 2019, at 2:03 PM, Peter Cseh <ge...@cloudera.com.invalid> wrote:
> 
> Hey Yong,
> 
> Thanks for reporting this issue!
> If I see correctly, your Oozie is set up to talk to a HDFS instance and to
> S3 as well. This is not a scenario I'm too familiar with.
> Could you give us some easy-to-follow steps to reproduce this?
> Thanks
> gp
> 
>> On Thu, Mar 21, 2019 at 11:13 PM Daniel Zhang <ja...@hotmail.com> wrote:
>> 
>> Hi, oozier:
>> 
>> Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie
>> 4.3.
>> 
>> We found out one nice feature was broken for us on Oozie 5.0.0,
>> unfortunately.
>> 
>> On Oozie 4.3, we put our oozie applications in one S3 bucket, as our
>> release repository, and in the oozie application properties file, we just
>> use as following:
>> 
>> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}
>> 
>> And oozie 4.3 runtime will load all the application code from the S3, and
>> still use the oozie sharelib from the HDFS for us, and whole application
>> workflow works perfectly.
>> 
>> After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as our
>> application repository anymore. The same application will WORK fine if the
>> application is stored in HDFS. But if stored in S3, we got the following
>> error message:
>> 
>> Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not
>> create lib paths list for application
>> [s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS:
>> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
>> s3://bucket-name
>>        at
>> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
>>        at org.apache.oozie.command.wf
>> .SubmitXCommand.execute(SubmitXCommand.java:168)
>>        ... 36 more
>> Caused by: java.lang.IllegalArgumentException: Wrong FS:
>> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
>> s3://bucket-name
>>        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
>>        at
>> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
>>        at
>> com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
>>        at
>> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
>>        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
>>        at
>> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
>>        at
>> org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
>>        at
>> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
>>        ... 37 more
>> 
>> It looks like if we config the APP path as in S3 by
>> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will complain
>> that it cannot load the sharelib any more from the HDFS URI, even though
>> the all the share lib are indeed stored in the HFDS correct location as
>> specified in the error message.
>> 
>> With this error message, I found out the following commit in the Oozie 5.0
>> 
>> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
>> 
>> Since the error comes from the FileSystem in
>> core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<
>> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>,
>> so I think MAYBE above commit causing it?
>> [https://avatars3.githubusercontent.com/u/2914398?s=200&v=4]<
>> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
>>> 
>> 
>> OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… ·
>> apache/oozie@5998c18 - GitHub<
>> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
>>> 
>> Mirror of Apache Oozie. Contribute to apache/oozie development by creating
>> an account on GitHub.
>> github.com
>> 
>> 
>> In 5.0.0, on line 202, it is using the "fs" which comes from line 177 with
>> a "conf" coming from line 169 like following:
>> https://github.com/apache/oozie/blob/branch-5.0/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L166
>> 
>>                    URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
>> 
>>                    Configuration conf =
>> has.createConfiguration(uri.getAuthority());
>> 
>> 
>> But in 4.3.0 at
>> https://github.com/apache/oozie/blob/branch-4.3/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L167
>> 
>> 
>>            URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
>> 
>>        Configuration conf = has.createJobConf(uri.getAuthority());
>> 
>> 
>> I am NOT 100% sure, but the above code indeed returns the FileSystem
>> eventually complains "WRONG FS" in my case, and the above commit changes
>> the "jobConf" from the createJobConf to createConfiguration.
>> 
>> So my question here, do you think that it is the above change causing my
>> issue? If so, I believe there is a reason for the above commit, but do I
>> have a solution also for my use case?
>> 
>> Thanks
>> 
>> Yong
>> 
>> 
> 
> -- 
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
> 
> [image: Cloudera] <https://www.cloudera.com/>
> 
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------

Re: oozie 5.0.0 on AWS EMR

Posted by Peter Cseh <ge...@cloudera.com.INVALID>.
Hey Yong,

Thanks for reporting this issue!
If I see correctly, your Oozie is set up to talk to a HDFS instance and to
S3 as well. This is not a scenario I'm too familiar with.
Could you give us some easy-to-follow steps to reproduce this?
Thanks
gp

On Thu, Mar 21, 2019 at 11:13 PM Daniel Zhang <ja...@hotmail.com> wrote:

> Hi, oozier:
>
> Since AWS EMR 5.15.0, it releases with Oozie 5.0.0, upgrades from oozie
> 4.3.
>
> We found out one nice feature was broken for us on Oozie 5.0.0,
> unfortunately.
>
> On Oozie 4.3, we put our oozie applications in one S3 bucket, as our
> release repository, and in the oozie application properties file, we just
> use as following:
>
> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}
>
> And oozie 4.3 runtime will load all the application code from the S3, and
> still use the oozie sharelib from the HDFS for us, and whole application
> workflow works perfectly.
>
> After EMR 5.15.0, it upgrades to Oozie 5.0.0, and we cannot use S3 as our
> application repository anymore. The same application will WORK fine if the
> application is stored in HDFS. But if stored in S3, we got the following
> error message:
>
> Caused by: org.apache.oozie.workflow.WorkflowException: E0712: Could not
> create lib paths list for application
> [s3://bucket-name/oozieJobs/ourAppName/workflow/workflow.xml], Wrong FS:
> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
> s3://bucket-name
>         at
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:258)
>         at org.apache.oozie.command.wf
> .SubmitXCommand.execute(SubmitXCommand.java:168)
>         ... 36 more
> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> hdfs://ip-172-31-72-175.ec2.internal:8020/user/oozie/share/lib, expected:
> s3://bucket-name
>         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
>         at
> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
>         at
> com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
>         at
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:740)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
>         at
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:347)
>         at
> org.apache.oozie.service.WorkflowAppService.getLibFiles(WorkflowAppService.java:301)
>         at
> org.apache.oozie.service.WorkflowAppService.createProtoActionConf(WorkflowAppService.java:202)
>         ... 37 more
>
> It looks like if we config the APP path as in S3 by
> appBaseDir=${s3.app.bucket}/oozieJobs/${appName}, Oozie 5.0 will complain
> that it cannot load the sharelib any more from the HDFS URI, even though
> the all the share lib are indeed stored in the HFDS correct location as
> specified in the error message.
>
> With this error message, I found out the following commit in the Oozie 5.0
>
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
>
> Since the error comes from the FileSystem in
> core/src/main/java/org/apache/oozie/service/WorkflowAppService.java<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109>,
> so I think MAYBE above commit causing it?
> [https://avatars3.githubusercontent.com/u/2914398?s=200&v=4]<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >
>
> OOZIE-2944 Shell action example does not work with Oozie on Yarn on h… ·
> apache/oozie@5998c18 - GitHub<
> https://github.com/apache/oozie/commit/5998c18fde1da769e91e3ef1bcca484723730c76#diff-d4e9af2c1e2ddeae544be6182b948109
> >
> Mirror of Apache Oozie. Contribute to apache/oozie development by creating
> an account on GitHub.
> github.com
>
>
> In 5.0.0, on line 202, it is using the "fs" which comes from line 177 with
> a "conf" coming from line 169 like following:
> https://github.com/apache/oozie/blob/branch-5.0/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L166
>
>                     URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
>
>                     Configuration conf =
> has.createConfiguration(uri.getAuthority());
>
>
> But in 4.3.0 at
> https://github.com/apache/oozie/blob/branch-4.3/core/src/main/java/org/apache/oozie/service/WorkflowAppService.java#L167
>
>
>             URI uri = new URI(jobConf.get(OozieClient.APP_PATH));
>
>         Configuration conf = has.createJobConf(uri.getAuthority());
>
>
> I am NOT 100% sure, but the above code indeed returns the FileSystem
> eventually complains "WRONG FS" in my case, and the above commit changes
> the "jobConf" from the createJobConf to createConfiguration.
>
> So my question here, do you think that it is the above change causing my
> issue? If so, I believe there is a reason for the above commit, but do I
> have a solution also for my use case?
>
> Thanks
>
> Yong
>
>

-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------