You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by purna pradeep <pu...@gmail.com> on 2018/05/15 14:11:08 UTC
Oozie for spark jobs without Hadoop
Hi,
Would like to know if I can use sparkaction in oozie without having Hadoop
cluster?
I want to use oozie to schedule spark jobs on Kubernetes cluster
I’m a beginner in oozie
Thanks
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
+Peter
On Wed, May 16, 2018 at 11:29 AM purna pradeep <pu...@gmail.com>
wrote:
> Peter,
>
> I have tried to specify dataset with uri starting with s3://, s3a:// and
> s3n:// and I am getting exception
>
>
>
> Exception occurred:E0904: Scheme [s3] not supported in uri
> [s3://mybucket/input.data] Making the job failed
>
> org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not
> supported in uri [s3:// mybucket /input.data]
>
> at
> org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:185)
>
> at
> org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:168)
>
> at
> org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:160)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(CoordCommandUtils.java:465)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.materializeInputDataEvents(CoordCommandUtils.java:731)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(CoordCommandUtils.java:546)
>
> at
> org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>
> at
> org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand.materialize(CoordMaterializeTransitionXCommand.java:362)
>
> at
> org.apache.oozie.command.MaterializeTransitionXCommand.execute(MaterializeTransitionXCommand.java:73)
>
> at
> org.apache.oozie.command.MaterializeTransitionXCommand.execute(MaterializeTransitionXCommand.java:29)
>
> at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:181)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
>
>
> Is S3 support specific to CDH distribution or should it work in Apache
> Oozie as well? I’m not using CDH yet so
>
> On Wed, May 16, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com> wrote:
>
>> I think it should be possible for Oozie to poll S3. Check out this
>> <
>> https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_oozie_s3.html
>> >
>> description on how to make it work in jobs, something similar should work
>> on the server side as well
>>
>> On Tue, May 15, 2018 at 4:43 PM, purna pradeep <pu...@gmail.com>
>> wrote:
>>
>> > Thanks Andras,
>> >
>> > Also I also would like to know if oozie supports Aws S3 as input events
>> to
>> > poll for a dependency file before kicking off a spark action
>> >
>> >
>> > For example: I don’t want to kick off a spark action until a file is
>> > arrived on a given AWS s3 location
>> >
>> > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>> andras.piros@cloudera.com>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > Oozie needs HDFS to store workflow, coordinator, or bundle
>> definitions,
>> > as
>> > > well as sharelib files in a safe, distributed and scalable way. Oozie
>> > needs
>> > > YARN to run almost all of its actions, Spark action being no
>> exception.
>> > >
>> > > At the moment it's not feasible to install Oozie without those Hadoop
>> > > components. How to install Oozie please *find here
>> > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>> > >
>> > > Regards,
>> > >
>> > > Andras
>> > >
>> > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>> purna2pradeep@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > Would like to know if I can use sparkaction in oozie without having
>> > > Hadoop
>> > > > cluster?
>> > > >
>> > > > I want to use oozie to schedule spark jobs on Kubernetes cluster
>> > > >
>> > > > I’m a beginner in oozie
>> > > >
>> > > > Thanks
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> *Peter Cseh *| Software Engineer
>> cloudera.com <https://www.cloudera.com>
>>
>> [image: Cloudera] <https://www.cloudera.com/>
>>
>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera
>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> ------------------------------
>>
>
Re: Oozie for spark jobs without Hadoop
Posted by Artem Ervits <ar...@gmail.com>.
Here's some related info
https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
On Wed, May 16, 2018, 3:45 PM purna pradeep <pu...@gmail.com> wrote:
> Peter,
>
> I got rid of this error by adding
> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>
> But I’m getting below error now
>
> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key
> must be specified by setting the fs.s3.awsAccessKeyId and
> fs.s3.awsSecretAccessKey properties (respectively)
>
> I have tried adding AWS access ,secret keys in
>
> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>
>
>
>
> On Wed, May 16, 2018 at 2:30 PM purna pradeep <pu...@gmail.com>
> wrote:
>
> >
> > I have tried this ,just added s3 instead of *
> >
> > <property>
> >
> >
> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
> >
> > <value>hdfs,hftp,webhdfs,s3</value>
> >
> > </property>
> >
> >
> > Getting below error
> >
> > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
> > org.apache.hadoop.fs.s3a.S3AFileSystem not found
> >
> > at
> > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
> >
> > at
> > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
> >
> > at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
> >
> > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
> >
> > at
> > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
> >
> > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
> >
> > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
> >
> > at
> >
> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
> >
> > at
> >
> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
> >
> >
> > On Wed, May 16, 2018 at 2:19 PM purna pradeep <pu...@gmail.com>
> > wrote:
> >
> >> This is what is in the logs
> >>
> >> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost]
> >> Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler]
> >>
> >> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost]
> >> Loaded default urihandler org.apache.oozie.dependency.FSURIHandler
> >>
> >>
> >> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com>
> >> wrote:
> >>
> >>> That's strange, this exception should not happen in that case.
> >>> Can you check the server logs for messages like this?
> >>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
> >>> LOG.info("Loaded default urihandler {0}",
> >>> defaultHandler.getClass().getName());
> >>> Thanks
> >>>
> >>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
> purna2pradeep@gmail.com>
> >>> wrote:
> >>>
> >>>> This is what I already have in my oozie-site.xml
> >>>>
> >>>> <property>
> >>>>
> >>>>
> >>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
> >>>>
> >>>> <value>*</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com>
> >>>> wrote:
> >>>>
> >>>>> You'll have to configure
> >>>>> oozie.service.HadoopAccessorService.supported.filesystems
> >>>>> hdfs,hftp,webhdfs Enlist
> >>>>> the different filesystems supported for federation. If wildcard "*"
> is
> >>>>> specified, then ALL file schemes will be allowed.properly.
> >>>>>
> >>>>> For testing purposes it's ok to put * in there in oozie-site.xml
> >>>>>
> >>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
> >>>>> purna2pradeep@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> > Peter,
> >>>>> >
> >>>>> > I have tried to specify dataset with uri starting with s3://,
> s3a://
> >>>>> and
> >>>>> > s3n:// and I am getting exception
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
> >>>>> > [s3://mybucket/input.data] Making the job failed
> >>>>> >
> >>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3]
> >>>>> not
> >>>>> > supported in uri [s3:// mybucket /input.data]
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
> >>>>> > URIHandlerService.java:185)
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
> >>>>> > URIHandlerService.java:168)
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
> >>>>> > URIHandlerService.java:160)
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
> >>>>> > CoordCommandUtils.java:465)
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
> >>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
> >>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
> >>>>> >
> >>>>> > at
> >>>>> >
> >>>>>
> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
> >>>>> > CoordCommandUtils.java:546)
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
> >>>>> >
> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
> >>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
> >>>>> > MaterializeTransitionXCommand.java:73)
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
> >>>>> > MaterializeTransitionXCommand.java:29)
> >>>>> >
> >>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
> >>>>> >
> >>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >>>>> >
> >>>>> > at
> >>>>> > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
> >>>>> > CallableQueueService.java:181)
> >>>>> >
> >>>>> > at
> >>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> >>>>> > ThreadPoolExecutor.java:1149)
> >>>>> >
> >>>>> > at
> >>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >>>>> > ThreadPoolExecutor.java:624)
> >>>>> >
> >>>>> > at java.lang.Thread.run(Thread.java:748)
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > Is S3 support specific to CDH distribution or should it work in
> >>>>> Apache
> >>>>> > Oozie as well? I’m not using CDH yet so
> >>>>> >
> >>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <gezapeti@cloudera.com
> >
> >>>>> wrote:
> >>>>> >
> >>>>> > > I think it should be possible for Oozie to poll S3. Check out
> this
> >>>>> > > <
> >>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
> >>>>> > x/topics/admin_oozie_s3.html
> >>>>> > > >
> >>>>> > > description on how to make it work in jobs, something similar
> >>>>> should work
> >>>>> > > on the server side as well
> >>>>> > >
> >>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
> >>>>> purna2pradeep@gmail.com>
> >>>>> > > wrote:
> >>>>> > >
> >>>>> > > > Thanks Andras,
> >>>>> > > >
> >>>>> > > > Also I also would like to know if oozie supports Aws S3 as
> input
> >>>>> events
> >>>>> > > to
> >>>>> > > > poll for a dependency file before kicking off a spark action
> >>>>> > > >
> >>>>> > > >
> >>>>> > > > For example: I don’t want to kick off a spark action until a
> >>>>> file is
> >>>>> > > > arrived on a given AWS s3 location
> >>>>> > > >
> >>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
> >>>>> > andras.piros@cloudera.com
> >>>>> > > >
> >>>>> > > > wrote:
> >>>>> > > >
> >>>>> > > > > Hi,
> >>>>> > > > >
> >>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
> >>>>> > definitions,
> >>>>> > > > as
> >>>>> > > > > well as sharelib files in a safe, distributed and scalable
> >>>>> way. Oozie
> >>>>> > > > needs
> >>>>> > > > > YARN to run almost all of its actions, Spark action being no
> >>>>> > exception.
> >>>>> > > > >
> >>>>> > > > > At the moment it's not feasible to install Oozie without
> those
> >>>>> Hadoop
> >>>>> > > > > components. How to install Oozie please *find here
> >>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
> >>>>> > > > >
> >>>>> > > > > Regards,
> >>>>> > > > >
> >>>>> > > > > Andras
> >>>>> > > > >
> >>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
> >>>>> > > purna2pradeep@gmail.com>
> >>>>> > > > > wrote:
> >>>>> > > > >
> >>>>> > > > > > Hi,
> >>>>> > > > > >
> >>>>> > > > > > Would like to know if I can use sparkaction in oozie
> without
> >>>>> having
> >>>>> > > > > Hadoop
> >>>>> > > > > > cluster?
> >>>>> > > > > >
> >>>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes
> >>>>> cluster
> >>>>> > > > > >
> >>>>> > > > > > I’m a beginner in oozie
> >>>>> > > > > >
> >>>>> > > > > > Thanks
> >>>>> > > > > >
> >>>>> > > > >
> >>>>> > > >
> >>>>> > >
> >>>>> > >
> >>>>> > >
> >>>>> > > --
> >>>>> > > *Peter Cseh *| Software Engineer
> >>>>> > > cloudera.com <https://www.cloudera.com>
> >>>>> > >
> >>>>> > > [image: Cloudera] <https://www.cloudera.com/>
> >>>>> > >
> >>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> >>>>> [image:
> >>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
> [image:
> >>>>> > Cloudera
> >>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> >>>>> > > ------------------------------
> >>>>> > >
> >>>>> >
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> *Peter Cseh *| Software Engineer
> >>>>> cloudera.com <https://www.cloudera.com>
> >>>>>
> >>>>> [image: Cloudera] <https://www.cloudera.com/>
> >>>>>
> >>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> >>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> >>>>> Cloudera
> >>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> >>>>> ------------------------------
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> *Peter Cseh *| Software Engineer
> >>> cloudera.com <https://www.cloudera.com>
> >>>
> >>> [image: Cloudera] <https://www.cloudera.com/>
> >>>
> >>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> >>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> >>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> >>> ------------------------------
> >>>
> >>>
>
Re: Oozie for spark jobs without Hadoop
Posted by Peter Cseh <ge...@cloudera.com>.
Can you try configuring the access keys via environment variables in the
server?
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_via_environment_variables
It's possible that we don't propagate the coordinator action's
configuration properly to the polling code.
On Thu, May 17, 2018 at 8:53 PM, purna pradeep <pu...@gmail.com>
wrote:
> Ok I got passed this error
>
> By rebuilding oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9
>
> now getting this error
>
>
>
> ACTION[0000000-180517144113498-oozie-xjt0-C@1] org.apache.oozie.service.HadoopAccessorException:
> E0902: Exception occurred: [doesBucketExist on mybucketcom.amazonaws.AmazonClientException:
> No AWS Credentials provided by BasicAWSCredentialsProvider
> EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider
> : com.amazonaws.SdkClientException: Unable to load credentials from
> service endpoint]
>
> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
> occurred: [doesBucketExist on cmsegmentation-qa: com.amazonaws.AmazonClientException:
> No AWS Credentials provided by BasicAWSCredentialsProvider
> EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider
> : com.amazonaws.SdkClientException: Unable to load credentials from
> service endpoint]
>
> On Thu, May 17, 2018 at 12:24 PM purna pradeep <pu...@gmail.com>
> wrote:
>
>>
>> Peter,
>>
>> Also When I submit a job with new http client jar, I get
>>
>> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie
>> server. No of retries = 1. Exception = Could not authenticate,
>> Authentication failed, status: 500, message: Server Error```
>>
>>
>> On Thu, May 17, 2018 at 12:14 PM purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>> Ok I have tried this
>>>
>>> It appears that s3a support requires httpclient 4.4.x and oozie is
>>> bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI
>>> stops loading.
>>>
>>>
>>>
>>> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>>> wrote:
>>>
>>>> Purna,
>>>>
>>>> Based on https://hadoop.apache.org/docs/stable/hadoop-aws/tools/
>>>> hadoop-aws/index.html#S3 you should try to go for s3a.
>>>> You'll have to include the aws-jdk as well if I see it correctly:
>>>> https://hadoop.apache.org/docs/stable/hadoop-
>>>> aws/tools/hadoop-aws/index.html#S3A
>>>> Also, the property names are slightly different so you'll have to
>>>> change the example I've given.
>>>>
>>>>
>>>>
>>>> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2pradeep@gmail.com
>>>> > wrote:
>>>>
>>>>> Peter,
>>>>>
>>>>> I’m using latest oozie 5.0.0 and I have tried below changes but no
>>>>> luck
>>>>>
>>>>> Is this for s3 or s3a ?
>>>>>
>>>>> I’m using s3 but if this is for s3a do you know which jar I need to
>>>>> include I mean Hadoop-aws jar or any other jar if required
>>>>>
>>>>> Hadoop-aws-2.8.3.jar is what I’m using
>>>>>
>>>>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <ge...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Ok, I've found it:
>>>>>>
>>>>>> If you are using 4.3.0 or newer this is the part which checks for
>>>>>> dependencies:
>>>>>> https://github.com/apache/oozie/blob/master/core/src/
>>>>>> main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-
>>>>>> L926
>>>>>> It passes the coordinator action's configuration and even does
>>>>>> impersonation to check for the dependencies:
>>>>>> https://github.com/apache/oozie/blob/master/core/src/
>>>>>> main/java/org/apache/oozie/coord/input/logic/
>>>>>> CoordInputLogicEvaluatorPhaseOne.java#L159
>>>>>>
>>>>>> Have you tried the following in the coordinator xml:
>>>>>>
>>>>>> <action>
>>>>>> <workflow>
>>>>>> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-
>>>>>> path>
>>>>>> <configuration>
>>>>>> <property>
>>>>>> <name>fs.s3.awsAccessKeyId</name>
>>>>>> <value>[YOURKEYID]</value>
>>>>>> </property>
>>>>>> <property>
>>>>>> <name>fs.s3.awsSecretAccessKey</name>
>>>>>> <value>[YOURKEY]</value>
>>>>>> </property>
>>>>>> </configuration>
>>>>>> </workflow>
>>>>>> </action>
>>>>>>
>>>>>> Based on the source this should be able to poll s3 periodically.
>>>>>>
>>>>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <
>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> I have tried with coordinator's configuration too but no luck ☹️
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Great progress there purna! :)
>>>>>>>>
>>>>>>>> Have you tried adding these properites to the coordinator's
>>>>>>>> configuration? we usually use the action config to build up connection to
>>>>>>>> the distributed file system.
>>>>>>>> Although I'm not sure we're using these when polling the
>>>>>>>> dependencies for coordinators, but I'm excited about you trying to make it
>>>>>>>> work!
>>>>>>>>
>>>>>>>> I'll get back with a - hopefully - more helpful answer soon, I have
>>>>>>>> to check the code in more depth first.
>>>>>>>> gp
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <
>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Peter,
>>>>>>>>>
>>>>>>>>> I got rid of this error by adding
>>>>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>>>>>
>>>>>>>>> But I’m getting below error now
>>>>>>>>>
>>>>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
>>>>>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>>>>>
>>>>>>>>> I have tried adding AWS access ,secret keys in
>>>>>>>>>
>>>>>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <
>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have tried this ,just added s3 instead of *
>>>>>>>>>>
>>>>>>>>>> <property>
>>>>>>>>>>
>>>>>>>>>> <name>oozie.service.HadoopAccessorService.
>>>>>>>>>> supported.filesystems</name>
>>>>>>>>>>
>>>>>>>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>>>>>
>>>>>>>>>> </property>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Getting below error
>>>>>>>>>>
>>>>>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>>>>>>>> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>>>>>
>>>>>>>>>> at org.apache.hadoop.conf.Configuration.getClass(
>>>>>>>>>> Configuration.java:2369)
>>>>>>>>>>
>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
>>>>>>>>>> FileSystem.java:2793)
>>>>>>>>>>
>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(
>>>>>>>>>> FileSystem.java:2810)
>>>>>>>>>>
>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(
>>>>>>>>>> FileSystem.java:100)
>>>>>>>>>>
>>>>>>>>>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
>>>>>>>>>> FileSystem.java:2849)
>>>>>>>>>>
>>>>>>>>>> at org.apache.hadoop.fs.FileSystem$Cache.get(
>>>>>>>>>> FileSystem.java:2831)
>>>>>>>>>>
>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>>>>>
>>>>>>>>>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>>>>>>>>>> HadoopAccessorService.java:625)
>>>>>>>>>>
>>>>>>>>>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>>>>>>>>>> HadoopAccessorService.java:623
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> This is what is in the logs
>>>>>>>>>>>
>>>>>>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>>>>>>>> SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency.
>>>>>>>>>>> FSURIHandler]
>>>>>>>>>>>
>>>>>>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>>>>>>>> SERVER[localhost] Loaded default urihandler org.apache.oozie.dependency.
>>>>>>>>>>> FSURIHandler
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <
>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>>>>>> LOG.info("Loaded urihandlers {0}",
>>>>>>>>>>>> Arrays.toString(classes));
>>>>>>>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>>>>>
>>>>>>>>>>>>> <property>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <name>oozie.service.HadoopAccessorService.
>>>>>>>>>>>>> supported.filesystems</name>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <value>*</value>
>>>>>>>>>>>>>
>>>>>>>>>>>>> </property>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <
>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> You'll have to configure
>>>>>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>>>>>> the different filesystems supported for federation. If
>>>>>>>>>>>>>> wildcard "*" is
>>>>>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For testing purposes it's ok to put * in there in
>>>>>>>>>>>>>> oozie-site.xml
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > Peter,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > I have tried to specify dataset with uri starting with
>>>>>>>>>>>>>> s3://, s3a:// and
>>>>>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904:
>>>>>>>>>>>>>> Scheme [s3] not
>>>>>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>> createEarlyURIs(
>>>>>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>> materializeOneInstance(
>>>>>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.command.coord.
>>>>>>>>>>>>>> CoordMaterializeTransitionXCom
>>>>>>>>>>>>>> > mand.materializeActions(CoordMaterializeTransitionXCom
>>>>>>>>>>>>>> mand.java:492)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.command.coord.
>>>>>>>>>>>>>> CoordMaterializeTransitionXCom
>>>>>>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCom
>>>>>>>>>>>>>> mand.java:362)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.
>>>>>>>>>>>>>> execute(
>>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.
>>>>>>>>>>>>>> execute(
>>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at org.apache.oozie.command.
>>>>>>>>>>>>>> XCommand.call(XCommand.java:290)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.
>>>>>>>>>>>>>> java:266)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.service.CallableQueueService$
>>>>>>>>>>>>>> CallableWrapper.run(
>>>>>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Is S3 support specific to CDH distribution or should it
>>>>>>>>>>>>>> work in Apache
>>>>>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check
>>>>>>>>>>>>>> out this
>>>>>>>>>>>>>> > > <
>>>>>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > description on how to make it work in jobs, something
>>>>>>>>>>>>>> similar should work
>>>>>>>>>>>>>> > > on the server side as well
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3
>>>>>>>>>>>>>> as input events
>>>>>>>>>>>>>> > > to
>>>>>>>>>>>>>> > > > poll for a dependency file before kicking off a spark
>>>>>>>>>>>>>> action
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > For example: I don’t want to kick off a spark action
>>>>>>>>>>>>>> until a file is
>>>>>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>>>>>> > andras.piros@cloudera.com
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > wrote:
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > > Hi,
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or
>>>>>>>>>>>>>> bundle
>>>>>>>>>>>>>> > definitions,
>>>>>>>>>>>>>> > > > as
>>>>>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>>>>>> scalable way. Oozie
>>>>>>>>>>>>>> > > > needs
>>>>>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action
>>>>>>>>>>>>>> being no
>>>>>>>>>>>>>> > exception.
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie
>>>>>>>>>>>>>> without those Hadoop
>>>>>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html
>>>>>>>>>>>>>> >*.
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > Regards,
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > Andras
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > > Hi,
>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in
>>>>>>>>>>>>>> oozie without having
>>>>>>>>>>>>>> > > > > Hadoop
>>>>>>>>>>>>>> > > > > > cluster?
>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>>>>>> Kubernetes cluster
>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>> > > > > > Thanks
>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > [image: Cloudera on Twitter] <
>>>>>>>>>>>>>> https://twitter.com/cloudera> [image:
>>>>>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>>>> [image:
>>>>>>>>>>>>>> > Cloudera
>>>>>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>> > > ------------------------------
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>>>>> [image:
>>>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>>>> [image: Cloudera
>>>>>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>
>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>
>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>>>>> Cloudera on LinkedIn]
>>>>>>>>>>>> <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>
>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>
>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Peter Cseh *| Software Engineer
>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>
>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>
>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>> ------------------------------
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> *Peter Cseh *| Software Engineer
>>>> cloudera.com <https://www.cloudera.com>
>>>>
>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>
>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>> ------------------------------
>>>>
>>>>
--
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
Here you go !
- Add oozie.service.HadoopAccessorService.supported.filesystems as * in
oozie-site.xml
- include hadoop-aws-2.8.3.jar
- Rebuild oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9
- Set jetty_opts with proxy values
On Sat, May 19, 2018 at 2:17 AM Peter Cseh <ge...@cloudera.com> wrote:
> Wow, great work!
> Can you please summarize the required steps? This would be useful for
> others so we probably should add it to our documentation.
> Thanks in advance!
> Peter
>
> On Fri, May 18, 2018 at 11:33 PM, purna pradeep <pu...@gmail.com>
> wrote:
>
>> I got this fixed by setting jetty_opts with proxy values.
>>
>> Thanks Peter!!
>>
>> On Thu, May 17, 2018 at 4:05 PM purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>> Ok I fixed this by adding aws keys in oozie
>>>
>>> But I’m getting below error
>>>
>>> I have tried setting proxy in core-site.xml but no luck
>>>
>>>
>>> 2018-05-17 15:39:20,602 ERROR CoordInputLogicEvaluatorPhaseOne:517 -
>>> SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-]
>>> JOB[0000000-180517144113498-oozie-xjt0-C] ACTION[0000000-
>>> 180517144113498-oozie-xjt0-C@2] org.apache.oozie.service.HadoopAccessorException:
>>> E0902: Exception occurred: [doesBucketExist on cmsegmentation-qa:
>>> com.amazonaws.SdkClientException: Unable to execute HTTP request:
>>> Connect to mybucket.s3.amazonaws.com:443
>>> <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket.
>>> s3.amazonaws.com/52.216.165.155
>>> <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155>] failed:
>>> connect timed out]
>>>
>>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
>>> occurred: [doesBucketExist on cmsegmentation-qa:
>>> com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect
>>> to mybucket.s3.amazonaws.com:443
>>> <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket
>>> .s3.amazonaws.com
>>> <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155> failed:
>>> connect timed out]
>>>
>>> at
>>> org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:630)
>>>
>>> at
>>> org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:594)
>>> at org.apache.oozie.dependency.
>>> FSURIHandler.getFileSystem(FSURIHandler.java:184)-env.sh
>>>
>>> But now I’m getting this error
>>>
>>>
>>>
>>> On Thu, May 17, 2018 at 2:53 PM purna pradeep <pu...@gmail.com>
>>> wrote:
>>>
>>>> Ok I got passed this error
>>>>
>>>> By rebuilding oozie with Dhttpclient.version=4.5.5
>>>> -Dhttpcore.version=4.4.9
>>>>
>>>> now getting this error
>>>>
>>>>
>>>>
>>>> ACTION[0000000-180517144113498-oozie-xjt0-C@1]
>>>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
>>>> occurred: [doesBucketExist on mybucketcom.amazonaws.AmazonClientException:
>>>> No AWS Credentials provided by BasicAWSCredentialsProvider
>>>> EnvironmentVariableCredentialsProvider
>>>> SharedInstanceProfileCredentialsProvider :
>>>> com.amazonaws.SdkClientException: Unable to load credentials from service
>>>> endpoint]
>>>>
>>>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
>>>> occurred: [doesBucketExist on cmsegmentation-qa:
>>>> com.amazonaws.AmazonClientException: No AWS Credentials provided by
>>>> BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider
>>>> SharedInstanceProfileCredentialsProvider :
>>>> com.amazonaws.SdkClientException: Unable to load credentials from service
>>>> endpoint]
>>>>
>>>> On Thu, May 17, 2018 at 12:24 PM purna pradeep <pu...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Peter,
>>>>>
>>>>> Also When I submit a job with new http client jar, I get
>>>>>
>>>>> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie
>>>>> server. No of retries = 1. Exception = Could not authenticate,
>>>>> Authentication failed, status: 500, message: Server Error```
>>>>>
>>>>>
>>>>> On Thu, May 17, 2018 at 12:14 PM purna pradeep <
>>>>> purna2pradeep@gmail.com> wrote:
>>>>>
>>>>>> Ok I have tried this
>>>>>>
>>>>>> It appears that s3a support requires httpclient 4.4.x and oozie is
>>>>>> bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI
>>>>>> stops loading.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Purna,
>>>>>>>
>>>>>>> Based on
>>>>>>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3
>>>>>>> you should try to go for s3a.
>>>>>>> You'll have to include the aws-jdk as well if I see it correctly:
>>>>>>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A
>>>>>>> Also, the property names are slightly different so you'll have to
>>>>>>> change the example I've given.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <
>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>
>>>>>>>> Peter,
>>>>>>>>
>>>>>>>> I’m using latest oozie 5.0.0 and I have tried below changes but no
>>>>>>>> luck
>>>>>>>>
>>>>>>>> Is this for s3 or s3a ?
>>>>>>>>
>>>>>>>> I’m using s3 but if this is for s3a do you know which jar I need to
>>>>>>>> include I mean Hadoop-aws jar or any other jar if required
>>>>>>>>
>>>>>>>> Hadoop-aws-2.8.3.jar is what I’m using
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <ge...@cloudera.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Ok, I've found it:
>>>>>>>>>
>>>>>>>>> If you are using 4.3.0 or newer this is the part which checks for
>>>>>>>>> dependencies:
>>>>>>>>>
>>>>>>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
>>>>>>>>> It passes the coordinator action's configuration and even does
>>>>>>>>> impersonation to check for the dependencies:
>>>>>>>>>
>>>>>>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
>>>>>>>>>
>>>>>>>>> Have you tried the following in the coordinator xml:
>>>>>>>>>
>>>>>>>>> <action>
>>>>>>>>> <workflow>
>>>>>>>>>
>>>>>>>>> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
>>>>>>>>> <configuration>
>>>>>>>>> <property>
>>>>>>>>> <name>fs.s3.awsAccessKeyId</name>
>>>>>>>>> <value>[YOURKEYID]</value>
>>>>>>>>> </property>
>>>>>>>>> <property>
>>>>>>>>> <name>fs.s3.awsSecretAccessKey</name>
>>>>>>>>> <value>[YOURKEY]</value>
>>>>>>>>> </property>
>>>>>>>>> </configuration>
>>>>>>>>> </workflow>
>>>>>>>>> </action>
>>>>>>>>>
>>>>>>>>> Based on the source this should be able to poll s3 periodically.
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <
>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have tried with coordinator's configuration too but no luck ☹️
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Great progress there purna! :)
>>>>>>>>>>>
>>>>>>>>>>> Have you tried adding these properites to the coordinator's
>>>>>>>>>>> configuration? we usually use the action config to build up connection to
>>>>>>>>>>> the distributed file system.
>>>>>>>>>>> Although I'm not sure we're using these when polling the
>>>>>>>>>>> dependencies for coordinators, but I'm excited about you trying to make it
>>>>>>>>>>> work!
>>>>>>>>>>>
>>>>>>>>>>> I'll get back with a - hopefully - more helpful answer soon, I
>>>>>>>>>>> have to check the code in more depth first.
>>>>>>>>>>> gp
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <
>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Peter,
>>>>>>>>>>>>
>>>>>>>>>>>> I got rid of this error by adding
>>>>>>>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>>>>>>>>
>>>>>>>>>>>> But I’m getting below error now
>>>>>>>>>>>>
>>>>>>>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and
>>>>>>>>>>>> Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>>>>>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>>>>>>>>
>>>>>>>>>>>> I have tried adding AWS access ,secret keys in
>>>>>>>>>>>>
>>>>>>>>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <
>>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have tried this ,just added s3 instead of *
>>>>>>>>>>>>>
>>>>>>>>>>>>> <property>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>>>>>>>>
>>>>>>>>>>>>> </property>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Getting below error
>>>>>>>>>>>>>
>>>>>>>>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>>>>>>>>>>> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>>>>>>>>
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>>>>>>>>>>>>>
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>>>>>>>>>>>>>
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>>>>>>>>>>>>>
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>>>>>>>>>>>
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>>>>>>>>>>>>>
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>>>>>>>>>>>
>>>>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>>>>>>>>
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>>>>>>>>>>>>>
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is what is in the logs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>>>>>>>>>>> SERVER[localhost] Loaded urihandlers
>>>>>>>>>>>>>> [org.apache.oozie.dependency.FSURIHandler]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>>>>>>>>>>> SERVER[localhost] Loaded default urihandler
>>>>>>>>>>>>>> org.apache.oozie.dependency.FSURIHandler
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <
>>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's strange, this exception should not happen in that
>>>>>>>>>>>>>>> case.
>>>>>>>>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>>>>>>>>> LOG.info("Loaded urihandlers {0}",
>>>>>>>>>>>>>>> Arrays.toString(classes));
>>>>>>>>>>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>>>>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <property>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <value>*</value>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> </property>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <
>>>>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You'll have to configure
>>>>>>>>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>>>>>>>>> the different filesystems supported for federation. If
>>>>>>>>>>>>>>>>> wildcard "*" is
>>>>>>>>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For testing purposes it's ok to put * in there in
>>>>>>>>>>>>>>>>> oozie-site.xml
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > Peter,
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > I have tried to specify dataset with uri starting with
>>>>>>>>>>>>>>>>> s3://, s3a:// and
>>>>>>>>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in
>>>>>>>>>>>>>>>>> uri
>>>>>>>>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904:
>>>>>>>>>>>>>>>>> Scheme [s3] not
>>>>>>>>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>>>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>>>>>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>>>>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Is S3 support specific to CDH distribution or should it
>>>>>>>>>>>>>>>>> work in Apache
>>>>>>>>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > > I think it should be possible for Oozie to poll S3.
>>>>>>>>>>>>>>>>> Check out this
>>>>>>>>>>>>>>>>> > > <
>>>>>>>>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>> > > description on how to make it work in jobs, something
>>>>>>>>>>>>>>>>> similar should work
>>>>>>>>>>>>>>>>> > > on the server side as well
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws
>>>>>>>>>>>>>>>>> S3 as input events
>>>>>>>>>>>>>>>>> > > to
>>>>>>>>>>>>>>>>> > > > poll for a dependency file before kicking off a
>>>>>>>>>>>>>>>>> spark action
>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>> > > > For example: I don’t want to kick off a spark action
>>>>>>>>>>>>>>>>> until a file is
>>>>>>>>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>>>>>>>>> > andras.piros@cloudera.com
>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>> > > > wrote:
>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>> > > > > Hi,
>>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator,
>>>>>>>>>>>>>>>>> or bundle
>>>>>>>>>>>>>>>>> > definitions,
>>>>>>>>>>>>>>>>> > > > as
>>>>>>>>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>>>>>>>>> scalable way. Oozie
>>>>>>>>>>>>>>>>> > > > needs
>>>>>>>>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark
>>>>>>>>>>>>>>>>> action being no
>>>>>>>>>>>>>>>>> > exception.
>>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie
>>>>>>>>>>>>>>>>> without those Hadoop
>>>>>>>>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>>>>>>>>> > > > > <
>>>>>>>>>>>>>>>>> https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>>> > > > > Regards,
>>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>>> > > > > Andras
>>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>>> > > > > > Hi,
>>>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in
>>>>>>>>>>>>>>>>> oozie without having
>>>>>>>>>>>>>>>>> > > > > Hadoop
>>>>>>>>>>>>>>>>> > > > > > cluster?
>>>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>>>>>>>>> Kubernetes cluster
>>>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>>>> > > > > > Thanks
>>>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > > [image: Cloudera on Twitter] <
>>>>>>>>>>>>>>>>> https://twitter.com/cloudera> [image:
>>>>>>>>>>>>>>>>> > > Cloudera on Facebook] <
>>>>>>>>>>>>>>>>> https://www.facebook.com/cloudera> [image:
>>>>>>>>>>>>>>>>> > Cloudera
>>>>>>>>>>>>>>>>> > > on LinkedIn] <
>>>>>>>>>>>>>>>>> https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>>>>> > > ------------------------------
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>>>>>>>> [image:
>>>>>>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>>>>>>> [image: Cloudera
>>>>>>>>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>>>>>>>> Cloudera on LinkedIn]
>>>>>>>>>>>>>>> <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>
>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>
>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>>>> Cloudera on LinkedIn]
>>>>>>>>>>> <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>> ------------------------------
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>
>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>
>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>> ------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>
>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>
>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>>
>
>
> --
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
>
Re: Oozie for spark jobs without Hadoop
Posted by Peter Cseh <ge...@cloudera.com>.
Wow, great work!
Can you please summarize the required steps? This would be useful for
others so we probably should add it to our documentation.
Thanks in advance!
Peter
On Fri, May 18, 2018 at 11:33 PM, purna pradeep <pu...@gmail.com>
wrote:
> I got this fixed by setting jetty_opts with proxy values.
>
> Thanks Peter!!
>
> On Thu, May 17, 2018 at 4:05 PM purna pradeep <pu...@gmail.com>
> wrote:
>
>> Ok I fixed this by adding aws keys in oozie
>>
>> But I’m getting below error
>>
>> I have tried setting proxy in core-site.xml but no luck
>>
>>
>> 2018-05-17 15:39:20,602 ERROR CoordInputLogicEvaluatorPhaseOne:517 -
>> SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-]
>> JOB[0000000-180517144113498-oozie-xjt0-C] ACTION[0000000-180517144113498
>> -oozie-xjt0-C@2] org.apache.oozie.service.HadoopAccessorException:
>> E0902: Exception occurred: [doesBucketExist on cmsegmentation-qa:
>> com.amazonaws.SdkClientException: Unable to execute HTTP request:
>> Connect to mybucket.s3.amazonaws.com:443
>> <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket.s3.amazonaws.
>> com/52.216.165.155
>> <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155>] failed:
>> connect timed out]
>>
>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
>> occurred: [doesBucketExist on cmsegmentation-qa: com.amazonaws.SdkClientException:
>> Unable to execute HTTP request: Connect to mybucket.s3.amazonaws.com:443
>> <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket
>> .s3.amazonaws.com
>> <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155> failed:
>> connect timed out]
>>
>> at org.apache.oozie.service.HadoopAccessorService.
>> createFileSystem(HadoopAccessorService.java:630)
>>
>> at org.apache.oozie.service.HadoopAccessorService.
>> createFileSystem(HadoopAccessorService.java:594)
>> at org.apache.oozie.dependency.FS
>> URIHandler.getFileSystem(FSURIHandler.java:184)-env.sh
>>
>> But now I’m getting this error
>>
>>
>>
>> On Thu, May 17, 2018 at 2:53 PM purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>> Ok I got passed this error
>>>
>>> By rebuilding oozie with Dhttpclient.version=4.5.5
>>> -Dhttpcore.version=4.4.9
>>>
>>> now getting this error
>>>
>>>
>>>
>>> ACTION[0000000-180517144113498-oozie-xjt0-C@1] org.apache.oozie.service.HadoopAccessorException:
>>> E0902: Exception occurred: [doesBucketExist on mybucketcom.amazonaws.AmazonClientException:
>>> No AWS Credentials provided by BasicAWSCredentialsProvider
>>> EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider
>>> : com.amazonaws.SdkClientException: Unable to load credentials from
>>> service endpoint]
>>>
>>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
>>> occurred: [doesBucketExist on cmsegmentation-qa: com.amazonaws.AmazonClientException:
>>> No AWS Credentials provided by BasicAWSCredentialsProvider
>>> EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider
>>> : com.amazonaws.SdkClientException: Unable to load credentials from
>>> service endpoint]
>>>
>>> On Thu, May 17, 2018 at 12:24 PM purna pradeep <pu...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Peter,
>>>>
>>>> Also When I submit a job with new http client jar, I get
>>>>
>>>> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie
>>>> server. No of retries = 1. Exception = Could not authenticate,
>>>> Authentication failed, status: 500, message: Server Error```
>>>>
>>>>
>>>> On Thu, May 17, 2018 at 12:14 PM purna pradeep <pu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Ok I have tried this
>>>>>
>>>>> It appears that s3a support requires httpclient 4.4.x and oozie is
>>>>> bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI
>>>>> stops loading.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Purna,
>>>>>>
>>>>>> Based on https://hadoop.apache.org/docs/stable/hadoop-aws/tools/
>>>>>> hadoop-aws/index.html#S3 you should try to go for s3a.
>>>>>> You'll have to include the aws-jdk as well if I see it correctly:
>>>>>> https://hadoop.apache.org/docs/stable/hadoop-
>>>>>> aws/tools/hadoop-aws/index.html#S3A
>>>>>> Also, the property names are slightly different so you'll have to
>>>>>> change the example I've given.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <
>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>
>>>>>>> Peter,
>>>>>>>
>>>>>>> I’m using latest oozie 5.0.0 and I have tried below changes but no
>>>>>>> luck
>>>>>>>
>>>>>>> Is this for s3 or s3a ?
>>>>>>>
>>>>>>> I’m using s3 but if this is for s3a do you know which jar I need to
>>>>>>> include I mean Hadoop-aws jar or any other jar if required
>>>>>>>
>>>>>>> Hadoop-aws-2.8.3.jar is what I’m using
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <ge...@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Ok, I've found it:
>>>>>>>>
>>>>>>>> If you are using 4.3.0 or newer this is the part which checks for
>>>>>>>> dependencies:
>>>>>>>> https://github.com/apache/oozie/blob/master/core/src/
>>>>>>>> main/java/org/apache/oozie/command/coord/
>>>>>>>> CoordCommandUtils.java#L914-L926
>>>>>>>> It passes the coordinator action's configuration and even does
>>>>>>>> impersonation to check for the dependencies:
>>>>>>>> https://github.com/apache/oozie/blob/master/core/src/
>>>>>>>> main/java/org/apache/oozie/coord/input/logic/
>>>>>>>> CoordInputLogicEvaluatorPhaseOne.java#L159
>>>>>>>>
>>>>>>>> Have you tried the following in the coordinator xml:
>>>>>>>>
>>>>>>>> <action>
>>>>>>>> <workflow>
>>>>>>>> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-
>>>>>>>> path>
>>>>>>>> <configuration>
>>>>>>>> <property>
>>>>>>>> <name>fs.s3.awsAccessKeyId</name>
>>>>>>>> <value>[YOURKEYID]</value>
>>>>>>>> </property>
>>>>>>>> <property>
>>>>>>>> <name>fs.s3.awsSecretAccessKey</name>
>>>>>>>> <value>[YOURKEY]</value>
>>>>>>>> </property>
>>>>>>>> </configuration>
>>>>>>>> </workflow>
>>>>>>>> </action>
>>>>>>>>
>>>>>>>> Based on the source this should be able to poll s3 periodically.
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <
>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have tried with coordinator's configuration too but no luck ☹️
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Great progress there purna! :)
>>>>>>>>>>
>>>>>>>>>> Have you tried adding these properites to the coordinator's
>>>>>>>>>> configuration? we usually use the action config to build up connection to
>>>>>>>>>> the distributed file system.
>>>>>>>>>> Although I'm not sure we're using these when polling the
>>>>>>>>>> dependencies for coordinators, but I'm excited about you trying to make it
>>>>>>>>>> work!
>>>>>>>>>>
>>>>>>>>>> I'll get back with a - hopefully - more helpful answer soon, I
>>>>>>>>>> have to check the code in more depth first.
>>>>>>>>>> gp
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <
>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Peter,
>>>>>>>>>>>
>>>>>>>>>>> I got rid of this error by adding
>>>>>>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>>>>>>>
>>>>>>>>>>> But I’m getting below error now
>>>>>>>>>>>
>>>>>>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and
>>>>>>>>>>> Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>>>>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>>>>>>>
>>>>>>>>>>> I have tried adding AWS access ,secret keys in
>>>>>>>>>>>
>>>>>>>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <
>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I have tried this ,just added s3 instead of *
>>>>>>>>>>>>
>>>>>>>>>>>> <property>
>>>>>>>>>>>>
>>>>>>>>>>>> <name>oozie.service.HadoopAccessorService.
>>>>>>>>>>>> supported.filesystems</name>
>>>>>>>>>>>>
>>>>>>>>>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>>>>>>>
>>>>>>>>>>>> </property>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Getting below error
>>>>>>>>>>>>
>>>>>>>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>>>>>>>>>> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>>>>>>>
>>>>>>>>>>>> at org.apache.hadoop.conf.Configuration.getClass(
>>>>>>>>>>>> Configuration.java:2369)
>>>>>>>>>>>>
>>>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
>>>>>>>>>>>> FileSystem.java:2793)
>>>>>>>>>>>>
>>>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(
>>>>>>>>>>>> FileSystem.java:2810)
>>>>>>>>>>>>
>>>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(
>>>>>>>>>>>> FileSystem.java:100)
>>>>>>>>>>>>
>>>>>>>>>>>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
>>>>>>>>>>>> FileSystem.java:2849)
>>>>>>>>>>>>
>>>>>>>>>>>> at org.apache.hadoop.fs.FileSystem$Cache.get(
>>>>>>>>>>>> FileSystem.java:2831)
>>>>>>>>>>>>
>>>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>>>>>>>
>>>>>>>>>>>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>>>>>>>>>>>> HadoopAccessorService.java:625)
>>>>>>>>>>>>
>>>>>>>>>>>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>>>>>>>>>>>> HadoopAccessorService.java:623
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> This is what is in the logs
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>>>>>>>>>> SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency.
>>>>>>>>>>>>> FSURIHandler]
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>>>>>>>>>> SERVER[localhost] Loaded default urihandler org.apache.oozie.dependency.
>>>>>>>>>>>>> FSURIHandler
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <
>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>>>>>>>> LOG.info("Loaded urihandlers {0}",
>>>>>>>>>>>>>> Arrays.toString(classes));
>>>>>>>>>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>>>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <property>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <name>oozie.service.HadoopAccessorService.
>>>>>>>>>>>>>>> supported.filesystems</name>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <value>*</value>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> </property>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <
>>>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You'll have to configure
>>>>>>>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>>>>>>>> the different filesystems supported for federation. If
>>>>>>>>>>>>>>>> wildcard "*" is
>>>>>>>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For testing purposes it's ok to put * in there in
>>>>>>>>>>>>>>>> oozie-site.xml
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > Peter,
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > I have tried to specify dataset with uri starting with
>>>>>>>>>>>>>>>> s3://, s3a:// and
>>>>>>>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904:
>>>>>>>>>>>>>>>> Scheme [s3] not
>>>>>>>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>>>> createEarlyURIs(
>>>>>>>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>>>> materializeOneInstance(
>>>>>>>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.command.coord.
>>>>>>>>>>>>>>>> CoordMaterializeTransitionXCom
>>>>>>>>>>>>>>>> > mand.materializeActions(CoordMaterializeTransitionXCom
>>>>>>>>>>>>>>>> mand.java:492)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.command.coord.
>>>>>>>>>>>>>>>> CoordMaterializeTransitionXCom
>>>>>>>>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCom
>>>>>>>>>>>>>>>> mand.java:362)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.
>>>>>>>>>>>>>>>> execute(
>>>>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.
>>>>>>>>>>>>>>>> execute(
>>>>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at org.apache.oozie.command.
>>>>>>>>>>>>>>>> XCommand.call(XCommand.java:290)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.
>>>>>>>>>>>>>>>> java:266)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > org.apache.oozie.service.CallableQueueService$
>>>>>>>>>>>>>>>> CallableWrapper.run(
>>>>>>>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Is S3 support specific to CDH distribution or should it
>>>>>>>>>>>>>>>> work in Apache
>>>>>>>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > > I think it should be possible for Oozie to poll S3.
>>>>>>>>>>>>>>>> Check out this
>>>>>>>>>>>>>>>> > > <
>>>>>>>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > description on how to make it work in jobs, something
>>>>>>>>>>>>>>>> similar should work
>>>>>>>>>>>>>>>> > > on the server side as well
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws
>>>>>>>>>>>>>>>> S3 as input events
>>>>>>>>>>>>>>>> > > to
>>>>>>>>>>>>>>>> > > > poll for a dependency file before kicking off a spark
>>>>>>>>>>>>>>>> action
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > For example: I don’t want to kick off a spark action
>>>>>>>>>>>>>>>> until a file is
>>>>>>>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>>>>>>>> > andras.piros@cloudera.com
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > wrote:
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > > Hi,
>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or
>>>>>>>>>>>>>>>> bundle
>>>>>>>>>>>>>>>> > definitions,
>>>>>>>>>>>>>>>> > > > as
>>>>>>>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>>>>>>>> scalable way. Oozie
>>>>>>>>>>>>>>>> > > > needs
>>>>>>>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action
>>>>>>>>>>>>>>>> being no
>>>>>>>>>>>>>>>> > exception.
>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie
>>>>>>>>>>>>>>>> without those Hadoop
>>>>>>>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>>>>>>>> > > > > <https://oozie.apache.org/
>>>>>>>>>>>>>>>> docs/5.0.0/AG_Install.html>*.
>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>> > > > > Regards,
>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>> > > > > Andras
>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>> > > > > > Hi,
>>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in
>>>>>>>>>>>>>>>> oozie without having
>>>>>>>>>>>>>>>> > > > > Hadoop
>>>>>>>>>>>>>>>> > > > > > cluster?
>>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>>>>>>>> Kubernetes cluster
>>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>>> > > > > > Thanks
>>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > [image: Cloudera on Twitter] <
>>>>>>>>>>>>>>>> https://twitter.com/cloudera> [image:
>>>>>>>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/
>>>>>>>>>>>>>>>> cloudera> [image:
>>>>>>>>>>>>>>>> > Cloudera
>>>>>>>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > > ------------------------------
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>>>>>>> [image:
>>>>>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>>>>>> [image: Cloudera
>>>>>>>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>>>>>>> Cloudera on LinkedIn]
>>>>>>>>>>>>>> <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>
>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>
>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>> ------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>
>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>
>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Peter Cseh *| Software Engineer
>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>
>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>
>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>> ------------------------------
>>>>>>
>>>>>>
--
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
I got this fixed by setting jetty_opts with proxy values.
Thanks Peter!!
On Thu, May 17, 2018 at 4:05 PM purna pradeep <pu...@gmail.com>
wrote:
> Ok I fixed this by adding aws keys in oozie
>
> But I’m getting below error
>
> I have tried setting proxy in core-site.xml but no luck
>
>
> 2018-05-17 15:39:20,602 ERROR CoordInputLogicEvaluatorPhaseOne:517 -
> SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-]
> JOB[0000000-180517144113498-oozie-xjt0-C] ACTION[0000000-
> 180517144113498-oozie-xjt0-C@2] org.apache.oozie.service.HadoopAccessorException:
> E0902: Exception occurred: [doesBucketExist on cmsegmentation-qa:
> com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect
> to mybucket.s3.amazonaws.com:443
> <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket.
> s3.amazonaws.com/52.216.165.155
> <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155>] failed:
> connect timed out]
>
> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
> occurred: [doesBucketExist on cmsegmentation-qa:
> com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect
> to mybucket.s3.amazonaws.com:443
> <http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket
> .s3.amazonaws.com
> <http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155> failed:
> connect timed out]
>
> at
> org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:630)
>
> at
> org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:594)
> at org.apache.oozie.dependency.FSURIHandler.getFileSystem(
> FSURIHandler.java:184)-env.sh
>
> But now I’m getting this error
>
>
>
> On Thu, May 17, 2018 at 2:53 PM purna pradeep <pu...@gmail.com>
> wrote:
>
>> Ok I got passed this error
>>
>> By rebuilding oozie with Dhttpclient.version=4.5.5
>> -Dhttpcore.version=4.4.9
>>
>> now getting this error
>>
>>
>>
>> ACTION[0000000-180517144113498-oozie-xjt0-C@1]
>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
>> occurred: [doesBucketExist on mybucketcom.amazonaws.AmazonClientException:
>> No AWS Credentials provided by BasicAWSCredentialsProvider
>> EnvironmentVariableCredentialsProvider
>> SharedInstanceProfileCredentialsProvider :
>> com.amazonaws.SdkClientException: Unable to load credentials from service
>> endpoint]
>>
>> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
>> occurred: [doesBucketExist on cmsegmentation-qa:
>> com.amazonaws.AmazonClientException: No AWS Credentials provided by
>> BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider
>> SharedInstanceProfileCredentialsProvider :
>> com.amazonaws.SdkClientException: Unable to load credentials from service
>> endpoint]
>>
>> On Thu, May 17, 2018 at 12:24 PM purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>>
>>> Peter,
>>>
>>> Also When I submit a job with new http client jar, I get
>>>
>>> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie
>>> server. No of retries = 1. Exception = Could not authenticate,
>>> Authentication failed, status: 500, message: Server Error```
>>>
>>>
>>> On Thu, May 17, 2018 at 12:14 PM purna pradeep <pu...@gmail.com>
>>> wrote:
>>>
>>>> Ok I have tried this
>>>>
>>>> It appears that s3a support requires httpclient 4.4.x and oozie is
>>>> bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI
>>>> stops loading.
>>>>
>>>>
>>>>
>>>> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Purna,
>>>>>
>>>>> Based on
>>>>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3
>>>>> you should try to go for s3a.
>>>>> You'll have to include the aws-jdk as well if I see it correctly:
>>>>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A
>>>>> Also, the property names are slightly different so you'll have to
>>>>> change the example I've given.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <
>>>>> purna2pradeep@gmail.com> wrote:
>>>>>
>>>>>> Peter,
>>>>>>
>>>>>> I’m using latest oozie 5.0.0 and I have tried below changes but no
>>>>>> luck
>>>>>>
>>>>>> Is this for s3 or s3a ?
>>>>>>
>>>>>> I’m using s3 but if this is for s3a do you know which jar I need to
>>>>>> include I mean Hadoop-aws jar or any other jar if required
>>>>>>
>>>>>> Hadoop-aws-2.8.3.jar is what I’m using
>>>>>>
>>>>>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <ge...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Ok, I've found it:
>>>>>>>
>>>>>>> If you are using 4.3.0 or newer this is the part which checks for
>>>>>>> dependencies:
>>>>>>>
>>>>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
>>>>>>> It passes the coordinator action's configuration and even does
>>>>>>> impersonation to check for the dependencies:
>>>>>>>
>>>>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
>>>>>>>
>>>>>>> Have you tried the following in the coordinator xml:
>>>>>>>
>>>>>>> <action>
>>>>>>> <workflow>
>>>>>>>
>>>>>>> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
>>>>>>> <configuration>
>>>>>>> <property>
>>>>>>> <name>fs.s3.awsAccessKeyId</name>
>>>>>>> <value>[YOURKEYID]</value>
>>>>>>> </property>
>>>>>>> <property>
>>>>>>> <name>fs.s3.awsSecretAccessKey</name>
>>>>>>> <value>[YOURKEY]</value>
>>>>>>> </property>
>>>>>>> </configuration>
>>>>>>> </workflow>
>>>>>>> </action>
>>>>>>>
>>>>>>> Based on the source this should be able to poll s3 periodically.
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <
>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I have tried with coordinator's configuration too but no luck ☹️
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Great progress there purna! :)
>>>>>>>>>
>>>>>>>>> Have you tried adding these properites to the coordinator's
>>>>>>>>> configuration? we usually use the action config to build up connection to
>>>>>>>>> the distributed file system.
>>>>>>>>> Although I'm not sure we're using these when polling the
>>>>>>>>> dependencies for coordinators, but I'm excited about you trying to make it
>>>>>>>>> work!
>>>>>>>>>
>>>>>>>>> I'll get back with a - hopefully - more helpful answer soon, I
>>>>>>>>> have to check the code in more depth first.
>>>>>>>>> gp
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <
>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Peter,
>>>>>>>>>>
>>>>>>>>>> I got rid of this error by adding
>>>>>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>>>>>>
>>>>>>>>>> But I’m getting below error now
>>>>>>>>>>
>>>>>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
>>>>>>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>>>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>>>>>>
>>>>>>>>>> I have tried adding AWS access ,secret keys in
>>>>>>>>>>
>>>>>>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <
>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have tried this ,just added s3 instead of *
>>>>>>>>>>>
>>>>>>>>>>> <property>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>>>
>>>>>>>>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>>>>>>
>>>>>>>>>>> </property>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Getting below error
>>>>>>>>>>>
>>>>>>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>>>>>>>>> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>>>>>>>>>
>>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>>>>>>>>>>>
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> This is what is in the logs
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>>>>>>>>> SERVER[localhost] Loaded urihandlers
>>>>>>>>>>>> [org.apache.oozie.dependency.FSURIHandler]
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>>>>>>>>> SERVER[localhost] Loaded default urihandler
>>>>>>>>>>>> org.apache.oozie.dependency.FSURIHandler
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <
>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>>>>>>> LOG.info("Loaded urihandlers {0}",
>>>>>>>>>>>>> Arrays.toString(classes));
>>>>>>>>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <property>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <value>*</value>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> </property>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <
>>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You'll have to configure
>>>>>>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>>>>>>> the different filesystems supported for federation. If
>>>>>>>>>>>>>>> wildcard "*" is
>>>>>>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For testing purposes it's ok to put * in there in
>>>>>>>>>>>>>>> oozie-site.xml
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Peter,
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > I have tried to specify dataset with uri starting with
>>>>>>>>>>>>>>> s3://, s3a:// and
>>>>>>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904:
>>>>>>>>>>>>>>> Scheme [s3] not
>>>>>>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>>>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Is S3 support specific to CDH distribution or should it
>>>>>>>>>>>>>>> work in Apache
>>>>>>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > > I think it should be possible for Oozie to poll S3.
>>>>>>>>>>>>>>> Check out this
>>>>>>>>>>>>>>> > > <
>>>>>>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > description on how to make it work in jobs, something
>>>>>>>>>>>>>>> similar should work
>>>>>>>>>>>>>>> > > on the server side as well
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws
>>>>>>>>>>>>>>> S3 as input events
>>>>>>>>>>>>>>> > > to
>>>>>>>>>>>>>>> > > > poll for a dependency file before kicking off a spark
>>>>>>>>>>>>>>> action
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > For example: I don’t want to kick off a spark action
>>>>>>>>>>>>>>> until a file is
>>>>>>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>>>>>>> > andras.piros@cloudera.com
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > wrote:
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > > Hi,
>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or
>>>>>>>>>>>>>>> bundle
>>>>>>>>>>>>>>> > definitions,
>>>>>>>>>>>>>>> > > > as
>>>>>>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>>>>>>> scalable way. Oozie
>>>>>>>>>>>>>>> > > > needs
>>>>>>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action
>>>>>>>>>>>>>>> being no
>>>>>>>>>>>>>>> > exception.
>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie
>>>>>>>>>>>>>>> without those Hadoop
>>>>>>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html
>>>>>>>>>>>>>>> >*.
>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>> > > > > Regards,
>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>> > > > > Andras
>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>> > > > > > Hi,
>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in
>>>>>>>>>>>>>>> oozie without having
>>>>>>>>>>>>>>> > > > > Hadoop
>>>>>>>>>>>>>>> > > > > > cluster?
>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>>>>>>> Kubernetes cluster
>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>> > > > > > Thanks
>>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > [image: Cloudera on Twitter] <
>>>>>>>>>>>>>>> https://twitter.com/cloudera> [image:
>>>>>>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>>>>> [image:
>>>>>>>>>>>>>>> > Cloudera
>>>>>>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>>> > > ------------------------------
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>>>>>> [image:
>>>>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>>>>> [image: Cloudera
>>>>>>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>>>>>> Cloudera on LinkedIn]
>>>>>>>>>>>>> <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>
>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>
>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>> ------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>
>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>
>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Peter Cseh *| Software Engineer
>>>>> cloudera.com <https://www.cloudera.com>
>>>>>
>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>
>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>> ------------------------------
>>>>>
>>>>>
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
Ok I fixed this by adding aws keys in oozie
But I’m getting below error
I have tried setting proxy in core-site.xml but no luck
2018-05-17 15:39:20,602 ERROR CoordInputLogicEvaluatorPhaseOne:517 -
SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-]
JOB[0000000-180517144113498-oozie-xjt0-C] ACTION[0000000-
180517144113498-oozie-xjt0-C@2]
org.apache.oozie.service.HadoopAccessorException:
E0902: Exception occurred: [doesBucketExist on cmsegmentation-qa:
com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to
mybucket.s3.amazonaws.com:443
<http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket.
s3.amazonaws.com/52.216.165.155
<http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155>] failed: connect
timed out]
org.apache.oozie.service.HadoopAccessorException: E0902: Exception
occurred: [doesBucketExist on cmsegmentation-qa:
com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect
to mybucket.s3.amazonaws.com:443
<http://cmsegmentation-qa.s3.amazonaws.com:443/> [mybucket.s3.amazonaws.com
<http://cmsegmentation-qa.s3.amazonaws.com/52.216.165.155> failed: connect
timed out]
at
org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:630)
at
org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:594)
at org.apache.oozie.dependency.FSURIHandler.getFileSystem(
FSURIHandler.java:184)-env.sh
But now I’m getting this error
On Thu, May 17, 2018 at 2:53 PM purna pradeep <pu...@gmail.com>
wrote:
> Ok I got passed this error
>
> By rebuilding oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9
>
> now getting this error
>
>
>
> ACTION[0000000-180517144113498-oozie-xjt0-C@1]
> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
> occurred: [doesBucketExist on mybucketcom.amazonaws.AmazonClientException:
> No AWS Credentials provided by BasicAWSCredentialsProvider
> EnvironmentVariableCredentialsProvider
> SharedInstanceProfileCredentialsProvider :
> com.amazonaws.SdkClientException: Unable to load credentials from service
> endpoint]
>
> org.apache.oozie.service.HadoopAccessorException: E0902: Exception
> occurred: [doesBucketExist on cmsegmentation-qa:
> com.amazonaws.AmazonClientException: No AWS Credentials provided by
> BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider
> SharedInstanceProfileCredentialsProvider :
> com.amazonaws.SdkClientException: Unable to load credentials from service
> endpoint]
>
> On Thu, May 17, 2018 at 12:24 PM purna pradeep <pu...@gmail.com>
> wrote:
>
>>
>> Peter,
>>
>> Also When I submit a job with new http client jar, I get
>>
>> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie
>> server. No of retries = 1. Exception = Could not authenticate,
>> Authentication failed, status: 500, message: Server Error```
>>
>>
>> On Thu, May 17, 2018 at 12:14 PM purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>> Ok I have tried this
>>>
>>> It appears that s3a support requires httpclient 4.4.x and oozie is
>>> bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI
>>> stops loading.
>>>
>>>
>>>
>>> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>>> wrote:
>>>
>>>> Purna,
>>>>
>>>> Based on
>>>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3
>>>> you should try to go for s3a.
>>>> You'll have to include the aws-jdk as well if I see it correctly:
>>>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A
>>>> Also, the property names are slightly different so you'll have to
>>>> change the example I've given.
>>>>
>>>>
>>>>
>>>> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2pradeep@gmail.com
>>>> > wrote:
>>>>
>>>>> Peter,
>>>>>
>>>>> I’m using latest oozie 5.0.0 and I have tried below changes but no
>>>>> luck
>>>>>
>>>>> Is this for s3 or s3a ?
>>>>>
>>>>> I’m using s3 but if this is for s3a do you know which jar I need to
>>>>> include I mean Hadoop-aws jar or any other jar if required
>>>>>
>>>>> Hadoop-aws-2.8.3.jar is what I’m using
>>>>>
>>>>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <ge...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Ok, I've found it:
>>>>>>
>>>>>> If you are using 4.3.0 or newer this is the part which checks for
>>>>>> dependencies:
>>>>>>
>>>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
>>>>>> It passes the coordinator action's configuration and even does
>>>>>> impersonation to check for the dependencies:
>>>>>>
>>>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
>>>>>>
>>>>>> Have you tried the following in the coordinator xml:
>>>>>>
>>>>>> <action>
>>>>>> <workflow>
>>>>>>
>>>>>> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
>>>>>> <configuration>
>>>>>> <property>
>>>>>> <name>fs.s3.awsAccessKeyId</name>
>>>>>> <value>[YOURKEYID]</value>
>>>>>> </property>
>>>>>> <property>
>>>>>> <name>fs.s3.awsSecretAccessKey</name>
>>>>>> <value>[YOURKEY]</value>
>>>>>> </property>
>>>>>> </configuration>
>>>>>> </workflow>
>>>>>> </action>
>>>>>>
>>>>>> Based on the source this should be able to poll s3 periodically.
>>>>>>
>>>>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <
>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> I have tried with coordinator's configuration too but no luck ☹️
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Great progress there purna! :)
>>>>>>>>
>>>>>>>> Have you tried adding these properites to the coordinator's
>>>>>>>> configuration? we usually use the action config to build up connection to
>>>>>>>> the distributed file system.
>>>>>>>> Although I'm not sure we're using these when polling the
>>>>>>>> dependencies for coordinators, but I'm excited about you trying to make it
>>>>>>>> work!
>>>>>>>>
>>>>>>>> I'll get back with a - hopefully - more helpful answer soon, I have
>>>>>>>> to check the code in more depth first.
>>>>>>>> gp
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <
>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Peter,
>>>>>>>>>
>>>>>>>>> I got rid of this error by adding
>>>>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>>>>>
>>>>>>>>> But I’m getting below error now
>>>>>>>>>
>>>>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
>>>>>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>>>>>
>>>>>>>>> I have tried adding AWS access ,secret keys in
>>>>>>>>>
>>>>>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <
>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have tried this ,just added s3 instead of *
>>>>>>>>>>
>>>>>>>>>> <property>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>>
>>>>>>>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>>>>>
>>>>>>>>>> </property>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Getting below error
>>>>>>>>>>
>>>>>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>>>>>>>> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>>>>>>>>
>>>>>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> This is what is in the logs
>>>>>>>>>>>
>>>>>>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>>>>>>>> SERVER[localhost] Loaded urihandlers
>>>>>>>>>>> [org.apache.oozie.dependency.FSURIHandler]
>>>>>>>>>>>
>>>>>>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>>>>>>>> SERVER[localhost] Loaded default urihandler
>>>>>>>>>>> org.apache.oozie.dependency.FSURIHandler
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <
>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>>>>>> LOG.info("Loaded urihandlers {0}",
>>>>>>>>>>>> Arrays.toString(classes));
>>>>>>>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>>>>>
>>>>>>>>>>>>> <property>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <value>*</value>
>>>>>>>>>>>>>
>>>>>>>>>>>>> </property>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <
>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> You'll have to configure
>>>>>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>>>>>> the different filesystems supported for federation. If
>>>>>>>>>>>>>> wildcard "*" is
>>>>>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For testing purposes it's ok to put * in there in
>>>>>>>>>>>>>> oozie-site.xml
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > Peter,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > I have tried to specify dataset with uri starting with
>>>>>>>>>>>>>> s3://, s3a:// and
>>>>>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904:
>>>>>>>>>>>>>> Scheme [s3] not
>>>>>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at
>>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Is S3 support specific to CDH distribution or should it
>>>>>>>>>>>>>> work in Apache
>>>>>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check
>>>>>>>>>>>>>> out this
>>>>>>>>>>>>>> > > <
>>>>>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > description on how to make it work in jobs, something
>>>>>>>>>>>>>> similar should work
>>>>>>>>>>>>>> > > on the server side as well
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3
>>>>>>>>>>>>>> as input events
>>>>>>>>>>>>>> > > to
>>>>>>>>>>>>>> > > > poll for a dependency file before kicking off a spark
>>>>>>>>>>>>>> action
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > For example: I don’t want to kick off a spark action
>>>>>>>>>>>>>> until a file is
>>>>>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>>>>>> > andras.piros@cloudera.com
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > wrote:
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > > Hi,
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or
>>>>>>>>>>>>>> bundle
>>>>>>>>>>>>>> > definitions,
>>>>>>>>>>>>>> > > > as
>>>>>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>>>>>> scalable way. Oozie
>>>>>>>>>>>>>> > > > needs
>>>>>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action
>>>>>>>>>>>>>> being no
>>>>>>>>>>>>>> > exception.
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie
>>>>>>>>>>>>>> without those Hadoop
>>>>>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html
>>>>>>>>>>>>>> >*.
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > Regards,
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > Andras
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > > > > Hi,
>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in
>>>>>>>>>>>>>> oozie without having
>>>>>>>>>>>>>> > > > > Hadoop
>>>>>>>>>>>>>> > > > > > cluster?
>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>>>>>> Kubernetes cluster
>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>> > > > > > Thanks
>>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > [image: Cloudera on Twitter] <
>>>>>>>>>>>>>> https://twitter.com/cloudera> [image:
>>>>>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>>>> [image:
>>>>>>>>>>>>>> > Cloudera
>>>>>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>> > > ------------------------------
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>>>>> [image:
>>>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>>>> [image: Cloudera
>>>>>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>
>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>
>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>>>>> Cloudera on LinkedIn]
>>>>>>>>>>>> <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>
>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>
>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Peter Cseh *| Software Engineer
>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>
>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>
>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>> ------------------------------
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> *Peter Cseh *| Software Engineer
>>>> cloudera.com <https://www.cloudera.com>
>>>>
>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>
>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>> ------------------------------
>>>>
>>>>
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
Ok I got passed this error
By rebuilding oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9
now getting this error
ACTION[0000000-180517144113498-oozie-xjt0-C@1]
org.apache.oozie.service.HadoopAccessorException: E0902: Exception
occurred: [doesBucketExist on mybucketcom.amazonaws.AmazonClientException:
No AWS Credentials provided by BasicAWSCredentialsProvider
EnvironmentVariableCredentialsProvider
SharedInstanceProfileCredentialsProvider :
com.amazonaws.SdkClientException: Unable to load credentials from service
endpoint]
org.apache.oozie.service.HadoopAccessorException: E0902: Exception
occurred: [doesBucketExist on cmsegmentation-qa:
com.amazonaws.AmazonClientException: No AWS Credentials provided by
BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider
SharedInstanceProfileCredentialsProvider :
com.amazonaws.SdkClientException: Unable to load credentials from service
endpoint]
On Thu, May 17, 2018 at 12:24 PM purna pradeep <pu...@gmail.com>
wrote:
>
> Peter,
>
> Also When I submit a job with new http client jar, I get
>
> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie
> server. No of retries = 1. Exception = Could not authenticate,
> Authentication failed, status: 500, message: Server Error```
>
>
> On Thu, May 17, 2018 at 12:14 PM purna pradeep <pu...@gmail.com>
> wrote:
>
>> Ok I have tried this
>>
>> It appears that s3a support requires httpclient 4.4.x and oozie is
>> bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI
>> stops loading.
>>
>>
>>
>> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>> wrote:
>>
>>> Purna,
>>>
>>> Based on
>>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3
>>> you should try to go for s3a.
>>> You'll have to include the aws-jdk as well if I see it correctly:
>>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A
>>> Also, the property names are slightly different so you'll have to change
>>> the example I've given.
>>>
>>>
>>>
>>> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <pu...@gmail.com>
>>> wrote:
>>>
>>>> Peter,
>>>>
>>>> I’m using latest oozie 5.0.0 and I have tried below changes but no luck
>>>>
>>>> Is this for s3 or s3a ?
>>>>
>>>> I’m using s3 but if this is for s3a do you know which jar I need to
>>>> include I mean Hadoop-aws jar or any other jar if required
>>>>
>>>> Hadoop-aws-2.8.3.jar is what I’m using
>>>>
>>>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <ge...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Ok, I've found it:
>>>>>
>>>>> If you are using 4.3.0 or newer this is the part which checks for
>>>>> dependencies:
>>>>>
>>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
>>>>> It passes the coordinator action's configuration and even does
>>>>> impersonation to check for the dependencies:
>>>>>
>>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
>>>>>
>>>>> Have you tried the following in the coordinator xml:
>>>>>
>>>>> <action>
>>>>> <workflow>
>>>>> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
>>>>> <configuration>
>>>>> <property>
>>>>> <name>fs.s3.awsAccessKeyId</name>
>>>>> <value>[YOURKEYID]</value>
>>>>> </property>
>>>>> <property>
>>>>> <name>fs.s3.awsSecretAccessKey</name>
>>>>> <value>[YOURKEY]</value>
>>>>> </property>
>>>>> </configuration>
>>>>> </workflow>
>>>>> </action>
>>>>>
>>>>> Based on the source this should be able to poll s3 periodically.
>>>>>
>>>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <
>>>>> purna2pradeep@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> I have tried with coordinator's configuration too but no luck ☹️
>>>>>>
>>>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Great progress there purna! :)
>>>>>>>
>>>>>>> Have you tried adding these properites to the coordinator's
>>>>>>> configuration? we usually use the action config to build up connection to
>>>>>>> the distributed file system.
>>>>>>> Although I'm not sure we're using these when polling the
>>>>>>> dependencies for coordinators, but I'm excited about you trying to make it
>>>>>>> work!
>>>>>>>
>>>>>>> I'll get back with a - hopefully - more helpful answer soon, I have
>>>>>>> to check the code in more depth first.
>>>>>>> gp
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <
>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>
>>>>>>>> Peter,
>>>>>>>>
>>>>>>>> I got rid of this error by adding
>>>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>>>>
>>>>>>>> But I’m getting below error now
>>>>>>>>
>>>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
>>>>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>>>>
>>>>>>>> I have tried adding AWS access ,secret keys in
>>>>>>>>
>>>>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <
>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have tried this ,just added s3 instead of *
>>>>>>>>>
>>>>>>>>> <property>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>
>>>>>>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>>>>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Getting below error
>>>>>>>>>
>>>>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>>>>>>> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>>>>
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>>>>>>>>>
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>>>>>>>>>
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>>>>>>>>>
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>>>>>>>
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>>>>>>>>>
>>>>>>>>> at
>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>>>>>>>
>>>>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>>>>
>>>>>>>>> at
>>>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>>>>>>>>>
>>>>>>>>> at
>>>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> This is what is in the logs
>>>>>>>>>>
>>>>>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>>>>>>> SERVER[localhost] Loaded urihandlers
>>>>>>>>>> [org.apache.oozie.dependency.FSURIHandler]
>>>>>>>>>>
>>>>>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>>>>>>> SERVER[localhost] Loaded default urihandler
>>>>>>>>>> org.apache.oozie.dependency.FSURIHandler
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <
>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>>>>> LOG.info("Loaded urihandlers {0}",
>>>>>>>>>>> Arrays.toString(classes));
>>>>>>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>>>>
>>>>>>>>>>>> <property>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>>>>
>>>>>>>>>>>> <value>*</value>
>>>>>>>>>>>>
>>>>>>>>>>>> </property>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <
>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> You'll have to configure
>>>>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>>>>> the different filesystems supported for federation. If
>>>>>>>>>>>>> wildcard "*" is
>>>>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For testing purposes it's ok to put * in there in
>>>>>>>>>>>>> oozie-site.xml
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Peter,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I have tried to specify dataset with uri starting with
>>>>>>>>>>>>> s3://, s3a:// and
>>>>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904:
>>>>>>>>>>>>> Scheme [s3] not
>>>>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> >
>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> >
>>>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>>>> >
>>>>>>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> >
>>>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> >
>>>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> >
>>>>>>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at
>>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Is S3 support specific to CDH distribution or should it work
>>>>>>>>>>>>> in Apache
>>>>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check
>>>>>>>>>>>>> out this
>>>>>>>>>>>>> > > <
>>>>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>>>>> > > >
>>>>>>>>>>>>> > > description on how to make it work in jobs, something
>>>>>>>>>>>>> similar should work
>>>>>>>>>>>>> > > on the server side as well
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>>>>> > > >
>>>>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3
>>>>>>>>>>>>> as input events
>>>>>>>>>>>>> > > to
>>>>>>>>>>>>> > > > poll for a dependency file before kicking off a spark
>>>>>>>>>>>>> action
>>>>>>>>>>>>> > > >
>>>>>>>>>>>>> > > >
>>>>>>>>>>>>> > > > For example: I don’t want to kick off a spark action
>>>>>>>>>>>>> until a file is
>>>>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>>>>> > > >
>>>>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>>>>> > andras.piros@cloudera.com
>>>>>>>>>>>>> > > >
>>>>>>>>>>>>> > > > wrote:
>>>>>>>>>>>>> > > >
>>>>>>>>>>>>> > > > > Hi,
>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or
>>>>>>>>>>>>> bundle
>>>>>>>>>>>>> > definitions,
>>>>>>>>>>>>> > > > as
>>>>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>>>>> scalable way. Oozie
>>>>>>>>>>>>> > > > needs
>>>>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action
>>>>>>>>>>>>> being no
>>>>>>>>>>>>> > exception.
>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie
>>>>>>>>>>>>> without those Hadoop
>>>>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html
>>>>>>>>>>>>> >*.
>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>> > > > > Regards,
>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>> > > > > Andras
>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>> > > > > > Hi,
>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>>>>>>>>> without having
>>>>>>>>>>>>> > > > > Hadoop
>>>>>>>>>>>>> > > > > > cluster?
>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>>>>> Kubernetes cluster
>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>> > > > > > Thanks
>>>>>>>>>>>>> > > > > >
>>>>>>>>>>>>> > > > >
>>>>>>>>>>>>> > > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > --
>>>>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>>>> [image:
>>>>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>>> [image:
>>>>>>>>>>>>> > Cloudera
>>>>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>> > > ------------------------------
>>>>>>>>>>>>> > >
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>>>> [image:
>>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>>> [image: Cloudera
>>>>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>
>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>
>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>>>> Cloudera on LinkedIn]
>>>>>>>>>>> <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>> ------------------------------
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>
>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>
>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Peter Cseh *| Software Engineer
>>>>> cloudera.com <https://www.cloudera.com>
>>>>>
>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>
>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>> ------------------------------
>>>>>
>>>>>
>>>
>>>
>>> --
>>> *Peter Cseh *| Software Engineer
>>> cloudera.com <https://www.cloudera.com>
>>>
>>> [image: Cloudera] <https://www.cloudera.com/>
>>>
>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> ------------------------------
>>>
>>>
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
Peter,
Also When I submit a job with new http client jar, I get
```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie
server. No of retries = 1. Exception = Could not authenticate,
Authentication failed, status: 500, message: Server Error```
On Thu, May 17, 2018 at 12:14 PM purna pradeep <pu...@gmail.com>
wrote:
> Ok I have tried this
>
> It appears that s3a support requires httpclient 4.4.x and oozie is bundled
> with httpclient 4.3.6. When httpclient is upgraded, the ext UI stops
> loading.
>
>
>
> On Thu, May 17, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com> wrote:
>
>> Purna,
>>
>> Based on
>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3
>> you should try to go for s3a.
>> You'll have to include the aws-jdk as well if I see it correctly:
>> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A
>> Also, the property names are slightly different so you'll have to change
>> the example I've given.
>>
>>
>>
>> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>> Peter,
>>>
>>> I’m using latest oozie 5.0.0 and I have tried below changes but no luck
>>>
>>> Is this for s3 or s3a ?
>>>
>>> I’m using s3 but if this is for s3a do you know which jar I need to
>>> include I mean Hadoop-aws jar or any other jar if required
>>>
>>> Hadoop-aws-2.8.3.jar is what I’m using
>>>
>>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <ge...@cloudera.com>
>>> wrote:
>>>
>>>> Ok, I've found it:
>>>>
>>>> If you are using 4.3.0 or newer this is the part which checks for
>>>> dependencies:
>>>>
>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
>>>> It passes the coordinator action's configuration and even does
>>>> impersonation to check for the dependencies:
>>>>
>>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
>>>>
>>>> Have you tried the following in the coordinator xml:
>>>>
>>>> <action>
>>>> <workflow>
>>>> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
>>>> <configuration>
>>>> <property>
>>>> <name>fs.s3.awsAccessKeyId</name>
>>>> <value>[YOURKEYID]</value>
>>>> </property>
>>>> <property>
>>>> <name>fs.s3.awsSecretAccessKey</name>
>>>> <value>[YOURKEY]</value>
>>>> </property>
>>>> </configuration>
>>>> </workflow>
>>>> </action>
>>>>
>>>> Based on the source this should be able to poll s3 periodically.
>>>>
>>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <
>>>> purna2pradeep@gmail.com> wrote:
>>>>
>>>>>
>>>>> I have tried with coordinator's configuration too but no luck ☹️
>>>>>
>>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Great progress there purna! :)
>>>>>>
>>>>>> Have you tried adding these properites to the coordinator's
>>>>>> configuration? we usually use the action config to build up connection to
>>>>>> the distributed file system.
>>>>>> Although I'm not sure we're using these when polling the dependencies
>>>>>> for coordinators, but I'm excited about you trying to make it work!
>>>>>>
>>>>>> I'll get back with a - hopefully - more helpful answer soon, I have
>>>>>> to check the code in more depth first.
>>>>>> gp
>>>>>>
>>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <
>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>
>>>>>>> Peter,
>>>>>>>
>>>>>>> I got rid of this error by adding
>>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>>>
>>>>>>> But I’m getting below error now
>>>>>>>
>>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
>>>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>>>
>>>>>>> I have tried adding AWS access ,secret keys in
>>>>>>>
>>>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <
>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I have tried this ,just added s3 instead of *
>>>>>>>>
>>>>>>>> <property>
>>>>>>>>
>>>>>>>>
>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>
>>>>>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>>>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>
>>>>>>>> Getting below error
>>>>>>>>
>>>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>>>>>>
>>>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> This is what is in the logs
>>>>>>>>>
>>>>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>>>>>> SERVER[localhost] Loaded urihandlers
>>>>>>>>> [org.apache.oozie.dependency.FSURIHandler]
>>>>>>>>>
>>>>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>>>>>> SERVER[localhost] Loaded default urihandler
>>>>>>>>> org.apache.oozie.dependency.FSURIHandler
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>>>> LOG.info("Loaded urihandlers {0}",
>>>>>>>>>> Arrays.toString(classes));
>>>>>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>>>
>>>>>>>>>>> <property>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>>>
>>>>>>>>>>> <value>*</value>
>>>>>>>>>>>
>>>>>>>>>>> </property>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <
>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> You'll have to configure
>>>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>>>> the different filesystems supported for federation. If wildcard
>>>>>>>>>>>> "*" is
>>>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>>>
>>>>>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> > Peter,
>>>>>>>>>>>> >
>>>>>>>>>>>> > I have tried to specify dataset with uri starting with s3://,
>>>>>>>>>>>> s3a:// and
>>>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>>>> >
>>>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904:
>>>>>>>>>>>> Scheme [s3] not
>>>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> >
>>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> >
>>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>>> >
>>>>>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> >
>>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> >
>>>>>>>>>>>> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> >
>>>>>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at
>>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>>>> >
>>>>>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > Is S3 support specific to CDH distribution or should it work
>>>>>>>>>>>> in Apache
>>>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check
>>>>>>>>>>>> out this
>>>>>>>>>>>> > > <
>>>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>>>> > > >
>>>>>>>>>>>> > > description on how to make it work in jobs, something
>>>>>>>>>>>> similar should work
>>>>>>>>>>>> > > on the server side as well
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>>>> > > >
>>>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3
>>>>>>>>>>>> as input events
>>>>>>>>>>>> > > to
>>>>>>>>>>>> > > > poll for a dependency file before kicking off a spark
>>>>>>>>>>>> action
>>>>>>>>>>>> > > >
>>>>>>>>>>>> > > >
>>>>>>>>>>>> > > > For example: I don’t want to kick off a spark action
>>>>>>>>>>>> until a file is
>>>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>>>> > > >
>>>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>>>> > andras.piros@cloudera.com
>>>>>>>>>>>> > > >
>>>>>>>>>>>> > > > wrote:
>>>>>>>>>>>> > > >
>>>>>>>>>>>> > > > > Hi,
>>>>>>>>>>>> > > > >
>>>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or
>>>>>>>>>>>> bundle
>>>>>>>>>>>> > definitions,
>>>>>>>>>>>> > > > as
>>>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>>>> scalable way. Oozie
>>>>>>>>>>>> > > > needs
>>>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action
>>>>>>>>>>>> being no
>>>>>>>>>>>> > exception.
>>>>>>>>>>>> > > > >
>>>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie
>>>>>>>>>>>> without those Hadoop
>>>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>>>>>>>> > > > >
>>>>>>>>>>>> > > > > Regards,
>>>>>>>>>>>> > > > >
>>>>>>>>>>>> > > > > Andras
>>>>>>>>>>>> > > > >
>>>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>>> > > > >
>>>>>>>>>>>> > > > > > Hi,
>>>>>>>>>>>> > > > > >
>>>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>>>>>>>> without having
>>>>>>>>>>>> > > > > Hadoop
>>>>>>>>>>>> > > > > > cluster?
>>>>>>>>>>>> > > > > >
>>>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>>>> Kubernetes cluster
>>>>>>>>>>>> > > > > >
>>>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>>>> > > > > >
>>>>>>>>>>>> > > > > > Thanks
>>>>>>>>>>>> > > > > >
>>>>>>>>>>>> > > > >
>>>>>>>>>>>> > > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > --
>>>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>>> [image:
>>>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>> [image:
>>>>>>>>>>>> > Cloudera
>>>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>> > > ------------------------------
>>>>>>>>>>>> > >
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>>
>>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>>
>>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>>> [image:
>>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>>> [image: Cloudera
>>>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>
>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>
>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>> ------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Peter Cseh *| Software Engineer
>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>
>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>
>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>> ------------------------------
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> *Peter Cseh *| Software Engineer
>>>> cloudera.com <https://www.cloudera.com>
>>>>
>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>
>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>> ------------------------------
>>>>
>>>>
>>
>>
>> --
>> *Peter Cseh *| Software Engineer
>> cloudera.com <https://www.cloudera.com>
>>
>> [image: Cloudera] <https://www.cloudera.com/>
>>
>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> ------------------------------
>>
>>
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
Ok I have tried this
It appears that s3a support requires httpclient 4.4.x and oozie is bundled
with httpclient 4.3.6. When httpclient is upgraded, the ext UI stops
loading.
On Thu, May 17, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com> wrote:
> Purna,
>
> Based on
> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3
> you should try to go for s3a.
> You'll have to include the aws-jdk as well if I see it correctly:
> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A
> Also, the property names are slightly different so you'll have to change
> the example I've given.
>
>
>
> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <pu...@gmail.com>
> wrote:
>
>> Peter,
>>
>> I’m using latest oozie 5.0.0 and I have tried below changes but no luck
>>
>> Is this for s3 or s3a ?
>>
>> I’m using s3 but if this is for s3a do you know which jar I need to
>> include I mean Hadoop-aws jar or any other jar if required
>>
>> Hadoop-aws-2.8.3.jar is what I’m using
>>
>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <ge...@cloudera.com> wrote:
>>
>>> Ok, I've found it:
>>>
>>> If you are using 4.3.0 or newer this is the part which checks for
>>> dependencies:
>>>
>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
>>> It passes the coordinator action's configuration and even does
>>> impersonation to check for the dependencies:
>>>
>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
>>>
>>> Have you tried the following in the coordinator xml:
>>>
>>> <action>
>>> <workflow>
>>> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
>>> <configuration>
>>> <property>
>>> <name>fs.s3.awsAccessKeyId</name>
>>> <value>[YOURKEYID]</value>
>>> </property>
>>> <property>
>>> <name>fs.s3.awsSecretAccessKey</name>
>>> <value>[YOURKEY]</value>
>>> </property>
>>> </configuration>
>>> </workflow>
>>> </action>
>>>
>>> Based on the source this should be able to poll s3 periodically.
>>>
>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <purna2pradeep@gmail.com
>>> > wrote:
>>>
>>>>
>>>> I have tried with coordinator's configuration too but no luck ☹️
>>>>
>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Great progress there purna! :)
>>>>>
>>>>> Have you tried adding these properites to the coordinator's
>>>>> configuration? we usually use the action config to build up connection to
>>>>> the distributed file system.
>>>>> Although I'm not sure we're using these when polling the dependencies
>>>>> for coordinators, but I'm excited about you trying to make it work!
>>>>>
>>>>> I'll get back with a - hopefully - more helpful answer soon, I have to
>>>>> check the code in more depth first.
>>>>> gp
>>>>>
>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <
>>>>> purna2pradeep@gmail.com> wrote:
>>>>>
>>>>>> Peter,
>>>>>>
>>>>>> I got rid of this error by adding
>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>>
>>>>>> But I’m getting below error now
>>>>>>
>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
>>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>>
>>>>>> I have tried adding AWS access ,secret keys in
>>>>>>
>>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <
>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> I have tried this ,just added s3 instead of *
>>>>>>>
>>>>>>> <property>
>>>>>>>
>>>>>>>
>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>
>>>>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>>
>>>>>>> </property>
>>>>>>>
>>>>>>>
>>>>>>> Getting below error
>>>>>>>
>>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>>>>>
>>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>
>>>>>>>> This is what is in the logs
>>>>>>>>
>>>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>>>>> SERVER[localhost] Loaded urihandlers
>>>>>>>> [org.apache.oozie.dependency.FSURIHandler]
>>>>>>>>
>>>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>>>>> SERVER[localhost] Loaded default urihandler
>>>>>>>> org.apache.oozie.dependency.FSURIHandler
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>>> LOG.info("Loaded urihandlers {0}",
>>>>>>>>> Arrays.toString(classes));
>>>>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>>
>>>>>>>>>> <property>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>>
>>>>>>>>>> <value>*</value>
>>>>>>>>>>
>>>>>>>>>> </property>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <
>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> You'll have to configure
>>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>>> the different filesystems supported for federation. If wildcard
>>>>>>>>>>> "*" is
>>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>>
>>>>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> > Peter,
>>>>>>>>>>> >
>>>>>>>>>>> > I have tried to specify dataset with uri starting with s3://,
>>>>>>>>>>> s3a:// and
>>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>>> >
>>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme
>>>>>>>>>>> [s3] not
>>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> >
>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> >
>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>> >
>>>>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>>>>>> >
>>>>>>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> >
>>>>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>>> >
>>>>>>>>>>> > at
>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>>> >
>>>>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Is S3 support specific to CDH distribution or should it work
>>>>>>>>>>> in Apache
>>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>>> >
>>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check
>>>>>>>>>>> out this
>>>>>>>>>>> > > <
>>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>>> > > >
>>>>>>>>>>> > > description on how to make it work in jobs, something
>>>>>>>>>>> similar should work
>>>>>>>>>>> > > on the server side as well
>>>>>>>>>>> > >
>>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>>> > > wrote:
>>>>>>>>>>> > >
>>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>>> > > >
>>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as
>>>>>>>>>>> input events
>>>>>>>>>>> > > to
>>>>>>>>>>> > > > poll for a dependency file before kicking off a spark
>>>>>>>>>>> action
>>>>>>>>>>> > > >
>>>>>>>>>>> > > >
>>>>>>>>>>> > > > For example: I don’t want to kick off a spark action until
>>>>>>>>>>> a file is
>>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>>> > > >
>>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>>> > andras.piros@cloudera.com
>>>>>>>>>>> > > >
>>>>>>>>>>> > > > wrote:
>>>>>>>>>>> > > >
>>>>>>>>>>> > > > > Hi,
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or
>>>>>>>>>>> bundle
>>>>>>>>>>> > definitions,
>>>>>>>>>>> > > > as
>>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>>> scalable way. Oozie
>>>>>>>>>>> > > > needs
>>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action
>>>>>>>>>>> being no
>>>>>>>>>>> > exception.
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie without
>>>>>>>>>>> those Hadoop
>>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > Regards,
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > Andras
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > > Hi,
>>>>>>>>>>> > > > > >
>>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>>>>>>> without having
>>>>>>>>>>> > > > > Hadoop
>>>>>>>>>>> > > > > > cluster?
>>>>>>>>>>> > > > > >
>>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>>> Kubernetes cluster
>>>>>>>>>>> > > > > >
>>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>>> > > > > >
>>>>>>>>>>> > > > > > Thanks
>>>>>>>>>>> > > > > >
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > --
>>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>>> > >
>>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>> > >
>>>>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>> [image:
>>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>> [image:
>>>>>>>>>>> > Cloudera
>>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>> > > ------------------------------
>>>>>>>>>>> > >
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>
>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>
>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>> [image:
>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>> [image: Cloudera
>>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>> ------------------------------
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>
>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>
>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>> ------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Peter Cseh *| Software Engineer
>>>>> cloudera.com <https://www.cloudera.com>
>>>>>
>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>
>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>> ------------------------------
>>>>>
>>>>>
>>>
>>>
>>> --
>>> *Peter Cseh *| Software Engineer
>>> cloudera.com <https://www.cloudera.com>
>>>
>>> [image: Cloudera] <https://www.cloudera.com/>
>>>
>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> ------------------------------
>>>
>>>
>
>
> --
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
>
Re: Oozie for spark jobs without Hadoop
Posted by Peter Cseh <ge...@cloudera.com>.
Purna,
Based on
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3
you should try to go for s3a.
You'll have to include the aws-jdk as well if I see it correctly:
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A
Also, the property names are slightly different so you'll have to change
the example I've given.
On Thu, May 17, 2018 at 4:16 PM, purna pradeep <pu...@gmail.com>
wrote:
> Peter,
>
> I’m using latest oozie 5.0.0 and I have tried below changes but no luck
>
> Is this for s3 or s3a ?
>
> I’m using s3 but if this is for s3a do you know which jar I need to
> include I mean Hadoop-aws jar or any other jar if required
>
> Hadoop-aws-2.8.3.jar is what I’m using
>
> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <ge...@cloudera.com> wrote:
>
>> Ok, I've found it:
>>
>> If you are using 4.3.0 or newer this is the part which checks for
>> dependencies:
>> https://github.com/apache/oozie/blob/master/core/src/
>> main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
>> It passes the coordinator action's configuration and even does
>> impersonation to check for the dependencies:
>> https://github.com/apache/oozie/blob/master/core/src/
>> main/java/org/apache/oozie/coord/input/logic/
>> CoordInputLogicEvaluatorPhaseOne.java#L159
>>
>> Have you tried the following in the coordinator xml:
>>
>> <action>
>> <workflow>
>> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
>> <configuration>
>> <property>
>> <name>fs.s3.awsAccessKeyId</name>
>> <value>[YOURKEYID]</value>
>> </property>
>> <property>
>> <name>fs.s3.awsSecretAccessKey</name>
>> <value>[YOURKEY]</value>
>> </property>
>> </configuration>
>> </workflow>
>> </action>
>>
>> Based on the source this should be able to poll s3 periodically.
>>
>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>>
>>> I have tried with coordinator's configuration too but no luck ☹️
>>>
>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com>
>>> wrote:
>>>
>>>> Great progress there purna! :)
>>>>
>>>> Have you tried adding these properites to the coordinator's
>>>> configuration? we usually use the action config to build up connection to
>>>> the distributed file system.
>>>> Although I'm not sure we're using these when polling the dependencies
>>>> for coordinators, but I'm excited about you trying to make it work!
>>>>
>>>> I'll get back with a - hopefully - more helpful answer soon, I have to
>>>> check the code in more depth first.
>>>> gp
>>>>
>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <purna2pradeep@gmail.com
>>>> > wrote:
>>>>
>>>>> Peter,
>>>>>
>>>>> I got rid of this error by adding
>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>
>>>>> But I’m getting below error now
>>>>>
>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>
>>>>> I have tried adding AWS access ,secret keys in
>>>>>
>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <pu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I have tried this ,just added s3 instead of *
>>>>>>
>>>>>> <property>
>>>>>>
>>>>>> <name>oozie.service.HadoopAccessorService.
>>>>>> supported.filesystems</name>
>>>>>>
>>>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>
>>>>>> </property>
>>>>>>
>>>>>>
>>>>>> Getting below error
>>>>>>
>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>
>>>>>> at org.apache.hadoop.conf.Configuration.getClass(
>>>>>> Configuration.java:2369)
>>>>>>
>>>>>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
>>>>>> FileSystem.java:2793)
>>>>>>
>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(
>>>>>> FileSystem.java:2810)
>>>>>>
>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(
>>>>>> FileSystem.java:100)
>>>>>>
>>>>>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
>>>>>> FileSystem.java:2849)
>>>>>>
>>>>>> at org.apache.hadoop.fs.FileSystem$Cache.get(
>>>>>> FileSystem.java:2831)
>>>>>>
>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>
>>>>>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>>>>>> HadoopAccessorService.java:625)
>>>>>>
>>>>>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>>>>>> HadoopAccessorService.java:623
>>>>>>
>>>>>>
>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>
>>>>>>> This is what is in the logs
>>>>>>>
>>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>>>> SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency.
>>>>>>> FSURIHandler]
>>>>>>>
>>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>>>> SERVER[localhost] Loaded default urihandler org.apache.oozie.dependency.
>>>>>>> FSURIHandler
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>> LOG.info("Loaded urihandlers {0}",
>>>>>>>> Arrays.toString(classes));
>>>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>
>>>>>>>>> <property>
>>>>>>>>>
>>>>>>>>> <name>oozie.service.HadoopAccessorService.
>>>>>>>>> supported.filesystems</name>
>>>>>>>>>
>>>>>>>>> <value>*</value>
>>>>>>>>>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> You'll have to configure
>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>> the different filesystems supported for federation. If wildcard
>>>>>>>>>> "*" is
>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>
>>>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> > Peter,
>>>>>>>>>> >
>>>>>>>>>> > I have tried to specify dataset with uri starting with s3://,
>>>>>>>>>> s3a:// and
>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>> >
>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme
>>>>>>>>>> [s3] not
>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>> createEarlyURIs(
>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>> materializeOneInstance(
>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>> > mand.materializeActions(CoordMaterializeTransitionXCom
>>>>>>>>>> mand.java:492)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>> >
>>>>>>>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:
>>>>>>>>>> 290)
>>>>>>>>>> >
>>>>>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > org.apache.oozie.service.CallableQueueService$
>>>>>>>>>> CallableWrapper.run(
>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>> >
>>>>>>>>>> > at
>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>> >
>>>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Is S3 support specific to CDH distribution or should it work in
>>>>>>>>>> Apache
>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>> >
>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check out
>>>>>>>>>> this
>>>>>>>>>> > > <
>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>> > > >
>>>>>>>>>> > > description on how to make it work in jobs, something similar
>>>>>>>>>> should work
>>>>>>>>>> > > on the server side as well
>>>>>>>>>> > >
>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>>> > > wrote:
>>>>>>>>>> > >
>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>> > > >
>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as
>>>>>>>>>> input events
>>>>>>>>>> > > to
>>>>>>>>>> > > > poll for a dependency file before kicking off a spark action
>>>>>>>>>> > > >
>>>>>>>>>> > > >
>>>>>>>>>> > > > For example: I don’t want to kick off a spark action until
>>>>>>>>>> a file is
>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>> > > >
>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>> > andras.piros@cloudera.com
>>>>>>>>>> > > >
>>>>>>>>>> > > > wrote:
>>>>>>>>>> > > >
>>>>>>>>>> > > > > Hi,
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>>>>>>>>>> > definitions,
>>>>>>>>>> > > > as
>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>> scalable way. Oozie
>>>>>>>>>> > > > needs
>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action being
>>>>>>>>>> no
>>>>>>>>>> > exception.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie without
>>>>>>>>>> those Hadoop
>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Regards,
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Andras
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>>>> > > > > wrote:
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > > Hi,
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>>>>>> without having
>>>>>>>>>> > > > > Hadoop
>>>>>>>>>> > > > > > cluster?
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>> Kubernetes cluster
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > Thanks
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > --
>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>> > >
>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>> > >
>>>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>> [image:
>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>> [image:
>>>>>>>>>> > Cloudera
>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>> > > ------------------------------
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>
>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>
>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>> [image:
>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>> [image: Cloudera
>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>> ------------------------------
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>
>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>
>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>>
>>>>
>>>>
>>>> --
>>>> *Peter Cseh *| Software Engineer
>>>> cloudera.com <https://www.cloudera.com>
>>>>
>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>
>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>> ------------------------------
>>>>
>>>>
>>
>>
>> --
>> *Peter Cseh *| Software Engineer
>> cloudera.com <https://www.cloudera.com>
>>
>> [image: Cloudera] <https://www.cloudera.com/>
>>
>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> ------------------------------
>>
>>
--
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
Peter,
I’m using latest oozie 5.0.0 and I have tried below changes but no luck
Is this for s3 or s3a ?
I’m using s3 but if this is for s3a do you know which jar I need to include
I mean Hadoop-aws jar or any other jar if required
Hadoop-aws-2.8.3.jar is what I’m using
On Wed, May 16, 2018 at 5:19 PM Peter Cseh <ge...@cloudera.com> wrote:
> Ok, I've found it:
>
> If you are using 4.3.0 or newer this is the part which checks for
> dependencies:
>
> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
> It passes the coordinator action's configuration and even does
> impersonation to check for the dependencies:
>
> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
>
> Have you tried the following in the coordinator xml:
>
> <action>
> <workflow>
> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
> <configuration>
> <property>
> <name>fs.s3.awsAccessKeyId</name>
> <value>[YOURKEYID]</value>
> </property>
> <property>
> <name>fs.s3.awsSecretAccessKey</name>
> <value>[YOURKEY]</value>
> </property>
> </configuration>
> </workflow>
> </action>
>
> Based on the source this should be able to poll s3 periodically.
>
> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <pu...@gmail.com>
> wrote:
>
>>
>> I have tried with coordinator's configuration too but no luck ☹️
>>
>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com> wrote:
>>
>>> Great progress there purna! :)
>>>
>>> Have you tried adding these properites to the coordinator's
>>> configuration? we usually use the action config to build up connection to
>>> the distributed file system.
>>> Although I'm not sure we're using these when polling the dependencies
>>> for coordinators, but I'm excited about you trying to make it work!
>>>
>>> I'll get back with a - hopefully - more helpful answer soon, I have to
>>> check the code in more depth first.
>>> gp
>>>
>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <pu...@gmail.com>
>>> wrote:
>>>
>>>> Peter,
>>>>
>>>> I got rid of this error by adding
>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>
>>>> But I’m getting below error now
>>>>
>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access
>>>> Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>
>>>> I have tried adding AWS access ,secret keys in
>>>>
>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <pu...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> I have tried this ,just added s3 instead of *
>>>>>
>>>>> <property>
>>>>>
>>>>>
>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>
>>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>>
>>>>> </property>
>>>>>
>>>>>
>>>>> Getting below error
>>>>>
>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>
>>>>> at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>>>>>
>>>>> at
>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>>>>>
>>>>> at
>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>>>>>
>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>>>
>>>>> at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>>>>>
>>>>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>>>
>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>
>>>>> at
>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>>>>>
>>>>> at
>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>>>>>
>>>>>
>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <pu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> This is what is in the logs
>>>>>>
>>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>>> SERVER[localhost] Loaded urihandlers
>>>>>> [org.apache.oozie.dependency.FSURIHandler]
>>>>>>
>>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>>> SERVER[localhost] Loaded default urihandler
>>>>>> org.apache.oozie.dependency.FSURIHandler
>>>>>>
>>>>>>
>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>> Can you check the server logs for messages like this?
>>>>>>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
>>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>>> defaultHandler.getClass().getName());
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>>
>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>
>>>>>>>> <property>
>>>>>>>>
>>>>>>>>
>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>
>>>>>>>> <value>*</value>
>>>>>>>>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> You'll have to configure
>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>> the different filesystems supported for federation. If wildcard
>>>>>>>>> "*" is
>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>
>>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> > Peter,
>>>>>>>>> >
>>>>>>>>> > I have tried to specify dataset with uri starting with s3://,
>>>>>>>>> s3a:// and
>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>> >
>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme
>>>>>>>>> [s3] not
>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> >
>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>> >
>>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>> >
>>>>>>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>>>> >
>>>>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> >
>>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>> >
>>>>>>>>> > at
>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>> >
>>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Is S3 support specific to CDH distribution or should it work in
>>>>>>>>> Apache
>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>> >
>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>>> >
>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check out
>>>>>>>>> this
>>>>>>>>> > > <
>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>> > > >
>>>>>>>>> > > description on how to make it work in jobs, something similar
>>>>>>>>> should work
>>>>>>>>> > > on the server side as well
>>>>>>>>> > >
>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>>> > > wrote:
>>>>>>>>> > >
>>>>>>>>> > > > Thanks Andras,
>>>>>>>>> > > >
>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as
>>>>>>>>> input events
>>>>>>>>> > > to
>>>>>>>>> > > > poll for a dependency file before kicking off a spark action
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > > For example: I don’t want to kick off a spark action until a
>>>>>>>>> file is
>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>> > > >
>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>> > andras.piros@cloudera.com
>>>>>>>>> > > >
>>>>>>>>> > > > wrote:
>>>>>>>>> > > >
>>>>>>>>> > > > > Hi,
>>>>>>>>> > > > >
>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>>>>>>>>> > definitions,
>>>>>>>>> > > > as
>>>>>>>>> > > > > well as sharelib files in a safe, distributed and scalable
>>>>>>>>> way. Oozie
>>>>>>>>> > > > needs
>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action being
>>>>>>>>> no
>>>>>>>>> > exception.
>>>>>>>>> > > > >
>>>>>>>>> > > > > At the moment it's not feasible to install Oozie without
>>>>>>>>> those Hadoop
>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>>>>> > > > >
>>>>>>>>> > > > > Regards,
>>>>>>>>> > > > >
>>>>>>>>> > > > > Andras
>>>>>>>>> > > > >
>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>>> > > > > wrote:
>>>>>>>>> > > > >
>>>>>>>>> > > > > > Hi,
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>>>>> without having
>>>>>>>>> > > > > Hadoop
>>>>>>>>> > > > > > cluster?
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes
>>>>>>>>> cluster
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > Thanks
>>>>>>>>> > > > > >
>>>>>>>>> > > > >
>>>>>>>>> > > >
>>>>>>>>> > >
>>>>>>>>> > >
>>>>>>>>> > >
>>>>>>>>> > > --
>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>> > >
>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>> > >
>>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>> [image:
>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>> [image:
>>>>>>>>> > Cloudera
>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>> > > ------------------------------
>>>>>>>>> > >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>
>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>
>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>> [image:
>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>> Cloudera
>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>> ------------------------------
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>
>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>
>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>>
>>>
>>>
>>> --
>>> *Peter Cseh *| Software Engineer
>>> cloudera.com <https://www.cloudera.com>
>>>
>>> [image: Cloudera] <https://www.cloudera.com/>
>>>
>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> ------------------------------
>>>
>>>
>
>
> --
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
>
Re: Oozie for spark jobs without Hadoop
Posted by Peter Cseh <ge...@cloudera.com>.
Ok, I've found it:
If you are using 4.3.0 or newer this is the part which checks for
dependencies:
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
It passes the coordinator action's configuration and even does
impersonation to check for the dependencies:
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
Have you tried the following in the coordinator xml:
<action>
<workflow>
<app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
<configuration>
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>[YOURKEYID]</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>[YOURKEY]</value>
</property>
</configuration>
</workflow>
</action>
Based on the source this should be able to poll s3 periodically.
On Wed, May 16, 2018 at 10:57 PM, purna pradeep <pu...@gmail.com>
wrote:
>
> I have tried with coordinator's configuration too but no luck ☹️
>
> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com> wrote:
>
>> Great progress there purna! :)
>>
>> Have you tried adding these properites to the coordinator's
>> configuration? we usually use the action config to build up connection to
>> the distributed file system.
>> Although I'm not sure we're using these when polling the dependencies for
>> coordinators, but I'm excited about you trying to make it work!
>>
>> I'll get back with a - hopefully - more helpful answer soon, I have to
>> check the code in more depth first.
>> gp
>>
>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>> Peter,
>>>
>>> I got rid of this error by adding
>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>
>>> But I’m getting below error now
>>>
>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access
>>> Key must be specified by setting the fs.s3.awsAccessKeyId and
>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>
>>> I have tried adding AWS access ,secret keys in
>>>
>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>
>>>
>>>
>>>
>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <pu...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> I have tried this ,just added s3 instead of *
>>>>
>>>> <property>
>>>>
>>>> <name>oozie.service.HadoopAccessorService.
>>>> supported.filesystems</name>
>>>>
>>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>>
>>>> </property>
>>>>
>>>>
>>>> Getting below error
>>>>
>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>
>>>> at org.apache.hadoop.conf.Configuration.getClass(
>>>> Configuration.java:2369)
>>>>
>>>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
>>>> FileSystem.java:2793)
>>>>
>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(
>>>> FileSystem.java:2810)
>>>>
>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>>
>>>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
>>>> FileSystem.java:2849)
>>>>
>>>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>>
>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>
>>>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>>>> HadoopAccessorService.java:625)
>>>>
>>>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>>>> HadoopAccessorService.java:623
>>>>
>>>>
>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <pu...@gmail.com>
>>>> wrote:
>>>>
>>>>> This is what is in the logs
>>>>>
>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 -
>>>>> SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency.
>>>>> FSURIHandler]
>>>>>
>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 -
>>>>> SERVER[localhost] Loaded default urihandler org.apache.oozie.dependency.
>>>>> FSURIHandler
>>>>>
>>>>>
>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> That's strange, this exception should not happen in that case.
>>>>>> Can you check the server logs for messages like this?
>>>>>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
>>>>>> LOG.info("Loaded default urihandler {0}",
>>>>>> defaultHandler.getClass().getName());
>>>>>> Thanks
>>>>>>
>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>> purna2pradeep@gmail.com> wrote:
>>>>>>
>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>
>>>>>>> <property>
>>>>>>>
>>>>>>> <name>oozie.service.HadoopAccessorService.
>>>>>>> supported.filesystems</name>
>>>>>>>
>>>>>>> <value>*</value>
>>>>>>>
>>>>>>> </property>
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> You'll have to configure
>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>> the different filesystems supported for federation. If wildcard "*"
>>>>>>>> is
>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>
>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > Peter,
>>>>>>>> >
>>>>>>>> > I have tried to specify dataset with uri starting with s3://,
>>>>>>>> s3a:// and
>>>>>>>> > s3n:// and I am getting exception
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>> >
>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme
>>>>>>>> [s3] not
>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>> > URIHandlerService.java:185)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>> > URIHandlerService.java:168)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>> > URIHandlerService.java:160)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>> materializeOneInstance(
>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>> > mand.materializeActions(CoordMaterializeTransitionXCom
>>>>>>>> mand.java:492)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>> >
>>>>>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>>> >
>>>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > org.apache.oozie.service.CallableQueueService$
>>>>>>>> CallableWrapper.run(
>>>>>>>> > CallableQueueService.java:181)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>> >
>>>>>>>> > at
>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>> >
>>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Is S3 support specific to CDH distribution or should it work in
>>>>>>>> Apache
>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>> >
>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>> gezapeti@cloudera.com> wrote:
>>>>>>>> >
>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check out
>>>>>>>> this
>>>>>>>> > > <
>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>> > > >
>>>>>>>> > > description on how to make it work in jobs, something similar
>>>>>>>> should work
>>>>>>>> > > on the server side as well
>>>>>>>> > >
>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>> purna2pradeep@gmail.com>
>>>>>>>> > > wrote:
>>>>>>>> > >
>>>>>>>> > > > Thanks Andras,
>>>>>>>> > > >
>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as
>>>>>>>> input events
>>>>>>>> > > to
>>>>>>>> > > > poll for a dependency file before kicking off a spark action
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > > For example: I don’t want to kick off a spark action until a
>>>>>>>> file is
>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>> > > >
>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>> > andras.piros@cloudera.com
>>>>>>>> > > >
>>>>>>>> > > > wrote:
>>>>>>>> > > >
>>>>>>>> > > > > Hi,
>>>>>>>> > > > >
>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>>>>>>>> > definitions,
>>>>>>>> > > > as
>>>>>>>> > > > > well as sharelib files in a safe, distributed and scalable
>>>>>>>> way. Oozie
>>>>>>>> > > > needs
>>>>>>>> > > > > YARN to run almost all of its actions, Spark action being no
>>>>>>>> > exception.
>>>>>>>> > > > >
>>>>>>>> > > > > At the moment it's not feasible to install Oozie without
>>>>>>>> those Hadoop
>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>>>> > > > >
>>>>>>>> > > > > Regards,
>>>>>>>> > > > >
>>>>>>>> > > > > Andras
>>>>>>>> > > > >
>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>>> > > > > wrote:
>>>>>>>> > > > >
>>>>>>>> > > > > > Hi,
>>>>>>>> > > > > >
>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>>>> without having
>>>>>>>> > > > > Hadoop
>>>>>>>> > > > > > cluster?
>>>>>>>> > > > > >
>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes
>>>>>>>> cluster
>>>>>>>> > > > > >
>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>> > > > > >
>>>>>>>> > > > > > Thanks
>>>>>>>> > > > > >
>>>>>>>> > > > >
>>>>>>>> > > >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > --
>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>> > >
>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>> > >
>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>> [image:
>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>> [image:
>>>>>>>> > Cloudera
>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>> > > ------------------------------
>>>>>>>> > >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>
>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>
>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>> Cloudera
>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Peter Cseh *| Software Engineer
>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>
>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>
>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>> ------------------------------
>>>>>>
>>>>>>
>>
>>
>> --
>> *Peter Cseh *| Software Engineer
>> cloudera.com <https://www.cloudera.com>
>>
>> [image: Cloudera] <https://www.cloudera.com/>
>>
>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> ------------------------------
>>
>>
--
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
I have tried with coordinator's configuration too but no luck ☹️
On Wed, May 16, 2018 at 3:54 PM Peter Cseh <ge...@cloudera.com> wrote:
> Great progress there purna! :)
>
> Have you tried adding these properites to the coordinator's configuration?
> we usually use the action config to build up connection to the distributed
> file system.
> Although I'm not sure we're using these when polling the dependencies for
> coordinators, but I'm excited about you trying to make it work!
>
> I'll get back with a - hopefully - more helpful answer soon, I have to
> check the code in more depth first.
> gp
>
> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <pu...@gmail.com>
> wrote:
>
>> Peter,
>>
>> I got rid of this error by adding
>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>
>> But I’m getting below error now
>>
>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access
>> Key must be specified by setting the fs.s3.awsAccessKeyId and
>> fs.s3.awsSecretAccessKey properties (respectively)
>>
>> I have tried adding AWS access ,secret keys in
>>
>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>
>>
>>
>>
>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>>
>>> I have tried this ,just added s3 instead of *
>>>
>>> <property>
>>>
>>>
>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>
>>> <value>hdfs,hftp,webhdfs,s3</value>
>>>
>>> </property>
>>>
>>>
>>> Getting below error
>>>
>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>
>>> at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>>>
>>> at
>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>>>
>>> at
>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>>>
>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>
>>> at
>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>>>
>>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>
>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>
>>> at
>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>>>
>>> at
>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>>>
>>>
>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <pu...@gmail.com>
>>> wrote:
>>>
>>>> This is what is in the logs
>>>>
>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost]
>>>> Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler]
>>>>
>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost]
>>>> Loaded default urihandler org.apache.oozie.dependency.FSURIHandler
>>>>
>>>>
>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com>
>>>> wrote:
>>>>
>>>>> That's strange, this exception should not happen in that case.
>>>>> Can you check the server logs for messages like this?
>>>>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
>>>>> LOG.info("Loaded default urihandler {0}",
>>>>> defaultHandler.getClass().getName());
>>>>> Thanks
>>>>>
>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>> purna2pradeep@gmail.com> wrote:
>>>>>
>>>>>> This is what I already have in my oozie-site.xml
>>>>>>
>>>>>> <property>
>>>>>>
>>>>>>
>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>
>>>>>> <value>*</value>
>>>>>>
>>>>>> </property>
>>>>>>
>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> You'll have to configure
>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>> the different filesystems supported for federation. If wildcard "*"
>>>>>>> is
>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>
>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>> purna2pradeep@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Peter,
>>>>>>> >
>>>>>>> > I have tried to specify dataset with uri starting with s3://,
>>>>>>> s3a:// and
>>>>>>> > s3n:// and I am getting exception
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>> >
>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme
>>>>>>> [s3] not
>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>> > URIHandlerService.java:185)
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>> > URIHandlerService.java:168)
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>> > URIHandlerService.java:160)
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>> > CoordCommandUtils.java:465)
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>> >
>>>>>>> > at
>>>>>>> >
>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>>>> > CoordCommandUtils.java:546)
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>> >
>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>> >
>>>>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>> >
>>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>> >
>>>>>>> > at
>>>>>>> > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>>> > CallableQueueService.java:181)
>>>>>>> >
>>>>>>> > at
>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>> >
>>>>>>> > at
>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>> >
>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Is S3 support specific to CDH distribution or should it work in
>>>>>>> Apache
>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>> >
>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > > I think it should be possible for Oozie to poll S3. Check out
>>>>>>> this
>>>>>>> > > <
>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>> > > >
>>>>>>> > > description on how to make it work in jobs, something similar
>>>>>>> should work
>>>>>>> > > on the server side as well
>>>>>>> > >
>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>> purna2pradeep@gmail.com>
>>>>>>> > > wrote:
>>>>>>> > >
>>>>>>> > > > Thanks Andras,
>>>>>>> > > >
>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as
>>>>>>> input events
>>>>>>> > > to
>>>>>>> > > > poll for a dependency file before kicking off a spark action
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > For example: I don’t want to kick off a spark action until a
>>>>>>> file is
>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>> > > >
>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>> > andras.piros@cloudera.com
>>>>>>> > > >
>>>>>>> > > > wrote:
>>>>>>> > > >
>>>>>>> > > > > Hi,
>>>>>>> > > > >
>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>>>>>>> > definitions,
>>>>>>> > > > as
>>>>>>> > > > > well as sharelib files in a safe, distributed and scalable
>>>>>>> way. Oozie
>>>>>>> > > > needs
>>>>>>> > > > > YARN to run almost all of its actions, Spark action being no
>>>>>>> > exception.
>>>>>>> > > > >
>>>>>>> > > > > At the moment it's not feasible to install Oozie without
>>>>>>> those Hadoop
>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>>> > > > >
>>>>>>> > > > > Regards,
>>>>>>> > > > >
>>>>>>> > > > > Andras
>>>>>>> > > > >
>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>> > > purna2pradeep@gmail.com>
>>>>>>> > > > > wrote:
>>>>>>> > > > >
>>>>>>> > > > > > Hi,
>>>>>>> > > > > >
>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>>> without having
>>>>>>> > > > > Hadoop
>>>>>>> > > > > > cluster?
>>>>>>> > > > > >
>>>>>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes
>>>>>>> cluster
>>>>>>> > > > > >
>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>> > > > > >
>>>>>>> > > > > > Thanks
>>>>>>> > > > > >
>>>>>>> > > > >
>>>>>>> > > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > --
>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>> > >
>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>> > >
>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>> [image:
>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>> [image:
>>>>>>> > Cloudera
>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>> > > ------------------------------
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>
>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>
>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>> Cloudera
>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Peter Cseh *| Software Engineer
>>>>> cloudera.com <https://www.cloudera.com>
>>>>>
>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>
>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>> ------------------------------
>>>>>
>>>>>
>
>
> --
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
>
Re: Oozie for spark jobs without Hadoop
Posted by Peter Cseh <ge...@cloudera.com>.
Great progress there purna! :)
Have you tried adding these properites to the coordinator's configuration?
we usually use the action config to build up connection to the distributed
file system.
Although I'm not sure we're using these when polling the dependencies for
coordinators, but I'm excited about you trying to make it work!
I'll get back with a - hopefully - more helpful answer soon, I have to
check the code in more depth first.
gp
On Wed, May 16, 2018 at 9:45 PM, purna pradeep <pu...@gmail.com>
wrote:
> Peter,
>
> I got rid of this error by adding
> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>
> But I’m getting below error now
>
> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access
> Key must be specified by setting the fs.s3.awsAccessKeyId and
> fs.s3.awsSecretAccessKey properties (respectively)
>
> I have tried adding AWS access ,secret keys in
>
> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>
>
>
>
> On Wed, May 16, 2018 at 2:30 PM purna pradeep <pu...@gmail.com>
> wrote:
>
>>
>> I have tried this ,just added s3 instead of *
>>
>> <property>
>>
>> <name>oozie.service.HadoopAccessorService.
>> supported.filesystems</name>
>>
>> <value>hdfs,hftp,webhdfs,s3</value>
>>
>> </property>
>>
>>
>> Getting below error
>>
>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>
>> at org.apache.hadoop.conf.Configuration.getClass(
>> Configuration.java:2369)
>>
>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
>> FileSystem.java:2793)
>>
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(
>> FileSystem.java:2810)
>>
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
>> FileSystem.java:2849)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>
>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>> HadoopAccessorService.java:625)
>>
>> at org.apache.oozie.service.HadoopAccessorService$5.run(
>> HadoopAccessorService.java:623
>>
>>
>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>> This is what is in the logs
>>>
>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost]
>>> Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler]
>>>
>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost]
>>> Loaded default urihandler org.apache.oozie.dependency.FSURIHandler
>>>
>>>
>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com>
>>> wrote:
>>>
>>>> That's strange, this exception should not happen in that case.
>>>> Can you check the server logs for messages like this?
>>>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
>>>> LOG.info("Loaded default urihandler {0}",
>>>> defaultHandler.getClass().getName());
>>>> Thanks
>>>>
>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <purna2pradeep@gmail.com
>>>> > wrote:
>>>>
>>>>> This is what I already have in my oozie-site.xml
>>>>>
>>>>> <property>
>>>>>
>>>>> <name>oozie.service.HadoopAccessorService.
>>>>> supported.filesystems</name>
>>>>>
>>>>> <value>*</value>
>>>>>
>>>>> </property>
>>>>>
>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> You'll have to configure
>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>> the different filesystems supported for federation. If wildcard "*" is
>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>
>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>
>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>> purna2pradeep@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > Peter,
>>>>>> >
>>>>>> > I have tried to specify dataset with uri starting with s3://,
>>>>>> s3a:// and
>>>>>> > s3n:// and I am getting exception
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>> >
>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme
>>>>>> [s3] not
>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>> > URIHandlerService.java:185)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>> > URIHandlerService.java:168)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>> > URIHandlerService.java:160)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>> > CoordCommandUtils.java:465)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>> materializeOneInstance(
>>>>>> > CoordCommandUtils.java:546)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>> > mand.materializeActions(CoordMaterializeTransitionXCom
>>>>>> mand.java:492)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>> >
>>>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>> >
>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>> >
>>>>>> > at
>>>>>> > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>> > CallableQueueService.java:181)
>>>>>> >
>>>>>> > at
>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>> >
>>>>>> > at
>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>> > ThreadPoolExecutor.java:624)
>>>>>> >
>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Is S3 support specific to CDH distribution or should it work in
>>>>>> Apache
>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>> >
>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > > I think it should be possible for Oozie to poll S3. Check out this
>>>>>> > > <
>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>> > x/topics/admin_oozie_s3.html
>>>>>> > > >
>>>>>> > > description on how to make it work in jobs, something similar
>>>>>> should work
>>>>>> > > on the server side as well
>>>>>> > >
>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>> purna2pradeep@gmail.com>
>>>>>> > > wrote:
>>>>>> > >
>>>>>> > > > Thanks Andras,
>>>>>> > > >
>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as
>>>>>> input events
>>>>>> > > to
>>>>>> > > > poll for a dependency file before kicking off a spark action
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > For example: I don’t want to kick off a spark action until a
>>>>>> file is
>>>>>> > > > arrived on a given AWS s3 location
>>>>>> > > >
>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>> > andras.piros@cloudera.com
>>>>>> > > >
>>>>>> > > > wrote:
>>>>>> > > >
>>>>>> > > > > Hi,
>>>>>> > > > >
>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>>>>>> > definitions,
>>>>>> > > > as
>>>>>> > > > > well as sharelib files in a safe, distributed and scalable
>>>>>> way. Oozie
>>>>>> > > > needs
>>>>>> > > > > YARN to run almost all of its actions, Spark action being no
>>>>>> > exception.
>>>>>> > > > >
>>>>>> > > > > At the moment it's not feasible to install Oozie without
>>>>>> those Hadoop
>>>>>> > > > > components. How to install Oozie please *find here
>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>> > > > >
>>>>>> > > > > Regards,
>>>>>> > > > >
>>>>>> > > > > Andras
>>>>>> > > > >
>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>> > > purna2pradeep@gmail.com>
>>>>>> > > > > wrote:
>>>>>> > > > >
>>>>>> > > > > > Hi,
>>>>>> > > > > >
>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>> without having
>>>>>> > > > > Hadoop
>>>>>> > > > > > cluster?
>>>>>> > > > > >
>>>>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes
>>>>>> cluster
>>>>>> > > > > >
>>>>>> > > > > > I’m a beginner in oozie
>>>>>> > > > > >
>>>>>> > > > > > Thanks
>>>>>> > > > > >
>>>>>> > > > >
>>>>>> > > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > --
>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>> > >
>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>> > >
>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>> [image:
>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>> > Cloudera
>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>> > > ------------------------------
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Peter Cseh *| Software Engineer
>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>
>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>
>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>> Cloudera
>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>> ------------------------------
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Peter Cseh *| Software Engineer
>>>> cloudera.com <https://www.cloudera.com>
>>>>
>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>
>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>> ------------------------------
>>>>
>>>>
--
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
Peter,
I got rid of this error by adding
hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
But I’m getting below error now
java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key
must be specified by setting the fs.s3.awsAccessKeyId and
fs.s3.awsSecretAccessKey properties (respectively)
I have tried adding AWS access ,secret keys in
oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
On Wed, May 16, 2018 at 2:30 PM purna pradeep <pu...@gmail.com>
wrote:
>
> I have tried this ,just added s3 instead of *
>
> <property>
>
> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>
> <value>hdfs,hftp,webhdfs,s3</value>
>
> </property>
>
>
> Getting below error
>
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>
> at
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>
> at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>
> at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>
> at
> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>
> at
> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>
>
> On Wed, May 16, 2018 at 2:19 PM purna pradeep <pu...@gmail.com>
> wrote:
>
>> This is what is in the logs
>>
>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost]
>> Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler]
>>
>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost]
>> Loaded default urihandler org.apache.oozie.dependency.FSURIHandler
>>
>>
>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com>
>> wrote:
>>
>>> That's strange, this exception should not happen in that case.
>>> Can you check the server logs for messages like this?
>>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
>>> LOG.info("Loaded default urihandler {0}",
>>> defaultHandler.getClass().getName());
>>> Thanks
>>>
>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <pu...@gmail.com>
>>> wrote:
>>>
>>>> This is what I already have in my oozie-site.xml
>>>>
>>>> <property>
>>>>
>>>>
>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>
>>>> <value>*</value>
>>>>
>>>> </property>
>>>>
>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com>
>>>> wrote:
>>>>
>>>>> You'll have to configure
>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>> hdfs,hftp,webhdfs Enlist
>>>>> the different filesystems supported for federation. If wildcard "*" is
>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>
>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>
>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>> purna2pradeep@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Peter,
>>>>> >
>>>>> > I have tried to specify dataset with uri starting with s3://, s3a://
>>>>> and
>>>>> > s3n:// and I am getting exception
>>>>> >
>>>>> >
>>>>> >
>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>> >
>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3]
>>>>> not
>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>> > URIHandlerService.java:185)
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>> > URIHandlerService.java:168)
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>> > URIHandlerService.java:160)
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>> > CoordCommandUtils.java:465)
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>> >
>>>>> > at
>>>>> >
>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>> > CoordCommandUtils.java:546)
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>> > mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>> > MaterializeTransitionXCommand.java:73)
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>> > MaterializeTransitionXCommand.java:29)
>>>>> >
>>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>> >
>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>> >
>>>>> > at
>>>>> > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>> > CallableQueueService.java:181)
>>>>> >
>>>>> > at
>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>> > ThreadPoolExecutor.java:1149)
>>>>> >
>>>>> > at
>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>> > ThreadPoolExecutor.java:624)
>>>>> >
>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>> >
>>>>> >
>>>>> >
>>>>> > Is S3 support specific to CDH distribution or should it work in
>>>>> Apache
>>>>> > Oozie as well? I’m not using CDH yet so
>>>>> >
>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>>>>> wrote:
>>>>> >
>>>>> > > I think it should be possible for Oozie to poll S3. Check out this
>>>>> > > <
>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>> > x/topics/admin_oozie_s3.html
>>>>> > > >
>>>>> > > description on how to make it work in jobs, something similar
>>>>> should work
>>>>> > > on the server side as well
>>>>> > >
>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>> purna2pradeep@gmail.com>
>>>>> > > wrote:
>>>>> > >
>>>>> > > > Thanks Andras,
>>>>> > > >
>>>>> > > > Also I also would like to know if oozie supports Aws S3 as input
>>>>> events
>>>>> > > to
>>>>> > > > poll for a dependency file before kicking off a spark action
>>>>> > > >
>>>>> > > >
>>>>> > > > For example: I don’t want to kick off a spark action until a
>>>>> file is
>>>>> > > > arrived on a given AWS s3 location
>>>>> > > >
>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>> > andras.piros@cloudera.com
>>>>> > > >
>>>>> > > > wrote:
>>>>> > > >
>>>>> > > > > Hi,
>>>>> > > > >
>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>>>>> > definitions,
>>>>> > > > as
>>>>> > > > > well as sharelib files in a safe, distributed and scalable
>>>>> way. Oozie
>>>>> > > > needs
>>>>> > > > > YARN to run almost all of its actions, Spark action being no
>>>>> > exception.
>>>>> > > > >
>>>>> > > > > At the moment it's not feasible to install Oozie without those
>>>>> Hadoop
>>>>> > > > > components. How to install Oozie please *find here
>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>> > > > >
>>>>> > > > > Regards,
>>>>> > > > >
>>>>> > > > > Andras
>>>>> > > > >
>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>> > > purna2pradeep@gmail.com>
>>>>> > > > > wrote:
>>>>> > > > >
>>>>> > > > > > Hi,
>>>>> > > > > >
>>>>> > > > > > Would like to know if I can use sparkaction in oozie without
>>>>> having
>>>>> > > > > Hadoop
>>>>> > > > > > cluster?
>>>>> > > > > >
>>>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes
>>>>> cluster
>>>>> > > > > >
>>>>> > > > > > I’m a beginner in oozie
>>>>> > > > > >
>>>>> > > > > > Thanks
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > *Peter Cseh *| Software Engineer
>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>> > >
>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>> > >
>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>> [image:
>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>> > Cloudera
>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>> > > ------------------------------
>>>>> > >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Peter Cseh *| Software Engineer
>>>>> cloudera.com <https://www.cloudera.com>
>>>>>
>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>
>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>> Cloudera
>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>> ------------------------------
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Peter Cseh *| Software Engineer
>>> cloudera.com <https://www.cloudera.com>
>>>
>>> [image: Cloudera] <https://www.cloudera.com/>
>>>
>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> ------------------------------
>>>
>>>
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
I have tried this ,just added s3 instead of *
<property>
<name>oozie.service.HadoopAccessorService.supported.filesystems</name>
<value>hdfs,hftp,webhdfs,s3</value>
</property>
Getting below error
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3a.S3AFileSystem not found
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at
org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
at
org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
On Wed, May 16, 2018 at 2:19 PM purna pradeep <pu...@gmail.com>
wrote:
> This is what is in the logs
>
> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost]
> Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler]
>
> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost]
> Loaded default urihandler org.apache.oozie.dependency.FSURIHandler
>
>
> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com> wrote:
>
>> That's strange, this exception should not happen in that case.
>> Can you check the server logs for messages like this?
>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
>> LOG.info("Loaded default urihandler {0}",
>> defaultHandler.getClass().getName());
>> Thanks
>>
>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <pu...@gmail.com>
>> wrote:
>>
>>> This is what I already have in my oozie-site.xml
>>>
>>> <property>
>>>
>>>
>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>
>>> <value>*</value>
>>>
>>> </property>
>>>
>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com>
>>> wrote:
>>>
>>>> You'll have to configure
>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>> hdfs,hftp,webhdfs Enlist
>>>> the different filesystems supported for federation. If wildcard "*" is
>>>> specified, then ALL file schemes will be allowed.properly.
>>>>
>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>
>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <purna2pradeep@gmail.com
>>>> >
>>>> wrote:
>>>>
>>>> > Peter,
>>>> >
>>>> > I have tried to specify dataset with uri starting with s3://, s3a://
>>>> and
>>>> > s3n:// and I am getting exception
>>>> >
>>>> >
>>>> >
>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>> > [s3://mybucket/input.data] Making the job failed
>>>> >
>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3]
>>>> not
>>>> > supported in uri [s3:// mybucket /input.data]
>>>> >
>>>> > at
>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>> > URIHandlerService.java:185)
>>>> >
>>>> > at
>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>> > URIHandlerService.java:168)
>>>> >
>>>> > at
>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>> > URIHandlerService.java:160)
>>>> >
>>>> > at
>>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>> > CoordCommandUtils.java:465)
>>>> >
>>>> > at
>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>> >
>>>> > at
>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>> >
>>>> > at
>>>> >
>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>> > CoordCommandUtils.java:546)
>>>> >
>>>> > at
>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>> > mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>> >
>>>> > at
>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>> >
>>>> > at
>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>> > MaterializeTransitionXCommand.java:73)
>>>> >
>>>> > at
>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>> > MaterializeTransitionXCommand.java:29)
>>>> >
>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>> >
>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>> >
>>>> > at
>>>> > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>> > CallableQueueService.java:181)
>>>> >
>>>> > at
>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> > ThreadPoolExecutor.java:1149)
>>>> >
>>>> > at
>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> > ThreadPoolExecutor.java:624)
>>>> >
>>>> > at java.lang.Thread.run(Thread.java:748)
>>>> >
>>>> >
>>>> >
>>>> > Is S3 support specific to CDH distribution or should it work in Apache
>>>> > Oozie as well? I’m not using CDH yet so
>>>> >
>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>>>> wrote:
>>>> >
>>>> > > I think it should be possible for Oozie to poll S3. Check out this
>>>> > > <
>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>> > x/topics/admin_oozie_s3.html
>>>> > > >
>>>> > > description on how to make it work in jobs, something similar
>>>> should work
>>>> > > on the server side as well
>>>> > >
>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>> purna2pradeep@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > > > Thanks Andras,
>>>> > > >
>>>> > > > Also I also would like to know if oozie supports Aws S3 as input
>>>> events
>>>> > > to
>>>> > > > poll for a dependency file before kicking off a spark action
>>>> > > >
>>>> > > >
>>>> > > > For example: I don’t want to kick off a spark action until a file
>>>> is
>>>> > > > arrived on a given AWS s3 location
>>>> > > >
>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>> > andras.piros@cloudera.com
>>>> > > >
>>>> > > > wrote:
>>>> > > >
>>>> > > > > Hi,
>>>> > > > >
>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>>>> > definitions,
>>>> > > > as
>>>> > > > > well as sharelib files in a safe, distributed and scalable way.
>>>> Oozie
>>>> > > > needs
>>>> > > > > YARN to run almost all of its actions, Spark action being no
>>>> > exception.
>>>> > > > >
>>>> > > > > At the moment it's not feasible to install Oozie without those
>>>> Hadoop
>>>> > > > > components. How to install Oozie please *find here
>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>> > > > >
>>>> > > > > Regards,
>>>> > > > >
>>>> > > > > Andras
>>>> > > > >
>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>> > > purna2pradeep@gmail.com>
>>>> > > > > wrote:
>>>> > > > >
>>>> > > > > > Hi,
>>>> > > > > >
>>>> > > > > > Would like to know if I can use sparkaction in oozie without
>>>> having
>>>> > > > > Hadoop
>>>> > > > > > cluster?
>>>> > > > > >
>>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes
>>>> cluster
>>>> > > > > >
>>>> > > > > > I’m a beginner in oozie
>>>> > > > > >
>>>> > > > > > Thanks
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > *Peter Cseh *| Software Engineer
>>>> > > cloudera.com <https://www.cloudera.com>
>>>> > >
>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>> > >
>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>> > Cloudera
>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>> > > ------------------------------
>>>> > >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> *Peter Cseh *| Software Engineer
>>>> cloudera.com <https://www.cloudera.com>
>>>>
>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>
>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>> Cloudera
>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>> ------------------------------
>>>>
>>>
>>
>>
>> --
>> *Peter Cseh *| Software Engineer
>> cloudera.com <https://www.cloudera.com>
>>
>> [image: Cloudera] <https://www.cloudera.com/>
>>
>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> ------------------------------
>>
>>
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
This is what is in the logs
2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost]
Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler]
2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost]
Loaded default urihandler org.apache.oozie.dependency.FSURIHandler
On Wed, May 16, 2018 at 12:27 PM Peter Cseh <ge...@cloudera.com> wrote:
> That's strange, this exception should not happen in that case.
> Can you check the server logs for messages like this?
> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
> LOG.info("Loaded default urihandler {0}",
> defaultHandler.getClass().getName());
> Thanks
>
> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <pu...@gmail.com>
> wrote:
>
>> This is what I already have in my oozie-site.xml
>>
>> <property>
>>
>>
>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>
>> <value>*</value>
>>
>> </property>
>>
>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com>
>> wrote:
>>
>>> You'll have to configure
>>> oozie.service.HadoopAccessorService.supported.filesystems
>>> hdfs,hftp,webhdfs Enlist
>>> the different filesystems supported for federation. If wildcard "*" is
>>> specified, then ALL file schemes will be allowed.properly.
>>>
>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>
>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <pu...@gmail.com>
>>> wrote:
>>>
>>> > Peter,
>>> >
>>> > I have tried to specify dataset with uri starting with s3://, s3a://
>>> and
>>> > s3n:// and I am getting exception
>>> >
>>> >
>>> >
>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>> > [s3://mybucket/input.data] Making the job failed
>>> >
>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not
>>> > supported in uri [s3:// mybucket /input.data]
>>> >
>>> > at
>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>> > URIHandlerService.java:185)
>>> >
>>> > at
>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>> > URIHandlerService.java:168)
>>> >
>>> > at
>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>> > URIHandlerService.java:160)
>>> >
>>> > at
>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>> > CoordCommandUtils.java:465)
>>> >
>>> > at
>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>> >
>>> > at
>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>> >
>>> > at
>>> >
>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>> > CoordCommandUtils.java:546)
>>> >
>>> > at
>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>> > mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>> >
>>> > at
>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>> >
>>> > at
>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>> > MaterializeTransitionXCommand.java:73)
>>> >
>>> > at
>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>> > MaterializeTransitionXCommand.java:29)
>>> >
>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>> >
>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> >
>>> > at
>>> > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>> > CallableQueueService.java:181)
>>> >
>>> > at
>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>> > ThreadPoolExecutor.java:1149)
>>> >
>>> > at
>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> > ThreadPoolExecutor.java:624)
>>> >
>>> > at java.lang.Thread.run(Thread.java:748)
>>> >
>>> >
>>> >
>>> > Is S3 support specific to CDH distribution or should it work in Apache
>>> > Oozie as well? I’m not using CDH yet so
>>> >
>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>>> wrote:
>>> >
>>> > > I think it should be possible for Oozie to poll S3. Check out this
>>> > > <
>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>> > x/topics/admin_oozie_s3.html
>>> > > >
>>> > > description on how to make it work in jobs, something similar should
>>> work
>>> > > on the server side as well
>>> > >
>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>> purna2pradeep@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Thanks Andras,
>>> > > >
>>> > > > Also I also would like to know if oozie supports Aws S3 as input
>>> events
>>> > > to
>>> > > > poll for a dependency file before kicking off a spark action
>>> > > >
>>> > > >
>>> > > > For example: I don’t want to kick off a spark action until a file
>>> is
>>> > > > arrived on a given AWS s3 location
>>> > > >
>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>> > andras.piros@cloudera.com
>>> > > >
>>> > > > wrote:
>>> > > >
>>> > > > > Hi,
>>> > > > >
>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>>> > definitions,
>>> > > > as
>>> > > > > well as sharelib files in a safe, distributed and scalable way.
>>> Oozie
>>> > > > needs
>>> > > > > YARN to run almost all of its actions, Spark action being no
>>> > exception.
>>> > > > >
>>> > > > > At the moment it's not feasible to install Oozie without those
>>> Hadoop
>>> > > > > components. How to install Oozie please *find here
>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>> > > > >
>>> > > > > Regards,
>>> > > > >
>>> > > > > Andras
>>> > > > >
>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>> > > purna2pradeep@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Hi,
>>> > > > > >
>>> > > > > > Would like to know if I can use sparkaction in oozie without
>>> having
>>> > > > > Hadoop
>>> > > > > > cluster?
>>> > > > > >
>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes
>>> cluster
>>> > > > > >
>>> > > > > > I’m a beginner in oozie
>>> > > > > >
>>> > > > > > Thanks
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > *Peter Cseh *| Software Engineer
>>> > > cloudera.com <https://www.cloudera.com>
>>> > >
>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>> > >
>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> > Cloudera
>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> > > ------------------------------
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> *Peter Cseh *| Software Engineer
>>> cloudera.com <https://www.cloudera.com>
>>>
>>> [image: Cloudera] <https://www.cloudera.com/>
>>>
>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> Cloudera
>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> ------------------------------
>>>
>>
>
>
> --
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
>
Re: Oozie for spark jobs without Hadoop
Posted by Peter Cseh <ge...@cloudera.com>.
That's strange, this exception should not happen in that case.
Can you check the server logs for messages like this?
LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
LOG.info("Loaded default urihandler {0}",
defaultHandler.getClass().getName());
Thanks
On Wed, May 16, 2018 at 5:47 PM, purna pradeep <pu...@gmail.com>
wrote:
> This is what I already have in my oozie-site.xml
>
> <property>
>
> <name>oozie.service.HadoopAccessorService.
> supported.filesystems</name>
>
> <value>*</value>
>
> </property>
>
> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com> wrote:
>
>> You'll have to configure
>> oozie.service.HadoopAccessorService.supported.filesystems
>> hdfs,hftp,webhdfs Enlist
>> the different filesystems supported for federation. If wildcard "*" is
>> specified, then ALL file schemes will be allowed.properly.
>>
>> For testing purposes it's ok to put * in there in oozie-site.xml
>>
>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <pu...@gmail.com>
>> wrote:
>>
>> > Peter,
>> >
>> > I have tried to specify dataset with uri starting with s3://, s3a:// and
>> > s3n:// and I am getting exception
>> >
>> >
>> >
>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>> > [s3://mybucket/input.data] Making the job failed
>> >
>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not
>> > supported in uri [s3:// mybucket /input.data]
>> >
>> > at
>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>> > URIHandlerService.java:185)
>> >
>> > at
>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>> > URIHandlerService.java:168)
>> >
>> > at
>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>> > URIHandlerService.java:160)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>> > CoordCommandUtils.java:465)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordCommandUtils.
>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordCommandUtils.
>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordCommandUtils.
>> materializeOneInstance(
>> > CoordCommandUtils.java:546)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>> > mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>> >
>> > at
>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>> >
>> > at
>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>> > MaterializeTransitionXCommand.java:73)
>> >
>> > at
>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>> > MaterializeTransitionXCommand.java:29)
>> >
>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>> >
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> >
>> > at
>> > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>> > CallableQueueService.java:181)
>> >
>> > at
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>> > ThreadPoolExecutor.java:1149)
>> >
>> > at
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> > ThreadPoolExecutor.java:624)
>> >
>> > at java.lang.Thread.run(Thread.java:748)
>> >
>> >
>> >
>> > Is S3 support specific to CDH distribution or should it work in Apache
>> > Oozie as well? I’m not using CDH yet so
>> >
>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
>> wrote:
>> >
>> > > I think it should be possible for Oozie to poll S3. Check out this
>> > > <
>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>> > x/topics/admin_oozie_s3.html
>> > > >
>> > > description on how to make it work in jobs, something similar should
>> work
>> > > on the server side as well
>> > >
>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>> purna2pradeep@gmail.com>
>> > > wrote:
>> > >
>> > > > Thanks Andras,
>> > > >
>> > > > Also I also would like to know if oozie supports Aws S3 as input
>> events
>> > > to
>> > > > poll for a dependency file before kicking off a spark action
>> > > >
>> > > >
>> > > > For example: I don’t want to kick off a spark action until a file is
>> > > > arrived on a given AWS s3 location
>> > > >
>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>> > andras.piros@cloudera.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>> > definitions,
>> > > > as
>> > > > > well as sharelib files in a safe, distributed and scalable way.
>> Oozie
>> > > > needs
>> > > > > YARN to run almost all of its actions, Spark action being no
>> > exception.
>> > > > >
>> > > > > At the moment it's not feasible to install Oozie without those
>> Hadoop
>> > > > > components. How to install Oozie please *find here
>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>> > > > >
>> > > > > Regards,
>> > > > >
>> > > > > Andras
>> > > > >
>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>> > > purna2pradeep@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > Would like to know if I can use sparkaction in oozie without
>> having
>> > > > > Hadoop
>> > > > > > cluster?
>> > > > > >
>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes cluster
>> > > > > >
>> > > > > > I’m a beginner in oozie
>> > > > > >
>> > > > > > Thanks
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > *Peter Cseh *| Software Engineer
>> > > cloudera.com <https://www.cloudera.com>
>> > >
>> > > [image: Cloudera] <https://www.cloudera.com/>
>> > >
>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> > Cloudera
>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> > > ------------------------------
>> > >
>> >
>>
>>
>>
>> --
>> *Peter Cseh *| Software Engineer
>> cloudera.com <https://www.cloudera.com>
>>
>> [image: Cloudera] <https://www.cloudera.com/>
>>
>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera
>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> ------------------------------
>>
>
--
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
This is what I already have in my oozie-site.xml
<property>
<name>oozie.service.HadoopAccessorService.supported.filesystems</name>
<value>*</value>
</property>
On Wed, May 16, 2018 at 11:37 AM Peter Cseh <ge...@cloudera.com> wrote:
> You'll have to configure
> oozie.service.HadoopAccessorService.supported.filesystems
> hdfs,hftp,webhdfs Enlist
> the different filesystems supported for federation. If wildcard "*" is
> specified, then ALL file schemes will be allowed.properly.
>
> For testing purposes it's ok to put * in there in oozie-site.xml
>
> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <pu...@gmail.com>
> wrote:
>
> > Peter,
> >
> > I have tried to specify dataset with uri starting with s3://, s3a:// and
> > s3n:// and I am getting exception
> >
> >
> >
> > Exception occurred:E0904: Scheme [s3] not supported in uri
> > [s3://mybucket/input.data] Making the job failed
> >
> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not
> > supported in uri [s3:// mybucket /input.data]
> >
> > at
> > org.apache.oozie.service.URIHandlerService.getURIHandler(
> > URIHandlerService.java:185)
> >
> > at
> > org.apache.oozie.service.URIHandlerService.getURIHandler(
> > URIHandlerService.java:168)
> >
> > at
> > org.apache.oozie.service.URIHandlerService.getURIHandler(
> > URIHandlerService.java:160)
> >
> > at
> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
> > CoordCommandUtils.java:465)
> >
> > at
> > org.apache.oozie.command.coord.CoordCommandUtils.
> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
> >
> > at
> > org.apache.oozie.command.coord.CoordCommandUtils.
> > materializeInputDataEvents(CoordCommandUtils.java:731)
> >
> > at
> > org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
> > CoordCommandUtils.java:546)
> >
> > at
> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
> > mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
> >
> > at
> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
> >
> > at
> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
> > MaterializeTransitionXCommand.java:73)
> >
> > at
> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
> > MaterializeTransitionXCommand.java:29)
> >
> > at org.apache.oozie.command.XCommand.call(XCommand.java:290)
> >
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >
> > at
> > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
> > CallableQueueService.java:181)
> >
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1149)
> >
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:624)
> >
> > at java.lang.Thread.run(Thread.java:748)
> >
> >
> >
> > Is S3 support specific to CDH distribution or should it work in Apache
> > Oozie as well? I’m not using CDH yet so
> >
> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com>
> wrote:
> >
> > > I think it should be possible for Oozie to poll S3. Check out this
> > > <
> > > https://www.cloudera.com/documentation/enterprise/5-9-
> > x/topics/admin_oozie_s3.html
> > > >
> > > description on how to make it work in jobs, something similar should
> work
> > > on the server side as well
> > >
> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
> purna2pradeep@gmail.com>
> > > wrote:
> > >
> > > > Thanks Andras,
> > > >
> > > > Also I also would like to know if oozie supports Aws S3 as input
> events
> > > to
> > > > poll for a dependency file before kicking off a spark action
> > > >
> > > >
> > > > For example: I don’t want to kick off a spark action until a file is
> > > > arrived on a given AWS s3 location
> > > >
> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
> > andras.piros@cloudera.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
> > definitions,
> > > > as
> > > > > well as sharelib files in a safe, distributed and scalable way.
> Oozie
> > > > needs
> > > > > YARN to run almost all of its actions, Spark action being no
> > exception.
> > > > >
> > > > > At the moment it's not feasible to install Oozie without those
> Hadoop
> > > > > components. How to install Oozie please *find here
> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Andras
> > > > >
> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
> > > purna2pradeep@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Would like to know if I can use sparkaction in oozie without
> having
> > > > > Hadoop
> > > > > > cluster?
> > > > > >
> > > > > > I want to use oozie to schedule spark jobs on Kubernetes cluster
> > > > > >
> > > > > > I’m a beginner in oozie
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > *Peter Cseh *| Software Engineer
> > > cloudera.com <https://www.cloudera.com>
> > >
> > > [image: Cloudera] <https://www.cloudera.com/>
> > >
> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > Cloudera
> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > ------------------------------
> > >
> >
>
>
>
> --
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
Re: Oozie for spark jobs without Hadoop
Posted by Peter Cseh <ge...@cloudera.com>.
You'll have to configure
oozie.service.HadoopAccessorService.supported.filesystems
hdfs,hftp,webhdfs Enlist
the different filesystems supported for federation. If wildcard "*" is
specified, then ALL file schemes will be allowed.properly.
For testing purposes it's ok to put * in there in oozie-site.xml
On Wed, May 16, 2018 at 5:29 PM, purna pradeep <pu...@gmail.com>
wrote:
> Peter,
>
> I have tried to specify dataset with uri starting with s3://, s3a:// and
> s3n:// and I am getting exception
>
>
>
> Exception occurred:E0904: Scheme [s3] not supported in uri
> [s3://mybucket/input.data] Making the job failed
>
> org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not
> supported in uri [s3:// mybucket /input.data]
>
> at
> org.apache.oozie.service.URIHandlerService.getURIHandler(
> URIHandlerService.java:185)
>
> at
> org.apache.oozie.service.URIHandlerService.getURIHandler(
> URIHandlerService.java:168)
>
> at
> org.apache.oozie.service.URIHandlerService.getURIHandler(
> URIHandlerService.java:160)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
> CoordCommandUtils.java:465)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.
> separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.
> materializeInputDataEvents(CoordCommandUtils.java:731)
>
> at
> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
> CoordCommandUtils.java:546)
>
> at
> org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>
> at
> org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
> mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>
> at
> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
> MaterializeTransitionXCommand.java:73)
>
> at
> org.apache.oozie.command.MaterializeTransitionXCommand.execute(
> MaterializeTransitionXCommand.java:29)
>
> at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
> CallableQueueService.java:181)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
>
>
> Is S3 support specific to CDH distribution or should it work in Apache
> Oozie as well? I’m not using CDH yet so
>
> On Wed, May 16, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com> wrote:
>
> > I think it should be possible for Oozie to poll S3. Check out this
> > <
> > https://www.cloudera.com/documentation/enterprise/5-9-
> x/topics/admin_oozie_s3.html
> > >
> > description on how to make it work in jobs, something similar should work
> > on the server side as well
> >
> > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <pu...@gmail.com>
> > wrote:
> >
> > > Thanks Andras,
> > >
> > > Also I also would like to know if oozie supports Aws S3 as input events
> > to
> > > poll for a dependency file before kicking off a spark action
> > >
> > >
> > > For example: I don’t want to kick off a spark action until a file is
> > > arrived on a given AWS s3 location
> > >
> > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
> andras.piros@cloudera.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Oozie needs HDFS to store workflow, coordinator, or bundle
> definitions,
> > > as
> > > > well as sharelib files in a safe, distributed and scalable way. Oozie
> > > needs
> > > > YARN to run almost all of its actions, Spark action being no
> exception.
> > > >
> > > > At the moment it's not feasible to install Oozie without those Hadoop
> > > > components. How to install Oozie please *find here
> > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
> > > >
> > > > Regards,
> > > >
> > > > Andras
> > > >
> > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
> > purna2pradeep@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Would like to know if I can use sparkaction in oozie without having
> > > > Hadoop
> > > > > cluster?
> > > > >
> > > > > I want to use oozie to schedule spark jobs on Kubernetes cluster
> > > > >
> > > > > I’m a beginner in oozie
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > *Peter Cseh *| Software Engineer
> > cloudera.com <https://www.cloudera.com>
> >
> > [image: Cloudera] <https://www.cloudera.com/>
> >
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
> >
>
--
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
Peter,
I have tried to specify dataset with uri starting with s3://, s3a:// and
s3n:// and I am getting exception
Exception occurred:E0904: Scheme [s3] not supported in uri
[s3://mybucket/input.data] Making the job failed
org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not
supported in uri [s3:// mybucket /input.data]
at
org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:185)
at
org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:168)
at
org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:160)
at
org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(CoordCommandUtils.java:465)
at
org.apache.oozie.command.coord.CoordCommandUtils.separateResolvedAndUnresolved(CoordCommandUtils.java:404)
at
org.apache.oozie.command.coord.CoordCommandUtils.materializeInputDataEvents(CoordCommandUtils.java:731)
at
org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(CoordCommandUtils.java:546)
at
org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
at
org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand.materialize(CoordMaterializeTransitionXCommand.java:362)
at
org.apache.oozie.command.MaterializeTransitionXCommand.execute(MaterializeTransitionXCommand.java:73)
at
org.apache.oozie.command.MaterializeTransitionXCommand.execute(MaterializeTransitionXCommand.java:29)
at org.apache.oozie.command.XCommand.call(XCommand.java:290)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:181)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Is S3 support specific to CDH distribution or should it work in Apache
Oozie as well? I’m not using CDH yet so
On Wed, May 16, 2018 at 10:28 AM Peter Cseh <ge...@cloudera.com> wrote:
> I think it should be possible for Oozie to poll S3. Check out this
> <
> https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_oozie_s3.html
> >
> description on how to make it work in jobs, something similar should work
> on the server side as well
>
> On Tue, May 15, 2018 at 4:43 PM, purna pradeep <pu...@gmail.com>
> wrote:
>
> > Thanks Andras,
> >
> > Also I also would like to know if oozie supports Aws S3 as input events
> to
> > poll for a dependency file before kicking off a spark action
> >
> >
> > For example: I don’t want to kick off a spark action until a file is
> > arrived on a given AWS s3 location
> >
> > On Tue, May 15, 2018 at 10:17 AM Andras Piros <andras.piros@cloudera.com
> >
> > wrote:
> >
> > > Hi,
> > >
> > > Oozie needs HDFS to store workflow, coordinator, or bundle definitions,
> > as
> > > well as sharelib files in a safe, distributed and scalable way. Oozie
> > needs
> > > YARN to run almost all of its actions, Spark action being no exception.
> > >
> > > At the moment it's not feasible to install Oozie without those Hadoop
> > > components. How to install Oozie please *find here
> > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
> > >
> > > Regards,
> > >
> > > Andras
> > >
> > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
> purna2pradeep@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Would like to know if I can use sparkaction in oozie without having
> > > Hadoop
> > > > cluster?
> > > >
> > > > I want to use oozie to schedule spark jobs on Kubernetes cluster
> > > >
> > > > I’m a beginner in oozie
> > > >
> > > > Thanks
> > > >
> > >
> >
>
>
>
> --
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
Re: Oozie for spark jobs without Hadoop
Posted by "Shikin, Igor" <Ig...@capitalone.com>.
Hi Peter,
I am working with Purna. I have tried to specify dataset with uri starting with s3://, s3a:// and s3n:// and I am getting exception
Exception occurred:E0904: Scheme [s3] not supported in uri [s3://cmsegmentation-qa/oozie-test/input.data] Making the job failed
org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not supported in uri [s3://cmsegmentation-qa/oozie-test/input.data]
at org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:185)
at org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:168)
at org.apache.oozie.service.URIHandlerService.getURIHandler(URIHandlerService.java:160)
at org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(CoordCommandUtils.java:465)
at org.apache.oozie.command.coord.CoordCommandUtils.separateResolvedAndUnresolved(CoordCommandUtils.java:404)
at org.apache.oozie.command.coord.CoordCommandUtils.materializeInputDataEvents(CoordCommandUtils.java:731)
at org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(CoordCommandUtils.java:546)
at org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
at org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand.materialize(CoordMaterializeTransitionXCommand.java:362)
at org.apache.oozie.command.MaterializeTransitionXCommand.execute(MaterializeTransitionXCommand.java:73)
at org.apache.oozie.command.MaterializeTransitionXCommand.execute(MaterializeTransitionXCommand.java:29)
at org.apache.oozie.command.XCommand.call(XCommand.java:290)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:181)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Is S3 support specific to CDH distribution or should in work in Apache Oozie as well?
Thanks!
On 5/16/18, 10:29 AM, "Peter Cseh" <ge...@cloudera.com> wrote:
I think it should be possible for Oozie to poll S3. Check out this
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cloudera.com_documentation_enterprise_5-2D9-2Dx_topics_admin-5Foozie-5Fs3.html&d=DwIFaQ&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=cWp0qG1SHE-BjGVXxc3DDdovhT3_gnV52lKkc9n1jXXo0m3nwaDOWVTuvrRxMuTU&m=Y7B2ffRtI5fOx8cSBGPHfKBiVS_t88GFcDvpDY19xl0&s=dGiatnV6rgwui2UT5_IPwJR0CLUiRosukYp1v9Wa6ig&e=>
description on how to make it work in jobs, something similar should work
on the server side as well
On Tue, May 15, 2018 at 4:43 PM, purna pradeep <pu...@gmail.com>
wrote:
> Thanks Andras,
>
> Also I also would like to know if oozie supports Aws S3 as input events to
> poll for a dependency file before kicking off a spark action
>
>
> For example: I don’t want to kick off a spark action until a file is
> arrived on a given AWS s3 location
>
> On Tue, May 15, 2018 at 10:17 AM Andras Piros <an...@cloudera.com>
> wrote:
>
> > Hi,
> >
> > Oozie needs HDFS to store workflow, coordinator, or bundle definitions,
> as
> > well as sharelib files in a safe, distributed and scalable way. Oozie
> needs
> > YARN to run almost all of its actions, Spark action being no exception.
> >
> > At the moment it's not feasible to install Oozie without those Hadoop
> > components. How to install Oozie please *find here
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__oozie.apache.org_docs_5.0.0_AG-5FInstall.html&d=DwIFaQ&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=cWp0qG1SHE-BjGVXxc3DDdovhT3_gnV52lKkc9n1jXXo0m3nwaDOWVTuvrRxMuTU&m=Y7B2ffRtI5fOx8cSBGPHfKBiVS_t88GFcDvpDY19xl0&s=YRCcmhnMovHP2GiPKZ59gzGrZjKvw_rfPJ_hqeOu_qY&e=>*.
> >
> > Regards,
> >
> > Andras
> >
> > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <pu...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Would like to know if I can use sparkaction in oozie without having
> > Hadoop
> > > cluster?
> > >
> > > I want to use oozie to schedule spark jobs on Kubernetes cluster
> > >
> > > I’m a beginner in oozie
> > >
> > > Thanks
> > >
> >
>
--
*Peter Cseh *| Software Engineer
cloudera.com <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cloudera.com&d=DwIFaQ&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=cWp0qG1SHE-BjGVXxc3DDdovhT3_gnV52lKkc9n1jXXo0m3nwaDOWVTuvrRxMuTU&m=Y7B2ffRtI5fOx8cSBGPHfKBiVS_t88GFcDvpDY19xl0&s=jHSKCL-Qn2Rqhd4R10JbyR84_tOemXIG7k7DXYFx_nY&e=>
[image: Cloudera] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cloudera.com_&d=DwIFaQ&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=cWp0qG1SHE-BjGVXxc3DDdovhT3_gnV52lKkc9n1jXXo0m3nwaDOWVTuvrRxMuTU&m=Y7B2ffRtI5fOx8cSBGPHfKBiVS_t88GFcDvpDY19xl0&s=yyz_zjLclExig_qrlg5ACyK8h0FuBcO1ueVlTbj0-QY&e=>
[image: Cloudera on Twitter] <https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_cloudera&d=DwIFaQ&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=cWp0qG1SHE-BjGVXxc3DDdovhT3_gnV52lKkc9n1jXXo0m3nwaDOWVTuvrRxMuTU&m=Y7B2ffRtI5fOx8cSBGPHfKBiVS_t88GFcDvpDY19xl0&s=dBGBZEcTe7Dk0WoYS4uXo2iL-EcyctZdhD5JLqKH0NM&e=> [image:
Cloudera on Facebook] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_cloudera&d=DwIFaQ&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=cWp0qG1SHE-BjGVXxc3DDdovhT3_gnV52lKkc9n1jXXo0m3nwaDOWVTuvrRxMuTU&m=Y7B2ffRtI5fOx8cSBGPHfKBiVS_t88GFcDvpDY19xl0&s=W1Prrd87Wvqmug5DhHrb4yeaqZ_sd-HWeyDQonNf3b8&e=> [image: Cloudera
on LinkedIn] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_cloudera&d=DwIFaQ&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=cWp0qG1SHE-BjGVXxc3DDdovhT3_gnV52lKkc9n1jXXo0m3nwaDOWVTuvrRxMuTU&m=Y7B2ffRtI5fOx8cSBGPHfKBiVS_t88GFcDvpDY19xl0&s=NDwrLruguxGxGEV1u-MqV5_PzYWh6Uet16GkSZIvl1M&e=>
------------------------------
________________________________________________________
The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Re: Oozie for spark jobs without Hadoop
Posted by Peter Cseh <ge...@cloudera.com>.
I think it should be possible for Oozie to poll S3. Check out this
<https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_oozie_s3.html>
description on how to make it work in jobs, something similar should work
on the server side as well
On Tue, May 15, 2018 at 4:43 PM, purna pradeep <pu...@gmail.com>
wrote:
> Thanks Andras,
>
> Also I also would like to know if oozie supports Aws S3 as input events to
> poll for a dependency file before kicking off a spark action
>
>
> For example: I don’t want to kick off a spark action until a file is
> arrived on a given AWS s3 location
>
> On Tue, May 15, 2018 at 10:17 AM Andras Piros <an...@cloudera.com>
> wrote:
>
> > Hi,
> >
> > Oozie needs HDFS to store workflow, coordinator, or bundle definitions,
> as
> > well as sharelib files in a safe, distributed and scalable way. Oozie
> needs
> > YARN to run almost all of its actions, Spark action being no exception.
> >
> > At the moment it's not feasible to install Oozie without those Hadoop
> > components. How to install Oozie please *find here
> > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
> >
> > Regards,
> >
> > Andras
> >
> > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <pu...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Would like to know if I can use sparkaction in oozie without having
> > Hadoop
> > > cluster?
> > >
> > > I want to use oozie to schedule spark jobs on Kubernetes cluster
> > >
> > > I’m a beginner in oozie
> > >
> > > Thanks
> > >
> >
>
--
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
Re: Oozie for spark jobs without Hadoop
Posted by purna pradeep <pu...@gmail.com>.
Thanks Andras,
Also I also would like to know if oozie supports Aws S3 as input events to
poll for a dependency file before kicking off a spark action
For example: I don’t want to kick off a spark action until a file is
arrived on a given AWS s3 location
On Tue, May 15, 2018 at 10:17 AM Andras Piros <an...@cloudera.com>
wrote:
> Hi,
>
> Oozie needs HDFS to store workflow, coordinator, or bundle definitions, as
> well as sharelib files in a safe, distributed and scalable way. Oozie needs
> YARN to run almost all of its actions, Spark action being no exception.
>
> At the moment it's not feasible to install Oozie without those Hadoop
> components. How to install Oozie please *find here
> <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>
> Regards,
>
> Andras
>
> On Tue, May 15, 2018 at 4:11 PM, purna pradeep <pu...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Would like to know if I can use sparkaction in oozie without having
> Hadoop
> > cluster?
> >
> > I want to use oozie to schedule spark jobs on Kubernetes cluster
> >
> > I’m a beginner in oozie
> >
> > Thanks
> >
>
Re: Oozie for spark jobs without Hadoop
Posted by Andras Piros <an...@cloudera.com>.
Hi,
Oozie needs HDFS to store workflow, coordinator, or bundle definitions, as
well as sharelib files in a safe, distributed and scalable way. Oozie needs
YARN to run almost all of its actions, Spark action being no exception.
At the moment it's not feasible to install Oozie without those Hadoop
components. How to install Oozie please *find here
<https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
Regards,
Andras
On Tue, May 15, 2018 at 4:11 PM, purna pradeep <pu...@gmail.com>
wrote:
> Hi,
>
> Would like to know if I can use sparkaction in oozie without having Hadoop
> cluster?
>
> I want to use oozie to schedule spark jobs on Kubernetes cluster
>
> I’m a beginner in oozie
>
> Thanks
>