You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Segerlind, Nathan L" <na...@intel.com> on 2014/04/09 23:10:56 UTC

is it possible to initiate Spark jobs from Oozie?

Howdy.

Is it possible to initiate Spark jobs from Oozie (presumably as a java action)? If so, are there known limitations to this?  And would anybody have a pointer to an example?

Thanks,
Nate


Re: is it possible to initiate Spark jobs from Oozie?

Posted by Shivani Rao <ra...@gmail.com>.
I have mucked around this a little bit. The first step to make this happen
is to build a fat jar. I wrote a quick
blog<http://myresearchdiaries.blogspot.com/2014/05/building-apache-spark-jars.html>documenting
my learning curve w.r.t that.

The next step is to schedule this as a java action. Since your code will
need to reference the spark as well as the hadoop libraries, it is best to
supply those in your java action. In order to do this you will need to
supply these jars in the "lib" folder.

So if <my-test-folder>/workflow.xml contains your java action, then
<my-test-folder>/lib would contain the following jars a) your spark lib jar
b) your spark app jar

However, this is where i got stuck. I got some time-out errors thrown by
akka when attempting to create a spark context. This could be due to the
following two reasons

a) The "setJars" function that needs be called before a spark context is
created is probably not finding the right jar. I am a little clueless on
how to do this. As mentioned in the spark
documentation<http://spark.apache.org/docs/0.9.1/spark-standalone.html>
we
need to specify the jar explicitly. However, given that oozie copies
everything into a tmp folder, I am not sure how to specify this path, so
that the data node that is executing the "java -cp
<path-to-fat-jar>:<path-to-libs> <mainclassname> " would know where to find
the containing jar.
b) My oozie is running on a different machine and attempting to launch the
spark job on a different cluster. Maybe that's what the time-out error
means. I still don't know.

So in summary, the limitation is that
a) Need to find a way to specify the path to the jar in "setJar" function
b) Need to have oozie running on the same cluster as oozie

I will keep you updated

Shivani




On Thu, Apr 10, 2014 at 8:52 AM, Mayur Rustagi <ma...@gmail.com>wrote:

> I dont think it'll do failure detection etc of spark job in Oozie as of
> yet. You should be able to trigger it from Oozie (worst case as a shell
> script).
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Thu, Apr 10, 2014 at 2:56 AM, Konstantin Kudryavtsev <
> kudryavtsev.konstantin@gmail.com> wrote:
>
>> I believe you need to write custom action or engage java action
>> On Apr 10, 2014 12:11 AM, "Segerlind, Nathan L" <
>> nathan.l.segerlind@intel.com> wrote:
>>
>>>  Howdy.
>>>
>>>
>>>
>>> Is it possible to initiate Spark jobs from Oozie (presumably as a java
>>> action)? If so, are there known limitations to this?  And would anybody
>>> have a pointer to an example?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Nate
>>>
>>>
>>>
>>
>


-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Re: is it possible to initiate Spark jobs from Oozie?

Posted by Mayur Rustagi <ma...@gmail.com>.
I dont think it'll do failure detection etc of spark job in Oozie as of
yet. You should be able to trigger it from Oozie (worst case as a shell
script).

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Apr 10, 2014 at 2:56 AM, Konstantin Kudryavtsev <
kudryavtsev.konstantin@gmail.com> wrote:

> I believe you need to write custom action or engage java action
> On Apr 10, 2014 12:11 AM, "Segerlind, Nathan L" <
> nathan.l.segerlind@intel.com> wrote:
>
>>  Howdy.
>>
>>
>>
>> Is it possible to initiate Spark jobs from Oozie (presumably as a java
>> action)? If so, are there known limitations to this?  And would anybody
>> have a pointer to an example?
>>
>>
>>
>> Thanks,
>>
>> Nate
>>
>>
>>
>

Re: is it possible to initiate Spark jobs from Oozie?

Posted by Konstantin Kudryavtsev <ku...@gmail.com>.
I believe you need to write custom action or engage java action
On Apr 10, 2014 12:11 AM, "Segerlind, Nathan L" <
nathan.l.segerlind@intel.com> wrote:

>  Howdy.
>
>
>
> Is it possible to initiate Spark jobs from Oozie (presumably as a java
> action)? If so, are there known limitations to this?  And would anybody
> have a pointer to an example?
>
>
>
> Thanks,
>
> Nate
>
>
>