You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Haopu Wang <HW...@qilinsoft.com> on 2016/06/16 09:36:18 UTC

Can I control the execution of Spark jobs?

Hi,

 

Suppose I have a spark application which is doing several ETL types of
things.

I understand Spark can analyze and generate several jobs to execute.

The question is: is it possible to control the dependency between these
jobs?

 

Thanks!

 


Re: Can I control the execution of Spark jobs?

Posted by Alonso Isidoro Roman <al...@gmail.com>.
Hi Wang,

maybe you can consider to use an integration framework like Apache Camel in
order to run differents jobs...

Alonso Isidoro Roman
[image: https://]about.me/alonso.isidoro.roman
<https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links>

2016-06-16 13:08 GMT+02:00 Jacek Laskowski <ja...@japila.pl>:

> Hi,
>
> When you say "several ETL types of things", what is this exactly? What
> would an example of "dependency between these jobs" be?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Jun 16, 2016 at 11:36 AM, Haopu Wang <HW...@qilinsoft.com> wrote:
> > Hi,
> >
> >
> >
> > Suppose I have a spark application which is doing several ETL types of
> > things.
> >
> > I understand Spark can analyze and generate several jobs to execute.
> >
> > The question is: is it possible to control the dependency between these
> > jobs?
> >
> >
> >
> > Thanks!
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Can I control the execution of Spark jobs?

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

Ahh, that makes sense now.

Spark works like this by default. You just do your 1st pipeline and
then another one (and perhaps some more). Since the pipelines are
processed serially (one by one) you implicitly create a dependency
between Spark jobs. You need no special steps to have it.

pipeline == load a dataset, transform it and save it to persistent storage

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Jun 17, 2016 at 4:15 AM, Haopu Wang <HW...@qilinsoft.com> wrote:
> Jacek,
>
> For example, one ETL job is saving raw events and update a file.
> The other job is using that file's content to process the data set.
>
> In this case, the first job has to be done before the second one. That's what I mean by dependency. Any suggestions/comments are appreciated.
>
> -----Original Message-----
> From: Jacek Laskowski [mailto:jacek@japila.pl]
> Sent: 2016年6月16日 19:09
> To: user
> Subject: Re: Can I control the execution of Spark jobs?
>
> Hi,
>
> When you say "several ETL types of things", what is this exactly? What
> would an example of "dependency between these jobs" be?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Jun 16, 2016 at 11:36 AM, Haopu Wang <HW...@qilinsoft.com> wrote:
>> Hi,
>>
>>
>>
>> Suppose I have a spark application which is doing several ETL types of
>> things.
>>
>> I understand Spark can analyze and generate several jobs to execute.
>>
>> The question is: is it possible to control the dependency between these
>> jobs?
>>
>>
>>
>> Thanks!
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


RE: Can I control the execution of Spark jobs?

Posted by Haopu Wang <HW...@qilinsoft.com>.
Jacek,

For example, one ETL job is saving raw events and update a file.
The other job is using that file's content to process the data set.

In this case, the first job has to be done before the second one. That's what I mean by dependency. Any suggestions/comments are appreciated.

-----Original Message-----
From: Jacek Laskowski [mailto:jacek@japila.pl] 
Sent: 2016年6月16日 19:09
To: user
Subject: Re: Can I control the execution of Spark jobs?

Hi,

When you say "several ETL types of things", what is this exactly? What
would an example of "dependency between these jobs" be?

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Jun 16, 2016 at 11:36 AM, Haopu Wang <HW...@qilinsoft.com> wrote:
> Hi,
>
>
>
> Suppose I have a spark application which is doing several ETL types of
> things.
>
> I understand Spark can analyze and generate several jobs to execute.
>
> The question is: is it possible to control the dependency between these
> jobs?
>
>
>
> Thanks!
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Can I control the execution of Spark jobs?

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

When you say "several ETL types of things", what is this exactly? What
would an example of "dependency between these jobs" be?

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Jun 16, 2016 at 11:36 AM, Haopu Wang <HW...@qilinsoft.com> wrote:
> Hi,
>
>
>
> Suppose I have a spark application which is doing several ETL types of
> things.
>
> I understand Spark can analyze and generate several jobs to execute.
>
> The question is: is it possible to control the dependency between these
> jobs?
>
>
>
> Thanks!
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org