You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "Gershi, Noam " <no...@citi.com> on 2019/12/31 12:04:11 UTC
Scheduling apache beam pipelines
Hi,
What is the best way to scheduler Apache beam pipelines (execute the same pipeline once per day, for the data of the last day)?
Will it be a different solution per runner?
[citi_logo_mail][citi_logo_mail]Noam Gershi
Software Developer
T: +972 (3) 7405718
[Mail_signature_blue]
Re: Scheduling apache beam pipelines
Posted by Soliman ElSaber <so...@mindvalley.com>.
Hi,
As Magnus suggested, you need to use a scheduler.
We are using *Apache Airflow*.
If you are using *GCP*, then you can use *Composer*. That will make your
life easier, but it is a bit more costly than hosting *Airflow* on your own
*server*.
Another workaround on *GCP* is using *Scheduler* to call *CloudFucniton* which
creates and run the *Dataflow* job. That is exactly like you have a *cron*
job that start your Apache Beam job.
All the best...
On Tue, Dec 31, 2019 at 8:09 PM Magnus Runesson <ma...@linuxalert.org>
wrote:
> Hi!
>
> You probably want to take a look at a scheduler such as Airflow(
> https://airflow.apache.org/) or Luigi(
> https://luigi.readthedocs.io/en/stable/index.html). If you are on Google
> they have a Cloud Composer(https://cloud.google.com/composer/) which is
> Airflow underneath. On AWS Glue is probably an option.
>
> /Magnus
> On 2019-12-31 13:04, Gershi, Noam wrote:
>
> Hi,
>
>
>
> What is the best way to scheduler Apache beam pipelines (execute the same
> pipeline once per day, for the data of the last day)?
>
> Will it be a different solution per runner?
>
>
>
>
>
> [image: citi_logo_mail][image: citi_logo_mail]*Noam Gershi*
>
> Software Developer
>
> *T*: +972 (3) 7405718
>
> [image: Mail_signature_blue]
>
>
>
>
--
Soliman ElSaber
Data Engineer
www.mindvalley.com
Re: Scheduling apache beam pipelines
Posted by Magnus Runesson <ma...@linuxalert.org>.
Hi!
You probably want to take a look at a scheduler such as
Airflow(https://airflow.apache.org/) or
Luigi(https://luigi.readthedocs.io/en/stable/index.html). If you are on
Google they have a Cloud Composer(https://cloud.google.com/composer/)
which is Airflow underneath. On AWS Glue is probably an option.
/Magnus
On 2019-12-31 13:04, Gershi, Noam wrote:
>
> Hi,
>
> What is the best way to scheduler Apache beam pipelines (execute the
> same pipeline once per day, for the data of the last day)?
>
> Will it be a different solution per runner?
>
> citi_logo_mailciti_logo_mail*Noam Gershi*
>
> Software Developer
>
> *T*:+972 (3) 7405718
>
> Mail_signature_blue
>