You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hop.apache.org by Sothy Yogarajah <so...@eikyu.ch> on 2021/09/07 14:51:14 UTC

Best Practice for scheduling multiple workflow

Hello Dear Hop-Users,

I am currently trying to figure out a simple solution to the following 
problem. I kindly would like to hear your opinions or better solution 
suggestions and best practice approaches from the community.

Problem: As a Data Engineer, how can I have multiple workflows that I 
have created, run consecutively at certain times (during the night)?

My solution: I create a shell-script for each workflow and run the 
shell-scripts via cronjobs.

Example of the shell-script:
----------
#!/bin/sh

# kill all running java processes
killall java

# get current dir where the shell-script and all workflow files are
path_to_workflow="$(dirname $(realpath $0))"
echo "files are under: $path_job"

# workflow to run
workflow="workflow_01.hwf"

# run workflow with hop-run
$HOME/hop/./hop-run.sh                           \
   --project="hop_jobroom_candidates"      \
   --file="$pathtoworkflow/$workflow"       \
   --runconfig="local"
----------

As you can imagine, this gets somewhat tedious if I have to manage lets 
say more than 50 workflows. How are you, dear hop-users, solving this 
problem?

--

Best Regards from Switzerland

Sothy.



-- 
EIKYU GmbH
Sempacherstrasse 14
CH-4053 Basel
www.eikyu.ch 
<http://www.eikyu.ch>

Re: Best Practice for scheduling multiple workflow

Posted by Hans Van Akelyen <ha...@gmail.com>.
Hi Sothy,

As with most answers, it depends.
Hop workflows and pipelines can be scheduled in numerous ways and if you
already have scheduling software present in your organization you can add
the workflows to that one, eg. airflow/luigi/...

That being said, using cron is also a commonly used solution, we use it in
multiple locations. As I do not know what your workflows look like they
usually tend to depend on each other, this means you can create multiple
parent workflows combining child workflows that have to be executed in
sequence.

Another solution would be to have 1 pipeline in your cron that executes
every x minutes and have a schedule stored somewhere else, a database table
or csv file. This file contains which workflow should be executed when, and
the pipeline can then trigger the start and do some additional logging.

As there's more than one way to skin a cat we have not (yet?) created our
own scheduler, we believe other projects are doing a great job on this part
and we should provide solutions to leverage those tools.

I hope this helps, if you have further questions don't hesitate to ask.

Cheers,
Hans

On Tue, 7 Sept 2021 at 16:51, Sothy Yogarajah <so...@eikyu.ch>
wrote:

> Hello Dear Hop-Users,
>
> I am currently trying to figure out a simple solution to the following
> problem. I kindly would like to hear your opinions or better solution
> suggestions and best practice approaches from the community.
>
> Problem: As a Data Engineer, how can I have multiple workflows that I
> have created, run consecutively at certain times (during the night)?
>
> My solution: I create a shell-script for each workflow and run the
> shell-scripts via cronjobs.
>
> Example of the shell-script:
> ----------
> #!/bin/sh
>
> # kill all running java processes
> killall java
>
> # get current dir where the shell-script and all workflow files are
> path_to_workflow="$(dirname $(realpath $0))"
> echo "files are under: $path_job"
>
> # workflow to run
> workflow="workflow_01.hwf"
>
> # run workflow with hop-run
> $HOME/hop/./hop-run.sh                           \
>    --project="hop_jobroom_candidates"      \
>    --file="$pathtoworkflow/$workflow"       \
>    --runconfig="local"
> ----------
>
> As you can imagine, this gets somewhat tedious if I have to manage lets
> say more than 50 workflows. How are you, dear hop-users, solving this
> problem?
>
> --
>
> Best Regards from Switzerland
>
> Sothy.
>
>
>
> --
> EIKYU GmbH
> Sempacherstrasse 14
> CH-4053 Basel
> www.eikyu.ch
> <http://www.eikyu.ch>
>