You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hop.apache.org by Mikhail Khludnev <mk...@apache.org> on 2023/05/13 20:02:28 UTC

Scheduling Pipeline/Workflow

Hello,
Thank you for the nice project. This is what I need. I want to reassure my
understanding of scheduling. If I need to run pipeline/workflow on a
regular schedule, is there any other option beside of Airflow (kicking of
hop pipiline) and scheduling it as Dataflow job?

-- 
Sincerely yours
Mikhail Khludnev

Re: Scheduling Pipeline/Workflow

Posted by ha...@gmail.com.
Hi All,

For dataflow you can also take advantage of the scheduler in dataflow itself.
There is a how-to on using the jobs tab in dataflow [1].

To use this you only need to place your pipeline and the needed metadata in a Google Storage bucket and you can fire off a pipeline on a schedule.

We try to be as versatile as possible to fit in your current data processing eco-system. We recently updated the documentation on how to use airflow [2] in the coming months we will create more How to guides focussed on scheduling in different environments.

If you have more information on what your architecture looks like we can provide more insights on how we would tackle the problem in your situation.

Kind regards,
Hans

[1] https://hop.apache.org//manual/latest/pipeline/beam/dataflowPipeline/google-dataflow-pipeline.html
[2] https://hop.apache.org//manual/next/how-to-guides/run-hop-in-apache-airflow.html
On 14 May 2023 at 08:06 +0200, Mikhail Khludnev <mk...@apache.org>, wrote:
> Got it. Thank you, Thad!
>
> > On Sun, May 14, 2023 at 6:01 AM Thad Guidry <th...@gmail.com> wrote:
> > > You can use any scheduling tool (CRON, RunDeck, etc.) !  Isn't that great?!?!?
> > >
> > > In your tool of choice, you will just need to ensure the tool can perform an HTTP Post request with the correct parameters.
> > >
> > > I personally use the async web service for long running batch jobs and use Python Flash to build a simple dashboard to monitor, but you could do the same with RunDeck, Nagios, or other tools.
> > > https://hop.apache.org/manual/latest/hop-server/async-web-service.html
> > >
> > > The Execution service which is what most of us use is detailed here:
> > > https://hop.apache.org/manual/latest/hop-rest/index.html#_execution_services
> > > Where that doc page needs to be improved and provide links from the other pages of Workflows and Pipelines.  Especially at least a tip or note admonition
> > > on this page:
> > > https://hop.apache.org/manual/latest/pipeline/pipelines.html
> > > We're not very good with providing linking to other parts of the manual that are directly relevant to Pipeline or Workflow running.  PR's welcome!
> > >
> > > Anyways, here's the web service metadata directly.
> > > https://hop.apache.org/manual/latest/hop-server/web-service.html
> > >
> > > Also, you can even control the Hop Server itself through scripts that also could be scheduled with your tool of choice.
> > > https://hop.apache.org/manual/latest/hop-server/index.html
> > >
> > > Luckily, you can use the "Search the docs" input box in the top right of the manual in order to find some of the other pages that you might be interested in.
> > >
> > > Thad
> > > https://www.linkedin.com/in/thadguidry/
> > > https://calendly.com/thadguidry/
> > >
> > >
> > > > On Sun, May 14, 2023 at 4:03 AM Mikhail Khludnev <mk...@apache.org> wrote:
> > > > > Hello,
> > > > > Thank you for the nice project. This is what I need. I want to reassure my understanding of scheduling. If I need to run pipeline/workflow on a regular schedule, is there any other option beside of Airflow (kicking of hop pipiline) and scheduling it as Dataflow job?
> > > > >
> > > > > --
> > > > > Sincerely yours
> > > > > Mikhail Khludnev
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!

Re: Scheduling Pipeline/Workflow

Posted by Mikhail Khludnev <mk...@apache.org>.
Got it. Thank you, Thad!

On Sun, May 14, 2023 at 6:01 AM Thad Guidry <th...@gmail.com> wrote:

> You can use any scheduling tool (CRON, RunDeck, etc.) !  Isn't that
> great?!?!?
>
> In your tool of choice, you will just need to ensure the tool can perform
> an HTTP Post request with the correct parameters.
>
> I personally use the async web service for long running batch jobs and use
> Python Flash to build a simple dashboard to monitor, but you could do the
> same with RunDeck, Nagios, or other tools.
> https://hop.apache.org/manual/latest/hop-server/async-web-service.html
>
> The Execution service which is what most of us use is detailed here:
>
> https://hop.apache.org/manual/latest/hop-rest/index.html#_execution_services
> *Where that doc page needs to be improved and provide links from the other
> pages of Workflows and Pipelines.*  Especially at least a tip or note
> admonition
> on this page:
> https://hop.apache.org/manual/latest/pipeline/pipelines.html
> We're not very good with providing linking to other parts of the manual
> that are directly relevant to Pipeline or Workflow running.  PR's welcome!
>
> Anyways, here's the web service metadata directly.
> https://hop.apache.org/manual/latest/hop-server/web-service.html
>
> Also, you can even control the Hop Server itself through scripts that also
> could be scheduled with your tool of choice.
> https://hop.apache.org/manual/latest/hop-server/index.html
>
> Luckily, you can use the "Search the docs" input box in the top right of
> the manual in order to find some of the other pages that you might be
> interested in.
>
> Thad
> https://www.linkedin.com/in/thadguidry/
> https://calendly.com/thadguidry/
>
>
> On Sun, May 14, 2023 at 4:03 AM Mikhail Khludnev <mk...@apache.org> wrote:
>
>> Hello,
>> Thank you for the nice project. This is what I need. I want to reassure
>> my understanding of scheduling. If I need to run pipeline/workflow on a
>> regular schedule, is there any other option beside of Airflow (kicking of
>> hop pipiline) and scheduling it as Dataflow job?
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: Scheduling Pipeline/Workflow

Posted by Thad Guidry <th...@gmail.com>.
You can use any scheduling tool (CRON, RunDeck, etc.) !  Isn't that
great?!?!?

In your tool of choice, you will just need to ensure the tool can perform
an HTTP Post request with the correct parameters.

I personally use the async web service for long running batch jobs and use
Python Flash to build a simple dashboard to monitor, but you could do the
same with RunDeck, Nagios, or other tools.
https://hop.apache.org/manual/latest/hop-server/async-web-service.html

The Execution service which is what most of us use is detailed here:
https://hop.apache.org/manual/latest/hop-rest/index.html#_execution_services
*Where that doc page needs to be improved and provide links from the other
pages of Workflows and Pipelines.*  Especially at least a tip or note
admonition
on this page:
https://hop.apache.org/manual/latest/pipeline/pipelines.html
We're not very good with providing linking to other parts of the manual
that are directly relevant to Pipeline or Workflow running.  PR's welcome!

Anyways, here's the web service metadata directly.
https://hop.apache.org/manual/latest/hop-server/web-service.html

Also, you can even control the Hop Server itself through scripts that also
could be scheduled with your tool of choice.
https://hop.apache.org/manual/latest/hop-server/index.html

Luckily, you can use the "Search the docs" input box in the top right of
the manual in order to find some of the other pages that you might be
interested in.

Thad
https://www.linkedin.com/in/thadguidry/
https://calendly.com/thadguidry/


On Sun, May 14, 2023 at 4:03 AM Mikhail Khludnev <mk...@apache.org> wrote:

> Hello,
> Thank you for the nice project. This is what I need. I want to reassure my
> understanding of scheduling. If I need to run pipeline/workflow on a
> regular schedule, is there any other option beside of Airflow (kicking of
> hop pipiline) and scheduling it as Dataflow job?
>
> --
> Sincerely yours
> Mikhail Khludnev
>