You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Marco Mistroni <mm...@gmail.com> on 2020/04/05 11:02:28 UTC

Scheduling dataflow pipelines

HI all
 sorry for this partially OT but has anyone been successful in scheduling
dataflow job on GCP?
I have tried the CloudFunction approach (following few eamples on the web)
but it didnt work out for me - the cloud function keep on giving me an
INVALID ARGUMENT - which i could not debug

So i was wondering if anyone has  been successful and can provide me an
example

kind regards
 Marco

Re: Scheduling dataflow pipelines

Posted by Marco Mistroni <mm...@gmail.com>.
Tx Andre. Have skipped pubsub for a simple CLF invoked via CLF sched
Thx for asdist


On Mon, Apr 6, 2020, 5:52 PM André Rocha Silva <
a.silva@portaltelemedicina.com.br> wrote:

> Marco
>
> If I'd give a step by step I'd go:
> 1) test the template on dataflow
> 2) test the cloud function
> 3) call the cloud function from a Pub/sub
> 4) send a message to pub/sub from scheduler
>
> take a look on this tutorial about scheduler:
> https://www.youtube.com/watch?v=WUPEUjvSBW8
>
> I think cloud composer is way too expensive, if you wanna call the
> template twice a day e.g.
>
> kind regards
>
> On Mon, Apr 6, 2020 at 11:45 AM Marco Mistroni <mm...@gmail.com>
> wrote:
>
>> Thanks will give it a go
>>
>> On Mon, Apr 6, 2020, 3:39 PM Soliman ElSaber <so...@mindvalley.com>
>> wrote:
>>
>>> We are using Composer (Airflow) to schedule and run the Dataflow jobs...
>>> Using the Python SDK, with small changes no the Composer (Airflow)
>>> DataFlowPythonOperator, to force it to use Python 3...
>>> It is working fine and creating a new Dataflow job every 30 minutes...
>>>
>>> On Mon, Apr 6, 2020 at 10:33 PM Marco Mistroni <mm...@gmail.com>
>>> wrote:
>>>
>>>> Right.. tx Andre. So presumably the flow of action will b
>>>> - create dflow template
>>>> -create CLF that invokes it
>>>> - create cold scheduler job that invokes function?
>>>>
>>>> Kind regards
>>>>
>>>> On Mon, Apr 6, 2020, 2:14 PM André Rocha Silva <
>>>> a.silva@portaltelemedicina.com.br> wrote:
>>>>
>>>>> Marco
>>>>>
>>>>> If you are already using GCP, I suggest you use the cloud scheduler.
>>>>> It is like a cron job completely serverless.
>>>>>
>>>>> If you need some extra help, let me know.
>>>>>
>>>>> On Mon, Apr 6, 2020 at 4:38 AM deepak kumar <kd...@gmail.com> wrote:
>>>>>
>>>>>> We have used composer (airlfow) successfully to schedule Dataflow
>>>>>> jobs.
>>>>>> Please let me know if you would need details around it.
>>>>>>
>>>>>> Thanks
>>>>>> Deepak
>>>>>>
>>>>>> On Sun, Apr 5, 2020 at 7:56 PM Joshua B. Harrison <
>>>>>> josh.harrison@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Marco,
>>>>>>>
>>>>>>> I've ended using a VM running Luigi to schedule jobs. I use the data
>>>>>>> flow Python API to execute stored templates.
>>>>>>>
>>>>>>> I can give you more details if you’re interested.
>>>>>>>
>>>>>>> Best,
>>>>>>> Joshua
>>>>>>>
>>>>>>> On Sun, Apr 5, 2020 at 5:02 AM Marco Mistroni <mm...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> HI all
>>>>>>>>  sorry for this partially OT but has anyone been successful in
>>>>>>>> scheduling dataflow job on GCP?
>>>>>>>> I have tried the CloudFunction approach (following few eamples on
>>>>>>>> the web) but it didnt work out for me - the cloud function keep on giving
>>>>>>>> me an INVALID ARGUMENT - which i could not debug
>>>>>>>>
>>>>>>>> So i was wondering if anyone has  been successful and can provide
>>>>>>>> me an example
>>>>>>>>
>>>>>>>> kind regards
>>>>>>>>  Marco
>>>>>>>>
>>>>>>>> --
>>>>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>>>>> <jo...@google.com> |  404-433-0242
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>    *ANDRÉ ROCHA SILVA*
>>>>>   * DATA ENGINEER*
>>>>>   (48) 3181-0611
>>>>>
>>>>>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
>>>>> <http://portaltelemedicina.com.br/>
>>>>> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
>>>>> <https://pt-br.facebook.com/PortalTelemedicina/>
>>>>> <https://www.linkedin.com/company/9426084/>
>>>>>
>>>>>
>>>
>>> --
>>> Soliman ElSaber
>>> Data Engineer
>>> www.mindvalley.com
>>>
>>
>
> --
>
>    *ANDRÉ ROCHA SILVA*
>   * DATA ENGINEER*
>   (48) 3181-0611
>
>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
> <http://portaltelemedicina.com.br/>
> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
> <https://pt-br.facebook.com/PortalTelemedicina/>
> <https://www.linkedin.com/company/9426084/>
>
>

Re: Scheduling dataflow pipelines

Posted by "Joshua B. Harrison" <jo...@gmail.com>.
I agree with André. However, if you want to do anything more complex,
you’ll need synchronization. This is why we went with Luigi. Composer is
pretty heavyweight. Running Luigi on a shared VM costs us about .30 cents a
day and allows much more control over how we schedule and execute tasks.

On Mon, Apr 6, 2020 at 10:52 AM André Rocha Silva <
a.silva@portaltelemedicina.com.br> wrote:

> Marco
>
> If I'd give a step by step I'd go:
> 1) test the template on dataflow
> 2) test the cloud function
> 3) call the cloud function from a Pub/sub
> 4) send a message to pub/sub from scheduler
>
> take a look on this tutorial about scheduler:
> https://www.youtube.com/watch?v=WUPEUjvSBW8
>
> I think cloud composer is way too expensive, if you wanna call the
> template twice a day e.g.
>
> kind regards
>
> On Mon, Apr 6, 2020 at 11:45 AM Marco Mistroni <mm...@gmail.com>
> wrote:
>
>> Thanks will give it a go
>>
>> On Mon, Apr 6, 2020, 3:39 PM Soliman ElSaber <so...@mindvalley.com>
>> wrote:
>>
>>> We are using Composer (Airflow) to schedule and run the Dataflow jobs...
>>> Using the Python SDK, with small changes no the Composer (Airflow)
>>> DataFlowPythonOperator, to force it to use Python 3...
>>> It is working fine and creating a new Dataflow job every 30 minutes...
>>>
>>> On Mon, Apr 6, 2020 at 10:33 PM Marco Mistroni <mm...@gmail.com>
>>> wrote:
>>>
>>>> Right.. tx Andre. So presumably the flow of action will b
>>>> - create dflow template
>>>> -create CLF that invokes it
>>>> - create cold scheduler job that invokes function?
>>>>
>>>> Kind regards
>>>>
>>>> On Mon, Apr 6, 2020, 2:14 PM André Rocha Silva <
>>>> a.silva@portaltelemedicina.com.br> wrote:
>>>>
>>>>> Marco
>>>>>
>>>>> If you are already using GCP, I suggest you use the cloud scheduler.
>>>>> It is like a cron job completely serverless.
>>>>>
>>>>> If you need some extra help, let me know.
>>>>>
>>>>> On Mon, Apr 6, 2020 at 4:38 AM deepak kumar <kd...@gmail.com> wrote:
>>>>>
>>>>>> We have used composer (airlfow) successfully to schedule Dataflow
>>>>>> jobs.
>>>>>> Please let me know if you would need details around it.
>>>>>>
>>>>>> Thanks
>>>>>> Deepak
>>>>>>
>>>>>> On Sun, Apr 5, 2020 at 7:56 PM Joshua B. Harrison <
>>>>>> josh.harrison@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Marco,
>>>>>>>
>>>>>>> I've ended using a VM running Luigi to schedule jobs. I use the data
>>>>>>> flow Python API to execute stored templates.
>>>>>>>
>>>>>>> I can give you more details if you’re interested.
>>>>>>>
>>>>>>> Best,
>>>>>>> Joshua
>>>>>>>
>>>>>>> On Sun, Apr 5, 2020 at 5:02 AM Marco Mistroni <mm...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> HI all
>>>>>>>>  sorry for this partially OT but has anyone been successful in
>>>>>>>> scheduling dataflow job on GCP?
>>>>>>>> I have tried the CloudFunction approach (following few eamples on
>>>>>>>> the web) but it didnt work out for me - the cloud function keep on giving
>>>>>>>> me an INVALID ARGUMENT - which i could not debug
>>>>>>>>
>>>>>>>> So i was wondering if anyone has  been successful and can provide
>>>>>>>> me an example
>>>>>>>>
>>>>>>>> kind regards
>>>>>>>>  Marco
>>>>>>>>
>>>>>>>> --
>>>>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>>>>> <jo...@google.com> |  404-433-0242
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>    *ANDRÉ ROCHA SILVA*
>>>>>   * DATA ENGINEER*
>>>>>   (48) 3181-0611
>>>>>
>>>>>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
>>>>> <http://portaltelemedicina.com.br/>
>>>>> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
>>>>> <https://pt-br.facebook.com/PortalTelemedicina/>
>>>>> <https://www.linkedin.com/company/9426084/>
>>>>>
>>>>>
>>>
>>> --
>>> Soliman ElSaber
>>> Data Engineer
>>> www.mindvalley.com
>>>
>>
>
> --
>
>    *ANDRÉ ROCHA SILVA*
>   * DATA ENGINEER*
>   (48) 3181-0611
>
>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
> <http://portaltelemedicina.com.br/>
> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
> <https://pt-br.facebook.com/PortalTelemedicina/>
> <https://www.linkedin.com/company/9426084/>
>
> --
Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
<jo...@google.com> |  404-433-0242

Re: Scheduling dataflow pipelines

Posted by André Rocha Silva <a....@portaltelemedicina.com.br>.
Marco

If I'd give a step by step I'd go:
1) test the template on dataflow
2) test the cloud function
3) call the cloud function from a Pub/sub
4) send a message to pub/sub from scheduler

take a look on this tutorial about scheduler:
https://www.youtube.com/watch?v=WUPEUjvSBW8

I think cloud composer is way too expensive, if you wanna call the template
twice a day e.g.

kind regards

On Mon, Apr 6, 2020 at 11:45 AM Marco Mistroni <mm...@gmail.com> wrote:

> Thanks will give it a go
>
> On Mon, Apr 6, 2020, 3:39 PM Soliman ElSaber <so...@mindvalley.com>
> wrote:
>
>> We are using Composer (Airflow) to schedule and run the Dataflow jobs...
>> Using the Python SDK, with small changes no the Composer (Airflow)
>> DataFlowPythonOperator, to force it to use Python 3...
>> It is working fine and creating a new Dataflow job every 30 minutes...
>>
>> On Mon, Apr 6, 2020 at 10:33 PM Marco Mistroni <mm...@gmail.com>
>> wrote:
>>
>>> Right.. tx Andre. So presumably the flow of action will b
>>> - create dflow template
>>> -create CLF that invokes it
>>> - create cold scheduler job that invokes function?
>>>
>>> Kind regards
>>>
>>> On Mon, Apr 6, 2020, 2:14 PM André Rocha Silva <
>>> a.silva@portaltelemedicina.com.br> wrote:
>>>
>>>> Marco
>>>>
>>>> If you are already using GCP, I suggest you use the cloud scheduler. It
>>>> is like a cron job completely serverless.
>>>>
>>>> If you need some extra help, let me know.
>>>>
>>>> On Mon, Apr 6, 2020 at 4:38 AM deepak kumar <kd...@gmail.com> wrote:
>>>>
>>>>> We have used composer (airlfow) successfully to schedule Dataflow jobs.
>>>>> Please let me know if you would need details around it.
>>>>>
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>> On Sun, Apr 5, 2020 at 7:56 PM Joshua B. Harrison <
>>>>> josh.harrison@gmail.com> wrote:
>>>>>
>>>>>> Hi Marco,
>>>>>>
>>>>>> I've ended using a VM running Luigi to schedule jobs. I use the data
>>>>>> flow Python API to execute stored templates.
>>>>>>
>>>>>> I can give you more details if you’re interested.
>>>>>>
>>>>>> Best,
>>>>>> Joshua
>>>>>>
>>>>>> On Sun, Apr 5, 2020 at 5:02 AM Marco Mistroni <mm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> HI all
>>>>>>>  sorry for this partially OT but has anyone been successful in
>>>>>>> scheduling dataflow job on GCP?
>>>>>>> I have tried the CloudFunction approach (following few eamples on
>>>>>>> the web) but it didnt work out for me - the cloud function keep on giving
>>>>>>> me an INVALID ARGUMENT - which i could not debug
>>>>>>>
>>>>>>> So i was wondering if anyone has  been successful and can provide me
>>>>>>> an example
>>>>>>>
>>>>>>> kind regards
>>>>>>>  Marco
>>>>>>>
>>>>>>> --
>>>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>>>> <jo...@google.com> |  404-433-0242
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>>    *ANDRÉ ROCHA SILVA*
>>>>   * DATA ENGINEER*
>>>>   (48) 3181-0611
>>>>
>>>>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
>>>> <http://portaltelemedicina.com.br/>
>>>> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
>>>> <https://pt-br.facebook.com/PortalTelemedicina/>
>>>> <https://www.linkedin.com/company/9426084/>
>>>>
>>>>
>>
>> --
>> Soliman ElSaber
>> Data Engineer
>> www.mindvalley.com
>>
>

-- 

   *ANDRÉ ROCHA SILVA*
  * DATA ENGINEER*
  (48) 3181-0611

  <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
<http://portaltelemedicina.com.br/>
<https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
<https://pt-br.facebook.com/PortalTelemedicina/>
<https://www.linkedin.com/company/9426084/>

Re: Scheduling dataflow pipelines

Posted by Marco Mistroni <mm...@gmail.com>.
Thanks will give it a go

On Mon, Apr 6, 2020, 3:39 PM Soliman ElSaber <so...@mindvalley.com> wrote:

> We are using Composer (Airflow) to schedule and run the Dataflow jobs...
> Using the Python SDK, with small changes no the Composer (Airflow)
> DataFlowPythonOperator, to force it to use Python 3...
> It is working fine and creating a new Dataflow job every 30 minutes...
>
> On Mon, Apr 6, 2020 at 10:33 PM Marco Mistroni <mm...@gmail.com>
> wrote:
>
>> Right.. tx Andre. So presumably the flow of action will b
>> - create dflow template
>> -create CLF that invokes it
>> - create cold scheduler job that invokes function?
>>
>> Kind regards
>>
>> On Mon, Apr 6, 2020, 2:14 PM André Rocha Silva <
>> a.silva@portaltelemedicina.com.br> wrote:
>>
>>> Marco
>>>
>>> If you are already using GCP, I suggest you use the cloud scheduler. It
>>> is like a cron job completely serverless.
>>>
>>> If you need some extra help, let me know.
>>>
>>> On Mon, Apr 6, 2020 at 4:38 AM deepak kumar <kd...@gmail.com> wrote:
>>>
>>>> We have used composer (airlfow) successfully to schedule Dataflow jobs.
>>>> Please let me know if you would need details around it.
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Sun, Apr 5, 2020 at 7:56 PM Joshua B. Harrison <
>>>> josh.harrison@gmail.com> wrote:
>>>>
>>>>> Hi Marco,
>>>>>
>>>>> I've ended using a VM running Luigi to schedule jobs. I use the data
>>>>> flow Python API to execute stored templates.
>>>>>
>>>>> I can give you more details if you’re interested.
>>>>>
>>>>> Best,
>>>>> Joshua
>>>>>
>>>>> On Sun, Apr 5, 2020 at 5:02 AM Marco Mistroni <mm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> HI all
>>>>>>  sorry for this partially OT but has anyone been successful in
>>>>>> scheduling dataflow job on GCP?
>>>>>> I have tried the CloudFunction approach (following few eamples on the
>>>>>> web) but it didnt work out for me - the cloud function keep on giving me an
>>>>>> INVALID ARGUMENT - which i could not debug
>>>>>>
>>>>>> So i was wondering if anyone has  been successful and can provide me
>>>>>> an example
>>>>>>
>>>>>> kind regards
>>>>>>  Marco
>>>>>>
>>>>>> --
>>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>>> <jo...@google.com> |  404-433-0242
>>>>>
>>>>
>>>
>>> --
>>>
>>>    *ANDRÉ ROCHA SILVA*
>>>   * DATA ENGINEER*
>>>   (48) 3181-0611
>>>
>>>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
>>> <http://portaltelemedicina.com.br/>
>>> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
>>> <https://pt-br.facebook.com/PortalTelemedicina/>
>>> <https://www.linkedin.com/company/9426084/>
>>>
>>>
>
> --
> Soliman ElSaber
> Data Engineer
> www.mindvalley.com
>

Re: Scheduling dataflow pipelines

Posted by Soliman ElSaber <so...@mindvalley.com>.
We are using Composer (Airflow) to schedule and run the Dataflow jobs...
Using the Python SDK, with small changes no the Composer (Airflow) DataFlow
PythonOperator, to force it to use Python 3...
It is working fine and creating a new Dataflow job every 30 minutes...

On Mon, Apr 6, 2020 at 10:33 PM Marco Mistroni <mm...@gmail.com> wrote:

> Right.. tx Andre. So presumably the flow of action will b
> - create dflow template
> -create CLF that invokes it
> - create cold scheduler job that invokes function?
>
> Kind regards
>
> On Mon, Apr 6, 2020, 2:14 PM André Rocha Silva <
> a.silva@portaltelemedicina.com.br> wrote:
>
>> Marco
>>
>> If you are already using GCP, I suggest you use the cloud scheduler. It
>> is like a cron job completely serverless.
>>
>> If you need some extra help, let me know.
>>
>> On Mon, Apr 6, 2020 at 4:38 AM deepak kumar <kd...@gmail.com> wrote:
>>
>>> We have used composer (airlfow) successfully to schedule Dataflow jobs.
>>> Please let me know if you would need details around it.
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Sun, Apr 5, 2020 at 7:56 PM Joshua B. Harrison <
>>> josh.harrison@gmail.com> wrote:
>>>
>>>> Hi Marco,
>>>>
>>>> I've ended using a VM running Luigi to schedule jobs. I use the data
>>>> flow Python API to execute stored templates.
>>>>
>>>> I can give you more details if you’re interested.
>>>>
>>>> Best,
>>>> Joshua
>>>>
>>>> On Sun, Apr 5, 2020 at 5:02 AM Marco Mistroni <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> HI all
>>>>>  sorry for this partially OT but has anyone been successful in
>>>>> scheduling dataflow job on GCP?
>>>>> I have tried the CloudFunction approach (following few eamples on the
>>>>> web) but it didnt work out for me - the cloud function keep on giving me an
>>>>> INVALID ARGUMENT - which i could not debug
>>>>>
>>>>> So i was wondering if anyone has  been successful and can provide me
>>>>> an example
>>>>>
>>>>> kind regards
>>>>>  Marco
>>>>>
>>>>> --
>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>> <jo...@google.com> |  404-433-0242
>>>>
>>>
>>
>> --
>>
>>    *ANDRÉ ROCHA SILVA*
>>   * DATA ENGINEER*
>>   (48) 3181-0611
>>
>>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
>> <http://portaltelemedicina.com.br/>
>> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
>> <https://pt-br.facebook.com/PortalTelemedicina/>
>> <https://www.linkedin.com/company/9426084/>
>>
>>

-- 
Soliman ElSaber
Data Engineer
www.mindvalley.com

Re: Scheduling dataflow pipelines

Posted by Marco Mistroni <mm...@gmail.com>.
Right.. tx Andre. So presumably the flow of action will b
- create dflow template
-create CLF that invokes it
- create cold scheduler job that invokes function?

Kind regards

On Mon, Apr 6, 2020, 2:14 PM André Rocha Silva <
a.silva@portaltelemedicina.com.br> wrote:

> Marco
>
> If you are already using GCP, I suggest you use the cloud scheduler. It is
> like a cron job completely serverless.
>
> If you need some extra help, let me know.
>
> On Mon, Apr 6, 2020 at 4:38 AM deepak kumar <kd...@gmail.com> wrote:
>
>> We have used composer (airlfow) successfully to schedule Dataflow jobs.
>> Please let me know if you would need details around it.
>>
>> Thanks
>> Deepak
>>
>> On Sun, Apr 5, 2020 at 7:56 PM Joshua B. Harrison <
>> josh.harrison@gmail.com> wrote:
>>
>>> Hi Marco,
>>>
>>> I've ended using a VM running Luigi to schedule jobs. I use the data
>>> flow Python API to execute stored templates.
>>>
>>> I can give you more details if you’re interested.
>>>
>>> Best,
>>> Joshua
>>>
>>> On Sun, Apr 5, 2020 at 5:02 AM Marco Mistroni <mm...@gmail.com>
>>> wrote:
>>>
>>>> HI all
>>>>  sorry for this partially OT but has anyone been successful in
>>>> scheduling dataflow job on GCP?
>>>> I have tried the CloudFunction approach (following few eamples on the
>>>> web) but it didnt work out for me - the cloud function keep on giving me an
>>>> INVALID ARGUMENT - which i could not debug
>>>>
>>>> So i was wondering if anyone has  been successful and can provide me an
>>>> example
>>>>
>>>> kind regards
>>>>  Marco
>>>>
>>>> --
>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>> <jo...@google.com> |  404-433-0242
>>>
>>
>
> --
>
>    *ANDRÉ ROCHA SILVA*
>   * DATA ENGINEER*
>   (48) 3181-0611
>
>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
> <http://portaltelemedicina.com.br/>
> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
> <https://pt-br.facebook.com/PortalTelemedicina/>
> <https://www.linkedin.com/company/9426084/>
>
>

Re: Scheduling dataflow pipelines

Posted by André Rocha Silva <a....@portaltelemedicina.com.br>.
Marco

If you are already using GCP, I suggest you use the cloud scheduler. It is
like a cron job completely serverless.

If you need some extra help, let me know.

On Mon, Apr 6, 2020 at 4:38 AM deepak kumar <kd...@gmail.com> wrote:

> We have used composer (airlfow) successfully to schedule Dataflow jobs.
> Please let me know if you would need details around it.
>
> Thanks
> Deepak
>
> On Sun, Apr 5, 2020 at 7:56 PM Joshua B. Harrison <jo...@gmail.com>
> wrote:
>
>> Hi Marco,
>>
>> I've ended using a VM running Luigi to schedule jobs. I use the data flow
>> Python API to execute stored templates.
>>
>> I can give you more details if you’re interested.
>>
>> Best,
>> Joshua
>>
>> On Sun, Apr 5, 2020 at 5:02 AM Marco Mistroni <mm...@gmail.com>
>> wrote:
>>
>>> HI all
>>>  sorry for this partially OT but has anyone been successful in
>>> scheduling dataflow job on GCP?
>>> I have tried the CloudFunction approach (following few eamples on the
>>> web) but it didnt work out for me - the cloud function keep on giving me an
>>> INVALID ARGUMENT - which i could not debug
>>>
>>> So i was wondering if anyone has  been successful and can provide me an
>>> example
>>>
>>> kind regards
>>>  Marco
>>>
>>> --
>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>> <jo...@google.com> |  404-433-0242
>>
>

-- 

   *ANDRÉ ROCHA SILVA*
  * DATA ENGINEER*
  (48) 3181-0611

  <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
<http://portaltelemedicina.com.br/>
<https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
<https://pt-br.facebook.com/PortalTelemedicina/>
<https://www.linkedin.com/company/9426084/>

Re: Scheduling dataflow pipelines

Posted by deepak kumar <kd...@gmail.com>.
We have used composer (airlfow) successfully to schedule Dataflow jobs.
Please let me know if you would need details around it.

Thanks
Deepak

On Sun, Apr 5, 2020 at 7:56 PM Joshua B. Harrison <jo...@gmail.com>
wrote:

> Hi Marco,
>
> I've ended using a VM running Luigi to schedule jobs. I use the data flow
> Python API to execute stored templates.
>
> I can give you more details if you’re interested.
>
> Best,
> Joshua
>
> On Sun, Apr 5, 2020 at 5:02 AM Marco Mistroni <mm...@gmail.com> wrote:
>
>> HI all
>>  sorry for this partially OT but has anyone been successful in scheduling
>> dataflow job on GCP?
>> I have tried the CloudFunction approach (following few eamples on the
>> web) but it didnt work out for me - the cloud function keep on giving me an
>> INVALID ARGUMENT - which i could not debug
>>
>> So i was wondering if anyone has  been successful and can provide me an
>> example
>>
>> kind regards
>>  Marco
>>
>> --
> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
> <jo...@google.com> |  404-433-0242
>

Re: Scheduling dataflow pipelines

Posted by "Joshua B. Harrison" <jo...@gmail.com>.
Hi Marco,

I've ended using a VM running Luigi to schedule jobs. I use the data flow
Python API to execute stored templates.

I can give you more details if you’re interested.

Best,
Joshua

On Sun, Apr 5, 2020 at 5:02 AM Marco Mistroni <mm...@gmail.com> wrote:

> HI all
>  sorry for this partially OT but has anyone been successful in scheduling
> dataflow job on GCP?
> I have tried the CloudFunction approach (following few eamples on the web)
> but it didnt work out for me - the cloud function keep on giving me an
> INVALID ARGUMENT - which i could not debug
>
> So i was wondering if anyone has  been successful and can provide me an
> example
>
> kind regards
>  Marco
>
> --
Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
<jo...@google.com> |  404-433-0242