You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Marco Mistroni <mm...@gmail.com> on 2020/04/08 08:50:49 UTC

Re: Apache Dataflow Template (Python)/ partially OT

Hi all
 Was wondering if anyone has experience similar
I kicked off 3 dataflow template s via cloud function. It has created 3 VM
which are still alive after jobs completed and I cannot delete them....
Could anyone assist with this?
Kind regards

On Mon, Apr 6, 2020, 3:00 PM Marco Mistroni <mm...@gmail.com> wrote:

> Hey
>  Thanks I create template from CMD line...was having issues with CLF but I
> think I was not using Auth correctly
> Will try your sample and report back if I am stuck
> Thanks a lot!
>
> On Mon, Apr 6, 2020, 2:20 PM André Rocha Silva <
> a.silva@portaltelemedicina.com.br> wrote:
>
>> Could you create the template already?
>>
>> Have you read the article? There I write the cloud function in js. Here
>> is some example of a cloud function in python:
>>
>> import google.auth
>> import random
>> import logging
>>
>> from googleapiclient.discovery import build
>>
>> GCLOUD_PROJECT = 'project-id-123'
>>
>>
>> def RunDataflow(event, context):
>>
>> credentials, _ = google.auth.default()
>>
>> service = build('dataflow', 'v1b3', credentials=credentials)
>>
>> uri = 'gs://bucket/input/file'
>> output_file = 'gs://bucket/output/file'
>>
>> template_path = 'gs://bucket/Dataflow_templates/template'
>> template_body = {
>> 'jobName': ('cf-job-' + str(random.randint(1, 101000))),
>> 'parameters': {
>> 'input_file': uri,
>> 'output_file': output_file,
>> },
>> }
>>
>> request = service.projects().templates().launch(
>> projectId=GCLOUD_PROJECT,
>> gcsPath=template_path,
>> body=template_body)
>> response = request.execute()
>>
>> logging.info(f'RunDataflow: got this response {response}')
>>
>>
>> On Mon, Apr 6, 2020 at 10:13 AM Marco Mistroni <mm...@gmail.com>
>> wrote:
>>
>>> @andre sorry to hijack this. Are you able to send a working example of
>>> kicking off dataflow  template via cloud function?
>>>
>>> Kind regards
>>>
>>> On Mon, Apr 6, 2020, 1:51 PM André Rocha Silva <
>>> a.silva@portaltelemedicina.com.br> wrote:
>>>
>>>> Hey!
>>>>
>>>> Could you make it work? You can take a look in this post, is a
>>>> single file template, easy peasy to create a template from:
>>>>
>>>> https://towardsdatascience.com/my-first-etl-job-google-cloud-dataflow-1fd773afa955
>>>>
>>>> If you want, we can schedule a google hangout and I help you, step by
>>>> step.
>>>> It is the least I can do after having had so much help from the
>>>> community :)
>>>>
>>>> On Sat, Apr 4, 2020 at 4:52 PM Marco Mistroni <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey
>>>>>  sure... it's  a crap script :).. just an ordinary dataflow script
>>>>>
>>>>>
>>>>> https://github.com/mmistroni/GCP_Experiments/tree/master/dataflow/edgar_flow
>>>>>
>>>>>
>>>>> What i meant to say , for your template question, is for you to write
>>>>> a basic script which run on bean... something as simple as this
>>>>>
>>>>>
>>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/beam_test.py
>>>>>
>>>>> and then you can create a template out of it by just running this
>>>>>
>>>>> python -m edgar_main  --runner=dataflow --project=datascience-projets
>>>>> --template_location=gs://mm_dataflow_bucket/templates/edgar_dataflow_template
>>>>> --temp_location=gs://mm_dataflow_bucket/temp
>>>>> --staging_location=gs://mm_dataflow_bucket/staging
>>>>>
>>>>> That will create a template 'edgar_dataflow_template' which you can
>>>>> use in GCP dataflow console to create your job.
>>>>>
>>>>> hth, i m sort of a noob to Beam, having started writing code just over
>>>>> a month ago. Feel free to ping me if u get stuck
>>>>>
>>>>> kind regards
>>>>>  Marco
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Apr 4, 2020 at 6:01 PM Xander Song <ia...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Marco,
>>>>>>
>>>>>> Thanks for your response. Would you mind sending the edgar_main
>>>>>> script so I can take a look?
>>>>>>
>>>>>> On Sat, Apr 4, 2020 at 2:25 AM Marco Mistroni <mm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey
>>>>>>>  As far as I know you can generate a dataflow template out of your
>>>>>>> beam code by specifying an option on command line?
>>>>>>> I am running this CMD and once template is generated I kick off a
>>>>>>> dflow job via console by pointing at it
>>>>>>>
>>>>>>> python -m edgar_main --runner=dataflow --project=datascience-projets
>>>>>>> --template_location=gs://<your bucket> Hth
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Apr 4, 2020, 9:52 AM Xander Song <ia...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am attempting to write a custom Dataflow Template using the
>>>>>>>> Apache Beam Python SDK, but am finding the documentation difficult to
>>>>>>>> follow. Does anyone have a minimal working example of how to write and
>>>>>>>> deploy such a template?
>>>>>>>>
>>>>>>>> Thanks in advance.
>>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>>
>>>>    *ANDRÉ ROCHA SILVA*
>>>>   * DATA ENGINEER*
>>>>   (48) 3181-0611
>>>>
>>>>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
>>>> <http://portaltelemedicina.com.br/>
>>>> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
>>>> <https://pt-br.facebook.com/PortalTelemedicina/>
>>>> <https://www.linkedin.com/company/9426084/>
>>>>
>>>>
>>
>> --
>>
>>    *ANDRÉ ROCHA SILVA*
>>   * DATA ENGINEER*
>>   (48) 3181-0611
>>
>>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
>> <http://portaltelemedicina.com.br/>
>> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
>> <https://pt-br.facebook.com/PortalTelemedicina/>
>> <https://www.linkedin.com/company/9426084/>
>>
>>

Re: Apache Dataflow Template (Python)/ partially OT

Posted by Luke Cwik <lc...@google.com>.
Reach out to Google Cloud support.

On Wed, Apr 8, 2020 at 1:51 AM Marco Mistroni <mm...@gmail.com> wrote:

> Hi all
>  Was wondering if anyone has experience similar
> I kicked off 3 dataflow template s via cloud function. It has created 3 VM
> which are still alive after jobs completed and I cannot delete them....
> Could anyone assist with this?
> Kind regards
>
> On Mon, Apr 6, 2020, 3:00 PM Marco Mistroni <mm...@gmail.com> wrote:
>
>> Hey
>>  Thanks I create template from CMD line...was having issues with CLF but
>> I think I was not using Auth correctly
>> Will try your sample and report back if I am stuck
>> Thanks a lot!
>>
>> On Mon, Apr 6, 2020, 2:20 PM André Rocha Silva <
>> a.silva@portaltelemedicina.com.br> wrote:
>>
>>> Could you create the template already?
>>>
>>> Have you read the article? There I write the cloud function in js. Here
>>> is some example of a cloud function in python:
>>>
>>> import google.auth
>>> import random
>>> import logging
>>>
>>> from googleapiclient.discovery import build
>>>
>>> GCLOUD_PROJECT = 'project-id-123'
>>>
>>>
>>> def RunDataflow(event, context):
>>>
>>> credentials, _ = google.auth.default()
>>>
>>> service = build('dataflow', 'v1b3', credentials=credentials)
>>>
>>> uri = 'gs://bucket/input/file'
>>> output_file = 'gs://bucket/output/file'
>>>
>>> template_path = 'gs://bucket/Dataflow_templates/template'
>>> template_body = {
>>> 'jobName': ('cf-job-' + str(random.randint(1, 101000))),
>>> 'parameters': {
>>> 'input_file': uri,
>>> 'output_file': output_file,
>>> },
>>> }
>>>
>>> request = service.projects().templates().launch(
>>> projectId=GCLOUD_PROJECT,
>>> gcsPath=template_path,
>>> body=template_body)
>>> response = request.execute()
>>>
>>> logging.info(f'RunDataflow: got this response {response}')
>>>
>>>
>>> On Mon, Apr 6, 2020 at 10:13 AM Marco Mistroni <mm...@gmail.com>
>>> wrote:
>>>
>>>> @andre sorry to hijack this. Are you able to send a working example of
>>>> kicking off dataflow  template via cloud function?
>>>>
>>>> Kind regards
>>>>
>>>> On Mon, Apr 6, 2020, 1:51 PM André Rocha Silva <
>>>> a.silva@portaltelemedicina.com.br> wrote:
>>>>
>>>>> Hey!
>>>>>
>>>>> Could you make it work? You can take a look in this post, is a
>>>>> single file template, easy peasy to create a template from:
>>>>>
>>>>> https://towardsdatascience.com/my-first-etl-job-google-cloud-dataflow-1fd773afa955
>>>>>
>>>>> If you want, we can schedule a google hangout and I help you, step by
>>>>> step.
>>>>> It is the least I can do after having had so much help from the
>>>>> community :)
>>>>>
>>>>> On Sat, Apr 4, 2020 at 4:52 PM Marco Mistroni <mm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hey
>>>>>>  sure... it's  a crap script :).. just an ordinary dataflow script
>>>>>>
>>>>>>
>>>>>> https://github.com/mmistroni/GCP_Experiments/tree/master/dataflow/edgar_flow
>>>>>>
>>>>>>
>>>>>> What i meant to say , for your template question, is for you to write
>>>>>> a basic script which run on bean... something as simple as this
>>>>>>
>>>>>>
>>>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/beam_test.py
>>>>>>
>>>>>> and then you can create a template out of it by just running this
>>>>>>
>>>>>> python -m edgar_main  --runner=dataflow --project=datascience-projets
>>>>>> --template_location=gs://mm_dataflow_bucket/templates/edgar_dataflow_template
>>>>>> --temp_location=gs://mm_dataflow_bucket/temp
>>>>>> --staging_location=gs://mm_dataflow_bucket/staging
>>>>>>
>>>>>> That will create a template 'edgar_dataflow_template' which you can
>>>>>> use in GCP dataflow console to create your job.
>>>>>>
>>>>>> hth, i m sort of a noob to Beam, having started writing code just
>>>>>> over a month ago. Feel free to ping me if u get stuck
>>>>>>
>>>>>> kind regards
>>>>>>  Marco
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Apr 4, 2020 at 6:01 PM Xander Song <ia...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Marco,
>>>>>>>
>>>>>>> Thanks for your response. Would you mind sending the edgar_main
>>>>>>> script so I can take a look?
>>>>>>>
>>>>>>> On Sat, Apr 4, 2020 at 2:25 AM Marco Mistroni <mm...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey
>>>>>>>>  As far as I know you can generate a dataflow template out of your
>>>>>>>> beam code by specifying an option on command line?
>>>>>>>> I am running this CMD and once template is generated I kick off a
>>>>>>>> dflow job via console by pointing at it
>>>>>>>>
>>>>>>>> python -m edgar_main --runner=dataflow
>>>>>>>> --project=datascience-projets --template_location=gs://<your bucket> Hth
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Apr 4, 2020, 9:52 AM Xander Song <ia...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I am attempting to write a custom Dataflow Template using the
>>>>>>>>> Apache Beam Python SDK, but am finding the documentation difficult to
>>>>>>>>> follow. Does anyone have a minimal working example of how to write and
>>>>>>>>> deploy such a template?
>>>>>>>>>
>>>>>>>>> Thanks in advance.
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>    *ANDRÉ ROCHA SILVA*
>>>>>   * DATA ENGINEER*
>>>>>   (48) 3181-0611
>>>>>
>>>>>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
>>>>> <http://portaltelemedicina.com.br/>
>>>>> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
>>>>> <https://pt-br.facebook.com/PortalTelemedicina/>
>>>>> <https://www.linkedin.com/company/9426084/>
>>>>>
>>>>>
>>>
>>> --
>>>
>>>    *ANDRÉ ROCHA SILVA*
>>>   * DATA ENGINEER*
>>>   (48) 3181-0611
>>>
>>>   <https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
>>> <http://portaltelemedicina.com.br/>
>>> <https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
>>> <https://pt-br.facebook.com/PortalTelemedicina/>
>>> <https://www.linkedin.com/company/9426084/>
>>>
>>>