You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "Morand, Sebastien" <se...@veolia.com> on 2017/06/04 17:22:23 UTC

Error in Python SDK 2.0 dataflow

Hi,

I'm stucked with an error in strackdrive making my container run over and
over again:

19:05:25.000 Installing collected packages: MyProject
19:05:25.000 Running setup.py install for MyProject: started
19:05:25.000 Running setup.py install for MyProject: finished with status
'done'
19:05:25.000 Successfully installed MyProject-1.0
19:05:25.000 Shuffle library target install path:
/usr/local/lib/python2.7/dist-packages/dataflow_worker
19:05:25.000 Shuffle client library installed.
19:05:25.000 Starting 1 python sub-processes
19:05:25.000 Executing: /usr/bin/python -m dataflow_worker.start
-Djob_id=2017-06-04_09_46_51-11252670683207760193 -Dproject_id=<projectid>
-Dreporting_enabled=True -Droot_url=https://dataflow.googleapis.com
-Dservice_path=https://dataflow.googleapis.com/
-Dtemp_gcs_directory=gs://unused
-Dworker_id=beamapp-sebastien-0604164-06040946-fb45-harness-h8hl
-Ddataflow.worker.logging.location=/var/log/dataflow/python-dataflow-0-json.log
-Dlocal_staging_directory=/var/opt/google/dataflow
-Dsdk_pipeline_options={"display_data":[{"key":"requirements_file","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"conf/requirements.txt"},{"key":"runner","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"DataflowRunner"},{"key":"staging_location","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"gs://dataflow-run/staging/beamapp-sebastien-0604164610-664715.1496594770.664848"},{"key":"project","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"<project_id>"},{"key":"temp_location","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"gs://dataflow-run/temp/beamapp-sebastien-0604164610-664715.1496594770.664848"},{"key":"setup_file","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"./setup.py"},{"key":"job_name","namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"STRING","value":"beamapp-sebastien-0604164610-664715"}],"options":{"autoscalingAlgorithm":"NONE","dataflowJobId":"2017-06-04_09_46_51-11252670683207760193","dataflow_endpoint":"
https://dataflow.googleapis.com
","direct_runner_use_stacked_bundle":true,"gcpTempLocation":"gs://dataflow-run/temp/beamapp-sebastien-0604164610-664715.1496594770.664848","job_name":"beamapp-sebastien-0604164610-664715","maxNumWorkers":0,"no_auth":false,"numWorkers":3,"pipeline_type_check":true,"profile_cpu":false,"profile_memory":false,"project":"<project_id>","region":"us-central1","requirements_file":"conf/requirements.txt","runner":"DataflowRunner","runtime_type_check":false,"save_main_session":false,"sdk_location":"default","setup_file":"./setup.py","staging_location":"gs://dataflow-run/staging/beamapp-sebastien-0604164610-664715.1496594770.664848","streaming":false,"temp_location":"gs://dataflow-run/temp/beamapp-sebastien-0604164610-664715.1496594770.664848","type_check_strictness":"DEFAULT_TO_ANY"}}
19:05:25.000 Traceback (most recent call last):
19:05:25.000 File "/usr/lib/python2.7/runpy.py", line 162, in
_run_module_as_main
19:05:25.000
19:05:25.000 "__main__", fname, loader, pkg_name)
19:05:25.000 File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
19:05:25.000
19:05:25.000 exec code in run_globals
19:05:25.000 File
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/start.py", line 26,
in <module>
19:05:25.000
19:05:25.000 from dataflow_worker import batchworker
19:05:25.000 File
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py",
line 61, in <module>
19:05:25.000
19:05:25.000 from apitools.base.py.exceptions import HttpError
19:05:25.000 File
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/__init__.py", line
23, in <module>
19:05:25.000
19:05:25.000 from apitools.base.py.credentials_lib import *
19:05:25.000 File
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/credentials_lib.py",
line 34, in <module>
19:05:25.000
19:05:25.000 from apitools.base.py import exceptions
19:05:25.000 ImportError
19:05:25.000 :
19:05:25.000 cannot import name exceptions
19:05:25.000 /usr/bin/python failed with exit status 1

My code is working fine on local runner, but it lasts 2 hours then crash on
dataflow runner. What is this error?

Thanks by advance,
Sébastien

-- 

--------------------------------------------------------------------------------------------
This e-mail transmission (message and any attached files) may contain 
information that is proprietary, privileged and/or confidential to Veolia 
Environnement and/or its affiliates and is intended exclusively for the 
person(s) to whom it is addressed. If you are not the intended recipient, 
please notify the sender by return e-mail and delete all copies of this 
e-mail, including all attachments. Unless expressly authorized, any use, 
disclosure, publication, retransmission or dissemination of this e-mail 
and/or of its attachments is strictly prohibited. 

Ce message electronique et ses fichiers attaches sont strictement 
confidentiels et peuvent contenir des elements dont Veolia Environnement 
et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc 
destines a l'usage de leurs seuls destinataires. Si vous avez recu ce 
message par erreur, merci de le retourner a son emetteur et de le detruire 
ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la 
publication, la distribution, ou la reproduction non expressement 
autorisees de ce message et de ses pieces attachees sont interdites.
--------------------------------------------------------------------------------------------

Re: Error in Python SDK 2.0 dataflow

Posted by Ahmet Altay <al...@google.com>.
The original error reported is unlikely to be related to C-dependent
library. Please let us know if you get more information.

On Mon, Jun 5, 2017 at 5:09 PM, Dmitry Demeshchuk <dm...@postmates.com>
wrote:

> Interesting, I'm running into a very similar issue. My current wild
> guesses are:
>
> 1. I'm somehow messing up nested packages management in my project (from
> your setup.py code you mentioned in the other email it looks like you are
> also using sub-packages).
>
> 2. I'm having a C-dependent library (specifically, psycopg2).
>
> The latter seems to be more likely, hopefully I will confirm it soon.
>
> On Sun, Jun 4, 2017 at 10:22 AM, Morand, Sebastien <
> sebastien.morand@veolia.com> wrote:
>
>> Hi,
>>
>> I'm stucked with an error in strackdrive making my container run over and
>> over again:
>>
>> 19:05:25.000 Installing collected packages: MyProject
>> 19:05:25.000 Running setup.py install for MyProject: started
>> 19:05:25.000 Running setup.py install for MyProject: finished with status
>> 'done'
>> 19:05:25.000 Successfully installed MyProject-1.0
>> 19:05:25.000 Shuffle library target install path:
>> /usr/local/lib/python2.7/dist-packages/dataflow_worker
>> 19:05:25.000 Shuffle client library installed.
>> 19:05:25.000 Starting 1 python sub-processes
>> 19:05:25.000 Executing: /usr/bin/python -m dataflow_worker.start
>> -Djob_id=2017-06-04_09_46_51-11252670683207760193
>> -Dproject_id=<projectid> -Dreporting_enabled=True -Droot_url=
>> https://dataflow.googleapis.com -Dservice_path=https://dataflo
>> w.googleapis.com/ -Dtemp_gcs_directory=gs://unused
>> -Dworker_id=beamapp-sebastien-0604164-06040946-fb45-harness-h8hl
>> -Ddataflow.worker.logging.location=/var/log/dataflow/python-dataflow-0-json.log
>> -Dlocal_staging_directory=/var/opt/google/dataflow
>> -Dsdk_pipeline_options={"display_data":[{"key":"requirements
>> _file","namespace":"apache_beam.options.pipeline_options.
>> PipelineOptions","type":"STRING","value":"conf/requirements.
>> txt"},{"key":"runner","namespace":"apache_beam.
>> options.pipeline_options.PipelineOptions","type":"STRING","
>> value":"DataflowRunner"},{"key":"staging_location","
>> namespace":"apache_beam.options.pipeline_options.
>> PipelineOptions","type":"STRING","value":"gs://dataflow
>> -run/staging/beamapp-sebastien-0604164610-664715.1496594770.
>> 664848"},{"key":"project","namespace":"apache_beam.
>> options.pipeline_options.PipelineOptions","type":"STRING","
>> value":"<project_id>"},{"key":"temp_location","namespace":"
>> apache_beam.options.pipeline_options.PipelineOptions","
>> type":"STRING","value":"gs://dataflow-run/temp/beamapp-sebas
>> tien-0604164610-664715.1496594770.664848"},{"key":"setup_
>> file","namespace":"apache_beam.options.pipeline_options.
>> PipelineOptions","type":"STRING","value":"./setup.py"},
>> {"key":"job_name","namespace":"apache_beam.options.pipeline_
>> options.PipelineOptions","type":"STRING","value":"
>> beamapp-sebastien-0604164610-664715"}],"options":{"autoscal
>> ingAlgorithm":"NONE","dataflowJobId":"2017-06-04_09_46_51-
>> 11252670683207760193","dataflow_endpoint":"https://dataflow.
>> googleapis.com","direct_runner_use_stacked_bundle":
>> true,"gcpTempLocation":"gs://dataflow-run/temp/beamapp-
>> sebastien-0604164610-664715.1496594770.664848","job_name":
>> "beamapp-sebastien-0604164610-664715","maxNumWorkers":0,"no_
>> auth":false,"numWorkers":3,"pipeline_type_check":true,"profi
>> le_cpu":false,"profile_memory":false,"project":"<project_id>
>> ","region":"us-central1","requirements_file":"conf/
>> requirements.txt","runner":"DataflowRunner","runtime_type_
>> check":false,"save_main_session":false,"sdk_location":
>> "default","setup_file":"./setup.py","staging_location":"
>> gs://dataflow-run/staging/beamapp-sebastien-0604164610-
>> 664715.1496594770.664848","streaming":false,"temp_
>> location":"gs://dataflow-run/temp/beamapp-sebastien-0604164
>> 610-664715.1496594770.664848","type_check_strictness":"DEFAULT_TO_ANY"}}
>> 19:05:25.000 Traceback (most recent call last):
>> 19:05:25.000 File "/usr/lib/python2.7/runpy.py", line 162, in
>> _run_module_as_main
>> 19:05:25.000
>> 19:05:25.000 "__main__", fname, loader, pkg_name)
>> 19:05:25.000 File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
>> 19:05:25.000
>> 19:05:25.000 exec code in run_globals
>> 19:05:25.000 File "/usr/local/lib/python2.7/dist
>> -packages/dataflow_worker/start.py", line 26, in <module>
>> 19:05:25.000
>> 19:05:25.000 from dataflow_worker import batchworker
>> 19:05:25.000 File "/usr/local/lib/python2.7/dist
>> -packages/dataflow_worker/batchworker.py", line 61, in <module>
>> 19:05:25.000
>> 19:05:25.000 from apitools.base.py.exceptions import HttpError
>> 19:05:25.000 File "/usr/local/lib/python2.7/dist
>> -packages/apitools/base/py/__init__.py", line 23, in <module>
>> 19:05:25.000
>> 19:05:25.000 from apitools.base.py.credentials_lib import *
>> 19:05:25.000 File "/usr/local/lib/python2.7/dist
>> -packages/apitools/base/py/credentials_lib.py", line 34, in <module>
>> 19:05:25.000
>> 19:05:25.000 from apitools.base.py import exceptions
>> 19:05:25.000 ImportError
>> 19:05:25.000 :
>> 19:05:25.000 cannot import name exceptions
>> 19:05:25.000 /usr/bin/python failed with exit status 1
>>
>> My code is working fine on local runner, but it lasts 2 hours then crash
>> on dataflow runner. What is this error?
>>
>> Thanks by advance,
>> Sébastien
>>
>>
>> ------------------------------------------------------------
>> --------------------------------
>> This e-mail transmission (message and any attached files) may contain
>> information that is proprietary, privileged and/or confidential to Veolia
>> Environnement and/or its affiliates and is intended exclusively for the
>> person(s) to whom it is addressed. If you are not the intended recipient,
>> please notify the sender by return e-mail and delete all copies of this
>> e-mail, including all attachments. Unless expressly authorized, any use,
>> disclosure, publication, retransmission or dissemination of this e-mail
>> and/or of its attachments is strictly prohibited.
>>
>> Ce message electronique et ses fichiers attaches sont strictement
>> confidentiels et peuvent contenir des elements dont Veolia Environnement
>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
>> message par erreur, merci de le retourner a son emetteur et de le detruire
>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
>> publication, la distribution, ou la reproduction non expressement
>> autorisees de ce message et de ses pieces attachees sont interdites.
>> ------------------------------------------------------------
>> --------------------------------
>>
>
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>

Re: Error in Python SDK 2.0 dataflow

Posted by Dmitry Demeshchuk <dm...@postmates.com>.
Interesting, I'm running into a very similar issue. My current wild guesses
are:

1. I'm somehow messing up nested packages management in my project (from
your setup.py code you mentioned in the other email it looks like you are
also using sub-packages).

2. I'm having a C-dependent library (specifically, psycopg2).

The latter seems to be more likely, hopefully I will confirm it soon.

On Sun, Jun 4, 2017 at 10:22 AM, Morand, Sebastien <
sebastien.morand@veolia.com> wrote:

> Hi,
>
> I'm stucked with an error in strackdrive making my container run over and
> over again:
>
> 19:05:25.000 Installing collected packages: MyProject
> 19:05:25.000 Running setup.py install for MyProject: started
> 19:05:25.000 Running setup.py install for MyProject: finished with status
> 'done'
> 19:05:25.000 Successfully installed MyProject-1.0
> 19:05:25.000 Shuffle library target install path:
> /usr/local/lib/python2.7/dist-packages/dataflow_worker
> 19:05:25.000 Shuffle client library installed.
> 19:05:25.000 Starting 1 python sub-processes
> 19:05:25.000 Executing: /usr/bin/python -m dataflow_worker.start
> -Djob_id=2017-06-04_09_46_51-11252670683207760193
> -Dproject_id=<projectid> -Dreporting_enabled=True -Droot_url=
> https://dataflow.googleapis.com -Dservice_path=https://
> dataflow.googleapis.com/ -Dtemp_gcs_directory=gs://unused
> -Dworker_id=beamapp-sebastien-0604164-06040946-fb45-harness-h8hl
> -Ddataflow.worker.logging.location=/var/log/dataflow/python-dataflow-0-json.log
> -Dlocal_staging_directory=/var/opt/google/dataflow
> -Dsdk_pipeline_options={"display_data":[{"key":"
> requirements_file","namespace":"apache_beam.options.pipeline_options.
> PipelineOptions","type":"STRING","value":"conf/requirements.txt"},{"key":"
> runner","namespace":"apache_beam.options.pipeline_options.
> PipelineOptions","type":"STRING","value":"DataflowRunner"},{"key":"
> staging_location","namespace":"apache_beam.options.pipeline_
> options.PipelineOptions","type":"STRING","value":"gs://
> dataflow-run/staging/beamapp-sebastien-0604164610-664715.
> 1496594770.664848"},{"key":"project","namespace":"apache_
> beam.options.pipeline_options.PipelineOptions","type":"
> STRING","value":"<project_id>"},{"key":"temp_location","
> namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"
> STRING","value":"gs://dataflow-run/temp/beamapp-
> sebastien-0604164610-664715.1496594770.664848"},{"key":"
> setup_file","namespace":"apache_beam.options.pipeline_
> options.PipelineOptions","type":"STRING","value":"./
> setup.py"},{"key":"job_name","namespace":"apache_beam.
> options.pipeline_options.PipelineOptions","type":"
> STRING","value":"beamapp-sebastien-0604164610-664715"}],"options":{"
> autoscalingAlgorithm":"NONE","dataflowJobId":"2017-06-04_09_
> 46_51-11252670683207760193","dataflow_endpoint":"https://
> dataflow.googleapis.com","direct_runner_use_stacked_
> bundle":true,"gcpTempLocation":"gs://dataflow-run/temp/
> beamapp-sebastien-0604164610-664715.1496594770.664848","
> job_name":"beamapp-sebastien-0604164610-664715","
> maxNumWorkers":0,"no_auth":false,"numWorkers":3,"
> pipeline_type_check":true,"profile_cpu":false,"profile_
> memory":false,"project":"<project_id>","region":"us-
> central1","requirements_file":"conf/requirements.txt","
> runner":"DataflowRunner","runtime_type_check":false,"
> save_main_session":false,"sdk_location":"default","setup_
> file":"./setup.py","staging_location":"gs://dataflow-run/
> staging/beamapp-sebastien-0604164610-664715.1496594770.
> 664848","streaming":false,"temp_location":"gs://dataflow-
> run/temp/beamapp-sebastien-0604164610-664715.1496594770.
> 664848","type_check_strictness":"DEFAULT_TO_ANY"}}
> 19:05:25.000 Traceback (most recent call last):
> 19:05:25.000 File "/usr/lib/python2.7/runpy.py", line 162, in
> _run_module_as_main
> 19:05:25.000
> 19:05:25.000 "__main__", fname, loader, pkg_name)
> 19:05:25.000 File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> 19:05:25.000
> 19:05:25.000 exec code in run_globals
> 19:05:25.000 File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/start.py",
> line 26, in <module>
> 19:05:25.000
> 19:05:25.000 from dataflow_worker import batchworker
> 19:05:25.000 File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py",
> line 61, in <module>
> 19:05:25.000
> 19:05:25.000 from apitools.base.py.exceptions import HttpError
> 19:05:25.000 File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/__init__.py",
> line 23, in <module>
> 19:05:25.000
> 19:05:25.000 from apitools.base.py.credentials_lib import *
> 19:05:25.000 File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/credentials_lib.py",
> line 34, in <module>
> 19:05:25.000
> 19:05:25.000 from apitools.base.py import exceptions
> 19:05:25.000 ImportError
> 19:05:25.000 :
> 19:05:25.000 cannot import name exceptions
> 19:05:25.000 /usr/bin/python failed with exit status 1
>
> My code is working fine on local runner, but it lasts 2 hours then crash
> on dataflow runner. What is this error?
>
> Thanks by advance,
> Sébastien
>
>
> ------------------------------------------------------------
> --------------------------------
> This e-mail transmission (message and any attached files) may contain
> information that is proprietary, privileged and/or confidential to Veolia
> Environnement and/or its affiliates and is intended exclusively for the
> person(s) to whom it is addressed. If you are not the intended recipient,
> please notify the sender by return e-mail and delete all copies of this
> e-mail, including all attachments. Unless expressly authorized, any use,
> disclosure, publication, retransmission or dissemination of this e-mail
> and/or of its attachments is strictly prohibited.
>
> Ce message electronique et ses fichiers attaches sont strictement
> confidentiels et peuvent contenir des elements dont Veolia Environnement
> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
> message par erreur, merci de le retourner a son emetteur et de le detruire
> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
> publication, la distribution, ou la reproduction non expressement
> autorisees de ce message et de ses pieces attachees sont interdites.
> ------------------------------------------------------------
> --------------------------------
>



-- 
Best regards,
Dmitry Demeshchuk.

Re: Error in Python SDK 2.0 dataflow

Posted by "Morand, Sebastien" <se...@veolia.com>.
Hi,

Actually I solved this problem removing install_requirements in setup.py.

But I did a lot of changes since then. I'll give a try removing the
requirements.txt and add the requirements in setup.py and let you know.

Regards,

*Sébastien MORAND*
Team Lead Solution Architect
Technology & Operations / Digital Factory
Veolia - Group Information Systems & Technology (IS&T)
Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08
Bureau 0144C (Ouest)
30, rue Madeleine-Vionnet - 93300 Aubervilliers, France
*www.veolia.com <http://www.veolia.com>*
<http://www.veolia.com>
<https://www.facebook.com/veoliaenvironment/>
<https://www.youtube.com/user/veoliaenvironnement>
<https://www.linkedin.com/company/veolia-environnement>
<https://twitter.com/veolia>

On 6 June 2017 at 02:04, Ahmet Altay <al...@google.com> wrote:

> Hi Sébastien,
>
> Could you explain more on what you are doing? Is it possible that you are
> overwriting/removing google-apitools package somehow? Also, for dataflow
> service issues feel free to use one of the methods mentioned in Dataflow
> support page [1].
>
> Thank you,
> Ahmet
>
> [1] https://cloud.google.com/dataflow/support
>
> On Sun, Jun 4, 2017 at 10:22 AM, Morand, Sebastien <
> sebastien.morand@veolia.com> wrote:
>
>> Hi,
>>
>> I'm stucked with an error in strackdrive making my container run over and
>> over again:
>>
>> 19:05:25.000 Installing collected packages: MyProject
>> 19:05:25.000 Running setup.py install for MyProject: started
>> 19:05:25.000 Running setup.py install for MyProject: finished with status
>> 'done'
>> 19:05:25.000 Successfully installed MyProject-1.0
>> 19:05:25.000 Shuffle library target install path:
>> /usr/local/lib/python2.7/dist-packages/dataflow_worker
>> 19:05:25.000 Shuffle client library installed.
>> 19:05:25.000 Starting 1 python sub-processes
>> 19:05:25.000 Executing: /usr/bin/python -m dataflow_worker.start
>> -Djob_id=2017-06-04_09_46_51-11252670683207760193
>> -Dproject_id=<projectid> -Dreporting_enabled=True -Droot_url=
>> https://dataflow.googleapis.com -Dservice_path=https://dataflo
>> w.googleapis.com/ -Dtemp_gcs_directory=gs://unused
>> -Dworker_id=beamapp-sebastien-0604164-06040946-fb45-harness-h8hl
>> -Ddataflow.worker.logging.location=/var/log/dataflow/python-dataflow-0-json.log
>> -Dlocal_staging_directory=/var/opt/google/dataflow
>> -Dsdk_pipeline_options={"display_data":[{"key":"requirements
>> _file","namespace":"apache_beam.options.pipeline_options.
>> PipelineOptions","type":"STRING","value":"conf/requirements.
>> txt"},{"key":"runner","namespace":"apache_beam.
>> options.pipeline_options.PipelineOptions","type":"STRING","
>> value":"DataflowRunner"},{"key":"staging_location","
>> namespace":"apache_beam.options.pipeline_options.
>> PipelineOptions","type":"STRING","value":"gs://dataflow
>> -run/staging/beamapp-sebastien-0604164610-664715.1496594770.
>> 664848"},{"key":"project","namespace":"apache_beam.
>> options.pipeline_options.PipelineOptions","type":"STRING","
>> value":"<project_id>"},{"key":"temp_location","namespace":"
>> apache_beam.options.pipeline_options.PipelineOptions","
>> type":"STRING","value":"gs://dataflow-run/temp/beamapp-sebas
>> tien-0604164610-664715.1496594770.664848"},{"key":"setup_
>> file","namespace":"apache_beam.options.pipeline_options.
>> PipelineOptions","type":"STRING","value":"./setup.py"},
>> {"key":"job_name","namespace":"apache_beam.options.pipeline_
>> options.PipelineOptions","type":"STRING","value":"
>> beamapp-sebastien-0604164610-664715"}],"options":{"autoscal
>> ingAlgorithm":"NONE","dataflowJobId":"2017-06-04_09_46_51-
>> 11252670683207760193","dataflow_endpoint":"https://dataflow.
>> googleapis.com","direct_runner_use_stacked_bundle":
>> true,"gcpTempLocation":"gs://dataflow-run/temp/beamapp-
>> sebastien-0604164610-664715.1496594770.664848","job_name":
>> "beamapp-sebastien-0604164610-664715","maxNumWorkers":0,"no_
>> auth":false,"numWorkers":3,"pipeline_type_check":true,"profi
>> le_cpu":false,"profile_memory":false,"project":"<project_id>
>> ","region":"us-central1","requirements_file":"conf/
>> requirements.txt","runner":"DataflowRunner","runtime_type_
>> check":false,"save_main_session":false,"sdk_location":
>> "default","setup_file":"./setup.py","staging_location":"
>> gs://dataflow-run/staging/beamapp-sebastien-0604164610-
>> 664715.1496594770.664848","streaming":false,"temp_
>> location":"gs://dataflow-run/temp/beamapp-sebastien-0604164
>> 610-664715.1496594770.664848","type_check_strictness":"DEFAULT_TO_ANY"}}
>> 19:05:25.000 Traceback (most recent call last):
>> 19:05:25.000 File "/usr/lib/python2.7/runpy.py", line 162, in
>> _run_module_as_main
>> 19:05:25.000
>> 19:05:25.000 "__main__", fname, loader, pkg_name)
>> 19:05:25.000 File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
>> 19:05:25.000
>> 19:05:25.000 exec code in run_globals
>> 19:05:25.000 File "/usr/local/lib/python2.7/dist
>> -packages/dataflow_worker/start.py", line 26, in <module>
>> 19:05:25.000
>> 19:05:25.000 from dataflow_worker import batchworker
>> 19:05:25.000 File "/usr/local/lib/python2.7/dist
>> -packages/dataflow_worker/batchworker.py", line 61, in <module>
>> 19:05:25.000
>> 19:05:25.000 from apitools.base.py.exceptions import HttpError
>> 19:05:25.000 File "/usr/local/lib/python2.7/dist
>> -packages/apitools/base/py/__init__.py", line 23, in <module>
>> 19:05:25.000
>> 19:05:25.000 from apitools.base.py.credentials_lib import *
>> 19:05:25.000 File "/usr/local/lib/python2.7/dist
>> -packages/apitools/base/py/credentials_lib.py", line 34, in <module>
>> 19:05:25.000
>> 19:05:25.000 from apitools.base.py import exceptions
>> 19:05:25.000 ImportError
>> 19:05:25.000 :
>> 19:05:25.000 cannot import name exceptions
>> 19:05:25.000 /usr/bin/python failed with exit status 1
>>
>> My code is working fine on local runner, but it lasts 2 hours then crash
>> on dataflow runner. What is this error?
>>
>> Thanks by advance,
>> Sébastien
>>
>>
>> ------------------------------------------------------------
>> --------------------------------
>> This e-mail transmission (message and any attached files) may contain
>> information that is proprietary, privileged and/or confidential to Veolia
>> Environnement and/or its affiliates and is intended exclusively for the
>> person(s) to whom it is addressed. If you are not the intended recipient,
>> please notify the sender by return e-mail and delete all copies of this
>> e-mail, including all attachments. Unless expressly authorized, any use,
>> disclosure, publication, retransmission or dissemination of this e-mail
>> and/or of its attachments is strictly prohibited.
>>
>> Ce message electronique et ses fichiers attaches sont strictement
>> confidentiels et peuvent contenir des elements dont Veolia Environnement
>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
>> message par erreur, merci de le retourner a son emetteur et de le detruire
>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
>> publication, la distribution, ou la reproduction non expressement
>> autorisees de ce message et de ses pieces attachees sont interdites.
>> ------------------------------------------------------------
>> --------------------------------
>>
>
>

-- 

--------------------------------------------------------------------------------------------
This e-mail transmission (message and any attached files) may contain 
information that is proprietary, privileged and/or confidential to Veolia 
Environnement and/or its affiliates and is intended exclusively for the 
person(s) to whom it is addressed. If you are not the intended recipient, 
please notify the sender by return e-mail and delete all copies of this 
e-mail, including all attachments. Unless expressly authorized, any use, 
disclosure, publication, retransmission or dissemination of this e-mail 
and/or of its attachments is strictly prohibited. 

Ce message electronique et ses fichiers attaches sont strictement 
confidentiels et peuvent contenir des elements dont Veolia Environnement 
et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc 
destines a l'usage de leurs seuls destinataires. Si vous avez recu ce 
message par erreur, merci de le retourner a son emetteur et de le detruire 
ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la 
publication, la distribution, ou la reproduction non expressement 
autorisees de ce message et de ses pieces attachees sont interdites.
--------------------------------------------------------------------------------------------

Re: Error in Python SDK 2.0 dataflow

Posted by Ahmet Altay <al...@google.com>.
Hi Sébastien,

Could you explain more on what you are doing? Is it possible that you are
overwriting/removing google-apitools package somehow? Also, for dataflow
service issues feel free to use one of the methods mentioned in Dataflow
support page [1].

Thank you,
Ahmet

[1] https://cloud.google.com/dataflow/support

On Sun, Jun 4, 2017 at 10:22 AM, Morand, Sebastien <
sebastien.morand@veolia.com> wrote:

> Hi,
>
> I'm stucked with an error in strackdrive making my container run over and
> over again:
>
> 19:05:25.000 Installing collected packages: MyProject
> 19:05:25.000 Running setup.py install for MyProject: started
> 19:05:25.000 Running setup.py install for MyProject: finished with status
> 'done'
> 19:05:25.000 Successfully installed MyProject-1.0
> 19:05:25.000 Shuffle library target install path:
> /usr/local/lib/python2.7/dist-packages/dataflow_worker
> 19:05:25.000 Shuffle client library installed.
> 19:05:25.000 Starting 1 python sub-processes
> 19:05:25.000 Executing: /usr/bin/python -m dataflow_worker.start
> -Djob_id=2017-06-04_09_46_51-11252670683207760193
> -Dproject_id=<projectid> -Dreporting_enabled=True -Droot_url=
> https://dataflow.googleapis.com -Dservice_path=https://
> dataflow.googleapis.com/ -Dtemp_gcs_directory=gs://unused
> -Dworker_id=beamapp-sebastien-0604164-06040946-fb45-harness-h8hl
> -Ddataflow.worker.logging.location=/var/log/dataflow/python-dataflow-0-json.log
> -Dlocal_staging_directory=/var/opt/google/dataflow
> -Dsdk_pipeline_options={"display_data":[{"key":"
> requirements_file","namespace":"apache_beam.options.pipeline_options.
> PipelineOptions","type":"STRING","value":"conf/requirements.txt"},{"key":"
> runner","namespace":"apache_beam.options.pipeline_options.
> PipelineOptions","type":"STRING","value":"DataflowRunner"},{"key":"
> staging_location","namespace":"apache_beam.options.pipeline_
> options.PipelineOptions","type":"STRING","value":"gs://
> dataflow-run/staging/beamapp-sebastien-0604164610-664715.
> 1496594770.664848"},{"key":"project","namespace":"apache_
> beam.options.pipeline_options.PipelineOptions","type":"
> STRING","value":"<project_id>"},{"key":"temp_location","
> namespace":"apache_beam.options.pipeline_options.PipelineOptions","type":"
> STRING","value":"gs://dataflow-run/temp/beamapp-
> sebastien-0604164610-664715.1496594770.664848"},{"key":"
> setup_file","namespace":"apache_beam.options.pipeline_
> options.PipelineOptions","type":"STRING","value":"./
> setup.py"},{"key":"job_name","namespace":"apache_beam.
> options.pipeline_options.PipelineOptions","type":"
> STRING","value":"beamapp-sebastien-0604164610-664715"}],"options":{"
> autoscalingAlgorithm":"NONE","dataflowJobId":"2017-06-04_09_
> 46_51-11252670683207760193","dataflow_endpoint":"https://
> dataflow.googleapis.com","direct_runner_use_stacked_
> bundle":true,"gcpTempLocation":"gs://dataflow-run/temp/
> beamapp-sebastien-0604164610-664715.1496594770.664848","
> job_name":"beamapp-sebastien-0604164610-664715","
> maxNumWorkers":0,"no_auth":false,"numWorkers":3,"
> pipeline_type_check":true,"profile_cpu":false,"profile_
> memory":false,"project":"<project_id>","region":"us-
> central1","requirements_file":"conf/requirements.txt","
> runner":"DataflowRunner","runtime_type_check":false,"
> save_main_session":false,"sdk_location":"default","setup_
> file":"./setup.py","staging_location":"gs://dataflow-run/
> staging/beamapp-sebastien-0604164610-664715.1496594770.
> 664848","streaming":false,"temp_location":"gs://dataflow-
> run/temp/beamapp-sebastien-0604164610-664715.1496594770.
> 664848","type_check_strictness":"DEFAULT_TO_ANY"}}
> 19:05:25.000 Traceback (most recent call last):
> 19:05:25.000 File "/usr/lib/python2.7/runpy.py", line 162, in
> _run_module_as_main
> 19:05:25.000
> 19:05:25.000 "__main__", fname, loader, pkg_name)
> 19:05:25.000 File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> 19:05:25.000
> 19:05:25.000 exec code in run_globals
> 19:05:25.000 File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/start.py",
> line 26, in <module>
> 19:05:25.000
> 19:05:25.000 from dataflow_worker import batchworker
> 19:05:25.000 File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py",
> line 61, in <module>
> 19:05:25.000
> 19:05:25.000 from apitools.base.py.exceptions import HttpError
> 19:05:25.000 File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/__init__.py",
> line 23, in <module>
> 19:05:25.000
> 19:05:25.000 from apitools.base.py.credentials_lib import *
> 19:05:25.000 File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/credentials_lib.py",
> line 34, in <module>
> 19:05:25.000
> 19:05:25.000 from apitools.base.py import exceptions
> 19:05:25.000 ImportError
> 19:05:25.000 :
> 19:05:25.000 cannot import name exceptions
> 19:05:25.000 /usr/bin/python failed with exit status 1
>
> My code is working fine on local runner, but it lasts 2 hours then crash
> on dataflow runner. What is this error?
>
> Thanks by advance,
> Sébastien
>
>
> ------------------------------------------------------------
> --------------------------------
> This e-mail transmission (message and any attached files) may contain
> information that is proprietary, privileged and/or confidential to Veolia
> Environnement and/or its affiliates and is intended exclusively for the
> person(s) to whom it is addressed. If you are not the intended recipient,
> please notify the sender by return e-mail and delete all copies of this
> e-mail, including all attachments. Unless expressly authorized, any use,
> disclosure, publication, retransmission or dissemination of this e-mail
> and/or of its attachments is strictly prohibited.
>
> Ce message electronique et ses fichiers attaches sont strictement
> confidentiels et peuvent contenir des elements dont Veolia Environnement
> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
> message par erreur, merci de le retourner a son emetteur et de le detruire
> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
> publication, la distribution, ou la reproduction non expressement
> autorisees de ce message et de ses pieces attachees sont interdites.
> ------------------------------------------------------------
> --------------------------------
>