You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Ahmet Altay <al...@google.com> on 2018/03/23 06:08:43 UTC

Re: Apache beam DataFlow runner throwing setup error

Hi Rajesh,

Have you looked at the worker-startup logs [1]? You should be able to see
the setup error there. It is possible that something in your requirements
file is failing to install in the workers. If that is the case,
see Managing Python Pipeline Dependencies [2] for alternative options. You
could also reach out to Google Cloud Dataflow support for getting
additional help [3]

Thank you,
Ahmet

[1]
https://cloud.google.com/dataflow/pipelines/logging#monitoring-pipeline-logs
[2] https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
[3] https://cloud.google.com/dataflow/support

On Thu, Mar 22, 2018 at 10:08 PM, Rajesh Hegde <rh...@datalicious.com>
wrote:

> Hi,
> We are building data pipeline using Beam Python SDK and trying to run on
> Dataflow, but getting the below error,
>
> *A setup error was detected in
> beamapp-xxxxyyyy-0322102737-03220329-8a74-harness-lm6v. Please refer to the
> worker-startup log for detailed information.*
>
> But could not find detailed worker-startup logs.
>
> We tried increasing memory size, worker count etc, but still getting the
> same error.
>
> Here is the command we use,
> *python run.py \*
> *--project=xyz \*
> *--runner=DataflowRunner \*
> *--staging_location=gs://xyz/staging \*
> *--temp_location=gs://xyz/temp \*
> *--requirements_file=requirements.txt \*
> *--worker_machine_type n1-standard-8 \*
> *--num_workers 2*
>
>
> pipeline snippet
>
> *data = pipeline | "load data" >> beam.io.Read(    *
> *    beam.io.BigQuerySource(query="SELECT * FROM abc_table LIMIT 100")*
> *)*
>
> *data | "filter data" >> beam.Filter(lambda x: x.get('column_name') ==
> value)*
>
>
> Above pipeline is just loading the data from BigQuery and filtering based
> on some column value. This pipeline works like a charm in DirectRunner but
> fails on Dataflow.
>
> Are we doing any obvious setup mistake? anyone else getting the same
> error? We could use some help to resolve the issue.
>
>
> --
>
> *Rajesh Hegde | Lead Product Developer | Datalicious*
> *e*: rhegde@datalicious.com | *m*: +919167571827 <+91%2091675%2071827>
> *a*: L-77, 15th Cross Rd, Sector 6, HSR Layout,
> <https://maps.google.com/?q=77,+15th+Cross+Rd,+Sector+6,+HSR+Layout,++Bangalore+Karnataka-+560102&entry=gmail&source=g>
> Bangalore Karnataka- 560102
> <https://maps.google.com/?q=77,+15th+Cross+Rd,+Sector+6,+HSR+Layout,++Bangalore+Karnataka-+560102&entry=gmail&source=g>
> *w*: www.datalicious.com
> <http://www.datalicious.com/?utm_source=signaturesatori&utm_medium=email&utm_campaign=signaturesatori>
>
> *Contact support@datalicious.com <su...@datalicious.com> anytime, we're
> keen to help!*
>
> <https://www.linkedin.com/company/datalicious-pty-ltd>
> <https://twitter.com/datalicious>   <https://www.facebook.com/Datalicious>
>    <https://plus.google.com/+Datalicious1>
>
>
> <https://www.datalicious.com/resources/facebook-people-based-measurement-attribution/?utm_source=signaturesatori&utm_medium=email&utm_campaign=signaturesatori>
>
>