You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Rajnil Guha <ra...@gmail.com> on 2021/03/29 19:31:00 UTC

[Question] -- Getting error while writing data into Big Query from Dataflow -- "Clients have non-trivial state that is local and unpickleable.", _pickle.PicklingError: Pickling client objects is explicitly not supported. Clients have non-trivial state that is local and unpickleable.

Hi Beam Community,

I am running a Dataflow pipeline using the Python SDK. I am doing do some ETL processing on my data and then write the output into Big Query. When I try to write into Big Query I get below error in Dataflow job. However when running this pipeline from my local on DirectRunner the same code runs successfully and data is written into Big Query.

 "Clients have non-trivial state that is local and unpickleable.",
_pickle.PicklingError: Pickling client objects is explicitly not supported.
Clients have non-trivial state that is local and unpickleable.

I have added the full traceback in the attached file.
I am trying to write data into Big Query as below:-

write_delivered_orders = (delivered_orders
                             | "ConvertDeliveredToJSON" >> beam.Map(to_json) 
                             | "WriteDeliveredOrders" >> beam.io.WriteToBigQuery(
                                 delivered_order_table_spec,
                                 schema=table_schema,
                                 create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
                                 write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
                                 additional_bq_parameters={'timePartitioning': {'type': 'DAY'}}
                             )
                             )

Has anyone encountered this error, then can you please help me to understand and resolve it.
Thanks in Advance.

Thanks & Regards
Rajnil Guha


Re: [Question] -- Getting error while writing data into Big Query from Dataflow -- "Clients have non-trivial state that is local and unpickleable.", _pickle.PicklingError: Pickling client objects is explicitly not supported. Clients have non-trivial state that is local and unpickleable.

Posted by Rajnil Guha <ra...@gmail.com>.
I tried both as mentioned below:-

1. Tried importing the modules locally in DoFns and functions and then run the pipeline with “save_main_session” as True.
2. Imported modules globally and created a requirements.txt file to set the dependencies and ran the pipeline as below with “save_main_session” as False.
	python batch_pipeline.py --project=$PROJECT --region=us-central1 --runner=DataflowRunner --staging=gs://dataflow-code-bucket/test --temp_location gs://dataflow-code-bucket/test --input gs://dataflow-code- <gs://dataflow-code->		  bucket/input-files/food_daily.csv requirements_file requirements.txt --save_main_session False

Both times the pipeline fails with the same pickling error.
The requirements.txt file is attached.



Thanks & Regards
Rajnil Guha

> On 30-Mar-2021, at 1:18 AM, Chamikara Jayalath <ch...@google.com> wrote:
> 
> This might be relevant: https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors <https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors>
> 
> If "save_main_session" is set Dataflow tries to pickle the main session. So you might have to define such objects locally (for example, within functions, DoFn classes, etc.) or update the pipeline to not set "save_main_session" (and set dependencies according to this <https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/> guide if needed).
> 
> Thanks,
> Cham
> 
> On Mon, Mar 29, 2021 at 12:31 PM Rajnil Guha <rajnil94.guha@gmail.com <ma...@gmail.com>> wrote:
> Hi Beam Community,
> 
> I am running a Dataflow pipeline using the Python SDK. I am doing do some ETL processing on my data and then write the output into Big Query. When I try to write into Big Query I get below error in Dataflow job. However when running this pipeline from my local on DirectRunner the same code runs successfully and data is written into Big Query.
> 
>  "Clients have non-trivial state that is local and unpickleable.",
> _pickle.PicklingError: Pickling client objects is explicitly not supported.
> Clients have non-trivial state that is local and unpickleable.
> 
> I have added the full traceback in the attached file.
> I am trying to write data into Big Query as below:-
> 
> write_delivered_orders = (delivered_orders
>                              | "ConvertDeliveredToJSON" >> beam.Map(to_json) 
>                              | "WriteDeliveredOrders" >> beam.io.WriteToBigQuery(
>                                  delivered_order_table_spec,
>                                  schema=table_schema,
>                                  create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>                                  write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>                                  additional_bq_parameters={'timePartitioning': {'type': 'DAY'}}
>                              )
>                              )
> 
> Has anyone encountered this error, then can you please help me to understand and resolve it.
> Thanks in Advance.
> 
> Thanks & Regards
> Rajnil Guha
> 


Re: [Question] -- Getting error while writing data into Big Query from Dataflow -- "Clients have non-trivial state that is local and unpickleable.", _pickle.PicklingError: Pickling client objects is explicitly not supported. Clients have non-trivial state that is local and unpickleable.

Posted by Chamikara Jayalath <ch...@google.com>.
This might be relevant:
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors

If "save_main_session" is set Dataflow tries to pickle the main session. So
you might have to define such objects locally (for example, within
functions, DoFn classes, etc.) or update the pipeline to not set
"save_main_session" (and set dependencies according to this
<https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/>
guide if needed).

Thanks,
Cham

On Mon, Mar 29, 2021 at 12:31 PM Rajnil Guha <ra...@gmail.com>
wrote:

> Hi Beam Community,
>
> I am running a Dataflow pipeline using the Python SDK. I am doing do some
> ETL processing on my data and then write the output into Big Query. When I
> try to write into Big Query I get below error in Dataflow job. However when
> running this pipeline from my local on DirectRunner the same code runs
> successfully and data is written into Big Query.
>
>  "Clients have non-trivial state that is local and unpickleable.",
> _pickle.PicklingError: Pickling client objects is explicitly not supported.
> Clients have non-trivial state that is local and unpickleable.
>
> I have added the full traceback in the attached file.
> I am trying to write data into Big Query as below:-
>
> write_delivered_orders = (delivered_orders
>                              | "ConvertDeliveredToJSON" >>
> beam.Map(to_json)
>                              | "WriteDeliveredOrders" >>
> beam.io.WriteToBigQuery(
>                                  delivered_order_table_spec,
>                                  schema=table_schema,
>
>  create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>
>  write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>
>  additional_bq_parameters={'timePartitioning': {'type': 'DAY'}}
>                              )
>                              )
>
> Has anyone encountered this error, then can you please help me to
> understand and resolve it.
> Thanks in Advance.
>
> Thanks & Regards
> Rajnil Guha
>
>