You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Dhiraj Sardana <Dh...@netent.com> on 2019/07/09 09:53:49 UTC

Apache Beam issue | Reading Avro files and pushing to Bigquery

Hello,

We’re using Apache Beam with Google dataflow. We’ve a pipeline which reads data from google storage (Avro files), transforms the same and pushes to Bigquery.

Error we’re getting: Pipelines processes the data for some time and then stuck and in the ends, fails (and data does not reach the Bigquery):

Here is error snippet:
Workflow failed. Causes: S92:AvroIO.ReadAll/Read all via FileBasedSource/Reshuffle/Reshuffle/GroupBy...
Workflow failed. Causes: S92:AvroIO.ReadAll/Read all via FileBasedSource/Reshuffle/Reshuffle/GroupByKey/Read+AvroIO.ReadAll/Read all via FileBasedSource/Reshuffle/Reshuffle/GroupByKey/GroupByWindow+AvroIO.ReadAll/Read all via FileBasedSource/Reshuffle/Reshuffle/ExpandIterable+AvroIO.ReadAll/Read all via FileBasedSource/Reshuffle/Values/Values/Map+AvroIO.ReadAll/Read all via FileBasedSource/Read ….

Code snippet:
PCollection<String> records = pipeline.apply(Create.of(fileList)).setCoder(StringUtf8Coder.of());
PCollection<GenericRecord> events = records.apply(AvroIO.readAllGenericRecords(userDefinedSchema));

Apache Beam version: We tried with 2.12.0 & 2.8.0 both, but same error.

It would be really helpful if we can get some hints/solution to this problem and let us know for more info.


Regards,
Dhiraj


Dhiraj Sardana
JEE Developer
────────────────
NetEnt | Better Gaming™
T: +46 760 024 812<tel:+46%20760%20024%20812>, M: +46 760 024 812<tel:+46%20760%20024%20812>
Dhiraj.Sardana@netent.com<ma...@netent.com>, www.netent.com
Address: NetEnt AB (publ), Vasagatan 16, 111 20, Stockholm, SE
This email and any attachments are confidential and may be legally privileged and protected by copyright. If you are not the intended recipient of this email you should not copy it or disclose its contents to anyone. If you have received this email in error, please notify the sender immediately and delete the email. Views or opinions in this email are solely those of the author. Unencrypted Internet communications are not secure and the sender does not accept responsibility for interception of this message by third parties. This communication is not intended to form a binding contract unless expressly indicated to the contrary and properly authorized. The recipient should scan this email and any attachments for the presence of viruses. The sender accepts no liability for any viruses transmitted in this email.

Re: Apache Beam issue | Reading Avro files and pushing to Bigquery

Posted by Lukasz Cwik <lc...@google.com>.
+user <us...@beam.apache.org> (please use user@ for questions about using
the product and restrict to dev@ for questions related to developing the
product).

Can you provide the rest of the failing reason (and any stacktraces from
the workers related to the failures)?

On Tue, Jul 9, 2019 at 11:04 AM Dhiraj Sardana <Dh...@netent.com>
wrote:

> Hello,
>
>
>
> We’re using Apache Beam with Google dataflow. We’ve a pipeline which reads
> data from google storage (Avro files), transforms the same and pushes to
> Bigquery.
>
>
>
> Error we’re getting: Pipelines processes the data for some time and then
> stuck and in the ends, fails (and data does not reach the Bigquery):
>
>
>
> Here is error snippet:
>
> Workflow failed. Causes: S92:AvroIO.ReadAll/Read all via
> FileBasedSource/Reshuffle/Reshuffle/GroupBy...
>
> Workflow failed. Causes: S92:AvroIO.ReadAll/Read all via
> FileBasedSource/Reshuffle/Reshuffle/GroupByKey/Read+AvroIO.ReadAll/Read all
> via
> FileBasedSource/Reshuffle/Reshuffle/GroupByKey/GroupByWindow+AvroIO.ReadAll/Read
> all via
> FileBasedSource/Reshuffle/Reshuffle/ExpandIterable+AvroIO.ReadAll/Read all
> via FileBasedSource/Reshuffle/Values/Values/Map+AvroIO.ReadAll/Read all via
> FileBasedSource/Read ….
>
>
>
> Code snippet:
>
> PCollection<String> records = pipeline.apply(Create.*of*
> (fileList)).setCoder(StringUtf8Coder.*of*());
> PCollection<GenericRecord> events = records.apply(AvroIO.
> *readAllGenericRecords*(userDefinedSchema));
>
>
>
> Apache Beam version: We tried with 2.12.0 & 2.8.0 both, but same error.
>
>
>
> It would be really helpful if we can get some hints/solution to this
> problem and let us know for more info.
>
>
>
>
>
> Regards,
>
> Dhiraj
>
>
> Dhiraj Sardana
> JEE Developer
> ────────────────
> *NetEnt* | Better Gaming™
> T: +46 760 024 812 <+46%20760%20024%20812>, M: +46 760 024 812
> <+46%20760%20024%20812>
> Dhiraj.Sardana@netent.com, www.netent.com
> Address: NetEnt AB (publ), Vasagatan 16, 111 20, Stockholm, SE
> This email and any attachments are confidential and may be legally
> privileged and protected by copyright. If you are not the intended
> recipient of this email you should not copy it or disclose its contents to
> anyone. If you have received this email in error, please notify the sender
> immediately and delete the email. Views or opinions in this email are
> solely those of the author. Unencrypted Internet communications are not
> secure and the sender does not accept responsibility for interception of
> this message by third parties. This communication is not intended to form a
> binding contract unless expressly indicated to the contrary and properly
> authorized. The recipient should scan this email and any attachments for
> the presence of viruses. The sender accepts no liability for any viruses
> transmitted in this email.
>

Re: Apache Beam issue | Reading Avro files and pushing to Bigquery

Posted by Lukasz Cwik <lc...@google.com>.
+user <us...@beam.apache.org> (please use user@ for questions about using
the product and restrict to dev@ for questions related to developing the
product).

Can you provide the rest of the failing reason (and any stacktraces from
the workers related to the failures)?

On Tue, Jul 9, 2019 at 11:04 AM Dhiraj Sardana <Dh...@netent.com>
wrote:

> Hello,
>
>
>
> We’re using Apache Beam with Google dataflow. We’ve a pipeline which reads
> data from google storage (Avro files), transforms the same and pushes to
> Bigquery.
>
>
>
> Error we’re getting: Pipelines processes the data for some time and then
> stuck and in the ends, fails (and data does not reach the Bigquery):
>
>
>
> Here is error snippet:
>
> Workflow failed. Causes: S92:AvroIO.ReadAll/Read all via
> FileBasedSource/Reshuffle/Reshuffle/GroupBy...
>
> Workflow failed. Causes: S92:AvroIO.ReadAll/Read all via
> FileBasedSource/Reshuffle/Reshuffle/GroupByKey/Read+AvroIO.ReadAll/Read all
> via
> FileBasedSource/Reshuffle/Reshuffle/GroupByKey/GroupByWindow+AvroIO.ReadAll/Read
> all via
> FileBasedSource/Reshuffle/Reshuffle/ExpandIterable+AvroIO.ReadAll/Read all
> via FileBasedSource/Reshuffle/Values/Values/Map+AvroIO.ReadAll/Read all via
> FileBasedSource/Read ….
>
>
>
> Code snippet:
>
> PCollection<String> records = pipeline.apply(Create.*of*
> (fileList)).setCoder(StringUtf8Coder.*of*());
> PCollection<GenericRecord> events = records.apply(AvroIO.
> *readAllGenericRecords*(userDefinedSchema));
>
>
>
> Apache Beam version: We tried with 2.12.0 & 2.8.0 both, but same error.
>
>
>
> It would be really helpful if we can get some hints/solution to this
> problem and let us know for more info.
>
>
>
>
>
> Regards,
>
> Dhiraj
>
>
> Dhiraj Sardana
> JEE Developer
> ────────────────
> *NetEnt* | Better Gaming™
> T: +46 760 024 812 <+46%20760%20024%20812>, M: +46 760 024 812
> <+46%20760%20024%20812>
> Dhiraj.Sardana@netent.com, www.netent.com
> Address: NetEnt AB (publ), Vasagatan 16, 111 20, Stockholm, SE
> This email and any attachments are confidential and may be legally
> privileged and protected by copyright. If you are not the intended
> recipient of this email you should not copy it or disclose its contents to
> anyone. If you have received this email in error, please notify the sender
> immediately and delete the email. Views or opinions in this email are
> solely those of the author. Unencrypted Internet communications are not
> secure and the sender does not accept responsibility for interception of
> this message by third parties. This communication is not intended to form a
> binding contract unless expressly indicated to the contrary and properly
> authorized. The recipient should scan this email and any attachments for
> the presence of viruses. The sender accepts no liability for any viruses
> transmitted in this email.
>