You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 19:57:41 UTC

[GitHub] [beam] damccorm opened a new issue, #20854: Dataflow Jobs keep failing with FileNotFoundError: [Errno 2] Not found: gs://tmp.../beamapp..../tmp-27400e24c0c31bc1-00000-of-00001.avro

damccorm opened a new issue, #20854:
URL: https://github.com/apache/beam/issues/20854

   I am processing up to a 1000 files .......xml.gz
   When I run a sample of 128 256, and 512 it works but not always.
   I have used between 8 and 512 workers. It seems anytime the job runs for longer then 30 minutes the job fails with FileNotFoundError: errot related to fastavro. 
   ```
   
           lines = (
                   p1
                   | "Get name" >> beam.Create(names[(no_of_files
   * (i - 1)) // no_of_jobs: (no_of_files * i) // no_of_jobs])
                   | "Read from cloud" >>
   beam.ParDo(ReadGCS())
                   | "Parse into JSON" >> beam.ParDo(ParseXML())
                
     | "Get Medline" >> beam.ParDo(GetMedline())
                   | "Build Json" >> beam.ParDo(JsonBuilder())
   
                  | "Write elements" >> beam.io.WriteToBigQuery(table=table_ref,
                       
                                            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
   
                                                                write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
   
                                                                schema="SCHEMA_AUTODETECT",
            
                                                       insert_retry_strategy=RetryStrategy.RETRY_ALWAYS,
   
                                                                ignore_insert_ids=True, validate=False)
   
          )
   
   ```
   
   
   
   Imported from Jira [BEAM-12101](https://issues.apache.org/jira/browse/BEAM-12101). Original Jira may contain additional context.
   Reported by: xct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn closed issue #20854: Dataflow Jobs keep failing with FileNotFoundError: [Errno 2] Not found: gs://tmp.../beamapp..../tmp-27400e24c0c31bc1-00000-of-00001.avro

Posted by GitBox <gi...@apache.org>.
tvalentyn closed issue #20854: Dataflow Jobs keep failing with FileNotFoundError: [Errno 2] Not found: gs://tmp.../beamapp..../tmp-27400e24c0c31bc1-00000-of-00001.avro
URL: https://github.com/apache/beam/issues/20854


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #20854: Dataflow Jobs keep failing with FileNotFoundError: [Errno 2] Not found: gs://tmp.../beamapp..../tmp-27400e24c0c31bc1-00000-of-00001.avro

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #20854:
URL: https://github.com/apache/beam/issues/20854#issuecomment-1205360273

   Sounds like something that could be investigated through Dataflow customer support as it would require additional details and customer involvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org