You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 19:30:54 UTC

[GitHub] [beam] kennknowles opened a new issue, #18625: BigQueryIO - Must specify numFileShards when using FILE_LOADS with unbounded PCollection

kennknowles opened a new issue, #18625:
URL: https://github.com/apache/beam/issues/18625

   Since Beam v2.2 it's possible to use [FILE_LOADS](https://beam.apache.org/documentation/sdks/javadoc/2.2.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html#FILE_LOADS). The documentation states that we have to specify _withTriggeringFrequency_ when using it, but doesn't talk about _withNumFileShards_, whereas if we don't specify it we get the below exception:
   
   ```
   
   Exception in thread "main" java.lang.IllegalArgumentException
           at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
   
          at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.expandTriggered(BatchLoads.java:209)
       
      at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.expand(BatchLoads.java:546)
           at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.expand(BatchLoads.java:79)
   
          at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
           at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:472)
   
          at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:286)
           at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.expandTyped(BigQueryIO.java:1550)
   
          at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.expand(BigQueryIO.java:1497)
         
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.expand(BigQueryIO.java:980)
           at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
   
          at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
           at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
   
          at com.travelaudience.data.job.rtbtobigquery.Main$.main(Main.scala:74)
           at com.travelaudience.data.job.rtbtobigquery.Main.main(Main.scala)
   
   ```
   
   
   Either default _numFileShards_ should be used or it should be precised in the documentation that this has to be set.
   
   Imported from Jira [BEAM-3766](https://issues.apache.org/jira/browse/BEAM-3766). Original Jira may contain additional context.
   Reported by: benjben.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org