You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/30 01:35:52 UTC

[GitHub] [beam] nbali opened a new issue, #22951: [Bug]: KafkaIO could fail with BigQueryIO.Write.withAutoSharding()

nbali opened a new issue, #22951:
URL: https://github.com/apache/beam/issues/22951

   ### What happened?
   
   BigQueryIO.Write.withAutoSharding() uses GroupIntoBatches.withShardedKey(), which uses 'workerUuid' and 'threadId' as the sharding key. According to my understanding the problem is that the Kafka consumer read in KafkaIO.Read for a single partition most likely happens without any parallelism on the same worker on the same thread as it's being read in a FIFO manner due to the offset. This essentially means that .withShardedKey() has no effect whatsoever.
   
   Although there is a 'FILE_TRIGGERING_BATCHING_DURATION' with '1s' duration, and a 'FILE_TRIGGERING_RECORD_COUNT' with '500000' count - and both triggers grouping, it still means if we are under 500k elements, and under 1s it will try to fire them at once. It is totally possible - with sufficiently high throughput, or 'outputWithTimestamp' that we have 500k elements in a single sec). This could result in OOME.
   
   We should also have a size limit, not only time and count.
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: io-java-gcp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] nbali commented on issue #22951: [Bug]: KafkaIO could fail with BigQueryIO.Write.withAutoSharding()

Posted by GitBox <gi...@apache.org>.
nbali commented on issue #22951:
URL: https://github.com/apache/beam/issues/22951#issuecomment-1231042640

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] nbali commented on issue #22951: [Bug]: KafkaIO could fail with BigQueryIO.Write.withAutoSharding()

Posted by GitBox <gi...@apache.org>.
nbali commented on issue #22951:
URL: https://github.com/apache/beam/issues/22951#issuecomment-1243712270

   Note to self: the same happens with https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik closed issue #22951: [Bug]: KafkaIO could fail with BigQueryIO.Write.withAutoSharding()

Posted by GitBox <gi...@apache.org>.
lukecwik closed issue #22951: [Bug]: KafkaIO could fail with BigQueryIO.Write.withAutoSharding()
URL: https://github.com/apache/beam/issues/22951


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org