You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "Abacn (via GitHub)" <gi...@apache.org> on 2023/04/19 18:43:47 UTC

[GitHub] [beam] Abacn opened a new issue, #26354: [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true

Abacn opened a new issue, #26354:
URL: https://github.com/apache/beam/issues/26354

   ### What happened?
   
   The feature is introduced in #25392. When `--setEnableBundling=true` pipeline option is set, it turns out that BigQueryIO only reads a small fraction of row for large table. Reproduced reading `tpcds_1T.web_sales` table.
   
   Number of rows: 720,000,376
   `--setEnableBundling=true`: 44,550,489 rows read
   `--setEnableBundling=false`: 720,000,376 rows read
   
   Reading from `tpcds_1G.web_sales` table, the issue is not triggered, as 18,000 rows are read.
   
   
   ### Issue Priority
   
   Priority: 1 (data loss / total loss of function)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true [beam]

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn closed issue #26354: [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true
URL: https://github.com/apache/beam/issues/26354


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #26354: [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #26354:
URL: https://github.com/apache/beam/issues/26354#issuecomment-1526158083

   The feature is introduced in Beam v2.46.0 and is activated only when this currently undocumented pipeline option is set. No production user is using it. @vachan-shetty is working on fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #26354: [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #26354:
URL: https://github.com/apache/beam/issues/26354#issuecomment-1533507451

   Is this related to #26521 or no?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #26354: [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #26354:
URL: https://github.com/apache/beam/issues/26354#issuecomment-1533514900

   @kennknowles thanks, will test if #26503 fixed the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #26354: [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #26354:
URL: https://github.com/apache/beam/issues/26354#issuecomment-1515242986

   Affecting >=v2.46.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #26354: [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #26354:
URL: https://github.com/apache/beam/issues/26354#issuecomment-1526088345

   This seems like a pretty severe issue. Any progress?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org