You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "bastewart (via GitHub)" <gi...@apache.org> on 2023/01/31 16:30:24 UTC

[GitHub] [beam] bastewart opened a new issue, #25233: [Feature Request]: Log failed rows in BigQuery Storage Write

bastewart opened a new issue, #25233:
URL: https://github.com/apache/beam/issues/25233

   ### What would you like to happen?
   
   Currently rows which fail to write with the BigQuery Storage API are written to an output `PCollection` (via [`WriteResult`](https://github.com/apache/beam/blob/634b0453469b66ee4c135aca48b02d2425916f36/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.java)), but otherwise are silent. It'd be useful if these could be automatically logged to make it obvious when failures occur without having to consume the `WriteResult`.
   
   I think the Streaming API currently logs on failures and retries*, it'd be helpful to match this behaviour!
   
   *I think [this is it, in `BigQueryServiceImpl`,](https://github.com/apache/beam/blob/634b0453469b66ee4c135aca48b02d2425916f36/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L994-L997) but it may be somewhere else.
   
   ### Issue Priority
   
   Priority: 3 (nice-to-have improvement)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on issue #25233: [Feature Request]: Log failed rows in BigQuery Storage Write

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on issue #25233:
URL: https://github.com/apache/beam/issues/25233#issuecomment-1420112452

   Do you mean textually logging these rows? I don't think this is doable - there are users who have large numbers of such rows, and textual logging is designed for low throughput. 
   
   We also try to avoid textually logging pipeline data, as some Beam users are concerned about private information appearing in textual logs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org