You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/09 17:37:47 UTC

[GitHub] [beam] lostluck opened a new issue, #24949: [Task][Go SDK]: Add String UTF8 check to vet runner for serialization.

lostluck opened a new issue, #24949:
URL: https://github.com/apache/beam/issues/24949

   ### What needs to happen?
   
   A bug was found that if a user is converting arbitrary byte sequences to strings, to get around being unable to use `[]byte` as a key to a map. This leads to these strings to sometimes be non-UTF8 compliant, which will break on encoding/decoding.
   
   Eg. Converting the byte sequences like [2 208 15] or [2 239 191 189 15] to strings simply can't be round-tripped correctly as JSON, so the encoded and decoded values do not match.
   
   The check would be to recursively examine every exported field in a structural DoFn for use of `string`, and checking if it's utf8 compliant. The check could be skipped for subtypes that implement the MarshalJSON and UnmarshalJSON interface methods.
   
   The [vet runner](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/vet/vet.go#L61) which can be electively run before any given pipeline with the `--beam_strict` flag would be the appropriate place to add this sort of checking to avoid more expensive checks 100% of the time.
   
   ### Issue Priority
   
   Priority: 3 (nice-to-have improvement)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [ ] Component: Java SDK
   - [X] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tobehardest commented on issue #24949: [Task][Go SDK]: Add String UTF8 check to vet runner for serialization.

Posted by "tobehardest (via GitHub)" <gi...@apache.org>.
tobehardest commented on issue #24949:
URL: https://github.com/apache/beam/issues/24949#issuecomment-1491755986

   Can you assign this task to me? I want to try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #24949: [Task][Go SDK]: Add String UTF8 check to vet runner for serialization.

Posted by "lostluck (via GitHub)" <gi...@apache.org>.
lostluck commented on issue #24949:
URL: https://github.com/apache/beam/issues/24949#issuecomment-1492129995

   @tobehardest Done! In the future, you can self assign an issue by commenting `.take-issue` and a bot will handle it. See the Beam contribution guide for more! https://beam.apache.org/contribute


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org