You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/11 22:39:22 UTC

[GitHub] [beam] lostluck opened a new issue, #22704: [Feature Request]: Encode Keys once for combiner lifting.

lostluck opened a new issue, #22704:
URL: https://github.com/apache/beam/issues/22704

   ### What would you like to happen?
   
   The Go SDK's combiner lifting currently encodes keys twice for caching. First it encodes values into a hash function, and then it repeats it to get the []byte for byte by byte comparisons in the cache.
   
   https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/combine.go#L469
   
   https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/combine.go#L483
   
   For expensive keys, this can become very expensive without additional caching schemes.
   
   Ideally this encoding happens once, instead of twice for each key being looked up.
   
   Note: Windowed byte equality is important to use for final lookups because that's the only way to ensure equivalence to runners. Beam Model GBKs use byte equality for keys as their basis.
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: sdk-go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #22704: [Feature Request]: [Go SDK] Encode Keys once for combiner lifting.

Posted by GitBox <gi...@apache.org>.
lostluck commented on issue #22704:
URL: https://github.com/apache/beam/issues/22704#issuecomment-1212568330

   A solution would be to encode to a []byte, wrap it in a bytes.Reader, then use that for the hash, and then the raw bytes for the byte by byte lookup comparison (as we do now). I'm not sure if we can change enough of the hash APIs to use byte.Reader directly keep everything stack based though.
   
   This would be good to have a benchmark written and then a Before/After comparison of the change (even if the benchmark is new with the PR) to validate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org