You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/21 21:19:45 UTC

[GitHub] [beam] lostluck opened a new issue, #22404: [Task][Go SDK] Define behavior of timestamp reading iterables for GBK value streams

lostluck opened a new issue, #22404:
URL: https://github.com/apache/beam/issues/22404

   ### What needs to happen?
   
   Timestamp metadata for individual elements is lost after sending values through a GroupByKey. This is not an issue, and is WAI as a consequence of the post GBK coder (WindowedValueHeader+KV<K, Iter<V>>), as the runner has combined timestamps.
   
   However, the Go SDK permits using a `func(*beam.EventTime, *V) bool` for iterating over values , will produce 0 value event times. This is confusing behavior.
   
   The task has two components:
   First is to document in the programming guide and the GroupByKey godoc that the timestamps for values will not be available after the GBK.
   Second is to make the current behavior more sensible: Populate such iterators with the combined timestamp for the key, instead of leaving it with the zero value.
   
   That would happen here for relfective iterators: https://github.com/apache/beam/blob/66ffc0b9fe99ba7d305d00c2f93e29979b1b2123/sdks/go/pkg/beam/core/runtime/exec/input.go#L173
   
   This form of iterator isn't yet supported with the `register` package, but it would also need to be supported there once implemented.
   https://github.com/apache/beam/blob/66ffc0b9fe99ba7d305d00c2f93e29979b1b2123/sdks/go/pkg/beam/register/iter.go
   
   This is separate from whether the SDK supports specifying timestamp combine strategies for a window. Presently it does not, and should default to "End Of Window" for a given Post GBK element.
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: sdk-go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #22404: [Task][Go SDK] Define behavior of timestamp reading iterables for GBK value streams

Posted by GitBox <gi...@apache.org>.
lostluck commented on issue #22404:
URL: https://github.com/apache/beam/issues/22404#issuecomment-1192052183

   Discussing this with @robertwb and @reuvenlax indicates that this is something incorrect in the Go SDK, and shouldn't be included. 
   
   By the Model and the Coders, the timestamps for individual values are either combined (in the GBK case) or dropped after being used for window assignment (for Side inputs).
   
   So ultimately, instead we need to simply fail these pipelines at construction time (with an error referring to this issue and explain this part of the model). 
   
   There are also some bad tests & documentation that need correction.
   
   https://github.com/apache/beam/blob/8b213c617ef8cf3a077bb0002b6b0fec8e85cb05/sdks/go/pkg/beam/core/funcx/sideinput.go#L59
   
   https://github.com/apache/beam/blob/273613dbdf2566a3975d5f47f581a861a91128b8/sdks/go/pkg/beam/core/funcx/sideinput_test.go#L34
   
   legacy template:
   https://github.com/apache/beam/blob/410ad7699621e28433d81809f6b9c42fe7bd6a60/sdks/go/pkg/beam/core/runtime/exec/optimized/inputs.tmpl#L33


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey closed issue #22404: [Task][Go SDK] Disallow iterators with initial timestamp values (ET,V) and (ET, K,V) -> not in the beam model.

Posted by GitBox <gi...@apache.org>.
jrmccluskey closed issue #22404: [Task][Go SDK] Disallow iterators with initial timestamp values (ET,V) and (ET, K,V) -> not in the beam model.
URL: https://github.com/apache/beam/issues/22404


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org