You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "ee07dazn (via GitHub)" <gi...@apache.org> on 2023/05/08 09:44:51 UTC
[GitHub] [beam] ee07dazn opened a new issue, #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection and Sliding window
ee07dazn opened a new issue, #26583:
URL: https://github.com/apache/beam/issues/26583
### What happened?
Take an unbounded pcollection
Attach event timestamp and sliding window
Per element, in order to find count of elements with same id over the last X hr/min/sec.. this will require CombinePerKey and CoGroupByKey at different stages
Write the output of CoGroupByKey to GCS
If I write the output to BQ, it works fine but with GCS, it gives the following error
`ValueError: GroupByKey cannot be applied to an unbounded PCollection with global windowing and a default trigger`
I am not even using a global window but instead a sliding window so this definitely seems like a bug.
### Issue Priority
Priority: 3 (minor)
### Issue Components
- [X] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [X] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]
Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1866714680
cc @robertwb
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]
Posted by "dermasmid (via GitHub)" <gi...@apache.org>.
dermasmid commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1865274366
this is where the window is set to global https://github.com/apache/beam/blob/140dd1458eb465c634f0bcd50622e5e807d183ad/sdks/python/apache_beam/io/iobase.py#L1149
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ee07dazn commented on issue #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection and Sliding window
Posted by "ee07dazn (via GitHub)" <gi...@apache.org>.
ee07dazn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1538073156
.add-labels python,dataflow
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on issue #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey
Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1540292544
@johnjcasey do you know if we document anywhere which sources and sinks support streaming and whether we plan to extend support to more sources/sinks in the future?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]
Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1866265677
Thanks for additional input. If you have a complete pipeline handy, feel free to post it -that might help save some cycles for putting the repro together.
cc: @johnjcasey who might have feedback on IO internals.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ee07dazn commented on issue #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection and Sliding window
Posted by "ee07dazn (via GitHub)" <gi...@apache.org>.
ee07dazn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1538074910
I can attach a code but i think the problem is fairly fundamental where it is not understanding that the window attached is `sliding` and not `global`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on issue #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey
Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1540797871
GCS IO should work in streaming context. @ee07dazn could you please share a (preferably minimal) example pipeline that reproduces the issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on issue #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey
Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1540800505
taking a closer look, most likely the issue is not w/ GCS io, but with the fact that at the time aggregation is happening the global window, and not the sliding window, is assigned to elements.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]
Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1866480255
cc @udim Can you check this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]
Posted by "dermasmid (via GitHub)" <gi...@apache.org>.
dermasmid commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1865199137
im having the same issue, setting --allow_unsafe_triggers allowed me to bypass this for now
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]
Posted by "dermasmid (via GitHub)" <gi...@apache.org>.
dermasmid commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1865280901
modifying the code to `core.WindowInto(pcoll.windowing)` fixed it, but im not real sure what side effect that would have
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org