You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "ee07dazn (via GitHub)" <gi...@apache.org> on 2023/05/08 09:44:51 UTC

[GitHub] [beam] ee07dazn opened a new issue, #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection and Sliding window

ee07dazn opened a new issue, #26583:
URL: https://github.com/apache/beam/issues/26583

   ### What happened?
   
   Take an unbounded pcollection
   Attach event timestamp and sliding window 
   Per element, in order to find count of elements with same id over the last X hr/min/sec.. this will require CombinePerKey and CoGroupByKey at different stages
   Write the output of CoGroupByKey to GCS 
   
   If I write the output to BQ, it works fine but with GCS, it gives the following error
   
   `ValueError: GroupByKey cannot be applied to an unbounded PCollection with global windowing and a default trigger` 
   
   I am not even using a global window but instead a sliding window so this definitely seems like a bug.
   
   
   ### Issue Priority
   
   Priority: 3 (minor)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [X] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1866714680

   cc @robertwb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]

Posted by "dermasmid (via GitHub)" <gi...@apache.org>.
dermasmid commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1865274366

   this is where the window is set to global https://github.com/apache/beam/blob/140dd1458eb465c634f0bcd50622e5e807d183ad/sdks/python/apache_beam/io/iobase.py#L1149


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ee07dazn commented on issue #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection and Sliding window

Posted by "ee07dazn (via GitHub)" <gi...@apache.org>.
ee07dazn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1538073156

   .add-labels python,dataflow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1540292544

   @johnjcasey do you know if we document anywhere which sources and sinks support streaming and whether we plan to extend support to more sources/sinks in the future?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1866265677

   Thanks for additional input. If you have a complete pipeline handy, feel free to post it -that might help save some cycles for putting the repro together.
   
   cc: @johnjcasey who might have feedback on IO internals.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ee07dazn commented on issue #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection and Sliding window

Posted by "ee07dazn (via GitHub)" <gi...@apache.org>.
ee07dazn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1538074910

   I can attach a code but i think the problem is fairly fundamental where it is not understanding that the window attached is `sliding` and not `global`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1540797871

   GCS IO should work in streaming context. @ee07dazn could you please share a (preferably minimal) example pipeline that reproduces the issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #26583: [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1540800505

   taking a closer look, most likely the issue is not w/ GCS io, but with the fact that at the time aggregation is happening the global window, and not the sliding window, is assigned to elements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1866480255

   cc @udim Can you check this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]

Posted by "dermasmid (via GitHub)" <gi...@apache.org>.
dermasmid commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1865199137

   im having the same issue, setting --allow_unsafe_triggers allowed me to bypass this for now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: Google Cloud Storage Sink not working with unbounded pcollection using Sliding window and GrouByKey [beam]

Posted by "dermasmid (via GitHub)" <gi...@apache.org>.
dermasmid commented on issue #26583:
URL: https://github.com/apache/beam/issues/26583#issuecomment-1865280901

   modifying the code to `core.WindowInto(pcoll.windowing)` fixed it, but im not real sure what side effect that would have


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org