You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "andreigurau (via GitHub)" <gi...@apache.org> on 2023/04/10 14:31:54 UTC

[GitHub] [beam] andreigurau opened a new issue, #26196: [Feature Request]: Refresh side input from BigQuery

andreigurau opened a new issue, #26196:
URL: https://github.com/apache/beam/issues/26196

   ### What would you like to happen?
   
   When you join a PubSub stream with BigQuery using side input, the side input data is loaded once and stays for the lifetime of the dataflow job, however there is no way to refresh the cache. Add the ability to be able to refresh the cache
   
   ### Issue Priority
   
   Priority: 3 (nice-to-have improvement)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [X] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Feature Request]: Refresh side input from BigQuery [beam]

Posted by "BostjanBozic (via GitHub)" <gi...@apache.org>.
BostjanBozic commented on issue #26196:
URL: https://github.com/apache/beam/issues/26196#issuecomment-2006766022

   Just a question - if you use global window (since Pub/Sub would be unbounded source), would `PeriodicImpulse` come into play or would you need to use `GenerateSequence` for that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Feature Request]: Refresh side input from BigQuery [beam]

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on issue #26196:
URL: https://github.com/apache/beam/issues/26196#issuecomment-2016664608

   https://github.com/apache/beam/blob/4208b86fdd117ed875973f4639293dc574ab15bb/sdks/python/apache_beam/io/gcp/bigquery.py#L99 should work. Have you tried this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Feature Request]: Refresh side input from BigQuery [beam]

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad closed issue #26196: [Feature Request]: Refresh side input from BigQuery 
URL: https://github.com/apache/beam/issues/26196


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Feature Request]: Refresh side input from BigQuery [beam]

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on issue #26196:
URL: https://github.com/apache/beam/issues/26196#issuecomment-2017972706

   This is supported by #13170 with `ReadAllFromBigQuery`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #26196: [Feature Request]: Refresh side input from BigQuery

Posted by "lostluck (via GitHub)" <gi...@apache.org>.
lostluck commented on issue #26196:
URL: https://github.com/apache/beam/issues/26196#issuecomment-1505629546

   A solution for this is coming in 2.47.0 (the in progress release).
   
   The solution for this is to use the [Slowly Updating Side Inputs pattern](https://beam.apache.org/documentation/patterns/side-inputs/) , using [PeriodicImpulse](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/transforms/periodic/periodic.go#L118). 
   
   After the release, the examples in the linked doc will be updated accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Feature Request]: Refresh side input from BigQuery [beam]

Posted by "BostjanBozic (via GitHub)" <gi...@apache.org>.
BostjanBozic commented on issue #26196:
URL: https://github.com/apache/beam/issues/26196#issuecomment-2017498974

   @liferoad thanks for the help here. I was first trying to fetch info using just plain `ReadFromBigQuery`, but it seems that `ReadFromBigQueryRequest` together with `ReadAllFromBigQuery` is required. This seems to be fetching data (does not work locally though, only when running on Dataflow, which is also stated in code), but I guess the output is a bit different then if we would just use `ReadFromBigQuery`, so I will have to troubleshoot this part. But at least read works for now :)
   
   ```
   lookup_table = (
       pipeline
       | "Retrigger BigQuery Data Read" >> PeriodicImpulse(fire_interval=600)
       | "Prepare BigQuery Data Read"
       >> beam.Map(
           lambda x: ReadFromBigQueryRequest(
               query="SELECT col1, col2, col3 FROM test.test_table;",
               use_standard_sql=True,
               flatten_results=False,
           )
       )
       | "Reading BigQuery Data"
       >> ReadAllFromBigQuery(
           temp_dataset="temp"
       )
       | "Reshuffling BigQuery Data" >> beam.Reshuffle()
       | "Keying BigQuery Data" >> beam.Map(lambda row: (row.get("col1"), row))
   )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org