You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "reuvenlax (via GitHub)" <gi...@apache.org> on 2023/01/22 06:20:44 UTC

[GitHub] [beam] reuvenlax opened a new pull request, #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

reuvenlax opened a new pull request, #25113:
URL: https://github.com/apache/beam/pull/25113

   https://github.com/apache/beam/issues/25112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1558075106

   run java precommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1399568660

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1399568608

   Run Spotless PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "johnjcasey (via GitHub)" <gi...@apache.org>.
johnjcasey commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1406969820

   @reuvenlax does BQ have an estimate for their fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] prodriguezdefino commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "prodriguezdefino (via GitHub)" <gi...@apache.org>.
prodriguezdefino commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1399623014

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1431481005

   @reuvenlax FYI, the 2.46 cut is a week from today - do you think this will make the cut? If not, I don't think it seems like a release blocker (for the same reasons mentioned above) - does that seem right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yirutang commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "yirutang (via GitHub)" <gi...@apache.org>.
yirutang commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1452531012

   > FYI - testing in production identified a bug in BigQuery that prevent merging of this PR. The BigQuery team is working on fixing this bug, and once this bug is fixed this PR can be merged.
   
   What is the bug?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "johnjcasey (via GitHub)" <gi...@apache.org>.
johnjcasey commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1403732435

   Run Java_GCP_IO_Direct PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax merged pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax merged PR #25113:
URL: https://github.com/apache/beam/pull/25113


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1399413425

   R: @prodriguezdefino 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1556313340

   Run Java_GCP_IO_Direct PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1557443865

   Run Java_GCP_IO_Direct PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] prodriguezdefino commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "prodriguezdefino (via GitHub)" <gi...@apache.org>.
prodriguezdefino commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1399614846

   > One thing to note: CreateTableHelpers keeps a process-wide static cache of tables known to exist, so it was never done on every bundle. The problem was. that when using at-least-once we had to check every table at least once on every worker, and the quota was too low for that.
   
   Maybe I misunderstood, but weren't we clearing the destination's map on every bundle start exec? 
   
   ```
   @StartBundle
        public void startBundle() {
          destinations = Maps.newHashMap();
        }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1399613804

   One thing to note: CreateTableHelpers keeps a process-wide static cache of tables known to exist, so it was never done on every bundle. The problem was. that when using at-least-once we had to check every table at least once on every worker, and the quota was too low for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1399618157

   The static map is in CreateTableHelpers.java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1403847507

   FYI - testing in production identified a bug in BigQuery that prevent merging of this PR. The BigQuery team is working on fixing this bug, and once this bug is fixed this PR can be merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "johnjcasey (via GitHub)" <gi...@apache.org>.
johnjcasey commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1403818398

   
   Run Java_GCP_IO_Direct PreCommit
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] prodriguezdefino commented on a diff in pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "prodriguezdefino (via GitHub)" <gi...@apache.org>.
prodriguezdefino commented on code in PR #25113:
URL: https://github.com/apache/beam/pull/25113#discussion_r1083545037


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CreateTableHelpers.java:
##########
@@ -46,8 +48,28 @@ public class CreateTableHelpers {
    */
   private static Set<String> createdTables = Collections.newSetFromMap(new ConcurrentHashMap<>());
 
+  // When CREATE_IF_NEEDED is specified, BQ tables should be created if they do not exist. This
+  // method detects
+  // errors on table operations, and attempts to create the table if necessary.
+  static void createTableWrapper(Callable<Void> action, Callable<Boolean> tryCreateTable)
+      throws Exception {
+    for (int retry = 0; retry <= 1; ++retry) {
+      try {
+        action.call();
+      } catch (ApiException | StatusRuntimeException e) {
+        // TODO: Once BigQuery reliably returns a consistent error on table not found, we should

Review Comment:
   Maybe an issues can be created in the client to track the feature request / improvement. 



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/CreateTableHelpers.java:
##########
@@ -46,8 +48,28 @@ public class CreateTableHelpers {
    */
   private static Set<String> createdTables = Collections.newSetFromMap(new ConcurrentHashMap<>());
 
+  // When CREATE_IF_NEEDED is specified, BQ tables should be created if they do not exist. This
+  // method detects
+  // errors on table operations, and attempts to create the table if necessary.
+  static void createTableWrapper(Callable<Void> action, Callable<Boolean> tryCreateTable)
+      throws Exception {
+    for (int retry = 0; retry <= 1; ++retry) {
+      try {
+        action.call();
+      } catch (ApiException | StatusRuntimeException e) {
+        // TODO: Once BigQuery reliably returns a consistent error on table not found, we should
+        // only try creating
+        // the table on that error.
+        boolean created = tryCreateTable.call();

Review Comment:
   not sure about this but, should this backoff before retrying? more considering that we are requesting the table info from BQ before sending the create request.
   
   I think we would want to resolve the table creation as fast as possible but as many bundle instances will try to do the same, almost at the same time, maybe backoff could save some create request if we wait some amount of time on some bundles (not sure, maybe randomly?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "johnjcasey (via GitHub)" <gi...@apache.org>.
johnjcasey commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1409095596

   It looks like the delay here is prohibitive to include in the release, and it has been an issue for several versions already, so this will be pushed to 2.46


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] reuvenlax commented on pull request #25113: #25112 No longer use GetTable to implement CREATE_IF_NEEDED to avoid low quotas

Posted by "reuvenlax (via GitHub)" <gi...@apache.org>.
reuvenlax commented on PR #25113:
URL: https://github.com/apache/beam/pull/25113#issuecomment-1556428286

   R: @yirutang BigQuery is still failing create stream after the table has been created - presumably still hitting caches. I added some extra retry/backoff here, which seems sufficient to make the test pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org