You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/06/24 21:37:53 UTC

[GitHub] [beam] pabloem opened a new pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

pabloem opened a new pull request #12082:
URL: https://github.com/apache/beam/pull/12082


   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark
   --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-663658537


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem merged pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem merged pull request #12082:
URL: https://github.com/apache/beam/pull/12082


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on a change in pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #12082:
URL: https://github.com/apache/beam/pull/12082#discussion_r459756841



##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -1465,3 +1465,23 @@ def get_avro_schema_from_table_schema(schema):
   dict_table_schema = get_dict_table_schema(schema)
   return bigquery_avro_tools.get_record_schema_from_dict_table_schema(
       "root", dict_table_schema)
+
+
+class BigQueryJobTypes:
+  EXPORT = 'EXPORT'
+  COPY = 'COPY'
+  LOAD = 'LOAD'
+  QUERY = 'QUERY'
+
+
+_BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}_{random}"

Review comment:
       Done.

##########
File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNamingTest.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import static org.junit.Assert.assertEquals;
+
+import org.apache.beam.sdk.io.gcp.bigquery.BigQueryResourceNaming.JobType;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+@RunWith(JUnit4.class)
+public class BigQueryResourceNamingTest {
+
+  @Test
+  public void testJobTypesInNames() {
+    assertEquals(
+        "beam_bq_job_EXPORT_beamappjobtest_abcd",
+        BigQueryResourceNaming.createJobIdPrefix("beamapp-job-test", "abcd", JobType.EXPORT));
+

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-649880288


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-663279975


   Run JavaPortabilityApi PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #12082:
URL: https://github.com/apache/beam/pull/12082#discussion_r457124122



##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -1465,3 +1465,23 @@ def get_avro_schema_from_table_schema(schema):
   dict_table_schema = get_dict_table_schema(schema)
   return bigquery_avro_tools.get_record_schema_from_dict_table_schema(
       "root", dict_table_schema)
+
+
+class BigQueryJobTypes:
+  EXPORT = 'EXPORT'
+  COPY = 'COPY'
+  LOAD = 'LOAD'
+  QUERY = 'QUERY'
+
+
+_BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}_{random}"

Review comment:
       Please add unit tests for Python as well.

##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import com.google.api.services.bigquery.model.TableReference;
+import java.util.Optional;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hashing;
+
+/**
+ * This class contains utilities to standardize how resources are named by BigQueryIO.
+ *
+ * <p>Resources can be: - BigQuery jobs - Export jobs - Query jobs - Load jobs - Copy jobs -

Review comment:
       
   "Resources can be any type of BigQuery job started by BigQueryIO and temporary BigQuery resources created by BigQueryIO." ?

##########
File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNamingTest.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import static org.junit.Assert.assertEquals;
+
+import org.apache.beam.sdk.io.gcp.bigquery.BigQueryResourceNaming.JobType;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+@RunWith(JUnit4.class)
+public class BigQueryResourceNamingTest {
+
+  @Test
+  public void testJobTypesInNames() {
+    assertEquals(
+        "beam_bq_job_EXPORT_beamappjobtest_abcd",
+        BigQueryResourceNaming.createJobIdPrefix("beamapp-job-test", "abcd", JobType.EXPORT));
+

Review comment:
       Can we add tests to make sure that generated names are consistent with BIGQUERY_JOB_TEMPLATE ?

##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import com.google.api.services.bigquery.model.TableReference;
+import java.util.Optional;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hashing;
+
+/**
+ * This class contains utilities to standardize how resources are named by BigQueryIO.
+ *
+ * <p>Resources can be: - BigQuery jobs - Export jobs - Query jobs - Load jobs - Copy jobs -
+ * Temporary datasets - Temporary tables
+ *
+ * <p>This class has no backwards compatibility guaantees. It is considered internal.
+ */
+class BigQueryResourceNaming {
+
+  /**
+   * Generate a BigQuery job ID based on a prefix from {@link
+   * BigQueryResourceNaming::createJobIdPrefix}, with destination information added to it.
+   *
+   * @param prefix A prefix generated in {@link BigQueryResourceNaming::createJobIdPrefix}.
+   * @param tableDestination A descriptor of the destination table.
+   * @param partition A partition number in the destination table.
+   * @param index
+   * @return
+   */
+  static String createJobIdWithDestination(
+      String prefix, TableDestination tableDestination, int partition, long index) {
+    // Job ID must be different for each partition of each table.
+    String destinationHash =
+        Hashing.murmur3_128().hashUnencodedChars(tableDestination.toString()).toString();
+    String jobId = String.format("%s_%s", prefix, destinationHash);
+    if (partition >= 0) {
+      jobId += String.format("_%05d", partition);
+    }
+    if (index >= 0) {
+      jobId += String.format("_%05d", index);
+    }
+    return jobId;
+  }
+
+  public enum JobType {
+    LOAD,
+    COPY,
+    EXPORT,
+    QUERY,
+  }
+
+  static final String BIGQUERY_JOB_TEMPLATE = "beam_bq_job_{TYPE}_{JOB_ID}_{STEP}_{RANDOM}";

Review comment:
       Can we standardize and define this format in BigQueryIO (probably with a backwards compatibility warning) so that users can rely on it ?

##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -1465,3 +1465,23 @@ def get_avro_schema_from_table_schema(schema):
   dict_table_schema = get_dict_table_schema(schema)
   return bigquery_avro_tools.get_record_schema_from_dict_table_schema(
       "root", dict_table_schema)
+
+
+class BigQueryJobTypes:
+  EXPORT = 'EXPORT'
+  COPY = 'COPY'
+  LOAD = 'LOAD'
+  QUERY = 'QUERY'
+
+
+_BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}_{random}"
+
+
+def generate_bq_job_name(job_name, step_id, job_type, random=None):

Review comment:
       Can we standardize and define this format in bigquery.py (probably with a backwards compatibility warning) so that users can rely on it ?

##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import com.google.api.services.bigquery.model.TableReference;
+import java.util.Optional;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hashing;
+
+/**
+ * This class contains utilities to standardize how resources are named by BigQueryIO.
+ *
+ * <p>Resources can be: - BigQuery jobs - Export jobs - Query jobs - Load jobs - Copy jobs -
+ * Temporary datasets - Temporary tables
+ *
+ * <p>This class has no backwards compatibility guaantees. It is considered internal.
+ */
+class BigQueryResourceNaming {
+
+  /**
+   * Generate a BigQuery job ID based on a prefix from {@link
+   * BigQueryResourceNaming::createJobIdPrefix}, with destination information added to it.
+   *
+   * @param prefix A prefix generated in {@link BigQueryResourceNaming::createJobIdPrefix}.
+   * @param tableDestination A descriptor of the destination table.
+   * @param partition A partition number in the destination table.
+   * @param index
+   * @return
+   */
+  static String createJobIdWithDestination(
+      String prefix, TableDestination tableDestination, int partition, long index) {
+    // Job ID must be different for each partition of each table.
+    String destinationHash =
+        Hashing.murmur3_128().hashUnencodedChars(tableDestination.toString()).toString();
+    String jobId = String.format("%s_%s", prefix, destinationHash);
+    if (partition >= 0) {
+      jobId += String.format("_%05d", partition);
+    }
+    if (index >= 0) {
+      jobId += String.format("_%05d", index);
+    }
+    return jobId;
+  }
+
+  public enum JobType {
+    LOAD,
+    COPY,
+    EXPORT,
+    QUERY,
+  }
+
+  static final String BIGQUERY_JOB_TEMPLATE = "beam_bq_job_{TYPE}_{JOB_ID}_{STEP}_{RANDOM}";

Review comment:
       Probably cleaner to just use String.format() to do the replacement ?
   Also, mention the standard format in the top level Java doc for easy reference.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-663269119


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-649140269


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-663708646


   @chamikaramj PTAL


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-652000051


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-666774552


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #12082:
URL: https://github.com/apache/beam/pull/12082#discussion_r461967928



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
##########
@@ -457,6 +456,22 @@
  * </a> for security and permission related information specific to BigQuery.
  */
 public class BigQueryIO {
+
+  /**
+   * Template for BigQuery jobs created by BigQueryIO. This template is: {@code
+   * "beam_bq_job_{TYPE}_{JOB_ID}_{STEP}_{RANDOM}"}, where:
+   *
+   * <ul>
+   *   <li>{@code TYPE} represents the BigQuery job type (e.g. extract / copy / load / query)
+   *   <li>{@code JOB_ID} is the Dataflow job name.

Review comment:
       Nit: what if the runner is not Dataflow ?

##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -290,6 +290,20 @@ def compute_table_name(row):
 ]
 
 _LOGGER = logging.getLogger(__name__)
+"""
+Template for BigQuery jobs created by BigQueryIO. This template is:
+`"beam_bq_job_{job_type}_{job_id}_{step_id}_{random}"`, where:
+
+- `job_type` represents the BigQuery job type (e.g. extract / copy / load /
+    query).
+- `job_id` is the Dataflow job name.

Review comment:
       Ditto.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-651394186






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on a change in pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #12082:
URL: https://github.com/apache/beam/pull/12082#discussion_r459717944



##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -1465,3 +1465,23 @@ def get_avro_schema_from_table_schema(schema):
   dict_table_schema = get_dict_table_schema(schema)
   return bigquery_avro_tools.get_record_schema_from_dict_table_schema(
       "root", dict_table_schema)
+
+
+class BigQueryJobTypes:
+  EXPORT = 'EXPORT'
+  COPY = 'COPY'
+  LOAD = 'LOAD'
+  QUERY = 'QUERY'
+
+
+_BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}_{random}"
+
+
+def generate_bq_job_name(job_name, step_id, job_type, random=None):

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-666732227


   Run Java PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on a change in pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #12082:
URL: https://github.com/apache/beam/pull/12082#discussion_r459699026



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import com.google.api.services.bigquery.model.TableReference;
+import java.util.Optional;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hashing;
+
+/**
+ * This class contains utilities to standardize how resources are named by BigQueryIO.
+ *
+ * <p>Resources can be: - BigQuery jobs - Export jobs - Query jobs - Load jobs - Copy jobs -

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on a change in pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #12082:
URL: https://github.com/apache/beam/pull/12082#discussion_r459700149



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import com.google.api.services.bigquery.model.TableReference;
+import java.util.Optional;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hashing;
+
+/**
+ * This class contains utilities to standardize how resources are named by BigQueryIO.
+ *
+ * <p>Resources can be: - BigQuery jobs - Export jobs - Query jobs - Load jobs - Copy jobs -
+ * Temporary datasets - Temporary tables
+ *
+ * <p>This class has no backwards compatibility guaantees. It is considered internal.
+ */
+class BigQueryResourceNaming {
+
+  /**
+   * Generate a BigQuery job ID based on a prefix from {@link
+   * BigQueryResourceNaming::createJobIdPrefix}, with destination information added to it.
+   *
+   * @param prefix A prefix generated in {@link BigQueryResourceNaming::createJobIdPrefix}.
+   * @param tableDestination A descriptor of the destination table.
+   * @param partition A partition number in the destination table.
+   * @param index
+   * @return
+   */
+  static String createJobIdWithDestination(
+      String prefix, TableDestination tableDestination, int partition, long index) {
+    // Job ID must be different for each partition of each table.
+    String destinationHash =
+        Hashing.murmur3_128().hashUnencodedChars(tableDestination.toString()).toString();
+    String jobId = String.format("%s_%s", prefix, destinationHash);
+    if (partition >= 0) {
+      jobId += String.format("_%05d", partition);
+    }
+    if (index >= 0) {
+      jobId += String.format("_%05d", index);
+    }
+    return jobId;
+  }
+
+  public enum JobType {
+    LOAD,
+    COPY,
+    EXPORT,
+    QUERY,
+  }
+
+  static final String BIGQUERY_JOB_TEMPLATE = "beam_bq_job_{TYPE}_{JOB_ID}_{STEP}_{RANDOM}";

Review comment:
       I used this because I'd like to have named slots for replacement. String.format only supports replacement based on position. Is it okay if I keep this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-651999947






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-651404146


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on a change in pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #12082:
URL: https://github.com/apache/beam/pull/12082#discussion_r463295587



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
##########
@@ -457,6 +456,22 @@
  * </a> for security and permission related information specific to BigQuery.
  */
 public class BigQueryIO {
+
+  /**
+   * Template for BigQuery jobs created by BigQueryIO. This template is: {@code
+   * "beam_bq_job_{TYPE}_{JOB_ID}_{STEP}_{RANDOM}"}, where:
+   *
+   * <ul>
+   *   <li>{@code TYPE} represents the BigQuery job type (e.g. extract / copy / load / query)
+   *   <li>{@code JOB_ID} is the Dataflow job name.

Review comment:
       ah the job name comes from Beam, and is autogenerated if one is not provided. Changed the documentation to express that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on a change in pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #12082:
URL: https://github.com/apache/beam/pull/12082#discussion_r459713758



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import com.google.api.services.bigquery.model.TableReference;
+import java.util.Optional;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.hash.Hashing;
+
+/**
+ * This class contains utilities to standardize how resources are named by BigQueryIO.
+ *
+ * <p>Resources can be: - BigQuery jobs - Export jobs - Query jobs - Load jobs - Copy jobs -
+ * Temporary datasets - Temporary tables
+ *
+ * <p>This class has no backwards compatibility guaantees. It is considered internal.
+ */
+class BigQueryResourceNaming {
+
+  /**
+   * Generate a BigQuery job ID based on a prefix from {@link
+   * BigQueryResourceNaming::createJobIdPrefix}, with destination information added to it.
+   *
+   * @param prefix A prefix generated in {@link BigQueryResourceNaming::createJobIdPrefix}.
+   * @param tableDestination A descriptor of the destination table.
+   * @param partition A partition number in the destination table.
+   * @param index
+   * @return
+   */
+  static String createJobIdWithDestination(
+      String prefix, TableDestination tableDestination, int partition, long index) {
+    // Job ID must be different for each partition of each table.
+    String destinationHash =
+        Hashing.murmur3_128().hashUnencodedChars(tableDestination.toString()).toString();
+    String jobId = String.format("%s_%s", prefix, destinationHash);
+    if (partition >= 0) {
+      jobId += String.format("_%05d", partition);
+    }
+    if (index >= 0) {
+      jobId += String.format("_%05d", index);
+    }
+    return jobId;
+  }
+
+  public enum JobType {
+    LOAD,
+    COPY,
+    EXPORT,
+    QUERY,
+  }
+
+  static final String BIGQUERY_JOB_TEMPLATE = "beam_bq_job_{TYPE}_{JOB_ID}_{STEP}_{RANDOM}";

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-651411593


   Run Python 3.7 PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-651256113


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] pabloem commented on pull request #12082: Standardizing BigQuery job names in Beam Python and Java SDKs

Posted by GitBox <gi...@apache.org>.
pabloem commented on pull request #12082:
URL: https://github.com/apache/beam/pull/12082#issuecomment-649873640






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org