You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:15:31 UTC

[GitHub] [beam] rworley-monster opened a new pull request #12389: [BEAM-10587] Support Maps in BigQuery

rworley-monster opened a new pull request #12389:
URL: https://github.com/apache/beam/pull/12389


   Maps are currently not supported when converting a Beam Schema to a BigQuery TableFieldSchema.  This improvement will convert Maps to repeated records with 'key' and 'value' fields.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](htt
 ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_
 Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_P
 ostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/b
 eam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-706094339


   I have written to the dev list and we will see how the conversation goes about the implicit conversion.  And I will attempt to add the BigQuery integration test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on a change in pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on a change in pull request #12389:
URL: https://github.com/apache/beam/pull/12389#discussion_r471559168



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
##########
@@ -443,6 +467,17 @@ public static TableRow toTableRow(Row row) {
         }
         return convertedItems;
 
+      case MAP:
+        Map<?, ?> pairs = (Map<?, ?>) fieldValue;
+        convertedItems = Lists.newArrayListWithCapacity(pairs.size());
+        for (Map.Entry<?, ?> pair : pairs.entrySet()) {
+          convertedItems.add(
+              new TableRow()
+                  .set(BIGQUERY_MAP_KEY_FIELD_NAME, pair.getKey())

Review comment:
       Nice catch.  I've updated it with that and tried to copy the above ITERABLE conversion as closely as possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #12389:
URL: https://github.com/apache/beam/pull/12389#discussion_r470756034



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
##########
@@ -263,6 +267,18 @@ private static FieldType fromTableFieldSchemaType(
                 "SqlTimestampWithLocalTzType", FieldType.STRING, "", FieldType.DATETIME) {});
       case "STRUCT":
       case "RECORD":
+        // check if record represents a map entry

Review comment:
       This is something that is going to need more discussion. I have two concerns:
   1. The ZetaSQL dialect doesn't support map types, so this will break that use case.
   2. The user might have a row matching this struct that isn't suppose to be a map.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-695080557


   I have added tests for the conversion of Beam schema maps to BigQuery and back.  Can you please let me know if there is anything else that I can do to help prepare this for approval?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-696452519


   Still looks good, here are the remaining items:
   1. We need to figure out [implicitly converting RECORD/STRUCT type to MAP](https://github.com/apache/beam/pull/12389#discussion_r470756034). This has some far reaching impacts, so about sending a email to dev@beam.apache.org explaining what you are doing?
   2. Add this to an integration test with BigQuery. This file would probably be an easy place to add tests (it also includes Beam SQL):
   https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryReadWriteIT.java


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-696452519


   Still looks good, here are the remaining items:
   1. We need to figure out [implicitly converting RECORD/STRUCT type to MAP](https://github.com/apache/beam/pull/12389#discussion_r470756034). This has some far reaching impacts, so about sending a email to dev@beam.apache.org explaining what you are doing?
   2. Add this to an integration test with BigQuery. This file would probably be an easy place to add tests (it also includes Beam SQL):
   https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryReadWriteIT.java


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-708978442


   Thanks, feel free to squash, I think all of the changes are a logical unit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on a change in pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on a change in pull request #12389:
URL: https://github.com/apache/beam/pull/12389#discussion_r471560146



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
##########
@@ -679,6 +714,20 @@ private static Object convertAvroArray(
     return ret;
   }
 
+  private static Object convertAvroMap(

Review comment:
       Agreed, that's a more accurate name and I have updated it to that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-674139020


   I have added support for reading BigQuery map records back into Beam Schema and Row.  Can you please confirm that I am on the right track and let me know if there are any other locations that would need this new support for maps?  Tests still need to be added and I hope to get to that next week.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
robertwb commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-665276141


   R: @apilloud


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-696452519


   Still looks good, here are the remaining items:
   1. We need to figure out [implicitly converting RECORD/STRUCT type to MAP](https://github.com/apache/beam/pull/12389#discussion_r470756034). This has some far reaching impacts, so about sending a email to dev@beam.apache.org explaining what you are doing?
   2. Add this to an integration test with BigQuery. This file would probably be an easy place to add tests (it also includes Beam SQL):
   https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryReadWriteIT.java


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-670381949


   Yes, it's in progress, but temporarily preempted by a higher priority task.  I hope to get back to it and add the follow-up commit next week.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on a change in pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on a change in pull request #12389:
URL: https://github.com/apache/beam/pull/12389#discussion_r462906573



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
##########
@@ -442,6 +450,17 @@ private static Object fromBeamField(FieldType fieldType, Object fieldValue) {
         }
         return convertedItems;
 
+      case MAP:
+        Map<?, ?> pairs = (Map<?, ?>) fieldValue;
+        convertedItems = Lists.newArrayListWithCapacity(pairs.size());
+        for (Map.Entry<?, ?> pair : pairs.entrySet()) {
+          HashMap<String, Object> item = new HashMap<>(2);

Review comment:
       Ok, I will return a List of TableRows instead of Maps.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster removed a comment on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster removed a comment on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-664974133


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
aaltay commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-670159620


   Hey @rworley-monster - any progress on this? Do you need help?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #12389:
URL: https://github.com/apache/beam/pull/12389#discussion_r470749928



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
##########
@@ -679,6 +714,20 @@ private static Object convertAvroArray(
     return ret;
   }
 
+  private static Object convertAvroMap(

Review comment:
       nit: The name might be a little confusing, this isn't actually converting an Avro Map type. Something like `convertAvroRecordToMap` might reduce that risk.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-706094339


   I have written to the dev list and we will see how the conversation goes about the implicit conversion.  And I will attempt to add the BigQuery integration test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #12389:
URL: https://github.com/apache/beam/pull/12389#discussion_r470743118



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
##########
@@ -443,6 +467,17 @@ public static TableRow toTableRow(Row row) {
         }
         return convertedItems;
 
+      case MAP:
+        Map<?, ?> pairs = (Map<?, ?>) fieldValue;
+        convertedItems = Lists.newArrayListWithCapacity(pairs.size());
+        for (Map.Entry<?, ?> pair : pairs.entrySet()) {
+          convertedItems.add(
+              new TableRow()
+                  .set(BIGQUERY_MAP_KEY_FIELD_NAME, pair.getKey())

Review comment:
       You'll need to convert the types stored in the map as well. Something like `fromBeamField(fieldType.getMapKeyType(), pair.getKey())`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on a change in pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on a change in pull request #12389:
URL: https://github.com/apache/beam/pull/12389#discussion_r471560897



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
##########
@@ -679,6 +714,20 @@ private static Object convertAvroArray(
     return ret;
   }
 
+  private static Object convertAvroMap(
+      FieldType beamField, Object value, BigQueryUtils.ConversionOptions options) {
+    List<GenericData.Record> records = (List<GenericData.Record>) value;
+    HashMap<Object, Object> ret = new HashMap<>();

Review comment:
       Sure, I updated it to use ImmutableMap instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-664974133






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #12389:
URL: https://github.com/apache/beam/pull/12389#discussion_r461975618



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
##########
@@ -442,6 +450,17 @@ private static Object fromBeamField(FieldType fieldType, Object fieldValue) {
         }
         return convertedItems;
 
+      case MAP:
+        Map<?, ?> pairs = (Map<?, ?>) fieldValue;
+        convertedItems = Lists.newArrayListWithCapacity(pairs.size());
+        for (Map.Entry<?, ?> pair : pairs.entrySet()) {
+          HashMap<String, Object> item = new HashMap<>(2);

Review comment:
       I believe the type you want here is `TableRow`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on a change in pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
apilloud commented on a change in pull request #12389:
URL: https://github.com/apache/beam/pull/12389#discussion_r470752028



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
##########
@@ -679,6 +714,20 @@ private static Object convertAvroArray(
     return ret;
   }
 
+  private static Object convertAvroMap(
+      FieldType beamField, Object value, BigQueryUtils.ConversionOptions options) {
+    List<GenericData.Record> records = (List<GenericData.Record>) value;
+    HashMap<Object, Object> ret = new HashMap<>();

Review comment:
       We prefer to use ImmutableMap when possible, there are examples in this file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
aaltay commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-705755708


   @rworley-monster - What is the next step on this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-708359843


   Based on feedback from the dev list, I have added a `SchemaConversionOptions` class with the option to infer maps when converting a BigQuery `TableSchema` to a Beam `Schema`.  The existing and default behaviour is unchanged, users will continue to get an array of rows.  If they enable this new option, they will get a map when the TableSchema contains a field with the expected `array<struct{key, value}>` schema.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
aaltay commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-707245413


   /cc relevant folks: @chamikaramj @pabloem 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-666280160


   > This needs tests!
   
   Indeed, I will add appropriate tests in `BigQueryUtilsTest`.
   
   > BigQuery doesn't fundamentally support maps. This change modifies the IO to implicitly run unnest() on maps when writing to BigQuery. It seems like this could create confusion for users.
   
   The motivation for this change is to allow Avro maps (or anything that can become a Beam Schema Map type) to flow directly into BigQuery.  I agree that a map becoming an array of key/value records is not completely intuitive, but it seemed the best solution that I was able to figure out within the current features of BigQuery schemas.
   
   > The BigQuery IO is bidirectional, users will expect to be able to read their maps back. Is that something you are considering adding?
   
   Yes, that makes sense.  It appears this would be done in `fromTableFieldSchemaType`, we would detect if a record has only 'key' and 'value' fields, and in that case return a Map FieldType.  A possible concern is that this would intercept key/value records that did not necessarily originate as Beam Maps.  Perhaps this would be a desired feature in many cases, but it can be debated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] aaltay commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
aaltay commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-705755708


   @rworley-monster - What is the next step on this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rworley-monster commented on a change in pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
rworley-monster commented on a change in pull request #12389:
URL: https://github.com/apache/beam/pull/12389#discussion_r471599270



##########
File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java
##########
@@ -263,6 +267,18 @@ private static FieldType fromTableFieldSchemaType(
                 "SqlTimestampWithLocalTzType", FieldType.STRING, "", FieldType.DATETIME) {});
       case "STRUCT":
       case "RECORD":
+        // check if record represents a map entry

Review comment:
       Regarding the second concern, I wonder if a user with a { "key", "value" } struct would always be satisfied to work with the field as a Map in Beam.  If not, then I suppose we would need to somehow tag the field schema with a marker that a user would not reasonably collide with.  The options I see are:
   1. Field names (could be something like "beam_key" or "map_key").  Personally, I'm not a fan of affecting schema names like this.
   2. Extra field (in addition to "key" and "value"), could be something like "beam_map" with all true or null values.  Similarly messy like above.
   3. Tag in field description(s) with a warning for the user to not delete it.  Might be something like #beam-map-do-not-delete#.
   
   Though these options require that the fields are created via Beam and not by the user beforehand.  Or the user would need to know to use these special markers to enable the Beam map functionality.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud merged pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
apilloud merged pull request #12389:
URL: https://github.com/apache/beam/pull/12389


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] apilloud commented on pull request #12389: [BEAM-10587] Support Maps in BigQuery

Posted by GitBox <gi...@apache.org>.
apilloud commented on pull request #12389:
URL: https://github.com/apache/beam/pull/12389#issuecomment-674179998


   I left some comments, I think you have everything needed to make this work for the Beam SQL calcite dialect.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org