You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/27 17:08:49 UTC

[GitHub] [beam] slilichenko opened a new pull request, #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

slilichenko opened a new pull request, #24366:
URL: https://github.com/apache/beam/pull/24366

   Improve error handling when converting TableRows to a schema containing BigQuery's INTEGER type:
   
   - Allow values to be of BigDecimal and BigInteger types
   - Capture the failed value and the field name and type to help with troubleshooting
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1346371413

   Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @lukecwik for label java.
   R: @pabloem for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on a diff in pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on code in PR #24366:
URL: https://github.com/apache/beam/pull/24366#discussion_r1046443234


##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -52,7 +57,7 @@
 
 @RunWith(JUnit4.class)
 @SuppressWarnings({
-  "nullness" // TODO(https://github.com/apache/beam/issues/20497)
+    "nullness" // TODO(https://github.com/apache/beam/issues/20497)

Review Comment:
   ```suggestion
     "nullness" // TODO(https://github.com/apache/beam/issues/20497)
   ```



##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -823,6 +828,78 @@ public void testNullRepeatedDescriptorFromTableSchema() throws Exception {
     assertTrue(repeatednof2.isEmpty());
   }
 
+  @Test
+  public void testIntegerTypeConversion() throws DescriptorValidationException {
+    String intFieldName = "int_field";
+    TableSchema tableSchema = new TableSchema()
+        .setFields(
+            ImmutableList.<TableFieldSchema>builder()
+                .add(
+                    new TableFieldSchema()
+                        .setType("INTEGER")
+                        .setName(intFieldName)
+                        .setMode("REQUIRED")
+                )
+                .build());
+    TableRowToStorageApiProto.SchemaInformation schemaInformation =
+        TableRowToStorageApiProto.SchemaInformation.fromTableSchema(tableSchema
+        );
+    SchemaInformation fieldSchema = schemaInformation.getSchemaForField(intFieldName);
+    Descriptor schemaDescriptor =
+        TableRowToStorageApiProto.getDescriptorFromTableSchema(tableSchema, true);
+    FieldDescriptor fieldDescriptor = schemaDescriptor.findFieldByName(intFieldName);
+
+    Object[][] validIntValues = new Object[][]{
+        // Source and expected converted values.
+        {"123", 123L},
+        {123L, 123L},
+        {123, 123L},
+        {new BigDecimal("123"), 123L},
+        {new BigInteger("123"), 123L}
+    };
+    for (Object[] validValue : validIntValues) {
+      Object sourceValue = validValue[0];
+      Long expectedConvertedValue = (Long) validValue[1];
+      try {
+        Object converted = TableRowToStorageApiProto.singularFieldToProtoValue(fieldSchema,
+            fieldDescriptor, sourceValue, false);
+        assertEquals(expectedConvertedValue, converted);
+      } catch (SchemaConversionException e) {
+        fail("Failed to convert value " + sourceValue + " of type " + validValue.getClass()
+            + " to INTEGER: " + e);
+      }
+    }
+
+    Object[][] invalidIntValues = new Object[][] {
+        // Value and expected error message
+        {"12.123", "Column: "
+            + intFieldName
+            + " (INT64). Value: 12.123 (java.lang.String). Reason: java.lang.NumberFormatException: For input string: \"12.123\""},
+        {Long.toString(Long.MAX_VALUE) + '0', "Column: "
+            + intFieldName
+            + " (INT64). Value: 92233720368547758070 (java.lang.String). Reason: java.lang.NumberFormatException: For input string: \"92233720368547758070\""},
+        {new BigDecimal("12.123"), "Column: "
+            + intFieldName
+            + " (INT64). Value: 12.123 (java.math.BigDecimal). Reason: java.lang.ArithmeticException: Rounding necessary"},
+        {new BigInteger(String.valueOf(Long.MAX_VALUE)).add(new BigInteger("10")), "Column: "
+            + intFieldName
+            + " (INT64). Value: 9223372036854775817 (java.math.BigInteger). Reason: java.lang.ArithmeticException: BigInteger out of long range"}
+
+    };
+    for (Object[] invalidValue : invalidIntValues) {
+      Object sourceValue = invalidValue[0];
+      String expectedError = (String) invalidValue[1];
+      try {
+        TableRowToStorageApiProto.singularFieldToProtoValue(fieldSchema, fieldDescriptor,
+            sourceValue, false);
+        fail("Expected to throw an exception converting " + sourceValue + " of type "
+            + invalidValue.getClass() + " to INTEGER");
+      } catch (SchemaConversionException e) {
+        assertEquals("Exception message", expectedError, e.getMessage());
+      }
+    }
+  }

Review Comment:
   ```suggestion
     @Test
     public void testIntegerTypeConversion() throws DescriptorValidationException {
       String intFieldName = "int_field";
       TableSchema tableSchema =
           new TableSchema()
               .setFields(
                   ImmutableList.<TableFieldSchema>builder()
                       .add(
                           new TableFieldSchema()
                               .setType("INTEGER")
                               .setName(intFieldName)
                               .setMode("REQUIRED"))
                       .build());
       TableRowToStorageApiProto.SchemaInformation schemaInformation =
           TableRowToStorageApiProto.SchemaInformation.fromTableSchema(tableSchema);
       SchemaInformation fieldSchema = schemaInformation.getSchemaForField(intFieldName);
       Descriptor schemaDescriptor =
           TableRowToStorageApiProto.getDescriptorFromTableSchema(tableSchema, true);
       FieldDescriptor fieldDescriptor = schemaDescriptor.findFieldByName(intFieldName);
   
       Object[][] validIntValues =
           new Object[][] {
             // Source and expected converted values.
             {"123", 123L},
             {123L, 123L},
             {123, 123L},
             {new BigDecimal("123"), 123L},
             {new BigInteger("123"), 123L}
           };
       for (Object[] validValue : validIntValues) {
         Object sourceValue = validValue[0];
         Long expectedConvertedValue = (Long) validValue[1];
         try {
           Object converted =
               TableRowToStorageApiProto.singularFieldToProtoValue(
                   fieldSchema, fieldDescriptor, sourceValue, false);
           assertEquals(expectedConvertedValue, converted);
         } catch (SchemaConversionException e) {
           fail(
               "Failed to convert value "
                   + sourceValue
                   + " of type "
                   + validValue.getClass()
                   + " to INTEGER: "
                   + e);
         }
       }
   
       Object[][] invalidIntValues =
           new Object[][] {
             // Value and expected error message
             {
               "12.123",
               "Column: "
                   + intFieldName
                   + " (INT64). Value: 12.123 (java.lang.String). Reason: java.lang.NumberFormatException: For input string: \"12.123\""
             },
             {
               Long.toString(Long.MAX_VALUE) + '0',
               "Column: "
                   + intFieldName
                   + " (INT64). Value: 92233720368547758070 (java.lang.String). Reason: java.lang.NumberFormatException: For input string: \"92233720368547758070\""
             },
             {
               new BigDecimal("12.123"),
               "Column: "
                   + intFieldName
                   + " (INT64). Value: 12.123 (java.math.BigDecimal). Reason: java.lang.ArithmeticException: Rounding necessary"
             },
             {
               new BigInteger(String.valueOf(Long.MAX_VALUE)).add(new BigInteger("10")),
               "Column: "
                   + intFieldName
                   + " (INT64). Value: 9223372036854775817 (java.math.BigInteger). Reason: java.lang.ArithmeticException: BigInteger out of long range"
             }
           };
       for (Object[] invalidValue : invalidIntValues) {
         Object sourceValue = invalidValue[0];
         String expectedError = (String) invalidValue[1];
         try {
           TableRowToStorageApiProto.singularFieldToProtoValue(
               fieldSchema, fieldDescriptor, sourceValue, false);
           fail(
               "Expected to throw an exception converting "
                   + sourceValue
                   + " of type "
                   + invalidValue.getClass()
                   + " to INTEGER");
         } catch (SchemaConversionException e) {
           assertEquals("Exception message", expectedError, e.getMessage());
         }
       }
     }
   ```



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -108,6 +109,20 @@ public static class SchemaDoesntMatchException extends SchemaConversionException
     }
   }
 
+  public static class SingleValueConversionException extends SchemaConversionException {
+    SingleValueConversionException(Object sourceValue, SchemaInformation schema, Exception e) {
+      super("Column: " + getPrettyFieldName(schema) + " (" + schema.getType() + "). "

Review Comment:
   ```suggestion
         super(
             "Column: "
                 + getPrettyFieldName(schema)
                 + " ("
                 + schema.getType()
                 + "). "
                 + "Value: "
                 + sourceValue
                 + " ("
                 + sourceValue.getClass().getName()
                 + "). Reason: "
                 + e);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on a diff in pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on code in PR #24366:
URL: https://github.com/apache/beam/pull/24366#discussion_r1046170338


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -590,9 +605,25 @@ private static void fieldDescriptorFromTableField(
     switch (schemaInformation.getType()) {
       case INT64:
         if (value instanceof String) {
-          return Long.valueOf((String) value);
+          try {
+            return Long.valueOf((String) value);
+          } catch (NumberFormatException e) {
+            throw new SingleValueConversionException(value, schemaInformation, e);
+          }
         } else if (value instanceof Integer || value instanceof Long) {
           return ((Number) value).longValue();
+        } else if (value instanceof BigDecimal) {
+          try {
+            return ((BigDecimal) value).longValueExact();
+          } catch (ArithmeticException e) {
+            throw new SingleValueConversionException(value, schemaInformation, e);
+          }
+        } else if( value instanceof BigInteger) {

Review Comment:
   ```suggestion
           } else if (value instanceof BigInteger) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1346951329

   remind me after tests pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] slilichenko commented on a diff in pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
slilichenko commented on code in PR #24366:
URL: https://github.com/apache/beam/pull/24366#discussion_r1034280378


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -408,9 +409,25 @@ private static void fieldDescriptorFromTableField(
       case "INT64":
       case "INTEGER":
         if (value instanceof String) {
-          return Long.valueOf((String) value);
+          try {
+            return Long.valueOf((String) value);
+          } catch (NumberFormatException e) {
+            // Expected. Element will be added to the failed element PCollection

Review Comment:
   Addressed in the latest commit. Only implemented for the INTEGER type at the moment. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on a diff in pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on code in PR #24366:
URL: https://github.com/apache/beam/pull/24366#discussion_r1046531630


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -108,6 +109,31 @@ public static class SchemaDoesntMatchException extends SchemaConversionException
     }
   }
 
+  public static class SingleValueConversionException extends SchemaConversionException {
+    SingleValueConversionException(Object sourceValue, SchemaInformation schema, Exception e) {
+      super(
+          "Column: "
+              + getPrettyFieldName(schema)
+              + " ("
+              + schema.getType()
+              + "). "
+              + "Value: "
+              + sourceValue
+              + " ("
+              + sourceValue.getClass().getName()
+              + "). Reason: "
+              + e);
+          + "Value: " + sourceValue + " (" + sourceValue.getClass().getName()
+          + "). Reason: " + e);

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1347543797

   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1342637381

   Reminder, please take a look at this pr: @apilloud @ahmedabu98 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] apilloud commented on a diff in pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
apilloud commented on code in PR #24366:
URL: https://github.com/apache/beam/pull/24366#discussion_r1033924832


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -408,9 +409,25 @@ private static void fieldDescriptorFromTableField(
       case "INT64":
       case "INTEGER":
         if (value instanceof String) {
-          return Long.valueOf((String) value);
+          try {
+            return Long.valueOf((String) value);
+          } catch (NumberFormatException e) {
+            // Expected. Element will be added to the failed element PCollection

Review Comment:
   These are going to result in a generic `SchemaDoesntMatchException("Unexpected value :"...)` being thrown which will make this potentially difficult to debug when it happens. If we can't just let the exception bubble up here, you should wrap it with a new exception rather than just dropping it. (Same thing for the exceptions below.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1346952548

   Ok - I'll remind @lukecwik after tests pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik merged pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
lukecwik merged PR #24366:
URL: https://github.com/apache/beam/pull/24366


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1329443352

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @apilloud for label java.
   R: @ahmedabu98 for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org