You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/27 17:08:49 UTC
[GitHub] [beam] slilichenko opened a new pull request, #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
slilichenko opened a new pull request, #24366:
URL: https://github.com/apache/beam/pull/24366
Improve error handling when converting TableRows to a schema containing BigQuery's INTEGER type:
- Allow values to be of BigDecimal and BigInteger types
- Capture the failed value and the field name and type to help with troubleshooting
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1346371413
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment `assign to next reviewer`:
R: @lukecwik for label java.
R: @pabloem for label io.
Available commands:
- `stop reviewer notifications` - opt out of the automated review tooling
- `remind me after tests pass` - tag the comment author after tests pass
- `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on a diff in pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
lukecwik commented on code in PR #24366:
URL: https://github.com/apache/beam/pull/24366#discussion_r1046443234
##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -52,7 +57,7 @@
@RunWith(JUnit4.class)
@SuppressWarnings({
- "nullness" // TODO(https://github.com/apache/beam/issues/20497)
+ "nullness" // TODO(https://github.com/apache/beam/issues/20497)
Review Comment:
```suggestion
"nullness" // TODO(https://github.com/apache/beam/issues/20497)
```
##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -823,6 +828,78 @@ public void testNullRepeatedDescriptorFromTableSchema() throws Exception {
assertTrue(repeatednof2.isEmpty());
}
+ @Test
+ public void testIntegerTypeConversion() throws DescriptorValidationException {
+ String intFieldName = "int_field";
+ TableSchema tableSchema = new TableSchema()
+ .setFields(
+ ImmutableList.<TableFieldSchema>builder()
+ .add(
+ new TableFieldSchema()
+ .setType("INTEGER")
+ .setName(intFieldName)
+ .setMode("REQUIRED")
+ )
+ .build());
+ TableRowToStorageApiProto.SchemaInformation schemaInformation =
+ TableRowToStorageApiProto.SchemaInformation.fromTableSchema(tableSchema
+ );
+ SchemaInformation fieldSchema = schemaInformation.getSchemaForField(intFieldName);
+ Descriptor schemaDescriptor =
+ TableRowToStorageApiProto.getDescriptorFromTableSchema(tableSchema, true);
+ FieldDescriptor fieldDescriptor = schemaDescriptor.findFieldByName(intFieldName);
+
+ Object[][] validIntValues = new Object[][]{
+ // Source and expected converted values.
+ {"123", 123L},
+ {123L, 123L},
+ {123, 123L},
+ {new BigDecimal("123"), 123L},
+ {new BigInteger("123"), 123L}
+ };
+ for (Object[] validValue : validIntValues) {
+ Object sourceValue = validValue[0];
+ Long expectedConvertedValue = (Long) validValue[1];
+ try {
+ Object converted = TableRowToStorageApiProto.singularFieldToProtoValue(fieldSchema,
+ fieldDescriptor, sourceValue, false);
+ assertEquals(expectedConvertedValue, converted);
+ } catch (SchemaConversionException e) {
+ fail("Failed to convert value " + sourceValue + " of type " + validValue.getClass()
+ + " to INTEGER: " + e);
+ }
+ }
+
+ Object[][] invalidIntValues = new Object[][] {
+ // Value and expected error message
+ {"12.123", "Column: "
+ + intFieldName
+ + " (INT64). Value: 12.123 (java.lang.String). Reason: java.lang.NumberFormatException: For input string: \"12.123\""},
+ {Long.toString(Long.MAX_VALUE) + '0', "Column: "
+ + intFieldName
+ + " (INT64). Value: 92233720368547758070 (java.lang.String). Reason: java.lang.NumberFormatException: For input string: \"92233720368547758070\""},
+ {new BigDecimal("12.123"), "Column: "
+ + intFieldName
+ + " (INT64). Value: 12.123 (java.math.BigDecimal). Reason: java.lang.ArithmeticException: Rounding necessary"},
+ {new BigInteger(String.valueOf(Long.MAX_VALUE)).add(new BigInteger("10")), "Column: "
+ + intFieldName
+ + " (INT64). Value: 9223372036854775817 (java.math.BigInteger). Reason: java.lang.ArithmeticException: BigInteger out of long range"}
+
+ };
+ for (Object[] invalidValue : invalidIntValues) {
+ Object sourceValue = invalidValue[0];
+ String expectedError = (String) invalidValue[1];
+ try {
+ TableRowToStorageApiProto.singularFieldToProtoValue(fieldSchema, fieldDescriptor,
+ sourceValue, false);
+ fail("Expected to throw an exception converting " + sourceValue + " of type "
+ + invalidValue.getClass() + " to INTEGER");
+ } catch (SchemaConversionException e) {
+ assertEquals("Exception message", expectedError, e.getMessage());
+ }
+ }
+ }
Review Comment:
```suggestion
@Test
public void testIntegerTypeConversion() throws DescriptorValidationException {
String intFieldName = "int_field";
TableSchema tableSchema =
new TableSchema()
.setFields(
ImmutableList.<TableFieldSchema>builder()
.add(
new TableFieldSchema()
.setType("INTEGER")
.setName(intFieldName)
.setMode("REQUIRED"))
.build());
TableRowToStorageApiProto.SchemaInformation schemaInformation =
TableRowToStorageApiProto.SchemaInformation.fromTableSchema(tableSchema);
SchemaInformation fieldSchema = schemaInformation.getSchemaForField(intFieldName);
Descriptor schemaDescriptor =
TableRowToStorageApiProto.getDescriptorFromTableSchema(tableSchema, true);
FieldDescriptor fieldDescriptor = schemaDescriptor.findFieldByName(intFieldName);
Object[][] validIntValues =
new Object[][] {
// Source and expected converted values.
{"123", 123L},
{123L, 123L},
{123, 123L},
{new BigDecimal("123"), 123L},
{new BigInteger("123"), 123L}
};
for (Object[] validValue : validIntValues) {
Object sourceValue = validValue[0];
Long expectedConvertedValue = (Long) validValue[1];
try {
Object converted =
TableRowToStorageApiProto.singularFieldToProtoValue(
fieldSchema, fieldDescriptor, sourceValue, false);
assertEquals(expectedConvertedValue, converted);
} catch (SchemaConversionException e) {
fail(
"Failed to convert value "
+ sourceValue
+ " of type "
+ validValue.getClass()
+ " to INTEGER: "
+ e);
}
}
Object[][] invalidIntValues =
new Object[][] {
// Value and expected error message
{
"12.123",
"Column: "
+ intFieldName
+ " (INT64). Value: 12.123 (java.lang.String). Reason: java.lang.NumberFormatException: For input string: \"12.123\""
},
{
Long.toString(Long.MAX_VALUE) + '0',
"Column: "
+ intFieldName
+ " (INT64). Value: 92233720368547758070 (java.lang.String). Reason: java.lang.NumberFormatException: For input string: \"92233720368547758070\""
},
{
new BigDecimal("12.123"),
"Column: "
+ intFieldName
+ " (INT64). Value: 12.123 (java.math.BigDecimal). Reason: java.lang.ArithmeticException: Rounding necessary"
},
{
new BigInteger(String.valueOf(Long.MAX_VALUE)).add(new BigInteger("10")),
"Column: "
+ intFieldName
+ " (INT64). Value: 9223372036854775817 (java.math.BigInteger). Reason: java.lang.ArithmeticException: BigInteger out of long range"
}
};
for (Object[] invalidValue : invalidIntValues) {
Object sourceValue = invalidValue[0];
String expectedError = (String) invalidValue[1];
try {
TableRowToStorageApiProto.singularFieldToProtoValue(
fieldSchema, fieldDescriptor, sourceValue, false);
fail(
"Expected to throw an exception converting "
+ sourceValue
+ " of type "
+ invalidValue.getClass()
+ " to INTEGER");
} catch (SchemaConversionException e) {
assertEquals("Exception message", expectedError, e.getMessage());
}
}
}
```
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -108,6 +109,20 @@ public static class SchemaDoesntMatchException extends SchemaConversionException
}
}
+ public static class SingleValueConversionException extends SchemaConversionException {
+ SingleValueConversionException(Object sourceValue, SchemaInformation schema, Exception e) {
+ super("Column: " + getPrettyFieldName(schema) + " (" + schema.getType() + "). "
Review Comment:
```suggestion
super(
"Column: "
+ getPrettyFieldName(schema)
+ " ("
+ schema.getType()
+ "). "
+ "Value: "
+ sourceValue
+ " ("
+ sourceValue.getClass().getName()
+ "). Reason: "
+ e);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on a diff in pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
lukecwik commented on code in PR #24366:
URL: https://github.com/apache/beam/pull/24366#discussion_r1046170338
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -590,9 +605,25 @@ private static void fieldDescriptorFromTableField(
switch (schemaInformation.getType()) {
case INT64:
if (value instanceof String) {
- return Long.valueOf((String) value);
+ try {
+ return Long.valueOf((String) value);
+ } catch (NumberFormatException e) {
+ throw new SingleValueConversionException(value, schemaInformation, e);
+ }
} else if (value instanceof Integer || value instanceof Long) {
return ((Number) value).longValue();
+ } else if (value instanceof BigDecimal) {
+ try {
+ return ((BigDecimal) value).longValueExact();
+ } catch (ArithmeticException e) {
+ throw new SingleValueConversionException(value, schemaInformation, e);
+ }
+ } else if( value instanceof BigInteger) {
Review Comment:
```suggestion
} else if (value instanceof BigInteger) {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1346951329
remind me after tests pass
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] slilichenko commented on a diff in pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
slilichenko commented on code in PR #24366:
URL: https://github.com/apache/beam/pull/24366#discussion_r1034280378
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -408,9 +409,25 @@ private static void fieldDescriptorFromTableField(
case "INT64":
case "INTEGER":
if (value instanceof String) {
- return Long.valueOf((String) value);
+ try {
+ return Long.valueOf((String) value);
+ } catch (NumberFormatException e) {
+ // Expected. Element will be added to the failed element PCollection
Review Comment:
Addressed in the latest commit. Only implemented for the INTEGER type at the moment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on a diff in pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
lukecwik commented on code in PR #24366:
URL: https://github.com/apache/beam/pull/24366#discussion_r1046531630
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -108,6 +109,31 @@ public static class SchemaDoesntMatchException extends SchemaConversionException
}
}
+ public static class SingleValueConversionException extends SchemaConversionException {
+ SingleValueConversionException(Object sourceValue, SchemaInformation schema, Exception e) {
+ super(
+ "Column: "
+ + getPrettyFieldName(schema)
+ + " ("
+ + schema.getType()
+ + "). "
+ + "Value: "
+ + sourceValue
+ + " ("
+ + sourceValue.getClass().getName()
+ + "). Reason: "
+ + e);
+ + "Value: " + sourceValue + " (" + sourceValue.getClass().getName()
+ + "). Reason: " + e);
Review Comment:
```suggestion
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1347543797
retest this please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1342637381
Reminder, please take a look at this pr: @apilloud @ahmedabu98
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] apilloud commented on a diff in pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
apilloud commented on code in PR #24366:
URL: https://github.com/apache/beam/pull/24366#discussion_r1033924832
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java:
##########
@@ -408,9 +409,25 @@ private static void fieldDescriptorFromTableField(
case "INT64":
case "INTEGER":
if (value instanceof String) {
- return Long.valueOf((String) value);
+ try {
+ return Long.valueOf((String) value);
+ } catch (NumberFormatException e) {
+ // Expected. Element will be added to the failed element PCollection
Review Comment:
These are going to result in a generic `SchemaDoesntMatchException("Unexpected value :"...)` being thrown which will make this potentially difficult to debug when it happens. If we can't just let the exception bubble up here, you should wrap it with a new exception rather than just dropping it. (Same thing for the exceptions below.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1346952548
Ok - I'll remind @lukecwik after tests pass
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik merged pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
lukecwik merged PR #24366:
URL: https://github.com/apache/beam/pull/24366
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #24366: Improve Integer conversion in BigQueryIO's Storage Write API method.
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24366:
URL: https://github.com/apache/beam/pull/24366#issuecomment-1329443352
Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
R: @apilloud for label java.
R: @ahmedabu98 for label io.
Available commands:
- `stop reviewer notifications` - opt out of the automated review tooling
- `remind me after tests pass` - tag the comment author after tests pass
- `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org