You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "RyuSA (via GitHub)" <gi...@apache.org> on 2023/07/09 07:49:15 UTC

[GitHub] [beam] RyuSA opened a new pull request, #27413: [BigQueryIO] Support StorageWriteAPI

RyuSA opened a new pull request, #27413:
URL: https://github.com/apache/beam/pull/27413

   
   **Current issue:**
   
   As per the specifications, BigQuery is capable of receiving timestamps up to `9999-12-31T23:59:59`. 
   https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type
   
   There is a bug in the `java.time.Instant.until`. https://bugs.java.com/bugdatabase/view_bug?bug_id=8199095
   
   This causes there is an overflow issue with timestamps beyond `2262-04-11T23:47:16Z`. This prevents correct processing of timestamps beyond this point.
   https://github.com/apache/beam/blob/v2.48.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java#L764
   
   <details>
   <summary>Details</summary>
   
   `Instant.until` calculates the period using `ChronoUnit.UNIT.between`. `ChronoUnit.MICROS.between` will multiply 1000_000_000(=Instant.NANOS_PER_SECOND) to the micro-second based Instant object and convert it to nano-second based one. 
   https://github.com/frohoff/jdk8u-jdk/blob/master/src/share/classes/java/time/Instant.java#L1149 
   
   This means, the maximum timestamp which `ChronoUnit.MICROS.between` can handle is
   
   - long.max = 9223372036854775807
   - Timestamp(long.max, UNIT.NANO) ~ 2262-04-11T23:47:16Z
   
   `TableRowToStorageApiProto.singularFieldToProtoValue` will use `ChronoUnit.MICROS.between(Instant.EPOCH, Instant.from(timestamp)`. This will be overflowed if timestamp > `2262-04-11T23:47:16Z` and it throw `java.lang.ArithmeticException: long overflow`.
   
   </details>
   
   **Proposed Fix:**
   
   This pull request applies a workaround to the section of code where the overflow occurs. With this fix, the StorageWriteAPI can correctly transmit timestamps up to `9999-12-31 23:59:59.999999 UTC` as originally intended by BigQuery's specifications.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RyuSA commented on a diff in pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "RyuSA (via GitHub)" <gi...@apache.org>.
RyuSA commented on code in PR #27413:
URL: https://github.com/apache/beam/pull/27413#discussion_r1260370611


##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -1049,6 +1050,58 @@ public void testIntegerTypeConversion() throws DescriptorValidationException {
     }
   }
 
+  @Test
+  public void testTimestampTypeConversion() throws DescriptorValidationException {

Review Comment:
   I think the test needed for this fix is the Timestamp input value bounds test.
   
   Since they were originally designed to properly parse input values of various "formats", I thought it would be a little too much to include a boundary test there as well.🤔
   
   But, yes I can add the test case to the general test instead of testTimestampTypeConversion. Do you think I should add the test case to both of with and without a field named "f"? (= e.g BASE_TABLE_SCHEMA and BASE_TABLE_SCHEMA_NO_F) Or one of them?



##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -1049,6 +1050,58 @@ public void testIntegerTypeConversion() throws DescriptorValidationException {
     }
   }
 
+  @Test
+  public void testTimestampTypeConversion() throws DescriptorValidationException {

Review Comment:
   I think the test needed for this fix is the Timestamp input value bounds test.
   
   Since they were originally designed to properly parse input values of various "formats", I thought it would be a little too much to include a boundary test there as well.🤔
   
   But, yes I can add the test case to the general test instead of testTimestampTypeConversion. Do you think I should add the test case to both of with and without a field named "f"? (= e.g BASE_TABLE_SCHEMA and BASE_TABLE_SCHEMA_NO_F) Or one of them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RyuSA commented on pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "RyuSA (via GitHub)" <gi...@apache.org>.
RyuSA commented on PR #27413:
URL: https://github.com/apache/beam/pull/27413#issuecomment-1627665045

   assign set of reviewers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RyuSA commented on a diff in pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "RyuSA (via GitHub)" <gi...@apache.org>.
RyuSA commented on code in PR #27413:
URL: https://github.com/apache/beam/pull/27413#discussion_r1263891289


##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -1049,6 +1050,58 @@ public void testIntegerTypeConversion() throws DescriptorValidationException {
     }
   }
 
+  @Test
+  public void testTimestampTypeConversion() throws DescriptorValidationException {

Review Comment:
   @ahmedabu98 I have pushed a new commit. 👍 I used both of `BASE_TABLE_SCHEMA` and `BASE_TABLE_SCHEMA_NO_F `. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #27413:
URL: https://github.com/apache/beam/pull/27413#issuecomment-1627665224

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @kennknowles for label java.
   R: @bvolpato for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #27413:
URL: https://github.com/apache/beam/pull/27413#discussion_r1263908171


##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -1049,6 +1050,58 @@ public void testIntegerTypeConversion() throws DescriptorValidationException {
     }
   }
 
+  @Test
+  public void testTimestampTypeConversion() throws DescriptorValidationException {

Review Comment:
   sorry got busy with other things
   > I think the test needed for this fix is the Timestamp input value bounds test.
   
    you're right this case is specific to an upper bound, but i think it still fits in a parsing test. thanks for adding it :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RyuSA commented on a diff in pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "RyuSA (via GitHub)" <gi...@apache.org>.
RyuSA commented on code in PR #27413:
URL: https://github.com/apache/beam/pull/27413#discussion_r1263891289


##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -1049,6 +1050,58 @@ public void testIntegerTypeConversion() throws DescriptorValidationException {
     }
   }
 
+  @Test
+  public void testTimestampTypeConversion() throws DescriptorValidationException {

Review Comment:
   @ahmedabu98 I have pushed a new commit. 👍



##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -1049,6 +1050,58 @@ public void testIntegerTypeConversion() throws DescriptorValidationException {
     }
   }
 
+  @Test
+  public void testTimestampTypeConversion() throws DescriptorValidationException {

Review Comment:
   @ahmedabu98 I have pushed a new commit. 👍



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #27413:
URL: https://github.com/apache/beam/pull/27413#issuecomment-1630984905

   FYI failing tests look irrelevant, you can ignore them


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on PR #27413:
URL: https://github.com/apache/beam/pull/27413#issuecomment-1629114694

   R: @ahmedabu98 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #27413:
URL: https://github.com/apache/beam/pull/27413#issuecomment-1629116711

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #27413:
URL: https://github.com/apache/beam/pull/27413#issuecomment-1627647932

   Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #27413:
URL: https://github.com/apache/beam/pull/27413#discussion_r1259863280


##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -1049,6 +1050,58 @@ public void testIntegerTypeConversion() throws DescriptorValidationException {
     }
   }
 
+  @Test
+  public void testTimestampTypeConversion() throws DescriptorValidationException {

Review Comment:
   This is not a blocker, but for completeness can you add this case (`"9999-12-31...'") as a new field in the general test cases in this file (e.g. `BASE_TABLE_SCHEMA_PROTO`)?
   
   You can look at https://github.com/apache/beam/pull/26879/files for an example



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RyuSA commented on a diff in pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "RyuSA (via GitHub)" <gi...@apache.org>.
RyuSA commented on code in PR #27413:
URL: https://github.com/apache/beam/pull/27413#discussion_r1260370611


##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoTest.java:
##########
@@ -1049,6 +1050,58 @@ public void testIntegerTypeConversion() throws DescriptorValidationException {
     }
   }
 
+  @Test
+  public void testTimestampTypeConversion() throws DescriptorValidationException {

Review Comment:
   I think the test needed for this fix is the Timestamp input value bounds test.
   
   Since they were originally designed to properly parse input values of various "formats", I thought it would be a little too much to include a boundary test there as well.🤔
   
   But, yes I can add the test case to the general test instead of testTimestampTypeConversion. Do you think I should add the test case to both of with and without a field named "f"? (= e.g `BASE_TABLE_SCHEMA` and `BASE_TABLE_SCHEMA_NO_F` ) Or one of them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn merged pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn merged PR #27413:
URL: https://github.com/apache/beam/pull/27413


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RyuSA commented on pull request #27413: Bugfix for Timestamp Overflow in BigQueryIO's StorageWriteAPI

Posted by "RyuSA (via GitHub)" <gi...@apache.org>.
RyuSA commented on PR #27413:
URL: https://github.com/apache/beam/pull/27413#issuecomment-1627666854

   It looks 2 workflows failed due to another reason.🤔
   
   - PreCommit Java Examples Dataflow -> GCP API error?
   - Java_GCP_IO_Direct -> Spanner ChangeStream got error
   
   Does anyone know how to fix it? or xan I ignore them? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org