You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "Abacn (via GitHub)" <gi...@apache.org> on 2023/01/27 18:56:17 UTC

[GitHub] [beam] Abacn opened a new pull request, #25209: Add timeout to some gcp IO unit tests

Abacn opened a new pull request, #25209:
URL: https://github.com/apache/beam/pull/25209

   Investigating #25207
   
   Flakiness not found locally so add test class timeout and run on jenkins
   
   related: already found one scenario where mock not working and causing extended time of tests (https://github.com/apache/beam/issues/21533#issuecomment-1406929345). This mostly likely happen in error handling tests.
   
   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn merged pull request #25209: Attempt fix GCPIO_Direct tests timeout

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn merged PR #25209:
URL: https://github.com/apache/beam/pull/25209


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on a diff in pull request #25209: Attempt fix GCPIO_Direct tests timeout

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on code in PR #25209:
URL: https://github.com/apache/beam/pull/25209#discussion_r1089781245


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteGroupedRecordsToFiles.java:
##########
@@ -58,11 +57,11 @@ public void processElement(
       for (ElementT tableRow : element.getValue()) {
         if (writer.getByteSize() > maxFileSize) {
           writer.close();
-          writer = rowWriterFactory.createRowWriter(tempFilePrefix, element.getKey());

Review Comment:
   this bug is captured after the modified writeDynamics test. the writer should be created after the `writer.getResult()` call. Otherwise causing `java.lang.IllegalStateException: Not yet closed`.
   
   In production the bundle will get retried so this is less severe, and likely the reason it has been ignored



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on pull request #25209: Attempt fix GCPIO_Direct tests timeout

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #25209:
URL: https://github.com/apache/beam/pull/25209#issuecomment-1410586408

   > Also, were these the tests that were causing GCPIO_Direct to time out? Just wondering why you chose these to make a timeout for.
   
   These tests were chosen by checking the most time consuming tests provided by jenkins UI. Yeah it is heuristic. By triggering repeatedly I saw one case `org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOWriteTest.testTriggeredFileLoadsWithTempTablesToExistingNullSchemaTable` failed (https://github.com/apache/beam/pull/25209#issuecomment-1407051477) but I did not find anything strange about that single test itself, but found other tests writing to many files which I suspect might cause problem, changed the test parameters and revealed a bug... that was how I get here
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on pull request #25209: Add timeout to some gcp IO unit tests

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #25209:
URL: https://github.com/apache/beam/pull/25209#issuecomment-1407086893

   This change exposes tests with timeout flaky rather than run it indefinitely as for now. This enables Java_GCP_IO_Direct provides a full test report.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on pull request #25209: Attempt fix GCPIO_Direct tests timeout

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #25209:
URL: https://github.com/apache/beam/pull/25209#issuecomment-1410666787

   Run Java_GCP_IO_Direct PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on a diff in pull request #25209: Attempt fix GCPIO_Direct tests timeout

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on code in PR #25209:
URL: https://github.com/apache/beam/pull/25209#discussion_r1089781681


##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOWriteTest.java:
##########
@@ -315,16 +324,19 @@ public void writeDynamicDestinations(boolean schemas, boolean autoSharding) thro
 
     final List<String> allUsernames = ImmutableList.of("bill", "bob", "randolph");
     List<String> userList = Lists.newArrayList();
-    // Make sure that we generate enough users so that WriteBundlesToFiles is forced to spill to
-    // WriteGroupedRecordsToFiles.
-    for (int i = 0; i < BatchLoads.DEFAULT_MAX_NUM_WRITERS_PER_BUNDLE * 10; ++i) {
-      // Every user has 10 nicknames.
-      for (int j = 0; j < 10; ++j) {
+    // i controls the number of destinations
+    for (int i = 0; i < maxNumWritersPerBundle * 2; ++i) {

Review Comment:
   with these parameters the number of files has been reduced 10 times (2000 -> 200+). The large number of local file operation is suspected to be contributed to the timeout. Target codes are still tested consistently (previously there is some code not gets covered, see above) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on a diff in pull request #25209: Attempt fix GCPIO_Direct tests timeout

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on code in PR #25209:
URL: https://github.com/apache/beam/pull/25209#discussion_r1089781681


##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOWriteTest.java:
##########
@@ -315,16 +324,19 @@ public void writeDynamicDestinations(boolean schemas, boolean autoSharding) thro
 
     final List<String> allUsernames = ImmutableList.of("bill", "bob", "randolph");
     List<String> userList = Lists.newArrayList();
-    // Make sure that we generate enough users so that WriteBundlesToFiles is forced to spill to
-    // WriteGroupedRecordsToFiles.
-    for (int i = 0; i < BatchLoads.DEFAULT_MAX_NUM_WRITERS_PER_BUNDLE * 10; ++i) {
-      // Every user has 10 nicknames.
-      for (int j = 0; j < 10; ++j) {
+    // i controls the number of destinations
+    for (int i = 0; i < maxNumWritersPerBundle * 2; ++i) {

Review Comment:
   with these parameters the number of files has been reduced 10 times (2000 -> 200+) while target codes are tested consistently (previously there is some code not gets covered, see above) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #25209: Attempt fix GCPIO_Direct tests timeout

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25209:
URL: https://github.com/apache/beam/pull/25209#issuecomment-1410504996

   Besides that, this LGTM thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #25209: Attempt fix GCPIO_Direct tests timeout

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25209:
URL: https://github.com/apache/beam/pull/25209#issuecomment-1410504734

   You can even reduce timeout mark for `BigQueryIOReadTest` and `PubsubIOTest` even more, maybe 60s. In the tests' history they don't go over 10s. 
   
   Also, were these the tests that were causing GCPIO_Direct to time out? Just wondering why you chose these to make a timeout for.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on pull request #25209: Attempt fix GCPIO_Direct tests timeout

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #25209:
URL: https://github.com/apache/beam/pull/25209#issuecomment-1408759038

   ready to review now R:@ahmedabu98


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on pull request #25209: Add timeout to some gcp IO unit tests

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #25209:
URL: https://github.com/apache/beam/pull/25209#issuecomment-1407051477

   Run Java_GCP_IO_Direct PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on a diff in pull request #25209: Attempt fix GCPIO_Direct tests timeout

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on code in PR #25209:
URL: https://github.com/apache/beam/pull/25209#discussion_r1089781245


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteGroupedRecordsToFiles.java:
##########
@@ -58,11 +57,11 @@ public void processElement(
       for (ElementT tableRow : element.getValue()) {
         if (writer.getByteSize() > maxFileSize) {
           writer.close();
-          writer = rowWriterFactory.createRowWriter(tempFilePrefix, element.getKey());

Review Comment:
   this bug is captured after the modified writeDynamics test. the writer should be created after the `writer.getResult()` call. Otherwise causing `java.lang.IllegalStateException: Not yet closed`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org