You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "jonvex (via GitHub)" <gi...@apache.org> on 2023/02/02 21:15:48 UTC

[GitHub] [hudi] jonvex opened a new pull request, #7831: [HUDI-5653] Add @AfterEach in TestHoodieDeltaStreamerWithMultiWriter to reset the datasource between tests

jonvex opened a new pull request, #7831:
URL: https://github.com/apache/hudi/pull/7831

   ### Change Logs
   
   The test was timing out sometimes. This was because the datagen source was not being reset. When the source creates 3000 unique keys, it will only give out updates after that. After the first run, each fetch data it would get 1000/3000 records. This will get nearly all after the first few fetches, but getting every single one takes a lot of fetches 
   
   <details>
     <summary>Here is an example of this effect</summary>
   ```
   jon@Jonathans-MBP testtable_MERGE_ON_READ % grep numInserts .hoodie/* | grep -v inflight | grep -v requested
   
   grep: .hoodie/archived: Is a directory
   grep: .hoodie/metadata: Is a directory
   .hoodie/20230202130912581.deltacommit:      "numInserts" : 328,
   .hoodie/20230202130912581.deltacommit:      "numInserts" : 344,
   .hoodie/20230202130912581.deltacommit:      "numInserts" : 328,
   .hoodie/20230202130920274.deltacommit:      "numInserts" : 235,
   .hoodie/20230202130920274.deltacommit:      "numInserts" : 219,
   .hoodie/20230202130920274.deltacommit:      "numInserts" : 222,
   .hoodie/20230202130925101.deltacommit:      "numInserts" : 157,
   .hoodie/20230202130925101.deltacommit:      "numInserts" : 148,
   .hoodie/20230202130925101.deltacommit:      "numInserts" : 143,
   .hoodie/20230202130928902.deltacommit:      "numInserts" : 93,
   .hoodie/20230202130928902.deltacommit:      "numInserts" : 102,
   .hoodie/20230202130928902.deltacommit:      "numInserts" : 118,
   .hoodie/20230202130932465.deltacommit:      "numInserts" : 73,
   .hoodie/20230202130932465.deltacommit:      "numInserts" : 63,
   .hoodie/20230202130932465.deltacommit:      "numInserts" : 54,
   .hoodie/20230202130937296.deltacommit:      "numInserts" : 35,
   .hoodie/20230202130937296.deltacommit:      "numInserts" : 39,
   .hoodie/20230202130937296.deltacommit:      "numInserts" : 44,
   .hoodie/20230202130945028.deltacommit:      "numInserts" : 26,
   .hoodie/20230202130945028.deltacommit:      "numInserts" : 26,
   .hoodie/20230202130945028.deltacommit:      "numInserts" : 36,
   .hoodie/20230202130949980.deltacommit:      "numInserts" : 15,
   .hoodie/20230202130949980.deltacommit:      "numInserts" : 26,
   .hoodie/20230202130949980.deltacommit:      "numInserts" : 16,
   .hoodie/20230202130955462.deltacommit:      "numInserts" : 13,
   .hoodie/20230202130955462.deltacommit:      "numInserts" : 13,
   .hoodie/20230202130955462.deltacommit:      "numInserts" : 11,
   .hoodie/20230202131000533.deltacommit:      "numInserts" : 11,
   .hoodie/20230202131000533.deltacommit:      "numInserts" : 4,
   .hoodie/20230202131000533.deltacommit:      "numInserts" : 11,
   .hoodie/20230202131007097.deltacommit:      "numInserts" : 6,
   .hoodie/20230202131007097.deltacommit:      "numInserts" : 4,
   .hoodie/20230202131007097.deltacommit:      "numInserts" : 7,
   .hoodie/20230202131013141.deltacommit:      "numInserts" : 3,
   .hoodie/20230202131013141.deltacommit:      "numInserts" : 2,
   .hoodie/20230202131013141.deltacommit:      "numInserts" : 2,
   .hoodie/20230202131018138.deltacommit:      "numInserts" : 2,
   .hoodie/20230202131018138.deltacommit:      "numInserts" : 2,
   .hoodie/20230202131018138.deltacommit:      "numInserts" : 4,
   .hoodie/20230202131023542.deltacommit:      "numInserts" : 2,
   .hoodie/20230202131023542.deltacommit:      "numInserts" : 0,
   .hoodie/20230202131023542.deltacommit:      "numInserts" : 1,
   .hoodie/20230202131028359.commit:      "numInserts" : 0,
   .hoodie/20230202131029163.deltacommit:      "numInserts" : 3,
   .hoodie/20230202131029163.deltacommit:      "numInserts" : 0,
   .hoodie/20230202131029163.deltacommit:      "numInserts" : 0,
   ```
   </details>
   
   ### Impact
   
   Less randomly failing ci
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7831: [HUDI-5653] Add @AfterEach in TestHoodieDeltaStreamerWithMultiWriter to reset the datasource between tests

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7831:
URL: https://github.com/apache/hudi/pull/7831#discussion_r1096035181


##########
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamerWithMultiWriter.java:
##########
@@ -72,8 +73,12 @@ public class TestHoodieDeltaStreamerWithMultiWriter extends SparkClientFunctiona
   String basePath;
   String propsFilePath;
   String tableBasePath;
+  
+  @AfterEach
+  public void teardown() throws Exception {
+    TestDataSource.resetDataGen();

Review Comment:
   shoudn't it be done in `org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerTestBase#resetTestDataSource()` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #7831: [HUDI-5653] Add @AfterEach in TestHoodieDeltaStreamerWithMultiWriter to reset the datasource between tests

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #7831:
URL: https://github.com/apache/hudi/pull/7831#discussion_r1095247805


##########
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamerWithMultiWriter.java:
##########
@@ -72,8 +73,12 @@ public class TestHoodieDeltaStreamerWithMultiWriter extends SparkClientFunctiona
   String basePath;
   String propsFilePath;
   String tableBasePath;
+  
+  @AfterEach
+  public void teardown() throws Exception {
+    TestDataSource.resetDataGen();

Review Comment:
   TestHoodieDeltaStreamer extends HoodieDeltaStreamerTestBase which extends UtilitiesTestBase which has this method as well. All the other usages of TestDataSource inherit from HoodieDeltaStreamerTestBase. So I don't think it needs to be added anywhere else



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7831: [HUDI-5653] Add @AfterEach in TestHoodieDeltaStreamerWithMultiWriter to reset the datasource between tests

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7831:
URL: https://github.com/apache/hudi/pull/7831#issuecomment-1414644929

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0bf6f186a6610febcf282604cb4bfa5b6c158290",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14882",
       "triggerID" : "0bf6f186a6610febcf282604cb4bfa5b6c158290",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0bf6f186a6610febcf282604cb4bfa5b6c158290 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14882) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7831: [HUDI-5653] Add @AfterEach in TestHoodieDeltaStreamerWithMultiWriter to reset the datasource between tests

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7831:
URL: https://github.com/apache/hudi/pull/7831#issuecomment-1414418215

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0bf6f186a6610febcf282604cb4bfa5b6c158290",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14882",
       "triggerID" : "0bf6f186a6610febcf282604cb4bfa5b6c158290",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0bf6f186a6610febcf282604cb4bfa5b6c158290 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14882) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7831: [HUDI-5653] Add @AfterEach in TestHoodieDeltaStreamerWithMultiWriter to reset the datasource between tests

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7831:
URL: https://github.com/apache/hudi/pull/7831#discussion_r1095240438


##########
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamerWithMultiWriter.java:
##########
@@ -72,8 +73,12 @@ public class TestHoodieDeltaStreamerWithMultiWriter extends SparkClientFunctiona
   String basePath;
   String propsFilePath;
   String tableBasePath;
+  
+  @AfterEach
+  public void teardown() throws Exception {
+    TestDataSource.resetDataGen();

Review Comment:
   try apply this to `TestHoodieDeltaStreamer` too? and any other usage of this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7831: [HUDI-5653] Add @AfterEach in TestHoodieDeltaStreamerWithMultiWriter to reset the datasource between tests

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7831:
URL: https://github.com/apache/hudi/pull/7831#issuecomment-1414408915

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0bf6f186a6610febcf282604cb4bfa5b6c158290",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0bf6f186a6610febcf282604cb4bfa5b6c158290",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0bf6f186a6610febcf282604cb4bfa5b6c158290 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan merged pull request #7831: [HUDI-5653] Add @AfterEach in TestHoodieDeltaStreamerWithMultiWriter to reset the datasource between tests

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan merged PR #7831:
URL: https://github.com/apache/hudi/pull/7831


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org