You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/31 05:31:19 UTC

[GitHub] [hudi] nsivabalan opened a new pull request, #7092: [WIP] Adding presto query validation to integ tests

nsivabalan opened a new pull request, #7092:
URL: https://github.com/apache/hudi/pull/7092

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1308212813

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838",
       "triggerID" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12892",
       "triggerID" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cfdf9292f6d5d702b2c71b440de6667b77cb4c40 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838) 
   * 2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12892) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #7092:
URL: https://github.com/apache/hudi/pull/7092#discussion_r1016598642


##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieTestSuiteJob.java:
##########
@@ -340,5 +343,11 @@ public static class HoodieTestSuiteConfig extends HoodieDeltaStreamer.Config {
 
     @Parameter(names = {"--trino-jdbc-password"}, description = "Password corresponding to the username to use for authentication")
     public String trinoPassword;
+
+    @Parameter(names = {"--index-type"}, description = "Index type to use for writes")
+    public String indexType = "SIMPLE";

Review Comment:
   ok it's fine to overwrite the default value for tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111] Improve integration test coverage

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1309056558

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838",
       "triggerID" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12892",
       "triggerID" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e4cada7e725120d7cf8a9fb17ce419436bc25ac",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12902",
       "triggerID" : "5e4cada7e725120d7cf8a9fb17ce419436bc25ac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5e4cada7e725120d7cf8a9fb17ce419436bc25ac Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12902) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #7092:
URL: https://github.com/apache/hudi/pull/7092#discussion_r1015106220


##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java:
##########
@@ -51,7 +51,7 @@ public Dataset<Row> getDatasetToValidate(SparkSession session, ExecutionContext
                                            StructType inputSchema) {
     String partitionPathField = context.getWriterContext().getProps().getString(DataSourceWriteOptions.PARTITIONPATH_FIELD().key());
     String hudiPath = context.getHoodieTestSuiteWriter().getCfg().targetBasePath + (partitionPathField.isEmpty() ? "/" : "/*/*/*");
-    Dataset<Row> hudiDf = session.read().option(HoodieMetadataConfig.ENABLE.key(), String.valueOf(config.isEnableMetadataValidate()))
+    Dataset<Row> hudiDf = session.read().option(HoodieMetadataConfig.ENABLE.key(), String.valueOf(context.getHoodieTestSuiteWriter().getCfg().enableMetadataOnRead))

Review Comment:
   there is some diff here. this.config refers to per node config. where as context.getHoodieTestSuiteWriter().getCfg() refers to HoodieTestSuiteJob.Config. they are not the same. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #7092:
URL: https://github.com/apache/hudi/pull/7092#discussion_r1016597764


##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java:
##########
@@ -51,7 +51,7 @@ public Dataset<Row> getDatasetToValidate(SparkSession session, ExecutionContext
                                            StructType inputSchema) {
     String partitionPathField = context.getWriterContext().getProps().getString(DataSourceWriteOptions.PARTITIONPATH_FIELD().key());
     String hudiPath = context.getHoodieTestSuiteWriter().getCfg().targetBasePath + (partitionPathField.isEmpty() ? "/" : "/*/*/*");
-    Dataset<Row> hudiDf = session.read().option(HoodieMetadataConfig.ENABLE.key(), String.valueOf(config.isEnableMetadataValidate()))
+    Dataset<Row> hudiDf = session.read().option(HoodieMetadataConfig.ENABLE.key(), String.valueOf(context.getHoodieTestSuiteWriter().getCfg().enableMetadataOnRead))

Review Comment:
   ok so should we make job config take precedence and fall back to node config if not present? looks like some merge logic needed here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111] Improve integration test coverage

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1308623131

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838",
       "triggerID" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12892",
       "triggerID" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e4cada7e725120d7cf8a9fb17ce419436bc25ac",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5e4cada7e725120d7cf8a9fb17ce419436bc25ac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12892) 
   * 5e4cada7e725120d7cf8a9fb17ce419436bc25ac UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1308176881

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838",
       "triggerID" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cfdf9292f6d5d702b2c71b440de6667b77cb4c40 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838) 
   * 2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan merged pull request #7092: [HUDI-5111] Improve integration test coverage

Posted by GitBox <gi...@apache.org>.
xushiyan merged PR #7092:
URL: https://github.com/apache/hudi/pull/7092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1305303958

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5562570bc49eaab4facba31473e9f560e50deb34 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697) 
   * cfdf9292f6d5d702b2c71b440de6667b77cb4c40 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
xushiyan commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1308508872

   @nsivabalan please look into the CI failure. thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1297789633

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5562570bc49eaab4facba31473e9f560e50deb34 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111] Improve integration test coverage

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1308629354

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838",
       "triggerID" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12892",
       "triggerID" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5e4cada7e725120d7cf8a9fb17ce419436bc25ac",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12902",
       "triggerID" : "5e4cada7e725120d7cf8a9fb17ce419436bc25ac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12892) 
   * 5e4cada7e725120d7cf8a9fb17ce419436bc25ac Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12902) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #7092:
URL: https://github.com/apache/hudi/pull/7092#discussion_r1015108193


##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieTestSuiteJob.java:
##########
@@ -340,5 +343,11 @@ public static class HoodieTestSuiteConfig extends HoodieDeltaStreamer.Config {
 
     @Parameter(names = {"--trino-jdbc-password"}, description = "Password corresponding to the username to use for authentication")
     public String trinoPassword;
+
+    @Parameter(names = {"--index-type"}, description = "Index type to use for writes")
+    public String indexType = "SIMPLE";

Review Comment:
   thats what I also thought initially. but Each engine has a diff default value and there is no constant that we can directly reference. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #7092:
URL: https://github.com/apache/hudi/pull/7092#discussion_r1017399907


##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java:
##########
@@ -51,7 +51,7 @@ public Dataset<Row> getDatasetToValidate(SparkSession session, ExecutionContext
                                            StructType inputSchema) {
     String partitionPathField = context.getWriterContext().getProps().getString(DataSourceWriteOptions.PARTITIONPATH_FIELD().key());
     String hudiPath = context.getHoodieTestSuiteWriter().getCfg().targetBasePath + (partitionPathField.isEmpty() ? "/" : "/*/*/*");
-    Dataset<Row> hudiDf = session.read().option(HoodieMetadataConfig.ENABLE.key(), String.valueOf(config.isEnableMetadataValidate()))
+    Dataset<Row> hudiDf = session.read().option(HoodieMetadataConfig.ENABLE.key(), String.valueOf(context.getHoodieTestSuiteWriter().getCfg().enableMetadataOnRead))

Review Comment:
   Here is how I am deciding where to fit the config. 
   some of them are just a top level config. like table base path, index type, isMetadata enabled since we wish to apply it to all nodes. while some are node level configs. for eg, num_records_to_insert, "query to use to validate" etc. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1297896889

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5562570bc49eaab4facba31473e9f560e50deb34 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #7092:
URL: https://github.com/apache/hudi/pull/7092#discussion_r1014056424


##########
hudi-integ-test/src/main/scala/org/apache/hudi/integ/testsuite/dag/nodes/SparkInsertNode.scala:
##########
@@ -71,6 +71,7 @@ class SparkInsertNode(dagNodeConfig: Config) extends DagNode[RDD[WriteStatus]] {
       .option(DataSourceWriteOptions.TABLE_NAME.key, context.getHoodieTestSuiteWriter.getCfg.targetTableName)
       .option(DataSourceWriteOptions.TABLE_TYPE.key, context.getHoodieTestSuiteWriter.getCfg.tableType)
       .option(DataSourceWriteOptions.OPERATION.key, getOperation())
+      .option("hoodie.index.type", context.getHoodieTestSuiteWriter.getCfg.indexType)

Review Comment:
   use constant for the key



##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java:
##########
@@ -51,7 +51,7 @@ public Dataset<Row> getDatasetToValidate(SparkSession session, ExecutionContext
                                            StructType inputSchema) {
     String partitionPathField = context.getWriterContext().getProps().getString(DataSourceWriteOptions.PARTITIONPATH_FIELD().key());
     String hudiPath = context.getHoodieTestSuiteWriter().getCfg().targetBasePath + (partitionPathField.isEmpty() ? "/" : "/*/*/*");
-    Dataset<Row> hudiDf = session.read().option(HoodieMetadataConfig.ENABLE.key(), String.valueOf(config.isEnableMetadataValidate()))
+    Dataset<Row> hudiDf = session.read().option(HoodieMetadataConfig.ENABLE.key(), String.valueOf(context.getHoodieTestSuiteWriter().getCfg().enableMetadataOnRead))

Review Comment:
   so we prefer `context.getHoodieTestSuiteWriter().getCfg()` over `this.config` ? how would people know this is the preferred way: we either make `this.config` usable or remove it so people only use context to retrieve config?



##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieTestSuiteJob.java:
##########
@@ -340,5 +343,11 @@ public static class HoodieTestSuiteConfig extends HoodieDeltaStreamer.Config {
 
     @Parameter(names = {"--trino-jdbc-password"}, description = "Password corresponding to the username to use for authentication")
     public String trinoPassword;
+
+    @Parameter(names = {"--index-type"}, description = "Index type to use for writes")
+    public String indexType = "SIMPLE";
+
+    @Parameter(names = {"--enable-metadata-on-read"}, description = "Enable's metadata for queries")
+    public Boolean enableMetadataOnRead = false;

Review Comment:
   ditto; and applies to other manually assigned default values



##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/PrestoQueryNode.java:
##########
@@ -35,25 +35,32 @@ public PrestoQueryNode(DeltaConfig.Config config) {
 
   @Override
   public void execute(ExecutionContext context, int curItrCount) throws Exception {
-    log.info("Executing presto query node {}", this.getName());
-    String url = context.getHoodieTestSuiteWriter().getCfg().prestoJdbcUrl;
-    if (StringUtils.isNullOrEmpty(url)) {
-      throw new IllegalArgumentException("Presto JDBC connection url not provided. Please set --presto-jdbc-url.");
-    }
-    String user = context.getHoodieTestSuiteWriter().getCfg().prestoUsername;
-    String pass = context.getHoodieTestSuiteWriter().getCfg().prestoPassword;
-    try {
-      Class.forName("com.facebook.presto.jdbc.PrestoDriver");
-    } catch (ClassNotFoundException e) {
-      throw new HoodieValidationException("Presto query validation failed due to " + e.getMessage(), e);
-    }
-    try (Connection connection = DriverManager.getConnection(url, user, pass)) {
-      Statement stmt = connection.createStatement();
-      setSessionProperties(this.config.getPrestoProperties(), stmt);
-      executeAndValidateQueries(this.config.getPrestoQueries(), stmt);
-      stmt.close();
-    } catch (Exception e) {
-      throw new HoodieValidationException("Presto query validation failed due to " + e.getMessage(), e);
+    if (context.getHoodieTestSuiteWriter().getCfg().enablePrestoValidation) {

Review Comment:
   pls follow the early-return style 



##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieTestSuiteJob.java:
##########
@@ -340,5 +343,11 @@ public static class HoodieTestSuiteConfig extends HoodieDeltaStreamer.Config {
 
     @Parameter(names = {"--trino-jdbc-password"}, description = "Password corresponding to the username to use for authentication")
     public String trinoPassword;
+
+    @Parameter(names = {"--index-type"}, description = "Index type to use for writes")
+    public String indexType = "SIMPLE";

Review Comment:
   use the default value from ConfigProperty



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1305940615

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838",
       "triggerID" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cfdf9292f6d5d702b2c71b440de6667b77cb4c40 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1297793607

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5562570bc49eaab4facba31473e9f560e50deb34 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1305313430

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838",
       "triggerID" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5562570bc49eaab4facba31473e9f560e50deb34 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697) 
   * cfdf9292f6d5d702b2c71b440de6667b77cb4c40 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #7092:
URL: https://github.com/apache/hudi/pull/7092#discussion_r1017755036


##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java:
##########
@@ -51,7 +51,7 @@ public Dataset<Row> getDatasetToValidate(SparkSession session, ExecutionContext
                                            StructType inputSchema) {
     String partitionPathField = context.getWriterContext().getProps().getString(DataSourceWriteOptions.PARTITIONPATH_FIELD().key());
     String hudiPath = context.getHoodieTestSuiteWriter().getCfg().targetBasePath + (partitionPathField.isEmpty() ? "/" : "/*/*/*");
-    Dataset<Row> hudiDf = session.read().option(HoodieMetadataConfig.ENABLE.key(), String.valueOf(config.isEnableMetadataValidate()))
+    Dataset<Row> hudiDf = session.read().option(HoodieMetadataConfig.ENABLE.key(), String.valueOf(context.getHoodieTestSuiteWriter().getCfg().enableMetadataOnRead))

Review Comment:
   @nsivabalan Ok took a closer look at the code. The node-level config is `DeltaConfig.Config`, which is a confusing class name. By using `Config` alone in the code makes it hard to tell what it contains. It's actually a bag of configs for the test suite, which should be called something like test suite config.
   
   As for `getHoodieTestSuiteWriter().getCfg()` returning `HoodieTestSuiteJob.HoodieTestSuiteConfig` makes it worse; it's job-level parameters passed to deltastreamer. We should call it `HoodieTestSuiteJobParams` to distinguish from a config object, which is typically a `HoodieConfig` with builder pattern throughout the codebase.
   
   This is not nitpicking; test classes have a lot to do with config pumping logic so we need to be careful designing good APIs and boundaries. We should make some code improvements in a separate pr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7092: [HUDI-5111][HUDI-5112][HUDI-5113] Enchancing integ test support

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7092:
URL: https://github.com/apache/hudi/pull/7092#issuecomment-1308456043

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12697",
       "triggerID" : "5562570bc49eaab4facba31473e9f560e50deb34",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12838",
       "triggerID" : "cfdf9292f6d5d702b2c71b440de6667b77cb4c40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12892",
       "triggerID" : "2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2432ff3f1bd0c4fa9e35cd4ff2a78cb60653ea20 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12892) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org