You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/20 16:21:15 UTC

[GitHub] [hudi] rahil-c opened a new pull request, #6151: Rahil c/spark3.1 profile clone

rahil-c opened a new pull request, #6151:
URL: https://github.com/apache/hudi/pull/6151

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1193047984

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224",
       "triggerID" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c2f0b51c8bd1bbc2759ef017e01baaa033d975",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10233",
       "triggerID" : "a4c2f0b51c8bd1bbc2759ef017e01baaa033d975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d733135839a55cc6d51cd8806446c4f802e6e63",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10237",
       "triggerID" : "0d733135839a55cc6d51cd8806446c4f802e6e63",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a4c2f0b51c8bd1bbc2759ef017e01baaa033d975 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10233) 
   * 0d733135839a55cc6d51cd8806446c4f802e6e63 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10237) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1191835447

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 37f1e305758ef1827720dd74c90b52abc1ffa67f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1190953855

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100",
       "triggerID" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10103",
       "triggerID" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 37f1e305758ef1827720dd74c90b52abc1ffa67f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927095741


##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java:
##########
@@ -76,6 +78,7 @@
  * NOTE: This class is invariant of the underlying file-format of the files being read
  */
 public class HoodieCopyOnWriteTableInputFormat extends HoodieTableInputFormat {
+  private static final Logger LOG = LogManager.getLogger(HoodieCopyOnWriteTableInputFormat.class);

Review Comment:
   this actually got merged now. https://github.com/apache/hudi/pull/6161, so if i rebase it will basically be the same and is needed in general. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: Rahil c/spark3.1 profile clone

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1190615910

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100",
       "triggerID" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10103",
       "triggerID" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 492ffdcca3f5e2351a106d09fb83a3debd3bb672 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100) 
   * b37c8dfa6205bf24b83b4a84816690921d45226a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10103) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1193039486

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224",
       "triggerID" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c2f0b51c8bd1bbc2759ef017e01baaa033d975",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10233",
       "triggerID" : "a4c2f0b51c8bd1bbc2759ef017e01baaa033d975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d733135839a55cc6d51cd8806446c4f802e6e63",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10237",
       "triggerID" : "0d733135839a55cc6d51cd8806446c4f802e6e63",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c25f0b8643603faaf6d9ddb480240741b1590b78 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224) 
   * a4c2f0b51c8bd1bbc2759ef017e01baaa033d975 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10233) 
   * 0d733135839a55cc6d51cd8806446c4f802e6e63 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10237) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927098017


##########
docker/demo/config/log4j.properties:
##########
@@ -25,6 +25,8 @@ log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}:
 # log level for this class is used to overwrite the root logger's log level, so that
 # the user can have different defaults for the shell and regular Spark apps.
 log4j.logger.org.apache.spark.repl.Main=WARN
+# Adjust Hudi internal logging levels
+log4j.logger.org.apache.hudi=DEBUG

Review Comment:
   sure i can remove this, this was more helpful for seeing the logging for docker IT tests and I think will help in general people debug. If the concern is that it will generate too many logs for azure ci IT section then we can remove this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1191831718

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 37f1e305758ef1827720dd74c90b52abc1ffa67f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927094265


##########
azure-pipelines.yml:
##########
@@ -200,27 +223,22 @@ stages:
               mavenOptions: '-Xmx4g'
       - job: IT
         displayName: IT modules
-        timeoutInMinutes: '120'
+        timeoutInMinutes: '180'
         steps:
           - task: Maven@3
             displayName: maven install
+            continueOnError: true
+            retryCountOnTaskFailure: 2
             inputs:
               mavenPomFile: 'pom.xml'
               goals: 'clean install'
               options: $(MVN_OPTS_INSTALL) -Pintegration-tests
               publishJUnitResults: false
               jdkVersionOption: '1.8'
-          - task: Maven@3

Review Comment:
   I believe if add a `condition` and have it set to `false` then it should disable this section https://docs.microsoft.com/en-us/azure/devops/pipelines/process/tasks?view=azure-devops&tabs=yaml will give it a try.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1467286004

   @yihua : should this PR be closed in light of https://github.com/apache/hudi/pull/6117 ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927113728


##########
hudi-examples/hudi-examples-flink/src/test/java/org/apache/hudi/examples/quickstart/TestHoodieFlinkQuickstart.java:
##########
@@ -34,6 +34,7 @@
 /**
  * IT cases for Hoodie table source and sink.
  */
+

Review Comment:
   will fix this 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: Rahil c/spark3.1 profile clone

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1190731283

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100",
       "triggerID" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10103",
       "triggerID" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 492ffdcca3f5e2351a106d09fb83a3debd3bb672 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100) 
   * b37c8dfa6205bf24b83b4a84816690921d45226a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10103) 
   * 37f1e305758ef1827720dd74c90b52abc1ffa67f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1192993331

   not sure why java ci is complaining about the logger since this got merged https://github.com/apache/hudi/pull/6161/checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r928066027


##########
hudi-spark-datasource/hudi-spark/pom.xml:
##########
@@ -316,6 +332,12 @@
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-hive_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>*</groupId>
+          <artifactId>*</artifactId>
+        </exclusion>
+      </exclusions>

Review Comment:
   Check whether this affects Spark bundle.



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestOrcBootstrap.java:
##########
@@ -168,11 +169,13 @@ public Schema generateNewDataSetAndReturnSchema(long timestamp, int numRecords,
     return AvroOrcUtils.createAvroSchemaWithDefaultValue(orcSchema, "test_orc_record", null, true);
   }
 
+  @Disabled("Disable due to hive's orc conflict.")

Review Comment:
   Maybe we can add a `@Tag` like `Spark2_4only` for this class.



##########
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieIndexer.java:
##########
@@ -75,6 +76,7 @@
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
+@Disabled

Review Comment:
   If this is due to multiple Spark context exception, we should use `SparkClientFunctionalTestHarness` to rewrite this test and avoid initializing the spark context again, to fix the tests.



##########
packaging/hudi-spark-bundle/pom.xml:
##########
@@ -95,6 +95,12 @@
                   <include>org.antlr:stringtemplate</include>
                   <include>org.apache.parquet:parquet-avro</include>
 
+                  <include>com.fasterxml.jackson.core:jackson-annotations</include>
+                  <include>com.fasterxml.jackson.core:jackson-core</include>
+                  <include>com.fasterxml.jackson.core:jackson-databind</include>
+                  <include>com.fasterxml.jackson.dataformat:jackson-dataformat-yaml</include>
+                  <include>com.fasterxml.jackson.module:jackson-module-scala_${scala.binary.version}</include>

Review Comment:
   Is this for fixing testing only?  We should avoid introducing new changes for production code and bundling.  If really necessary, could you add these to test scope only or integ-test-bundle?



##########
hudi-utilities/pom.xml:
##########
@@ -227,6 +227,10 @@
           <groupId>org.slf4j</groupId>
           <artifactId>slf4j-api</artifactId>
         </exclusion>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+        </exclusion>

Review Comment:
   Could you clarify if this is needed?  Any implication on Spark bundle (e.g., missing Hadoop-related classes)?  Is this for test only?



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestOrcBootstrap.java:
##########
@@ -168,11 +169,13 @@ public Schema generateNewDataSetAndReturnSchema(long timestamp, int numRecords,
     return AvroOrcUtils.createAvroSchemaWithDefaultValue(orcSchema, "test_orc_record", null, true);
   }
 
+  @Disabled("Disable due to hive's orc conflict.")

Review Comment:
   Could we re-enable these tests in Spark 2.4 in Github CI?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: Rahil c/spark3.1 profile clone

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1190540165

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100",
       "triggerID" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 492ffdcca3f5e2351a106d09fb83a3debd3bb672 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927085106


##########
azure-pipelines.yml:
##########
@@ -89,10 +90,12 @@ stages:
     jobs:
       - job: UT_FT_1
         displayName: UT FT common & flink & UT client/spark-client
-        timeoutInMinutes: '120'
+        timeoutInMinutes: '150'

Review Comment:
   I think in general ive been seeing the azure ci go over the timeout at 120 min (outside of this pr), I can revert these changes but would it be safer to keep it? Or is this more of a concern of resource usuage for the azure ci in general?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927121257


##########
hudi-client/hudi-spark-client/pom.xml:
##########
@@ -48,10 +48,22 @@
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-core_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+        </exclusion>
+      </exclusions>
     </dependency>
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-sql_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.orc</groupId>
+          <artifactId>orc-core</artifactId>
+        </exclusion>
+      </exclusions>

Review Comment:
   Im not sure if this is more of test issue or production issue. For example ive seen orc related tests fail for this dependency conflict
   
   ```
   Java.lang.NoSuchMethodError: org.apache.orc.TypeDescription.createRowBatch(I)Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch;
   
   ```
   
   My understanding is that it seems to do with the hive 2 orc https://github.com/apache/hive/blob/rel/release-2.3.1/pom.xml  and the spark 3 orc https://github.com/apache/spark/blob/v3.1.3/pom.xml#L139 being different versions that dont work well together. 
   
   In the original spark 3.2 pr https://github.com/apache/hudi/pull/4752 the same orc issues were present and we made a call then to disable the orc related tests. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1193028506

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224",
       "triggerID" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c2f0b51c8bd1bbc2759ef017e01baaa033d975",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a4c2f0b51c8bd1bbc2759ef017e01baaa033d975",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c25f0b8643603faaf6d9ddb480240741b1590b78 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224) 
   * a4c2f0b51c8bd1bbc2759ef017e01baaa033d975 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1491005519

   the last status of this work is done in https://github.com/apache/hudi/pull/7327
   
   i'll close this one in favor of that


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r928050091


##########
hudi-client/hudi-spark-client/pom.xml:
##########
@@ -48,10 +48,22 @@
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-core_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+        </exclusion>
+      </exclusions>
     </dependency>
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-sql_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.orc</groupId>
+          <artifactId>orc-core</artifactId>
+        </exclusion>
+      </exclusions>

Review Comment:
   Got it, could you manually verify if ORC format still works with Spark bundle?



##########
azure-pipelines.yml:
##########
@@ -200,27 +223,22 @@ stages:
               mavenOptions: '-Xmx4g'
       - job: IT
         displayName: IT modules
-        timeoutInMinutes: '120'
+        timeoutInMinutes: '180'
         steps:
           - task: Maven@3
             displayName: maven install
+            continueOnError: true
+            retryCountOnTaskFailure: 2
             inputs:
               mavenPomFile: 'pom.xml'
               goals: 'clean install'
               options: $(MVN_OPTS_INSTALL) -Pintegration-tests
               publishJUnitResults: false
               jdkVersionOption: '1.8'
-          - task: Maven@3

Review Comment:
   then let's comment these lines out without deleting them to remind reenabling them again. 



##########
docker/compose/docker-compose_hadoop284_hive233_spark313.yml:
##########
@@ -0,0 +1,309 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: "3.3"
+
+services:
+
+  namenode:
+    image: rchertara/hudi-hadoop_2.8.4-namenode:image
+    hostname: namenode
+    container_name: namenode
+    environment:
+      - CLUSTER_NAME=hudi_hadoop284_hive232_spark313
+    ports:
+      - "50070:50070"
+      - "8020:8020"
+    env_file:
+      - ./hadoop.env
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://namenode:50070"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+  datanode1:
+    image: rchertara/hudi-hadoop_2.8.4-datanode:image
+    container_name: datanode1
+    hostname: datanode1
+    environment:
+      - CLUSTER_NAME=hudi_hadoop284_hive232_spark313
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "50075:50075"
+      - "50010:50010"
+    links:
+      - "namenode"
+      - "historyserver"
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://datanode1:50075"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    depends_on:
+      - namenode
+
+  historyserver:
+    image: rchertara/hudi-hadoop_2.8.4-history:image
+    hostname: historyserver
+    container_name: historyserver
+    environment:
+      - CLUSTER_NAME=hudi_hadoop284_hive232_spark313
+    depends_on:
+      - "namenode"
+    links:
+      - "namenode"
+    ports:
+      - "58188:8188"
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://historyserver:8188"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    env_file:
+      - ./hadoop.env
+    volumes:
+      - historyserver:/hadoop/yarn/timeline
+
+  hive-metastore-postgresql:
+    image: bde2020/hive-metastore-postgresql:2.3.0
+    volumes:
+      - hive-metastore-postgresql:/var/lib/postgresql
+    hostname: hive-metastore-postgresql
+    container_name: hive-metastore-postgresql
+
+  hivemetastore:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3:image
+    hostname: hivemetastore
+    container_name: hivemetastore
+    links:
+      - "hive-metastore-postgresql"
+      - "namenode"
+    env_file:
+      - ./hadoop.env
+    command: /opt/hive/bin/hive --service metastore
+    environment:
+      SERVICE_PRECONDITION: "namenode:50070 hive-metastore-postgresql:5432"
+    ports:
+      - "9083:9083"
+    healthcheck:
+      test: ["CMD", "nc", "-z", "hivemetastore", "9083"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    depends_on:
+      - "hive-metastore-postgresql"
+      - "namenode"
+
+  hiveserver:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3:image
+    hostname: hiveserver
+    container_name: hiveserver
+    env_file:
+      - ./hadoop.env
+    environment:
+      SERVICE_PRECONDITION: "hivemetastore:9083"
+    ports:
+      - "10000:10000"
+    depends_on:
+      - "hivemetastore"
+    links:
+      - "hivemetastore"
+      - "hive-metastore-postgresql"
+      - "namenode"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+
+  sparkmaster:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_3.1.3:image
+    hostname: sparkmaster
+    container_name: sparkmaster
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8080:8080"
+      - "7077:7077"
+    environment:
+      - INIT_DAEMON_STEP=setup_spark
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+
+  spark-worker-1:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_3.1.3:image
+    hostname: spark-worker-1
+    container_name: spark-worker-1
+    env_file:
+      - ./hadoop.env
+    depends_on:
+      - sparkmaster
+    ports:
+      - "8081:8081"
+    environment:
+      - "SPARK_MASTER=spark://sparkmaster:7077"
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+
+  zookeeper:
+    image: 'bitnami/zookeeper:3.4.12-r68'
+    hostname: zookeeper
+    container_name: zookeeper
+    ports:
+      - "2181:2181"
+    environment:
+      - ALLOW_ANONYMOUS_LOGIN=yes
+
+  kafka:
+    image: 'bitnami/kafka:2.0.0'
+    hostname: kafkabroker
+    container_name: kafkabroker
+    ports:
+      - "9092:9092"
+    environment:
+      - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
+      - ALLOW_PLAINTEXT_LISTENER=yes
+
+  presto-coordinator-1:
+    container_name: presto-coordinator-1
+    hostname: presto-coordinator-1
+    image: rchertara/hudi-hadoop_2.8.4-prestobase_0.271:image
+    ports:
+      - "8090:8090"
+    environment:
+      - PRESTO_JVM_MAX_HEAP=512M
+      - PRESTO_QUERY_MAX_MEMORY=1GB
+      - PRESTO_QUERY_MAX_MEMORY_PER_NODE=256MB
+      - PRESTO_QUERY_MAX_TOTAL_MEMORY_PER_NODE=384MB
+      - PRESTO_MEMORY_HEAP_HEADROOM_PER_NODE=100MB
+      - TERM=xterm
+    links:
+      - "hivemetastore"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: coordinator
+
+  presto-worker-1:
+    container_name: presto-worker-1
+    hostname: presto-worker-1
+    image: rchertara/hudi-hadoop_2.8.4-prestobase_0.271:image
+    depends_on: [ "presto-coordinator-1" ]
+    environment:
+      - PRESTO_JVM_MAX_HEAP=512M
+      - PRESTO_QUERY_MAX_MEMORY=1GB
+      - PRESTO_QUERY_MAX_MEMORY_PER_NODE=256MB
+      - PRESTO_QUERY_MAX_TOTAL_MEMORY_PER_NODE=384MB
+      - PRESTO_MEMORY_HEAP_HEADROOM_PER_NODE=100MB
+      - TERM=xterm
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: worker
+
+  trino-coordinator-1:
+    container_name: trino-coordinator-1
+    hostname: trino-coordinator-1
+    image: rchertara/hudi-hadoop_2.8.4-trinocoordinator_368:image
+    ports:
+      - "8091:8091"
+    links:
+      - "hivemetastore"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: http://trino-coordinator-1:8091 trino-coordinator-1
+
+  trino-worker-1:
+    container_name: trino-worker-1
+    hostname: trino-worker-1
+    image: rchertara/hudi-hadoop_2.8.4-trinoworker_368:image
+    depends_on: [ "trino-coordinator-1" ]
+    ports:
+      - "8092:8092"
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: http://trino-coordinator-1:8091 trino-worker-1
+
+  graphite:
+    container_name: graphite
+    hostname: graphite
+    image: graphiteapp/graphite-statsd
+    ports:
+      - 80:80
+      - 2003-2004:2003-2004
+      - 8126:8126
+
+  adhoc-1:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_3.1.3:image
+    hostname: adhoc-1
+    container_name: adhoc-1
+    env_file:
+      - ./hadoop.env
+    depends_on:
+      - sparkmaster
+    ports:
+      - '4040:4040'
+    environment:
+      - "SPARK_MASTER=spark://sparkmaster:7077"
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+      - "presto-coordinator-1"
+      - "trino-coordinator-1"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+
+  adhoc-2:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_3.1.3:image

Review Comment:
   I have the permission and I'll upload the images myself.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927012448


##########
azure-pipelines.yml:
##########
@@ -89,10 +90,12 @@ stages:
     jobs:
       - job: UT_FT_1
         displayName: UT FT common & flink & UT client/spark-client
-        timeoutInMinutes: '120'
+        timeoutInMinutes: '150'

Review Comment:
   Could you revert the unnecessary timeout change?



##########
azure-pipelines.yml:
##########
@@ -89,10 +90,12 @@ stages:
     jobs:
       - job: UT_FT_1
         displayName: UT FT common & flink & UT client/spark-client
-        timeoutInMinutes: '120'
+        timeoutInMinutes: '150'
         steps:
           - task: Maven@3
             displayName: maven install
+            continueOnError: true
+            retryCountOnTaskFailure: 1

Review Comment:
   Remove this all similar changes?



##########
docker/compose/docker-compose_hadoop284_hive233_spark313.yml:
##########
@@ -0,0 +1,309 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: "3.3"
+
+services:
+
+  namenode:
+    image: rchertara/hudi-hadoop_2.8.4-namenode:image
+    hostname: namenode
+    container_name: namenode
+    environment:
+      - CLUSTER_NAME=hudi_hadoop284_hive232_spark313
+    ports:
+      - "50070:50070"
+      - "8020:8020"
+    env_file:
+      - ./hadoop.env
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://namenode:50070"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+  datanode1:
+    image: rchertara/hudi-hadoop_2.8.4-datanode:image
+    container_name: datanode1
+    hostname: datanode1
+    environment:
+      - CLUSTER_NAME=hudi_hadoop284_hive232_spark313
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "50075:50075"
+      - "50010:50010"
+    links:
+      - "namenode"
+      - "historyserver"
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://datanode1:50075"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    depends_on:
+      - namenode
+
+  historyserver:
+    image: rchertara/hudi-hadoop_2.8.4-history:image
+    hostname: historyserver
+    container_name: historyserver
+    environment:
+      - CLUSTER_NAME=hudi_hadoop284_hive232_spark313
+    depends_on:
+      - "namenode"
+    links:
+      - "namenode"
+    ports:
+      - "58188:8188"
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://historyserver:8188"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    env_file:
+      - ./hadoop.env
+    volumes:
+      - historyserver:/hadoop/yarn/timeline
+
+  hive-metastore-postgresql:
+    image: bde2020/hive-metastore-postgresql:2.3.0
+    volumes:
+      - hive-metastore-postgresql:/var/lib/postgresql
+    hostname: hive-metastore-postgresql
+    container_name: hive-metastore-postgresql
+
+  hivemetastore:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3:image
+    hostname: hivemetastore
+    container_name: hivemetastore
+    links:
+      - "hive-metastore-postgresql"
+      - "namenode"
+    env_file:
+      - ./hadoop.env
+    command: /opt/hive/bin/hive --service metastore
+    environment:
+      SERVICE_PRECONDITION: "namenode:50070 hive-metastore-postgresql:5432"
+    ports:
+      - "9083:9083"
+    healthcheck:
+      test: ["CMD", "nc", "-z", "hivemetastore", "9083"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    depends_on:
+      - "hive-metastore-postgresql"
+      - "namenode"
+
+  hiveserver:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3:image
+    hostname: hiveserver
+    container_name: hiveserver
+    env_file:
+      - ./hadoop.env
+    environment:
+      SERVICE_PRECONDITION: "hivemetastore:9083"
+    ports:
+      - "10000:10000"
+    depends_on:
+      - "hivemetastore"
+    links:
+      - "hivemetastore"
+      - "hive-metastore-postgresql"
+      - "namenode"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+
+  sparkmaster:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_3.1.3:image
+    hostname: sparkmaster
+    container_name: sparkmaster
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8080:8080"
+      - "7077:7077"
+    environment:
+      - INIT_DAEMON_STEP=setup_spark
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+
+  spark-worker-1:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_3.1.3:image
+    hostname: spark-worker-1
+    container_name: spark-worker-1
+    env_file:
+      - ./hadoop.env
+    depends_on:
+      - sparkmaster
+    ports:
+      - "8081:8081"
+    environment:
+      - "SPARK_MASTER=spark://sparkmaster:7077"
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+
+  zookeeper:
+    image: 'bitnami/zookeeper:3.4.12-r68'
+    hostname: zookeeper
+    container_name: zookeeper
+    ports:
+      - "2181:2181"
+    environment:
+      - ALLOW_ANONYMOUS_LOGIN=yes
+
+  kafka:
+    image: 'bitnami/kafka:2.0.0'
+    hostname: kafkabroker
+    container_name: kafkabroker
+    ports:
+      - "9092:9092"
+    environment:
+      - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
+      - ALLOW_PLAINTEXT_LISTENER=yes
+
+  presto-coordinator-1:
+    container_name: presto-coordinator-1
+    hostname: presto-coordinator-1
+    image: rchertara/hudi-hadoop_2.8.4-prestobase_0.271:image
+    ports:
+      - "8090:8090"
+    environment:
+      - PRESTO_JVM_MAX_HEAP=512M
+      - PRESTO_QUERY_MAX_MEMORY=1GB
+      - PRESTO_QUERY_MAX_MEMORY_PER_NODE=256MB
+      - PRESTO_QUERY_MAX_TOTAL_MEMORY_PER_NODE=384MB
+      - PRESTO_MEMORY_HEAP_HEADROOM_PER_NODE=100MB
+      - TERM=xterm
+    links:
+      - "hivemetastore"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: coordinator
+
+  presto-worker-1:
+    container_name: presto-worker-1
+    hostname: presto-worker-1
+    image: rchertara/hudi-hadoop_2.8.4-prestobase_0.271:image
+    depends_on: [ "presto-coordinator-1" ]
+    environment:
+      - PRESTO_JVM_MAX_HEAP=512M
+      - PRESTO_QUERY_MAX_MEMORY=1GB
+      - PRESTO_QUERY_MAX_MEMORY_PER_NODE=256MB
+      - PRESTO_QUERY_MAX_TOTAL_MEMORY_PER_NODE=384MB
+      - PRESTO_MEMORY_HEAP_HEADROOM_PER_NODE=100MB
+      - TERM=xterm
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: worker
+
+  trino-coordinator-1:
+    container_name: trino-coordinator-1
+    hostname: trino-coordinator-1
+    image: rchertara/hudi-hadoop_2.8.4-trinocoordinator_368:image
+    ports:
+      - "8091:8091"
+    links:
+      - "hivemetastore"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: http://trino-coordinator-1:8091 trino-coordinator-1
+
+  trino-worker-1:
+    container_name: trino-worker-1
+    hostname: trino-worker-1
+    image: rchertara/hudi-hadoop_2.8.4-trinoworker_368:image
+    depends_on: [ "trino-coordinator-1" ]
+    ports:
+      - "8092:8092"
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: http://trino-coordinator-1:8091 trino-worker-1
+
+  graphite:
+    container_name: graphite
+    hostname: graphite
+    image: graphiteapp/graphite-statsd
+    ports:
+      - 80:80
+      - 2003-2004:2003-2004
+      - 8126:8126
+
+  adhoc-1:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_3.1.3:image
+    hostname: adhoc-1
+    container_name: adhoc-1
+    env_file:
+      - ./hadoop.env
+    depends_on:
+      - sparkmaster
+    ports:
+      - '4040:4040'
+    environment:
+      - "SPARK_MASTER=spark://sparkmaster:7077"
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+      - "presto-coordinator-1"
+      - "trino-coordinator-1"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+
+  adhoc-2:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_3.1.3:image

Review Comment:
   if the images are finalized, let's upload the images to apachehudi docker account and change the reference here.



##########
docker/demo/config/log4j.properties:
##########
@@ -25,6 +25,8 @@ log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}:
 # log level for this class is used to overwrite the root logger's log level, so that
 # the user can have different defaults for the shell and regular Spark apps.
 log4j.logger.org.apache.spark.repl.Main=WARN
+# Adjust Hudi internal logging levels
+log4j.logger.org.apache.hudi=DEBUG

Review Comment:
   nit: remove this?



##########
hudi-examples/hudi-examples-flink/src/test/java/org/apache/hudi/examples/quickstart/TestHoodieFlinkQuickstart.java:
##########
@@ -34,6 +34,7 @@
 /**
  * IT cases for Hoodie table source and sink.
  */
+

Review Comment:
   nit: revert empty line?



##########
hudi-client/hudi-spark-client/pom.xml:
##########
@@ -174,6 +194,12 @@
       <artifactId>awaitility</artifactId>
       <scope>test</scope>
     </dependency>
+    <dependency>
+      <groupId>com.thoughtworks.paranamer</groupId>
+      <artifactId>paranamer</artifactId>
+      <version>2.8</version>
+      <scope>test</scope>
+    </dependency>

Review Comment:
   How is this introduced?  Does it have a compatible OSS license?



##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java:
##########
@@ -76,6 +78,7 @@
  * NOTE: This class is invariant of the underlying file-format of the files being read
  */
 public class HoodieCopyOnWriteTableInputFormat extends HoodieTableInputFormat {
+  private static final Logger LOG = LogManager.getLogger(HoodieCopyOnWriteTableInputFormat.class);

Review Comment:
   Is this still needed?



##########
packaging/hudi-spark-bundle/pom.xml:
##########
@@ -95,6 +95,12 @@
                   <include>org.antlr:stringtemplate</include>
                   <include>org.apache.parquet:parquet-avro</include>
 
+                  <include>com.fasterxml.jackson.core:jackson-annotations</include>
+                  <include>com.fasterxml.jackson.core:jackson-core</include>
+                  <include>com.fasterxml.jackson.core:jackson-databind</include>
+                  <include>com.fasterxml.jackson.dataformat:jackson-dataformat-yaml</include>
+                  <include>com.fasterxml.jackson.module:jackson-module-scala_${scala.binary.version}</include>

Review Comment:
   wondering why we add this?



##########
azure-pipelines.yml:
##########
@@ -200,27 +223,22 @@ stages:
               mavenOptions: '-Xmx4g'
       - job: IT
         displayName: IT modules
-        timeoutInMinutes: '120'
+        timeoutInMinutes: '180'
         steps:
           - task: Maven@3
             displayName: maven install
+            continueOnError: true
+            retryCountOnTaskFailure: 2
             inputs:
               mavenPomFile: 'pom.xml'
               goals: 'clean install'
               options: $(MVN_OPTS_INSTALL) -Pintegration-tests
               publishJUnitResults: false
               jdkVersionOption: '1.8'
-          - task: Maven@3

Review Comment:
   Instead of deleting this, could you add a property to disable this task?  cc @xushiyan for help.



##########
azure-pipelines.yml:
##########
@@ -119,10 +126,12 @@ stages:
               mavenOptions: '-Xmx4g'
       - job: UT_FT_2
         displayName: FT client/spark-client
-        timeoutInMinutes: '120'
+        timeoutInMinutes: '150'

Review Comment:
   similar here and below.



##########
hudi-integ-test/prepare_integration_suite.sh:
##########
@@ -42,7 +42,7 @@ get_spark_command() {
   else
     scala=$scala
   fi
-  echo "spark-submit --packages org.apache.spark:spark-avro_${scala}:2.4.4 \
+  echo "spark-submit --packages org.apache.spark:spark-avro_${scala}:3.1.3 \

Review Comment:
   `--packages org.apache.spark:spark-avro_${scala}:3.1.3 \` is no longer needed.  We should delete that.



##########
hudi-client/hudi-spark-client/pom.xml:
##########
@@ -48,10 +48,22 @@
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-core_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+        </exclusion>
+      </exclusions>
     </dependency>
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-sql_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.orc</groupId>
+          <artifactId>orc-core</artifactId>
+        </exclusion>
+      </exclusions>

Review Comment:
   Will this break ORC support in Spark and Hudi?



##########
hudi-utilities/pom.xml:
##########
@@ -241,6 +245,17 @@
       </exclusions>
     </dependency>
 
+    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-hive_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>*</groupId>
+          <artifactId>*</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>

Review Comment:
   No point adding this since all artifacts are excluded?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927278345


##########
hudi-integ-test/prepare_integration_suite.sh:
##########
@@ -42,7 +42,7 @@ get_spark_command() {
   else
     scala=$scala
   fi
-  echo "spark-submit --packages org.apache.spark:spark-avro_${scala}:2.4.4 \
+  echo "spark-submit --packages org.apache.spark:spark-avro_${scala}:3.1.3 \

Review Comment:
   will remove 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan closed pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan closed pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile
URL: https://github.com/apache/hudi/pull/6151


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1193057004

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224",
       "triggerID" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c2f0b51c8bd1bbc2759ef017e01baaa033d975",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10233",
       "triggerID" : "a4c2f0b51c8bd1bbc2759ef017e01baaa033d975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d733135839a55cc6d51cd8806446c4f802e6e63",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10237",
       "triggerID" : "0d733135839a55cc6d51cd8806446c4f802e6e63",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d733135839a55cc6d51cd8806446c4f802e6e63 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10237) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r928048532


##########
azure-pipelines.yml:
##########
@@ -89,10 +90,12 @@ stages:
     jobs:
       - job: UT_FT_1
         displayName: UT FT common & flink & UT client/spark-client
-        timeoutInMinutes: '120'
+        timeoutInMinutes: '150'
         steps:
           - task: Maven@3
             displayName: maven install
+            continueOnError: true
+            retryCountOnTaskFailure: 1

Review Comment:
   Understood.  What you state only applies to your PR which affects most tests.  For other PRs, it's good to fail early on legitimate test errors so that the CI resources can be used to run for other PRs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r928048220


##########
azure-pipelines.yml:
##########
@@ -89,10 +90,12 @@ stages:
     jobs:
       - job: UT_FT_1
         displayName: UT FT common & flink & UT client/spark-client
-        timeoutInMinutes: '120'
+        timeoutInMinutes: '150'

Review Comment:
   I see the successful CI runs finished within 2 hours so there is no need to increase the timeout.  We can always retry failed jobs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1190883248

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100",
       "triggerID" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10103",
       "triggerID" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b37c8dfa6205bf24b83b4a84816690921d45226a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10103) 
   * 37f1e305758ef1827720dd74c90b52abc1ffa67f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1490645166

   > @yihua : should this PR be closed in light of #6117 ?
   
   There were blockers to make *Spark 3.2* as the default profile, while making *Spark 3.1* as the default profile was more tangible.  @rahil-c is it still true?  If so, we can close this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: Rahil c/spark3.1 profile clone

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1190611467

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100",
       "triggerID" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 492ffdcca3f5e2351a106d09fb83a3debd3bb672 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100) 
   * b37c8dfa6205bf24b83b4a84816690921d45226a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: Rahil c/spark3.1 profile clone

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1190735521

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100",
       "triggerID" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10103",
       "triggerID" : "b37c8dfa6205bf24b83b4a84816690921d45226a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 492ffdcca3f5e2351a106d09fb83a3debd3bb672 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10100) 
   * b37c8dfa6205bf24b83b4a84816690921d45226a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10103) 
   * 37f1e305758ef1827720dd74c90b52abc1ffa67f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1191936281

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 37f1e305758ef1827720dd74c90b52abc1ffa67f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927112047


##########
hudi-client/hudi-spark-client/pom.xml:
##########
@@ -174,6 +194,12 @@
       <artifactId>awaitility</artifactId>
       <scope>test</scope>
     </dependency>
+    <dependency>
+      <groupId>com.thoughtworks.paranamer</groupId>
+      <artifactId>paranamer</artifactId>
+      <version>2.8</version>
+      <scope>test</scope>
+    </dependency>

Review Comment:
   My assumption is you are referring to this dependency https://github.com/paul-hammant/paranamer?
   
       <dependency>
         <groupId>com.thoughtworks.paranamer</groupId>
         <artifactId>paranamer</artifactId>
         <version>2.8</version>
         <scope>test</scope>
       </dependency>
   
   
   this change was made by @xushiyan https://issues.apache.org/jira/browse/HUDI-3088 #4752 but i think we need this newer 2.8 for spark 3 if its the default profile. In master we have this https://github.com/apache/hudi/search?q=paranamer in the licenses we seem to be referring to paranamer 2.7
   
   here are some jiras ive seen discussing the issue https://github.com/apache/beam/pull/17424  https://issues.apache.org/jira/browse/BEAM-14345 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927276458


##########
hudi-client/hudi-spark-client/pom.xml:
##########
@@ -48,10 +48,22 @@
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-core_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-client</artifactId>
+        </exclusion>
+      </exclusions>
     </dependency>
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-sql_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.orc</groupId>
+          <artifactId>orc-core</artifactId>
+        </exclusion>
+      </exclusions>

Review Comment:
   cc @xushiyan are you familiar with the specifics of this conflict? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1193011596

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224",
       "triggerID" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c25f0b8643603faaf6d9ddb480240741b1590b78 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1190782837

   Refer to this green azure ci run: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=10078&view=logs&j=47cf0f2a-901e-5ca1-f652-e53b6abbf660&t=35a68570-76b2-5f68-d601-1bf50f7fbd97
   
   all sections passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: Rahil c/spark3.1 profile clone

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1190493239

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "492ffdcca3f5e2351a106d09fb83a3debd3bb672",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 492ffdcca3f5e2351a106d09fb83a3debd3bb672 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: Rahil c/spark3.1 profile clone

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r925871993


##########
docker/hoodie/hadoop/build_docker_images.sh:
##########
@@ -0,0 +1,19 @@
+docker build base -t apachehudi/hudi-hadoop_2.8.4-base

Review Comment:
   have to upload final docker images to apachehudi dockerhub



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927114817


##########
hudi-utilities/pom.xml:
##########
@@ -241,6 +245,17 @@
       </exclusions>
     </dependency>
 
+    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-hive_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>*</groupId>
+          <artifactId>*</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>

Review Comment:
   I can try removing again and testing without it but i think for some reason this helped resolve some test failures in this module. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927086712


##########
azure-pipelines.yml:
##########
@@ -89,10 +90,12 @@ stages:
     jobs:
       - job: UT_FT_1
         displayName: UT FT common & flink & UT client/spark-client
-        timeoutInMinutes: '120'
+        timeoutInMinutes: '150'
         steps:
           - task: Maven@3
             displayName: maven install
+            continueOnError: true
+            retryCountOnTaskFailure: 1

Review Comment:
   I still think that having the `continueOnError` and `retryCount` is useful, otherwise in general people still have to keep triggering azure ci to see the next set of failures, or if theres some azure agent connection issue then have to rerun which also queues up the many builds. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1193065573

   As we discussed, there is a risk of landing this if there are any changes on the bundles at this point.  Before landing the PR:
   (1) we should try to avoid any dependency change for production code and bundling.  Adjusting dependency for tests is ok and should be limited to tests only.  We shouldn't change the compile pom for merely fixing tests.
   (2) for any disabled tests in Azure CI, try to find a way to run them in Github CI to maintain the coverage.
   (3) make sure root pom changes for switching profiles do not change any behavior for building all bundles.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
xushiyan commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1221424985

   @rahil-c close this? we are going to use spark 3.2 or 3.3 as default?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927272053


##########
packaging/hudi-spark-bundle/pom.xml:
##########
@@ -95,6 +95,12 @@
                   <include>org.antlr:stringtemplate</include>
                   <include>org.apache.parquet:parquet-avro</include>
 
+                  <include>com.fasterxml.jackson.core:jackson-annotations</include>
+                  <include>com.fasterxml.jackson.core:jackson-core</include>
+                  <include>com.fasterxml.jackson.core:jackson-databind</include>
+                  <include>com.fasterxml.jackson.dataformat:jackson-dataformat-yaml</include>
+                  <include>com.fasterxml.jackson.module:jackson-module-scala_${scala.binary.version}</include>

Review Comment:
   When running the IT tests with the spark3 was running into this dependency conflict below 
   ```
   Exception in thread "main" java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.JsonMappingException.<init>(Ljava/io/Closeable;Ljava/lang/String;)V
   	at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)
   	at com.fasterxml.jackson.module.scala.JacksonModule.setupModule$(JacksonModule.scala:46)
   	at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:17)
   	at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:718)
   	at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
   	at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
   	at org.apache.spark.SparkContext.withScope(SparkContext.scala:792)
   	at org.apache.spark.SparkContext.parallelize(SparkContext.scala:809)
   	at org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:136)
   	at HoodieJavaApp.run(HoodieJavaApp.java:141)
   	at HoodieJavaApp.main(HoodieJavaApp.java:111)
   ```
   so in hudi-spark pom we define the following depedency which should ideally provide the class and not result in this class not found
   ```
      <dependency>
         <groupId>com.fasterxml.jackson.module</groupId>
         <artifactId>jackson-module-scala_${scala.binary.version}</artifactId>
         <version>${fasterxml.jackson.module.scala.version}</version>
       </dependency>
   
   ```
   this jackson module scala contains several jackson dependencies like jackson-data bind etc.
   From the mvn logs however it seems it was not getting included in the bundle in several areas and was being excluded. So in order to get past this conflict added it in the bundle. 
   
   ```
   [INFO] Excluding com.fasterxml.jackson.module:jackson-module-scala_2.12:jar:2.10.0 from the shaded jar.
   
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927266490


##########
hudi-utilities/pom.xml:
##########
@@ -241,6 +245,17 @@
       </exclusions>
     </dependency>
 
+    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-hive_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>*</groupId>
+          <artifactId>*</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>

Review Comment:
   Removing this dependency seems to cause failures 
   ```
   [ERROR] testBuildHiveSyncConfig{boolean}[1]  Time elapsed: 0.017 s  <<< ERROR!
   java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveExternalCatalog
   	at org.apache.hudi.DataSourceUtils.buildHiveSyncConfig(DataSourceUtils.java:322)
   	at org.apache.hudi.TestDataSourceUtils.testBuildHiveSyncConfig(TestDataSourceUtils.java:261)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   
   ```
   For now opting to keep this dependency.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927094990


##########
docker/compose/docker-compose_hadoop284_hive233_spark313.yml:
##########
@@ -0,0 +1,309 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: "3.3"
+
+services:
+
+  namenode:
+    image: rchertara/hudi-hadoop_2.8.4-namenode:image
+    hostname: namenode
+    container_name: namenode
+    environment:
+      - CLUSTER_NAME=hudi_hadoop284_hive232_spark313
+    ports:
+      - "50070:50070"
+      - "8020:8020"
+    env_file:
+      - ./hadoop.env
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://namenode:50070"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+  datanode1:
+    image: rchertara/hudi-hadoop_2.8.4-datanode:image
+    container_name: datanode1
+    hostname: datanode1
+    environment:
+      - CLUSTER_NAME=hudi_hadoop284_hive232_spark313
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "50075:50075"
+      - "50010:50010"
+    links:
+      - "namenode"
+      - "historyserver"
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://datanode1:50075"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    depends_on:
+      - namenode
+
+  historyserver:
+    image: rchertara/hudi-hadoop_2.8.4-history:image
+    hostname: historyserver
+    container_name: historyserver
+    environment:
+      - CLUSTER_NAME=hudi_hadoop284_hive232_spark313
+    depends_on:
+      - "namenode"
+    links:
+      - "namenode"
+    ports:
+      - "58188:8188"
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://historyserver:8188"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    env_file:
+      - ./hadoop.env
+    volumes:
+      - historyserver:/hadoop/yarn/timeline
+
+  hive-metastore-postgresql:
+    image: bde2020/hive-metastore-postgresql:2.3.0
+    volumes:
+      - hive-metastore-postgresql:/var/lib/postgresql
+    hostname: hive-metastore-postgresql
+    container_name: hive-metastore-postgresql
+
+  hivemetastore:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3:image
+    hostname: hivemetastore
+    container_name: hivemetastore
+    links:
+      - "hive-metastore-postgresql"
+      - "namenode"
+    env_file:
+      - ./hadoop.env
+    command: /opt/hive/bin/hive --service metastore
+    environment:
+      SERVICE_PRECONDITION: "namenode:50070 hive-metastore-postgresql:5432"
+    ports:
+      - "9083:9083"
+    healthcheck:
+      test: ["CMD", "nc", "-z", "hivemetastore", "9083"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    depends_on:
+      - "hive-metastore-postgresql"
+      - "namenode"
+
+  hiveserver:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3:image
+    hostname: hiveserver
+    container_name: hiveserver
+    env_file:
+      - ./hadoop.env
+    environment:
+      SERVICE_PRECONDITION: "hivemetastore:9083"
+    ports:
+      - "10000:10000"
+    depends_on:
+      - "hivemetastore"
+    links:
+      - "hivemetastore"
+      - "hive-metastore-postgresql"
+      - "namenode"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+
+  sparkmaster:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_3.1.3:image
+    hostname: sparkmaster
+    container_name: sparkmaster
+    env_file:
+      - ./hadoop.env
+    ports:
+      - "8080:8080"
+      - "7077:7077"
+    environment:
+      - INIT_DAEMON_STEP=setup_spark
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+
+  spark-worker-1:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_3.1.3:image
+    hostname: spark-worker-1
+    container_name: spark-worker-1
+    env_file:
+      - ./hadoop.env
+    depends_on:
+      - sparkmaster
+    ports:
+      - "8081:8081"
+    environment:
+      - "SPARK_MASTER=spark://sparkmaster:7077"
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+
+  zookeeper:
+    image: 'bitnami/zookeeper:3.4.12-r68'
+    hostname: zookeeper
+    container_name: zookeeper
+    ports:
+      - "2181:2181"
+    environment:
+      - ALLOW_ANONYMOUS_LOGIN=yes
+
+  kafka:
+    image: 'bitnami/kafka:2.0.0'
+    hostname: kafkabroker
+    container_name: kafkabroker
+    ports:
+      - "9092:9092"
+    environment:
+      - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
+      - ALLOW_PLAINTEXT_LISTENER=yes
+
+  presto-coordinator-1:
+    container_name: presto-coordinator-1
+    hostname: presto-coordinator-1
+    image: rchertara/hudi-hadoop_2.8.4-prestobase_0.271:image
+    ports:
+      - "8090:8090"
+    environment:
+      - PRESTO_JVM_MAX_HEAP=512M
+      - PRESTO_QUERY_MAX_MEMORY=1GB
+      - PRESTO_QUERY_MAX_MEMORY_PER_NODE=256MB
+      - PRESTO_QUERY_MAX_TOTAL_MEMORY_PER_NODE=384MB
+      - PRESTO_MEMORY_HEAP_HEADROOM_PER_NODE=100MB
+      - TERM=xterm
+    links:
+      - "hivemetastore"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: coordinator
+
+  presto-worker-1:
+    container_name: presto-worker-1
+    hostname: presto-worker-1
+    image: rchertara/hudi-hadoop_2.8.4-prestobase_0.271:image
+    depends_on: [ "presto-coordinator-1" ]
+    environment:
+      - PRESTO_JVM_MAX_HEAP=512M
+      - PRESTO_QUERY_MAX_MEMORY=1GB
+      - PRESTO_QUERY_MAX_MEMORY_PER_NODE=256MB
+      - PRESTO_QUERY_MAX_TOTAL_MEMORY_PER_NODE=384MB
+      - PRESTO_MEMORY_HEAP_HEADROOM_PER_NODE=100MB
+      - TERM=xterm
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: worker
+
+  trino-coordinator-1:
+    container_name: trino-coordinator-1
+    hostname: trino-coordinator-1
+    image: rchertara/hudi-hadoop_2.8.4-trinocoordinator_368:image
+    ports:
+      - "8091:8091"
+    links:
+      - "hivemetastore"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: http://trino-coordinator-1:8091 trino-coordinator-1
+
+  trino-worker-1:
+    container_name: trino-worker-1
+    hostname: trino-worker-1
+    image: rchertara/hudi-hadoop_2.8.4-trinoworker_368:image
+    depends_on: [ "trino-coordinator-1" ]
+    ports:
+      - "8092:8092"
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+    command: http://trino-coordinator-1:8091 trino-worker-1
+
+  graphite:
+    container_name: graphite
+    hostname: graphite
+    image: graphiteapp/graphite-statsd
+    ports:
+      - 80:80
+      - 2003-2004:2003-2004
+      - 8126:8126
+
+  adhoc-1:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_3.1.3:image
+    hostname: adhoc-1
+    container_name: adhoc-1
+    env_file:
+      - ./hadoop.env
+    depends_on:
+      - sparkmaster
+    ports:
+      - '4040:4040'
+    environment:
+      - "SPARK_MASTER=spark://sparkmaster:7077"
+    links:
+      - "hivemetastore"
+      - "hiveserver"
+      - "hive-metastore-postgresql"
+      - "namenode"
+      - "presto-coordinator-1"
+      - "trino-coordinator-1"
+    volumes:
+      - ${HUDI_WS}:/var/hoodie/ws
+
+  adhoc-2:
+    image: rchertara/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_3.1.3:image

Review Comment:
   for the `apachehudi` account do i need a special acccess to load images? Or is there a simple way to transfer the images from my account to apache hudi account 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #6151: [HUDI-4429] Make Spark3.1 the default profile

Posted by GitBox <gi...@apache.org>.
rahil-c commented on code in PR #6151:
URL: https://github.com/apache/hudi/pull/6151#discussion_r927266490


##########
hudi-utilities/pom.xml:
##########
@@ -241,6 +245,17 @@
       </exclusions>
     </dependency>
 
+    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-hive_${scala.binary.version}</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>*</groupId>
+          <artifactId>*</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>

Review Comment:
   Removing this dependency seems to cause failures 
   For now opting to keep this dependency.
   
   [ERROR] testBuildHiveSyncConfig{boolean}[1]  Time elapsed: 0.017 s  <<< ERROR!
   java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveExternalCatalog
   	at org.apache.hudi.DataSourceUtils.buildHiveSyncConfig(DataSourceUtils.java:322)
   	at org.apache.hudi.TestDataSourceUtils.testBuildHiveSyncConfig(TestDataSourceUtils.java:261)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1192990862

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224",
       "triggerID" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 37f1e305758ef1827720dd74c90b52abc1ffa67f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105) 
   * c25f0b8643603faaf6d9ddb480240741b1590b78 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1192989064

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 37f1e305758ef1827720dd74c90b52abc1ffa67f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105) 
   * c25f0b8643603faaf6d9ddb480240741b1590b78 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6151:
URL: https://github.com/apache/hudi/pull/6151#issuecomment-1193029452

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10105",
       "triggerID" : "37f1e305758ef1827720dd74c90b52abc1ffa67f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224",
       "triggerID" : "c25f0b8643603faaf6d9ddb480240741b1590b78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c2f0b51c8bd1bbc2759ef017e01baaa033d975",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10233",
       "triggerID" : "a4c2f0b51c8bd1bbc2759ef017e01baaa033d975",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d733135839a55cc6d51cd8806446c4f802e6e63",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0d733135839a55cc6d51cd8806446c4f802e6e63",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c25f0b8643603faaf6d9ddb480240741b1590b78 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10224) 
   * a4c2f0b51c8bd1bbc2759ef017e01baaa033d975 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10233) 
   * 0d733135839a55cc6d51cd8806446c4f802e6e63 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org