You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/21 15:27:07 UTC

[GitHub] [hudi] dongkelun opened a new pull request #3517: Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

dongkelun opened a new pull request #3517:
URL: https://github.com/apache/hudi/pull/3517


   ## What is the purpose of the pull request
   Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only
   
   ## Verify this pull request
   
   Unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#issuecomment-903133286


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1851",
       "triggerID" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * df7b426190ea6c583d99fb67ca8adad93edf25f0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1851) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#issuecomment-903133286


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1851",
       "triggerID" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "240b94a01bc9605a34871827c555dc7c26b1b4d8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1856",
       "triggerID" : "240b94a01bc9605a34871827c555dc7c26b1b4d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * df7b426190ea6c583d99fb67ca8adad93edf25f0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1851) 
   * 240b94a01bc9605a34871827c555dc7c26b1b4d8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1856) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on a change in pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#discussion_r712645466



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala
##########
@@ -196,9 +196,11 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Runnab
   }
 
   private def isEqualToTarget(targetColumnName: String, sourceExpression: Expression): Boolean = {
+    val sourceColNameMap = sourceDFOutput.map(attr => (attr.name.toLowerCase, attr.name)).toMap
+
     sourceExpression match {
-      case attr: AttributeReference if attr.name.equalsIgnoreCase(targetColumnName) => true
-      case Cast(attr: AttributeReference, _, _) if attr.name.equalsIgnoreCase(targetColumnName) => true
+      case attr: AttributeReference if sourceColNameMap(attr.name.toLowerCase).equals(targetColumnName) => true

Review comment:
       Can we use `sparkSession.sessionState.conf.resolver ` to compare the column name?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#issuecomment-903133286


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1851",
       "triggerID" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * df7b426190ea6c583d99fb67ca8adad93edf25f0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1851) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on a change in pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#discussion_r713082521



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala
##########
@@ -196,9 +196,11 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Runnab
   }
 
   private def isEqualToTarget(targetColumnName: String, sourceExpression: Expression): Boolean = {
+    val sourceColNameMap = sourceDFOutput.map(attr => (attr.name.toLowerCase, attr.name)).toMap
+
     sourceExpression match {
-      case attr: AttributeReference if attr.name.equalsIgnoreCase(targetColumnName) => true
-      case Cast(attr: AttributeReference, _, _) if attr.name.equalsIgnoreCase(targetColumnName) => true
+      case attr: AttributeReference if sourceColNameMap(attr.name.toLowerCase).equals(targetColumnName) => true

Review comment:
       Make sense to me. +1  for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3517: Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#issuecomment-903133286


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * df7b426190ea6c583d99fb67ca8adad93edf25f0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#issuecomment-903133286


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1851",
       "triggerID" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "240b94a01bc9605a34871827c555dc7c26b1b4d8",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1856",
       "triggerID" : "240b94a01bc9605a34871827c555dc7c26b1b4d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 240b94a01bc9605a34871827c555dc7c26b1b4d8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1856) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
dongkelun commented on pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#issuecomment-907607706


   @pengzhiwei2018 Hi,can you please take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 merged pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 merged pull request #3517:
URL: https://github.com/apache/hudi/pull/3517


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#discussion_r713041105



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala
##########
@@ -196,9 +196,11 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Runnab
   }
 
   private def isEqualToTarget(targetColumnName: String, sourceExpression: Expression): Boolean = {
+    val sourceColNameMap = sourceDFOutput.map(attr => (attr.name.toLowerCase, attr.name)).toMap
+
     sourceExpression match {
-      case attr: AttributeReference if attr.name.equalsIgnoreCase(targetColumnName) => true
-      case Cast(attr: AttributeReference, _, _) if attr.name.equalsIgnoreCase(targetColumnName) => true
+      case attr: AttributeReference if sourceColNameMap(attr.name.toLowerCase).equals(targetColumnName) => true

Review comment:
       Hi,is it like this?
   ```scala
   val resolver = sparkSession.sessionState.conf.resolver
   case attr: AttributeReference if resolver(attr.name, targetColumnName) => true
   ```
   I'm not sure if I understand,resolver is not case sensitive when comparing equality.However, the comparison of equality here must be case sensitive.Therefore, use sourceColNameMap(attr.name.toLowerCase) to obtain the original column name of source table without case conversion,Then compare with targetColumnName for equality.If not, add the corresponding column name with withColumn later. It is case sensitive because sourceDF is case sensitive when writing data.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#issuecomment-903133286


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1851",
       "triggerID" : "df7b426190ea6c583d99fb67ca8adad93edf25f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "240b94a01bc9605a34871827c555dc7c26b1b4d8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "240b94a01bc9605a34871827c555dc7c26b1b4d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * df7b426190ea6c583d99fb67ca8adad93edf25f0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1851) 
   * 240b94a01bc9605a34871827c555dc7c26b1b4d8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3517: [HUDI-2343]Fix the exception for mergeInto when the primaryKey and preCombineField of source table and target table differ in case only

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3517:
URL: https://github.com/apache/hudi/pull/3517#discussion_r713041105



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala
##########
@@ -196,9 +196,11 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Runnab
   }
 
   private def isEqualToTarget(targetColumnName: String, sourceExpression: Expression): Boolean = {
+    val sourceColNameMap = sourceDFOutput.map(attr => (attr.name.toLowerCase, attr.name)).toMap
+
     sourceExpression match {
-      case attr: AttributeReference if attr.name.equalsIgnoreCase(targetColumnName) => true
-      case Cast(attr: AttributeReference, _, _) if attr.name.equalsIgnoreCase(targetColumnName) => true
+      case attr: AttributeReference if sourceColNameMap(attr.name.toLowerCase).equals(targetColumnName) => true

Review comment:
       Hi,is it like this?
   ```scala
   val resolver = sparkSession.sessionState.conf.resolver
   case attr: AttributeReference if resolver(attr.name, targetColumnName) => true
   ```
   I'm not sure if I understand,resolver is not case sensitive when comparing equality.However, the comparison of equality here must be case sensitive.Therefore, use sourceColNameMap(attr.name.toLowerCase) to obtain the original column name of source table without case conversion,Then compare with targetcolumnname for equality.If not, add the corresponding column name with withcolumn later. It is case sensitive because sourceDF is case sensitive when writing data.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org