You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/05 15:11:10 UTC

[GitHub] [hudi] dongkelun opened a new pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

dongkelun opened a new pull request #3415:
URL: https://github.com/apache/hudi/pull/3415


   
   ## What is the purpose of the pull request
   
   Support column name matching for insert * and update set * in merge into when sourceTable's columns contains all targetTable's columns
   
   ## How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 10b2dc9c80f373a90f0140507fb20b85dfcf30d5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-896330648


   @pengzhiwei2018 can you please take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r686836427



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,24 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(attr => attr.name)
+
+          if(targetOutputWithoutMetaFields.filter(attr => resolvedSourceColumnNamesWithoutMetaFields.contains(attr.name)).length

Review comment:
       Like this,is that ok?
   
   `val targetColumnNamesWithoutMetaFields = targetOutputWithoutMetaFields.map(_.name)
   if(targetColumnNamesWithoutMetaFields.toSet.subsetOf(resolvedSourceColumnNamesWithoutMetaFields.toSet)){
   `




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r687612451



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,25 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(_.name)
+          val targetColumnNamesWithoutMetaFields = targetOutputWithoutMetaFields.map(_.name)
+
+          if(targetColumnNamesWithoutMetaFields.toSet.subsetOf(resolvedSourceColumnNamesWithoutMetaFields.toSet)){
+            //If sourceTable's columns contains all targetTable's columns,
+            //We fill assign all the source fields to the target fields by column name matching.
+            val sourceColNameAttrMap = resolvedSourceOutputWithoutMetaFields.map(attr => (attr.name,attr)).toMap

Review comment:
       need a whitespace after ","

##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,25 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(_.name)
+          val targetColumnNamesWithoutMetaFields = targetOutputWithoutMetaFields.map(_.name)
+
+          if(targetColumnNamesWithoutMetaFields.toSet.subsetOf(resolvedSourceColumnNamesWithoutMetaFields.toSet)){
+            //If sourceTable's columns contains all targetTable's columns,
+            //We fill assign all the source fields to the target fields by column name matching.
+            val sourceColNameAttrMap = resolvedSourceOutputWithoutMetaFields.map(attr => (attr.name,attr)).toMap
+            targetOutputWithoutMetaFields.map(targetAttr => {
+              val sourceAttr = sourceColNameAttrMap.get(targetAttr.name).get
+              Assignment(targetAttr, sourceAttr)

Review comment:
       `sourceColNameAttrMap(targetAttr.name)` may be better




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570",
       "triggerID" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f8fb54e42c50bd8cfa3dd8be161f0d924b581081 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570",
       "triggerID" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f8fb54e42c50bd8cfa3dd8be161f0d924b581081 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570) 
   * df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r686836427



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,24 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(attr => attr.name)
+
+          if(targetOutputWithoutMetaFields.filter(attr => resolvedSourceColumnNamesWithoutMetaFields.contains(attr.name)).length

Review comment:
       Like this,is that ok?
   
   ```scala
   val targetColumnNamesWithoutMetaFields = targetOutputWithoutMetaFields.map(_.name)
   if(targetColumnNamesWithoutMetaFields.toSet.subsetOf(resolvedSourceColumnNamesWithoutMetaFields.toSet)){
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-896454219


   Hi @dongkelun , Thanks for the contribution for this. Overall LGTM except some minor optimize. And also you can run the test case in spark3  by the follow command:
   
   > mvn clean install -DskipTests -Pspark3
   > mvn test -Punit-tests -Pspark3 -pl hudi-spark-datasource/hudi-spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570",
       "triggerID" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1659",
       "triggerID" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e89deea5859150123d84b248affdd5355ddeb82a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e89deea5859150123d84b248affdd5355ddeb82a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1659) 
   * e89deea5859150123d84b248affdd5355ddeb82a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r687624856



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,25 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(_.name)
+          val targetColumnNamesWithoutMetaFields = targetOutputWithoutMetaFields.map(_.name)
+
+          if(targetColumnNamesWithoutMetaFields.toSet.subsetOf(resolvedSourceColumnNamesWithoutMetaFields.toSet)){
+            //If sourceTable's columns contains all targetTable's columns,
+            //We fill assign all the source fields to the target fields by column name matching.
+            val sourceColNameAttrMap = resolvedSourceOutputWithoutMetaFields.map(attr => (attr.name,attr)).toMap
+            targetOutputWithoutMetaFields.map(targetAttr => {
+              val sourceAttr = sourceColNameAttrMap.get(targetAttr.name).get
+              Assignment(targetAttr, sourceAttr)

Review comment:
       Okay, thanks for the guidance. It's really better




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570",
       "triggerID" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1659",
       "triggerID" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e89deea5859150123d84b248affdd5355ddeb82a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1683",
       "triggerID" : "e89deea5859150123d84b248affdd5355ddeb82a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1659) 
   * e89deea5859150123d84b248affdd5355ddeb82a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1683) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r686883225



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,24 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(attr => attr.name)
+
+          if(targetOutputWithoutMetaFields.filter(attr => resolvedSourceColumnNamesWithoutMetaFields.contains(attr.name)).length
+            == targetOutputWithoutMetaFields.length) {
+            //If sourceTable's columns contains all targetTable's columns,
+            //We fill assign all the source fields to the target fields by column name matching.
+            targetOutputWithoutMetaFields.map(targetAttr => {
+              val sourceAttr = resolvedSourceOutputWithoutMetaFields.filter(attr => attr.name.equals(targetAttr.name)).head

Review comment:
       Hello, is that right?
   
   `val sourceColNameAttrMap = resolvedSourceOutputWithoutMetaFields.map(attr => (attr.name,attr)).toMap  
        val sourceAttr = sourceColNameAttrMap.get(targetAttr.name).get
   `




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570",
       "triggerID" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1659",
       "triggerID" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1659) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r686883225



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,24 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(attr => attr.name)
+
+          if(targetOutputWithoutMetaFields.filter(attr => resolvedSourceColumnNamesWithoutMetaFields.contains(attr.name)).length
+            == targetOutputWithoutMetaFields.length) {
+            //If sourceTable's columns contains all targetTable's columns,
+            //We fill assign all the source fields to the target fields by column name matching.
+            targetOutputWithoutMetaFields.map(targetAttr => {
+              val sourceAttr = resolvedSourceOutputWithoutMetaFields.filter(attr => attr.name.equals(targetAttr.name)).head

Review comment:
       Hello, is that right?
   
   `val sourceColNameAttrMap = resolvedSourceOutputWithoutMetaFields.map(attr => (attr.name,attr)).toMap
    val sourceAttr = sourceColNameAttrMap.get(targetAttr.name).get
   `




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r686883225



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,24 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(attr => attr.name)
+
+          if(targetOutputWithoutMetaFields.filter(attr => resolvedSourceColumnNamesWithoutMetaFields.contains(attr.name)).length
+            == targetOutputWithoutMetaFields.length) {
+            //If sourceTable's columns contains all targetTable's columns,
+            //We fill assign all the source fields to the target fields by column name matching.
+            targetOutputWithoutMetaFields.map(targetAttr => {
+              val sourceAttr = resolvedSourceOutputWithoutMetaFields.filter(attr => attr.name.equals(targetAttr.name)).head

Review comment:
       Hello, is that right?
   
   `val sourceColNameAttrMap = resolvedSourceOutputWithoutMetaFields.map(attr => (attr.name,attr)).toMap
   
    val sourceAttr = sourceColNameAttrMap.get(targetAttr.name).get
   `




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570",
       "triggerID" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1659",
       "triggerID" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f8fb54e42c50bd8cfa3dd8be161f0d924b581081 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570) 
   * df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1659) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 10b2dc9c80f373a90f0140507fb20b85dfcf30d5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r686447091



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,24 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(attr => attr.name)
+
+          if(targetOutputWithoutMetaFields.filter(attr => resolvedSourceColumnNamesWithoutMetaFields.contains(attr.name)).length

Review comment:
       Can we test the Equality using a Set?

##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,24 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(attr => attr.name)
+
+          if(targetOutputWithoutMetaFields.filter(attr => resolvedSourceColumnNamesWithoutMetaFields.contains(attr.name)).length
+            == targetOutputWithoutMetaFields.length) {
+            //If sourceTable's columns contains all targetTable's columns,
+            //We fill assign all the source fields to the target fields by column name matching.
+            targetOutputWithoutMetaFields.map(targetAttr => {
+              val sourceAttr = resolvedSourceOutputWithoutMetaFields.filter(attr => attr.name.equals(targetAttr.name)).head

Review comment:
       Can we fetch the sourceAttr by Map?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-898215349


   LGTM, Thanks for the contribution @dongkelun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-898216727


   > > Hi @dongkelun , Thanks for the contribution for this. Overall LGTM except some minor optimize. And also you can run the test case in spark3 by the follow command:
   > > > mvn clean install -DskipTests -Pspark3
   > > > mvn test -Punit-tests -Pspark3 -pl hudi-spark-datasource/hudi-spark
   > 
   > Hi, @pengzhiwei2018 The result is:'Tests: succeeded 56, failed 6, canceled 0, ignored 0, pending 0'.Two of them are ORC exceptions, and the other three I think are due to time zone differences, but I don't know how to solve the time zone difference, and the other one is the mismatch of exception information. The detailed results are as follows:
   > 
   > `1、Test Different Type of Partition Column *** FAILED *** Expected Array([1,a1,10,2021-05-20 00:00:00], [2,a2,10,2021-05-20 00:00:00]), but got Array([1,a1,10.0,2021-05-20 15:00:00], [2,a2,10.0,2021-05-20 15:00:00]) 2、- Test MergeInto Exception *** FAILED *** Expected "... for target field: '[id]' in merge into upda...", but got "... for target field: '[_ts]' in merge into upda..." (TestHoodieSqlBase.scala:86) 3、test basic HoodieSparkSqlWriter functionality with datasource insert for COPY_ON_WRITE with ORC as the base file format with populate meta fields true *** FAILED *** 4、test basic HoodieSparkSqlWriter functionality with datasource insert for MERGE_ON_READ with ORC as the base file format with populate meta fields true *** FAILED *** 5、Test Sql Statements *** FAILED *** java.lang.IllegalArgumentException: UnExpect result for: select id, name, price, cast(dt as string) from h0_p Expect: 1 a1 10 2021-05-07 00:00:00, Actual: 1 a1 10 2021-05-07 15:00:00 6、Test Crea
 te Table As Select *** FAILED *** Expected Array([1,a1,10,2021-05-06 00:00:00]), but got Array([1,a1,10,2021-05-06 15:00:00]) (TestHoodieSqlBase.scala:78) `
   
   I have rebased the code to the master and test for spark3. Except the test for orc, others has passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 10b2dc9c80f373a90f0140507fb20b85dfcf30d5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r687647053



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,25 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(_.name)
+          val targetColumnNamesWithoutMetaFields = targetOutputWithoutMetaFields.map(_.name)
+
+          if(targetColumnNamesWithoutMetaFields.toSet.subsetOf(resolvedSourceColumnNamesWithoutMetaFields.toSet)){
+            //If sourceTable's columns contains all targetTable's columns,
+            //We fill assign all the source fields to the target fields by column name matching.
+            val sourceColNameAttrMap = resolvedSourceOutputWithoutMetaFields.map(attr => (attr.name,attr)).toMap

Review comment:
       Okay, added




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-898207251


   > Hi @dongkelun , Thanks for the contribution for this. Overall LGTM except some minor optimize. And also you can run the test case in spark3 by the follow command:
   > 
   > > mvn clean install -DskipTests -Pspark3
   > > mvn test -Punit-tests -Pspark3 -pl hudi-spark-datasource/hudi-spark
   
   Hi, @pengzhiwei2018 The result is:'Tests: succeeded 56, failed 6, canceled 0, ignored 0, pending 0'.Two of them are ORC exceptions,  and the other three I think are due to time zone differences, but I don't know how to solve the time zone difference, and the other one is the mismatch of exception information.  The detailed results are as follows:
   
   `
   1、Test Different Type of Partition Column *** FAILED ***
     Expected Array([1,a1,10,2021-05-20 00:00:00], [2,a2,10,2021-05-20 00:00:00]), but got Array([1,a1,10.0,2021-05-20 15:00:00], [2,a2,10.0,2021-05-20 15:00:00])
   2、- Test MergeInto Exception *** FAILED ***
     Expected "... for target field: '[id]' in merge into upda...", but got "... for target field: '[_ts]' in merge into upda..." (TestHoodieSqlBase.scala:86)
   3、test basic HoodieSparkSqlWriter functionality with datasource insert for COPY_ON_WRITE with ORC as the base file format  with populate meta fields true *** FAILED ***
   4、test basic HoodieSparkSqlWriter functionality with datasource insert for MERGE_ON_READ with ORC as the base file format  with populate meta fields true *** FAILED ***
   5、Test Sql Statements *** FAILED ***
     java.lang.IllegalArgumentException: UnExpect result for: select id, name, price, cast(dt as string) from h0_p
   Expect:
    1 a1 10 2021-05-07 00:00:00, Actual:
    1 a1 10 2021-05-07 15:00:00
   6、Test Create Table As Select *** FAILED ***
     Expected Array([1,a1,10,2021-05-06 00:00:00]), but got Array([1,a1,10,2021-05-06 15:00:00]) (TestHoodieSqlBase.scala:78)  
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570",
       "triggerID" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1659",
       "triggerID" : "df3e3baa6d1f4c2d637164da2eb9d54385c5f9f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e89deea5859150123d84b248affdd5355ddeb82a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1683",
       "triggerID" : "e89deea5859150123d84b248affdd5355ddeb82a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e89deea5859150123d84b248affdd5355ddeb82a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1683) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570",
       "triggerID" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 10b2dc9c80f373a90f0140507fb20b85dfcf30d5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411) 
   * f8fb54e42c50bd8cfa3dd8be161f0d924b581081 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1570) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 merged pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 merged pull request #3415:
URL: https://github.com/apache/hudi/pull/3415


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 10b2dc9c80f373a90f0140507fb20b85dfcf30d5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r686883225



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,24 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(attr => attr.name)
+
+          if(targetOutputWithoutMetaFields.filter(attr => resolvedSourceColumnNamesWithoutMetaFields.contains(attr.name)).length
+            == targetOutputWithoutMetaFields.length) {
+            //If sourceTable's columns contains all targetTable's columns,
+            //We fill assign all the source fields to the target fields by column name matching.
+            targetOutputWithoutMetaFields.map(targetAttr => {
+              val sourceAttr = resolvedSourceOutputWithoutMetaFields.filter(attr => attr.name.equals(targetAttr.name)).head

Review comment:
       Hello, is that right?
   
   `val sourceColNameAttrMap = resolvedSourceOutputWithoutMetaFields.map(attr => (attr.name,attr)).toMap
   val sourceAttr = sourceColNameAttrMap.get(targetAttr.name).get
   `




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f8fb54e42c50bd8cfa3dd8be161f0d924b581081",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 10b2dc9c80f373a90f0140507fb20b85dfcf30d5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411) 
   * f8fb54e42c50bd8cfa3dd8be161f0d924b581081 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on a change in pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on a change in pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#discussion_r686883225



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
##########
@@ -142,11 +142,24 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi
         val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
         val resolvedAssignments = if (isInsertOrUpdateStar(assignments)) {
           // assignments is empty means insert * or update set *
-          // we fill assign all the source fields to the target fields
-          target.output
-            .filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
-            .zip(resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name)))
-            .map { case (targetAttr, sourceAttr) => Assignment(targetAttr, sourceAttr) }
+          val resolvedSourceOutputWithoutMetaFields = resolvedSource.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val targetOutputWithoutMetaFields = target.output.filter(attr => !HoodieSqlUtils.isMetaField(attr.name))
+          val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(attr => attr.name)
+
+          if(targetOutputWithoutMetaFields.filter(attr => resolvedSourceColumnNamesWithoutMetaFields.contains(attr.name)).length
+            == targetOutputWithoutMetaFields.length) {
+            //If sourceTable's columns contains all targetTable's columns,
+            //We fill assign all the source fields to the target fields by column name matching.
+            targetOutputWithoutMetaFields.map(targetAttr => {
+              val sourceAttr = resolvedSourceOutputWithoutMetaFields.filter(attr => attr.name.equals(targetAttr.name)).head

Review comment:
       Hello, is that right?
   
   ```scala
   val sourceColNameAttrMap = resolvedSourceOutputWithoutMetaFields.map(attr => (attr.name,attr)).toMap  
   val sourceAttr = sourceColNameAttrMap.get(targetAttr.name).get
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-893539789


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411",
       "triggerID" : "10b2dc9c80f373a90f0140507fb20b85dfcf30d5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 10b2dc9c80f373a90f0140507fb20b85dfcf30d5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1411) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] dongkelun commented on pull request #3415: [HUDI-2279]Support column name matching for insert * and update set *

Posted by GitBox <gi...@apache.org>.
dongkelun commented on pull request #3415:
URL: https://github.com/apache/hudi/pull/3415#issuecomment-903237562


   @pengzhiwei2018 Hello, can we support ignoring case when column names match?If possible, the code will be modified as follows:
   
   ```scala
   val resolvedSourceColumnNamesWithoutMetaFields = resolvedSourceOutputWithoutMetaFields.map(_.name.toLowerCase)
   val targetColumnNamesWithoutMetaFields = targetOutputWithoutMetaFields.map(_.name.toLowerCase)
   
   val sourceColNameAttrMap = resolvedSourceOutputWithoutMetaFields.map(attr => (attr.name.toLowerCase, attr)).toMap
   val sourceAttr = sourceColNameAttrMap(targetAttr.name.toLowerCase)
   ```
   And it works fine if this [PR](https://github.com/apache/hudi/pull/3517) is merged,Do I need to submit this together with the PR above, or submit a PR separately
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org