You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/04 10:15:12 UTC

[GitHub] [hudi] KnightChess opened a new issue, #6037: [QUESTION] about the implementation of spark merge into

KnightChess opened a new issue, #6037:
URL: https://github.com/apache/hudi/issues/6037

   Hi, team, I have some doubt when use `merge into` with spark sql
   1.
   eg:
   ```sql
   merge into delete_error_test target 
   using (select 'wlq_new3' as name, 1 as id, 29 as age, '20210101' as dt) source 
   on target.id = source.id
   when matched and target.age = 28 then update set age = source.age, dt = source.dt 
   when not matched  then insert (id, age, name, dt) 
   values (source.id, source.age, source.name, '20210102')").explain(true)
   ```
   the matched condition col is  target table, it will throw cannot resolve, consider the below code, I'm not sure if it considers `mor` table. If so, `condition.map(resolveExpressionFrom(resolvedSource, target)(_))` for `cow` may be ok.
   Are there any other considerations?
   ```scala
   val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
   // ...
   private def resolveExpressionFrom(left: LogicalPlan, right: Option[LogicalPlan] = None)
                           (expression: Expression): Expression = {
       // Fake a project for the expression based on the source plan.
       val fakeProject = if (right.isDefined) {
         Project(Seq(Alias(expression, "_c0")()),
           sparkAdapter.createJoin(left, right.get, Inner))
       } else {
         Project(Seq(Alias(expression, "_c0")()),
           left)
       }
       // Resolve the fake project
       val resolvedProject =
         analyzer.ResolveReferences.apply(fakeProject).asInstanceOf[Project]
       val unResolvedAttrs = resolvedProject.projectList.head.collect {
         case attr: UnresolvedAttribute => attr
       }
       if (unResolvedAttrs.nonEmpty) {
         throw new AnalysisException(s"Cannot resolve ${unResolvedAttrs.mkString(",")} in " +
           s"${expression.sql}, the input " + s"columns is: [${fakeProject.child.output.mkString(", ")}]")
       }
       // Fetch the resolved expression from the fake project.
       resolvedProject.projectList.head.asInstanceOf[Alias].child
     }
   ```
   
   2.
   I have the need to use multi updateActions, Are there any considerations for the below limit？
   ```scala
   assert(updateActions.size <= 1, s"Only support one updateAction currently, current update action count is: ${updateActions.size}")
   ```
   
   Hope to get confused


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6037: [QUESTION] about the implementation of spark merge into

Posted by GitBox <gi...@apache.org>.

yihua commented on issue #6037:
URL: https://github.com/apache/hudi/issues/6037#issuecomment-1174654191

   Thanks @fengjian428 for looking into this.  Let's create a Jira ticket to track that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope closed issue #6037: [QUESTION] about the implementation of spark merge into

Posted by GitBox <gi...@apache.org>.

codope closed issue #6037: [QUESTION] about the implementation of spark merge into
URL: https://github.com/apache/hudi/issues/6037


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] voonhous commented on issue #6037: [QUESTION] about the implementation of spark merge into

Posted by GitBox <gi...@apache.org>.

voonhous commented on issue #6037:
URL: https://github.com/apache/hudi/issues/6037#issuecomment-1174752193

   Would like to enquire, for multiple update actions, how would overlapping matched actions be evaluated? 
   
   For example:
   ```sql
   when matched and target.age = -1 then update set name = source.name, age = source.age
   when matched and target.age > -10 then update set name = source.name
   ```
   
   I am not really sure how traditional SQL parsers/engines handle these situations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] fengjian428 commented on issue #6037: [QUESTION] about the implementation of spark merge into

Posted by GitBox <gi...@apache.org>.

fengjian428 commented on issue #6037:
URL: https://github.com/apache/hudi/issues/6037#issuecomment-1174052124

   Do you want to use Merge into syntex on MergeOnRead table?  what's the error you got? I know there are some limitations if using this syntax on MOR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess commented on issue #6037: [QUESTION] about the implementation of spark merge into

Posted by GitBox <gi...@apache.org>.

KnightChess commented on issue #6037:
URL: https://github.com/apache/hudi/issues/6037#issuecomment-1174476029

   @fengjian428 No, just COW table. 
   In case one, it will throw AnalysisException `Cannot resolve target.age in ...., the input columns is: [id#x, name#y, age#z, dt#w]`.
   In case two, I want use more than one updateAction may be like below
   ```sql
   merge into delete_error_test target 
   using (select 'wlq_new3' as name, 1 as id, 29 as age, '20210101' as dt) source 
   on target.id = source.id
   when matched and target.age = -1 then update set name = source.name, age = source.age
   when matched and target.age <> -1 then update set name = source.name
   when not matched  then insert (id, age, name, dt) 
   values (source.id, source.age, source.name, '20210102')").explain(true)
   ```
   And I just want to know why add the limit in these case, then I will coding for these scenes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] fengjian428 commented on issue #6037: [QUESTION] about the implementation of spark merge into

Posted by GitBox <gi...@apache.org>.

fengjian428 commented on issue #6037:
URL: https://github.com/apache/hudi/issues/6037#issuecomment-1174550510

   for the first question, I think this restriction is just back-compatible for mor table.  data should update correctly if remove the unResolvedAttrs check when using cow mode.
   
   for the second one,  I go through the source code, then I found the expressionPayload that MergeInto used, cannot support more than one UPDATE condition and assignment.  I think I can create a pr for this
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess closed issue #6037: [QUESTION] about the implementation of spark merge into

Posted by GitBox <gi...@apache.org>.

KnightChess closed issue #6037: [QUESTION] about the implementation of spark merge into
URL: https://github.com/apache/hudi/issues/6037


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6037: [QUESTION] about the implementation of spark merge into

Posted by "yihua (via GitHub)" <gi...@apache.org>.

yihua commented on issue #6037:
URL: https://github.com/apache/hudi/issues/6037#issuecomment-1471014100

   I created HUDI-5940 for supporting predicates with target table fields in matched conditions in Spark SQL MERGE INTO.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess commented on issue #6037: [QUESTION] about the implementation of spark merge into

Posted by GitBox <gi...@apache.org>.

KnightChess commented on issue #6037:
URL: https://github.com/apache/hudi/issues/6037#issuecomment-1174558095

   @fengjian428 thanks for answering, for the second, Looking forward to your pr :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] fengjian428 commented on issue #6037: [QUESTION] about the implementation of spark merge into

Posted by GitBox <gi...@apache.org>.

fengjian428 commented on issue #6037:
URL: https://github.com/apache/hudi/issues/6037#issuecomment-1174683659

   > ```scala
   > updateActions
   > ```
   
   https://issues.apache.org/jira/browse/HUDI-4361 create one for multiple update actions @yihua 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope commented on issue #6037: [QUESTION] about the implementation of spark merge into

Posted by GitBox <gi...@apache.org>.

codope commented on issue #6037:
URL: https://github.com/apache/hudi/issues/6037#issuecomment-1202059274

   > Would like to enquire about some special cases. For multiple update actions, how would overlapping matched actions be evaluated?
   > 
   > For example:
   > 
   > ```sql
   > when matched and target.age = -1 then update set name = source.name, age = source.age
   > when matched and target.age > -10 then update set name = source.name
   > ```
   > 
   > I am not really sure how traditional SQL parsers/engines handle these situations.
   
   @voonhous @fengjian428 Let's take this implementation discussion to the JIRA HUDI-4361. We should prioritize this improvement in the next release (0.13.0). Since the issue itself has been triaged, I am going to close it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org