You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/10 09:51:11 UTC

[GitHub] [hudi] pankajc007 opened a new issue #4550: [QUESTION] Hudi Partial Update on COW

pankajc007 opened a new issue #4550:
URL: https://github.com/apache/hudi/issues/4550


   Hi Team,
   I have a use case for partial update on a Hudi table. I saw a https://issues.apache.org/jira/browse/HUDI-1884 ticket ([PR](https://github.com/apache/hudi/pull/3154/files)) already merged in version 0.9.0. I am trying out this, but could not achieve it. getting schema error while writing.
   Wondering if I need to set some hudi options explicitly to enable partial updates.
   I would really appreciate if anyone can give me an example for that, or point me in the right direction if in case I've missed anything.
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pankajc007 closed issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
pankajc007 closed issue #4550:
URL: https://github.com/apache/hudi/issues/4550


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1011827801


   @pankajc007 there are some examples in the quick start guide for merge into
   https://hudi.apache.org/docs/quick-start-guide#mergeinto
   
   options you can refer to the setup section
   https://hudi.apache.org/docs/quick-start-guide#setup


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx commented on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
Guanpx commented on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1011901170


   > @pankajc007 there are some examples in the quick start guide for merge into https://hudi.apache.org/docs/quick-start-guide#mergeinto
   > 
   > options you can refer to the setup section https://hudi.apache.org/docs/quick-start-guide#setup
   > 
   > please close if this answers your question
   
   ![image](https://user-images.githubusercontent.com/29246713/149292152-6feb8364-a6ee-4f31-9075-bd9315709e0d.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1009430640


   @YannByron : Can you please follow up here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan edited a comment on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
xushiyan edited a comment on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1011827801


   @pankajc007 there are some examples in the quick start guide for merge into
   https://hudi.apache.org/docs/quick-start-guide#mergeinto
   
   options you can refer to the setup section
   https://hudi.apache.org/docs/quick-start-guide#setup
   
   please close if this answers your question


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1012311002


   @Guanpx you'd need to switch to the Spark sql tab alongside "scala" "python"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] LucassLin edited a comment on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
LucassLin edited a comment on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1016807182


   Hi Team,
   
   I followed https://hudi.apache.org/docs/quick-start-guide#mergeinto and do a partial update on a table but getting 
   ```
   22/01/18 23:22:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.lang.AssertionError: assertion failed: No plan for MergeIntoTable (((col1#4217 = col1#258) && (col2#4230 = col2#305)) && ((col3#4241 = col3#316) && (col4#4287 = col4#1441))), [updateaction(None, assignment(col5#4242, col5#317), assignment(col6#4244, col6#319))], [insertaction(None)]
   ```
   
   Following is the code I have:
   ```
   val historicalDF = spark.read.format("org.apache.hudi").load(basePath)
       historicalDF.createOrReplaceTempView("historical_data")
       incrementalDF.createOrReplaceTempView("incremental_data")
       val sqlPartialUpdate =
         s"""
          | merge into historical_data as target
          | using (
          |   select * from incremental_data
          | ) source
          | on  target.col1 = source.col1
          | and target.col2 = source.col2
          | and target.col3 = source.col3
          | and target.col4 = source.col4
          | when matched then
          |   update set target.col5 = source.col5, target.col6 = source.col6
          | when not matched then insert *
          """.stripMargin
      spark.sql(sqlPartialUpdate)
   ```
   
   I would really appreciate if anyone can give me an example for that, or point me in the right direction if in case I've missed anything.
   Thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] LucassLin edited a comment on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
LucassLin edited a comment on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1016807182


   Hi Team,
   
   I followed https://hudi.apache.org/docs/quick-start-guide#mergeinto and do a partial update on a table but getting 
   ```
   22/01/18 23:22:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.lang.AssertionError: assertion failed: No plan for MergeIntoTable (((col1#4217 = col1#258) && (col2#4230 = col2#305)) && ((col3#4241 = col3#316) && (col4#4287 = col4#1441))), [updateaction(None, assignment(col5#4242, col5#317), assignment(col6#4244, col6#319))], [insertaction(None)]
   ```
   
   Following is the code I have:
   ```
   val historicalDF = spark.read.format("org.apache.hudi").load(basePath)
   historicalDF.createOrReplaceTempView("historical_data")
   incrementalDF.createOrReplaceTempView("incremental_data")
   val sqlPartialUpdate =
         s"""
          | merge into historical_data as target
          | using (
          |   select * from incremental_data
          | ) source
          | on  target.col1 = source.col1
          | and target.col2 = source.col2
          | and target.col3 = source.col3
          | and target.col4 = source.col4
          | when matched then
          |   update set target.col5 = source.col5, target.col6 = source.col6
          | when not matched then insert *
          """.stripMargin
   spark.sql(sqlPartialUpdate)
   ```
   
   I would really appreciate if anyone can give me an example for that, or point me in the right direction if in case I've missed anything.
   Thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] LucassLin removed a comment on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
LucassLin removed a comment on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1016807182


   Hi Team,
   
   I followed https://hudi.apache.org/docs/quick-start-guide#mergeinto and do a partial update on a table but getting the following issue
   ```
   22/01/18 23:22:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.lang.AssertionError: assertion failed: No plan for MergeIntoTable (((col1#4217 = col1#258) && (col2#4230 = col2#305)) && ((col3#4241 = col3#316) && (col4#4287 = col4#1441))), [updateaction(None, assignment(col5#4242, col5#317), assignment(col6#4244, col6#319))], [insertaction(None)]
   ```
   
   Following is the code I have:
   ```
   val historicalDF = spark.read.format("org.apache.hudi").load(basePath)
   historicalDF.createOrReplaceTempView("historical_data")
   incrementalDF.createOrReplaceTempView("incremental_data")
   val sqlPartialUpdate =
         s"""
          | merge into historical_data as target
          | using (
          |   select * from incremental_data
          | ) source
          | on  target.col1 = source.col1
          | and target.col2 = source.col2
          | and target.col3 = source.col3
          | and target.col4 = source.col4
          | when matched then
          |   update set target.col5 = source.col5, target.col6 = source.col6
          | when not matched then insert *
          """.stripMargin
   spark.sql(sqlPartialUpdate)
   ```
   
   I would really appreciate if anyone can help with this issue, or point me in the right direction if in case I've missed anything.
   Thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan edited a comment on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
xushiyan edited a comment on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1012311002


   @Guanpx you'd need to switch to the Spark sql tab alongside "scala" "python"
   @kywe665 fyi this is a website usage problem where spark sql tab hidden away while nav link shows the anchor link


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] LucassLin edited a comment on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
LucassLin edited a comment on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1016807182


   Hi Team,
   
   I followed https://hudi.apache.org/docs/quick-start-guide#mergeinto and do a partial update on a table but getting 
   ```
   22/01/18 23:22:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.lang.AssertionError: assertion failed: No plan for MergeIntoTable (((col1#4217 = col1#258) && (col2#4230 = col2#305)) && ((col3#4241 = col3#316) && (col4#4287 = col4#1441))), [updateaction(None, assignment(col5#4242, col5#317), assignment(col6#4244, col6#319))], [insertaction(None)]
   ```
   
   Following is the code I have:
   ```
   val historicalDF = spark.read.format("org.apache.hudi").load(basePath)
   historicalDF.createOrReplaceTempView("historical_data")
   incrementalDF.createOrReplaceTempView("incremental_data")
   val sqlPartialUpdate =
         s"""
          | merge into historical_data as target
          | using (
          |   select * from incremental_data
          | ) source
          | on  target.col1 = source.col1
          | and target.col2 = source.col2
          | and target.col3 = source.col3
          | and target.col4 = source.col4
          | when matched then
          |   update set target.col5 = source.col5, target.col6 = source.col6
          | when not matched then insert *
          """.stripMargin
   spark.sql(sqlPartialUpdate)
   ```
   
   I would really appreciate if anyone can help with this issue, or point me in the right direction if in case I've missed anything.
   Thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] LucassLin commented on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
LucassLin commented on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1016807182


   Hi Team,
   
   I followed https://hudi.apache.org/docs/quick-start-guide#mergeinto and do a partial update on a table but getting 
   ```
   22/01/18 23:22:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.lang.AssertionError: assertion failed: No plan for MergeIntoTable (((col1#4217 = col1#258) && (col2#4230 = col2#305)) && ((col3#4241 = col3#316) && (col4#4287 = col4#1441))), [updateaction(None, assignment(col5#4242, col5#317), assignment(col6#4244, col6#319))], [insertaction(None)]
   ```
   
   Following is the code I have:
   ```
   val historicalDF = spark.read.format("org.apache.hudi").load(basePath)
       historicalDF.createOrReplaceTempView("historical_data")
       dataFrameMap(incremental).createOrReplaceTempView("incremental_data")
       val sqlPartialUpdate =
         s"""
          | merge into historical_data as target
          | using (
          |   select * from incremental_data
          | ) source
          | on  target.col1 = source.col1
          | and target.col2 = source.col2
          | and target.col3 = source.col3
          | and target.col4 = source.col4
          | when matched then
          |   update set target.col5 = source.col5, target.col6 = source.col6
          | when not matched then insert *
          """.stripMargin
      spark.sql(sqlPartialUpdate)
   ```
   
   I would really appreciate if anyone can give me an example for that, or point me in the right direction if in case I've missed anything.
   Thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] LucassLin edited a comment on issue #4550: [QUESTION] Hudi Partial Update on COW

Posted by GitBox <gi...@apache.org>.
LucassLin edited a comment on issue #4550:
URL: https://github.com/apache/hudi/issues/4550#issuecomment-1016807182


   Hi Team,
   
   I followed https://hudi.apache.org/docs/quick-start-guide#mergeinto and do a partial update on a table but getting the following issue
   ```
   22/01/18 23:22:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.lang.AssertionError: assertion failed: No plan for MergeIntoTable (((col1#4217 = col1#258) && (col2#4230 = col2#305)) && ((col3#4241 = col3#316) && (col4#4287 = col4#1441))), [updateaction(None, assignment(col5#4242, col5#317), assignment(col6#4244, col6#319))], [insertaction(None)]
   ```
   
   Following is the code I have:
   ```
   val historicalDF = spark.read.format("org.apache.hudi").load(basePath)
   historicalDF.createOrReplaceTempView("historical_data")
   incrementalDF.createOrReplaceTempView("incremental_data")
   val sqlPartialUpdate =
         s"""
          | merge into historical_data as target
          | using (
          |   select * from incremental_data
          | ) source
          | on  target.col1 = source.col1
          | and target.col2 = source.col2
          | and target.col3 = source.col3
          | and target.col4 = source.col4
          | when matched then
          |   update set target.col5 = source.col5, target.col6 = source.col6
          | when not matched then insert *
          """.stripMargin
   spark.sql(sqlPartialUpdate)
   ```
   
   I would really appreciate if anyone can help with this issue, or point me in the right direction if in case I've missed anything.
   Thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org