You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "rubenssoto (via GitHub)" <gi...@apache.org> on 2023/03/30 01:18:12 UTC

[GitHub] [hudi] rubenssoto opened a new issue, #8321: [SUPPORT] Hudi And DBT: cannot resolve _hoodie_commit_time in MERGE command

rubenssoto opened a new issue, #8321:
URL: https://github.com/apache/hudi/issues/8321

   Hello everyone,
   We are trying Hudi with DBT and we are facing some problems.
   
   We could run the first execution of the model but on the second execution I am facing the following error.
   
   ```
   23/03/29 17:39:55 ERROR SparkExecuteStatementOperation: Error executing query with 5825552f-7f5f-440a-b441-cc6196ac6038, currentState RUNNING, 
   org.apache.spark.sql.AnalysisException: cannot resolve _hoodie_commit_time in MERGE command given columns [DBT_INTERNAL_SOURCE.data_lake_ts, DBT_INTERNAL_SOURCE.id]; line 15 pos 2
   	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.$anonfun$resolveMergeExprOrFail$2(Analyzer.scala:1617)
   	at scala.collection.mutable.LinkedHashSet.foreach(LinkedHashSet.scala:95)
   	at org.apache.spark.sql.catalyst.expressions.AttributeSet.foreach(AttributeSet.scala:137)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.resolveMergeExprOrFail(Analyzer.scala:1613)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.$anonfun$resolveAssignments$1(Analyzer.scala:1601)
   	at scala.collection.immutable.List.map(List.scala:293)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.resolveAssignments(Analyzer.scala:1591)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$17.$anonfun$applyOrElse$84(Analyzer.scala:1548)
   	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
   	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
   	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$17.applyOrElse(Analyzer.scala:1529)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$17.applyOrElse(Analyzer.scala:1447)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:1447)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:1427)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:215)
   	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
   	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
   	at scala.collection.immutable.List.foldLeft(List.scala:91)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:212)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:284)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:327)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:284)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:274)
   	at scala.collection.immutable.List.foreach(List.scala:431)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:274)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:188)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:222)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:218)
   	at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:167)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:218)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:182)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:203)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:202)
   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:90)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
   	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:90)
   	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:88)
   	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:80)
   	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
   	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:291)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:230)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79)
   	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:230)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:225)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
   	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:239)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:750)
   ```
   
   
   Spark 3.2.1
   Hudi 0.12.1
   Dbt spark 1.4.1
   
   Our model:
   
   {{
       config(
           materialized='incremental',
           file_format='hudi',
           incremental_strategy='merge',
           options={
               "hoodie.table.name": "my_first_dbt_model",
               "hoodie.datasource.write.recordkey.field": "id",
               "hoodie.datasource.write.precombine.field": "data_lake_ts"
           },        
           unique_key='id',
           location_root='s3://datalake/default_test/'
       )
   }}
   
   with source_data as (
   
       select format_number(rand()*1000, 0) as id
       union all
       select null as id
   
       )
   
   select *, current_timestamp() as data_lake_ts
   from source_data
   where id is not null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [SUPPORT] Hudi And DBT: cannot resolve _hoodie_commit_time in MERGE command [hudi]

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8321:
URL: https://github.com/apache/hudi/issues/8321#issuecomment-1815172972

   @rubenssoto Closing this issue as it works. Even also confirmed with the latest Hudi version. Please reopen in case you see this again. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8321: [SUPPORT] Hudi And DBT: cannot resolve _hoodie_commit_time in MERGE command

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8321:
URL: https://github.com/apache/hudi/issues/8321#issuecomment-1596104437

   @rubenssoto I tried reproducing this issue but it was working okay for me even with multiple runs. I was using spark connection with thrift and trying to run exactly same model you had. 
   
   Were you able to resolve this issue. if yes can you let us know what problem you had here.
   
   ```
   {{
   config(
   materialized='incremental',
   file_format='hudi',
   incremental_strategy='merge',
   options={
   "hoodie.table.name": "my_first_dbt_model",
   "hoodie.datasource.write.recordkey.field": "id",
   "hoodie.datasource.write.precombine.field": "data_lake_ts"
   },
   unique_key='id',
   location_root='file:///tmp/dbt/default_test/'
   )
   }}
   
   with source_data as (
   
   select format_number(rand()*1000, 0) as id
   union all
   select null as id
   
   )
   select *, current_timestamp() as data_lake_ts
   from source_data
   where id is not null
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope commented on issue #8321: [SUPPORT] Hudi And DBT: cannot resolve _hoodie_commit_time in MERGE command

Posted by "codope (via GitHub)" <gi...@apache.org>.

codope commented on issue #8321:
URL: https://github.com/apache/hudi/issues/8321#issuecomment-1505012774

   cc @vingov can you please help here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [SUPPORT] Hudi And DBT: cannot resolve _hoodie_commit_time in MERGE command [hudi]

Posted by "codope (via GitHub)" <gi...@apache.org>.

codope closed issue #8321: [SUPPORT] Hudi And DBT: cannot resolve _hoodie_commit_time in MERGE command
URL: https://github.com/apache/hudi/issues/8321


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org