You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/07 15:46:40 UTC

[GitHub] [hudi] Guanpx opened a new issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

Guanpx opened a new issue #4539:
URL: https://github.com/apache/hudi/issues/4539


   
   **Describe the problem you faced**
   
   use spark 2.4.0
   Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.DataSourceUtils$.PARTITIONING_COLUMNS_KEY()Ljava/lang/String;
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. spark writer data to hudi
   
   **Expected behavior**
   a sample code
   ```
    val spark: SparkSession = SparkSession.builder()
         .appName("test")
         .master("local[3]")
         .config(SPARK_SERIALIZER,SPARK_SERIALIZER_VALUE)
         .getOrCreate()
   
       val df: DataFrame = spark.
         read
         .format("hudi")
         .load("hdfs://pre-cdh01.com:8020/hudi/rd/app_columns")
         .where("id in (6373, 6382)")
   
       df.show()
       
       df.write
         .format("hudi")
         .options(getQuickstartWriteConfigs)
         .option(PRECOMBINE_FIELD.key(), "create_time")
         .option(RECORDKEY_FIELD.key(), "id")
         .option(TABLE_NAME.key(), "app_columns")
         .mode(SaveMode.Append).
         save("hdfs://pre-cdh01.com:8020/hudi/rd/app_columns")
   ```
   **Environment Description**
   
   * Hudi version : 0.10.0
   
   * Spark version : 2.4.0-cdh6
   
   * Hive version : 2.1.1-cdh6
   
   * Hadoop version : 3.0.0-cdh6
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   [links](https://github.com/apache/hudi/issues/3191)
   
   hudi use  org.apache.spark.sql.execution.datasources.DataSourceUtils$.PARTITIONING_COLUMNS_KEY() 
    in [code](https://github.com/apache/hudi/blob/release-0.10.0/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala#L214)
   
   our spark can not upgrade, so, if **replace hudi source code
    SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY TO string "__partition_columns"  or delete that code will Impact on other functions?**
   
   
   **Stacktrace**
   
   ```
   Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.DataSourceUtils$.PARTITIONING_COLUMNS_KEY()Ljava/lang/String;
   	at org.apache.hudi.DataSourceWriteOptions$.translateSqlOptions(DataSourceOptions.scala:214)
   	at org.apache.hudi.HoodieSparkSqlWriter$.mergeParamsAndGetHoodieConfig(HoodieSparkSqlWriter.scala:721)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:91)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
   	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
   	at juzishuke.spark.ReadOrWriteHudi$.main(ReadOrWriteHudi.scala:42)
   	at juzishuke.spark.ReadOrWriteHudi.main(ReadOrWriteHudi.scala)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] Guanpx commented on issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

Posted by GitBox <gi...@apache.org>.

Guanpx commented on issue #4539:
URL: https://github.com/apache/hudi/issues/4539#issuecomment-1007514186


   #3191


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #4539:
URL: https://github.com/apache/hudi/issues/4539#issuecomment-1012254261


   Closing the issue out as we don't have much AI from our side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] Guanpx commented on issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

Posted by GitBox <gi...@apache.org>.

Guanpx commented on issue #4539:
URL: https://github.com/apache/hudi/issues/4539#issuecomment-1007514186


   #3191


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan closed issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

Posted by GitBox <gi...@apache.org>.

nsivabalan closed issue #4539:
URL: https://github.com/apache/hudi/issues/4539


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #4539:
URL: https://github.com/apache/hudi/issues/4539#issuecomment-1008230027


   2.4.0 is not supported. Can you try with 2.4.3 or higher spark versions. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] Guanpx commented on issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

Posted by GitBox <gi...@apache.org>.

Guanpx commented on issue #4539:
URL: https://github.com/apache/hudi/issues/4539#issuecomment-1008493909


   > 2.4.0 is not supported. Can you try with 2.4.3 or higher spark versions.
   
   our spark can not upgrade, so, if replace hudi source code
   SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY TO string "__partition_columns" or delete that code will Impact on other functions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #4539:
URL: https://github.com/apache/hudi/issues/4539#issuecomment-1009419865


   I am not sure on that. Dont' think anyone has ever tried that. in general, would not recommend doing such custom changes. But if you want to, do give it a try and let us know how it goes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] Guanpx edited a comment on issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

Posted by GitBox <gi...@apache.org>.

Guanpx edited a comment on issue #4539:
URL: https://github.com/apache/hudi/issues/4539#issuecomment-1007514186


   #3191 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] Guanpx edited a comment on issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

Posted by GitBox <gi...@apache.org>.

Guanpx edited a comment on issue #4539:
URL: https://github.com/apache/hudi/issues/4539#issuecomment-1007514186


   #3191 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org