You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "wolf8334 (via GitHub)" <gi...@apache.org> on 2023/02/27 11:01:59 UTC

[GitHub] [hudi] wolf8334 opened a new issue, #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

wolf8334 opened a new issue, #8061:
URL: https://github.com/apache/hudi/issues/8061

   **Describe the problem you faced**
   
   I use java and spark 3.3 to read hudi 0.13.0 table following the guide on offical website.
   The guide says this will work,but I got an IllegalArgumentException: For input string: "null".
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.generate one hudi COW table from mysql table.
   2.get access to the COW table through spark sql
   3.the IllegalArgumentException: For input string: "null" shows.
   4.I have already changed the datasource and the table structure,It has no relationship with this.
   
   **Expected behavior**
   
   the data is shown.
   
   **Environment Description**
   
   * Hudi version :
   0.12.2,0.13.0
   
   * Spark version :
   3.3.2
   
   * Hive version :
   none
   
   * Hadoop version :
   3.3.4
   
   * Storage (HDFS/S3/GCS..) :
   HDFS
   
   * Running on Docker? (yes/no) :
   no.my local laptop
   
   **Additional context**
   JDK 1.8
   
   Add any other context about the problem here.
   `           Map<String, String> hudiConf = new HashMap<>();
               hudiConf.put(HoodieWriteConfig.TBL_NAME.key(), "t_yklc_info");
   
               Dataset<Row> demods = getActiveSession().read().options(hudiConf).format("org.apache.hudi").load("/user/spark/hudi/*/*");
   
               demods.createOrReplaceTempView("lcinfo");
               demods.printSchema();
   
               logger.info(getActiveSession().conf().get(SQLConf.LEGACY_PARQUET_NANOS_AS_LONG().key()).toString());
               logger.info(getActiveSession().conf().get(SQLConf.PARQUET_BINARY_AS_STRING().key()).toString());
               logger.info(getActiveSession().conf().get(SQLConf.PARQUET_INT96_AS_TIMESTAMP().key()).toString());
               logger.info(getActiveSession().conf().get(SQLConf.CASE_SENSITIVE().key()).toString());
   
   
               Dataset<Row> ds = getActiveSession().sql("select APP_NO from lcinfo where APP_NO = '1' and STAT_CYCLE = '2'");
               ds.printSchema();
               ds.show();`
   
   **Stacktrace**
   `INFO  18:45:03.183 | org.apache.spark.sql.execution.datasources.FileScanRDD | Reading File path: hdfs://192.168.5.128:9000/user/spark/hudi/2/1.parquet, range: 0-3964741, partition values: [empty row]
   ERROR 18:45:03.420 | org.apache.spark.executor.Executor | Exception in task 3.0 in stage 1.0 (TID 60)
   java.lang.IllegalArgumentException: For input string: "null"
   	at scala.collection.immutable.StringLike.parseBoolean(StringLike.scala:330) ~[scala-library-2.12.15.jar:?]
   	at scala.collection.immutable.StringLike.toBoolean(StringLike.scala:289) ~[scala-library-2.12.15.jar:?]
   	at scala.collection.immutable.StringLike.toBoolean$(StringLike.scala:289) ~[scala-library-2.12.15.jar:?]
   	at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:33) ~[scala-library-2.12.15.jar:?]
   	at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.<init>(ParquetSchemaConverter.scala:70) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.sql.execution.datasources.parquet.HoodieParquetFileFormatHelper$.buildImplicitSchemaChangeInfo(HoodieParquetFileFormatHelper.scala:30) ~[hudi-spark3.3-bundle_2.12-0.13.0.jar:3.3.2]
   	at org.apache.spark.sql.execution.datasources.parquet.Spark32PlusHoodieParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(Spark32PlusHoodieParquetFileFormat.scala:231) ~[hudi-spark3.3-bundle_2.12-0.13.0.jar:3.3.2]
   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:209) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:270) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:561) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source) ~[?:?]
   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?]
   	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) ~[spark-sql_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) ~[spark-core_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) ~[spark-core_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) ~[spark-core_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) ~[spark-core_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[spark-core_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.scheduler.Task.run(Task.scala:136) ~[spark-core_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) ~[spark-core_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) ~[spark-core_2.12-3.3.2.jar:3.3.2]
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) ~[spark-core_2.12-3.3.2.jar:3.3.2]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362]
   	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bigdata-spec commented on issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Posted by "bigdata-spec (via GitHub)" <gi...@apache.org>.

bigdata-spec commented on issue #8061:
URL: https://github.com/apache/hudi/issues/8061#issuecomment-1517286711

   > [wolf8334](/wolf8334)
   
   does it mean master can fix this problem？


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bigdata-spec commented on issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Posted by "bigdata-spec (via GitHub)" <gi...@apache.org>.

bigdata-spec commented on issue #8061:
URL: https://github.com/apache/hudi/issues/8061#issuecomment-1517326256

   > spark.hadoop.spark.sql.parquet.binaryAsString
   
   Hi， I have some doubt，
   I find 
   ```
   --conf 'spark.hadoop.spark.sql.legacy.parquet.nanosAsLong=false' \
   --conf 'spark.hadoop.spark.sql.parquet.binaryAsString=false' \
   --conf 'spark.hadoop.spark.sql.parquet.int96AsTimestamp=true' \
   --conf 'spark.hadoop.spark.sql.caseSensitive=false'
   ```
   
   in Apache Spark3.3.2, **spark.hadoop.spark.sql.legacy.parquet.nanosAsLong is false** and so on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] caokaizhi commented on issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Posted by "caokaizhi (via GitHub)" <gi...@apache.org>.

caokaizhi commented on issue #8061:
URL: https://github.com/apache/hudi/issues/8061#issuecomment-1459350020

   I also encountered this problem when using hudi 0.13.0 on spark 3.3.2 and found that an exception was thrown when querying the mor table and the merge type was "REALTIME_PAYLOAD_COMBINE". The reason for this is that spark 3.3.2 is not compatible with the ParquetToSparkSchemaConverter class of spark 3.3.1. The constructor method of the ParquetToSparkSchemaConverterl class in spark 3.3.2 requires the "LEGACY_PARQUET_NANOS_AS_LONG" configuration parameter, whereas in The buildReaderWithPartitionValues method of the Spark32PlusHoodieParquetFileFormatl class does not initialize the value of this parameter. So my conclusion is that hudi 0.13.0 is currently not compatible with spark 3.3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yihua commented on issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Posted by "yihua (via GitHub)" <gi...@apache.org>.

yihua commented on issue #8061:
URL: https://github.com/apache/hudi/issues/8061#issuecomment-1646189442

   Hi @bigdata-spec Have you tried Hudi 0.13.1 release which is compatible with Spark 3.3.2 release, without adding additional spark configs above?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] wolf8334 commented on issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Posted by "wolf8334 (via GitHub)" <gi...@apache.org>.

wolf8334 commented on issue #8061:
URL: https://github.com/apache/hudi/issues/8061#issuecomment-1447657892

   I add these code and it works.But still wander why.
   sc.set("spark.sql.legacy.parquet.nanosAsLong", "false");
   sc.set("spark.sql.parquet.binaryAsString", "false");
   sc.set("spark.sql.parquet.int96AsTimestamp", "true");
   sc.set("spark.sql.caseSensitive", "false");


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] danny0405 commented on issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.

danny0405 commented on issue #8061:
URL: https://github.com/apache/hudi/issues/8061#issuecomment-1459673658

   Thanks for the feedback, found that there seems already a fixing PR: https://github.com/apache/hudi/pull/8082/files,
   let's move the discussions there and it's great if you guys can help the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope commented on issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Posted by "codope (via GitHub)" <gi...@apache.org>.

codope commented on issue #8061:
URL: https://github.com/apache/hudi/issues/8061#issuecomment-1505032357

   Closing as we have the PR and we will followup there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] cmanning-arcadia commented on issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Posted by "cmanning-arcadia (via GitHub)" <gi...@apache.org>.

cmanning-arcadia commented on issue #8061:
URL: https://github.com/apache/hudi/issues/8061#issuecomment-1501775602

   If anyone finds this and has the issue from spark-shell, try adding the parameters as command line args. Example:
   
   `spark-shell --packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.12.2 \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
   --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
   --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \
   --conf 'spark.hadoop.spark.sql.legacy.parquet.nanosAsLong=false' \
   --conf 'spark.hadoop.spark.sql.parquet.binaryAsString=false' \
   --conf 'spark.hadoop.spark.sql.parquet.int96AsTimestamp=true' \
   --conf 'spark.hadoop.spark.sql.caseSensitive=false'`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope closed issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Posted by "codope (via GitHub)" <gi...@apache.org>.

codope closed issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"
URL: https://github.com/apache/hudi/issues/8061


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bigdata-spec commented on issue #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Posted by "bigdata-spec (via GitHub)" <gi...@apache.org>.

bigdata-spec commented on issue #8061:
URL: https://github.com/apache/hudi/issues/8061#issuecomment-1517287409

   > Thanks for the feedback, found that there seems already a fixing PR: #8082, let's move the discussions there and it's great if you guys can help the review.
   
   Hi，I will try Spark3.3.2 and Hudi 0.13，does it mean master can fix this problem？


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org