You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "lw-lin (via GitHub)" <gi...@apache.org> on 2023/03/07 05:34:31 UTC

[GitHub] [hudi] lw-lin opened a new issue, #8109: [SUPPORT] Spark32PlusHoodieParquetFileFormat should set "SQLConf.LEGACY_PARQUET_NANOS_AS_LONG" ?

lw-lin opened a new issue, #8109:
URL: https://github.com/apache/hudi/issues/8109

   **Describe the problem you faced**
   
   I've got an Exception:
   
   > Caused by: java.lang.IllegalArgumentException: For input string: "null"
   	at scala.collection.immutable.StringLike.parseBoolean(StringLike.scala:330)
   	at scala.collection.immutable.StringLike.toBoolean(StringLike.scala:289)
   	at scala.collection.immutable.StringLike.toBoolean$(StringLike.scala:289)
   	at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:33)
   	at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.<init>(ParquetSchemaConverter.scala:70)
   	at org.apache.spark.sql.execution.datasources.parquet.HoodieParquetFileFormatHelper$.buildImplicitSchemaChangeInfo(HoodieParquetFileFormatHelper.scala:30)
   
   Maybe it's because 
   https://github.com/apache/spark/blob/v3.3.2/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L3462 introduced a new config entry "SQLConf.LEGACY_PARQUET_NANOS_AS_LONG", and `Spark32PlusHoodieParquetFileFormat` should set it?
   
   **Environment Description**
   
   * Hudi version : 0.13.0
   
   * Spark version : 3.3.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8109: [SUPPORT] Spark32PlusHoodieParquetFileFormat should set "SQLConf.LEGACY_PARQUET_NANOS_AS_LONG" ?

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8109:
URL: https://github.com/apache/hudi/issues/8109#issuecomment-1514507066

   Already a PR in progress of the same - https://github.com/apache/hudi/pull/8082


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xushiyan closed issue #8109: [SUPPORT] Spark32PlusHoodieParquetFileFormat should set "SQLConf.LEGACY_PARQUET_NANOS_AS_LONG" ?

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.

xushiyan closed issue #8109: [SUPPORT] Spark32PlusHoodieParquetFileFormat should set "SQLConf.LEGACY_PARQUET_NANOS_AS_LONG" ?
URL: https://github.com/apache/hudi/issues/8109


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] jhchee commented on issue #8109: [SUPPORT] Spark32PlusHoodieParquetFileFormat should set "SQLConf.LEGACY_PARQUET_NANOS_AS_LONG" ?

Posted by "jhchee (via GitHub)" <gi...@apache.org>.

jhchee commented on issue #8109:
URL: https://github.com/apache/hudi/issues/8109#issuecomment-1512948346

   Ok you should set this when using spark.sql 
   `.config("spark.sql.legacy.parquet.nanosAsLong", "true")`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] HuangFru commented on issue #8109: [SUPPORT] Spark32PlusHoodieParquetFileFormat should set "SQLConf.LEGACY_PARQUET_NANOS_AS_LONG" ?

Posted by "HuangFru (via GitHub)" <gi...@apache.org>.

HuangFru commented on issue #8109:
URL: https://github.com/apache/hudi/issues/8109#issuecomment-1489958644

   I also ran into the same problem and I have the same environment as @lw-lin.
   ```
   Caused by: java.lang.IllegalArgumentException: For input string: "null"
   	at scala.collection.immutable.StringLike.parseBoolean(StringLike.scala:330)
   	at scala.collection.immutable.StringLike.toBoolean(StringLike.scala:289)
   	at scala.collection.immutable.StringLike.toBoolean$(StringLike.scala:289)
   	at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:33)
   	at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.<init>(ParquetSchemaConverter.scala:70)
   	at org.apache.spark.sql.execution.datasources.parquet.HoodieParquetFileFormatHelper$.buildImplicitSchemaChangeInfo(HoodieParquetFileFormatHelper.scala:30)
   	at org.apache.spark.sql.execution.datasources.parquet.Spark32PlusHoodieParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(Spark32PlusHoodieParquetFileFormat.scala:231)
   	at org.apache.hudi.HoodieDataSourceHelper$.$anonfun$buildHoodieParquetReader$1(HoodieDataSourceHelper.scala:71)
   	at org.apache.hudi.HoodieBaseRelation.$anonfun$createBaseFileReader$1(HoodieBaseRelation.scala:554)
   	at org.apache.hudi.HoodieBaseRelation$BaseFileReader.apply(HoodieBaseRelation.scala:613)
   	at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:87)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] jhchee commented on issue #8109: [SUPPORT] Spark32PlusHoodieParquetFileFormat should set "SQLConf.LEGACY_PARQUET_NANOS_AS_LONG" ?

Posted by "jhchee (via GitHub)" <gi...@apache.org>.

jhchee commented on issue #8109:
URL: https://github.com/apache/hudi/issues/8109#issuecomment-1512920617

   @HuangFru May I know how do you handle this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8109: [SUPPORT] Spark32PlusHoodieParquetFileFormat should set "SQLConf.LEGACY_PARQUET_NANOS_AS_LONG" ?

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8109:
URL: https://github.com/apache/hudi/issues/8109#issuecomment-1514507892

   We can add these in conf to avoid any issues until then - 
   --conf 'spark.hadoop.spark.sql.legacy.parquet.nanosAsLong=false' \
   --conf 'spark.hadoop.spark.sql.parquet.binaryAsString=false' \
   --conf 'spark.hadoop.spark.sql.parquet.int96AsTimestamp=true' \
   --conf 'spark.hadoop.spark.sql.caseSensitive=false'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org