You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Senthil Kumar (Jira)" <ji...@apache.org> on 2021/10/01 19:25:00 UTC
[jira] [Commented] (SPARK-36861) Partition columns are overly
eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423402#comment-17423402 ]
Senthil Kumar commented on SPARK-36861:
---------------------------------------
Yes in Spark 3.3 hour column is created as "DateType" but I could see hour part in subdirs created
===============
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.0-SNAPSHOT
/_/
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", 2)).toDF("hour", "i")
df: org.apache.spark.sql.DataFrame = [hour: string, i: int]
scala> df.write.partitionBy("hour").parquet("/tmp/t1")
scala> spark.read.parquet("/tmp/t1").schema
res1: org.apache.spark.sql.types.StructType = StructType(StructField(i,IntegerType,true), StructField(hour,DateType,true))
scala>
===============
and subdirs created are
===============
ls -l
total 0
-rw-r--r-- 1 senthilkumar wheel 0 Oct 2 00:44 _SUCCESS
drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T00
drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T01
drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T02
===============
It will be helpful if you share the list of sub-dirs created in your case.
> Partition columns are overly eagerly parsed as dates
> ----------------------------------------------------
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: Tanel Kiis
> Priority: Blocker
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it is parsed as date type and the hour part is lost.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org