You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/08/31 06:01:00 UTC
[jira] [Resolved] (SPARK-32720) Spark 3 Fails to Cast DateType to StringType when comparing result of Max

     [ https://issues.apache.org/jira/browse/SPARK-32720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-32720.
----------------------------------
    Resolution: Cannot Reproduce

> Spark 3 Fails to Cast DateType to StringType when comparing result of Max
> -------------------------------------------------------------------------
>
>                 Key: SPARK-32720
>                 URL: https://issues.apache.org/jira/browse/SPARK-32720
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Markus
>            Priority: Major
>
> When reading from parquet, using basepath option, the some_date is inferred and read as a DateType column. I do a withColumn and cast as StringType. Then I do a where clause comparing swh_date to Max(swh_date) where it fails on casting from DateType to String.
> However, this works in spark 2.4. First showing what I did in spark 3, then in spark 2.4 below.
> {code:java}
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.0.0
>       /_/
> Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
> Type in expressions to have them evaluated.
> Type :help for more information.
>  scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> spark.read.option("basePath", "hdfs://<hdfs_path>").parquet("hdfs://<hdfs_path>/some_date=2020-08-15").withColumn("some_date", col("some_date").cast(StringType)).createOrReplaceTempView("testView")
> 20/08/26 23:04:47 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
> scala> spark.sql("select some_date from testView t1 where some_date = (select MAX(t2.some_date) from testView t2)")
> res2: org.apache.spark.sql.DataFrame = [some_date: string]
> scala> res2.count
> 20/08/26 23:05:31 WARN UnsafeProjection: Expr codegen error and falling back to interpreter mode
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:529)
>   at scala.None$.get(Option.scala:527)
>   at org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId(datetimeExpressions.scala:56)
>   at org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId$(datetimeExpressions.scala:56)
>   at org.apache.spark.sql.catalyst.expressions.CastBase.zoneId$lzycompute(Cast.scala:253)
>   at org.apache.spark.sql.catalyst.expressions.CastBase.zoneId(Cast.scala:253)
>   at org.apache.spark.sql.catalyst.expressions.CastBase.dateFormatter$lzycompute(Cast.scala:287)
>   at org.apache.spark.sql.catalyst.expressions.CastBase.dateFormatter(Cast.scala:287)
>   at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$6(Cast.scala:295)
>   at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$6$adapted(Cast.scala:295)
>   at org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:285)
>   at org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$5(Cast.scala:295)
>   at org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:815)
>   at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:461)
>   at org.apache.spark.sql.catalyst.expressions.InterpretedUnsafeProjection.apply(InterpretedUnsafeProjection.scala:76)
>   at org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:329)
>   at org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:113)
>   at org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:926)
>   at org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:914)
>   at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:96)
>   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:182)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745) {code}
> {code:java}
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
>       /_/
> Using Scala version 2.12.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> spark.read.option("basePath", "hdfs://<hdfsPath>").parquet("<hdfsPath>/some_date=2020-08-15").withColumn("some_date", col("some_date").cast(StringType)).createOrReplaceTempView("testView")
> 20/08/26 16:20:13 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
> scala> spark.sql("select some_date from testView t1 where some_date = (select MAX(t2.some_date) from testView t2)")
> res1: org.apache.spark.sql.DataFrame = [some_date: string]
> scala> res1.count
> res2: Long = 10{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org