You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/06/01 10:49:00 UTC

[jira] [Commented] (SPARK-31885) Incorrect filtering of old millis timestamp in parquet

    [ https://issues.apache.org/jira/browse/SPARK-31885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120922#comment-17120922 ] 

Apache Spark commented on SPARK-31885:
--------------------------------------

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28693

> Incorrect filtering of old millis timestamp in parquet
> ------------------------------------------------------
>
>                 Key: SPARK-31885
>                 URL: https://issues.apache.org/jira/browse/SPARK-31885
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.1.0
>            Reporter: Maxim Gekk
>            Priority: Major
>
> {code:scala}
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
>       /_/
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS")
> scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED")
> scala> Seq(java.sql.Timestamp.valueOf("1000-06-14 08:28:53.123")).toDF("ts").write.mode("overwrite").parquet("/Users/maximgekk/tmp/ts_millis_old_filter")
> scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").show(false)
> +-----------------------+
> |ts                     |
> +-----------------------+
> |1000-06-14 08:28:53.123|
> +-----------------------+
> scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123")
> res6: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [ts: timestamp]
> scala> spark.read.parquet("/Users/maximgekk/tmp/ts_millis_old_filter").filter($"ts" === "1000-06-14 08:28:53.123").show(false)
> +---+
> |ts |
> +---+
> +---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org