You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sumeet (Jira)" <ji...@apache.org> on 2020/08/13 21:05:00 UTC
[jira] [Updated] (SPARK-32611) Querying ORC table in Spark3 using
spark.sql.orc.impl=hive produces incorrect when timestamp is present in
predicate
[ https://issues.apache.org/jira/browse/SPARK-32611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sumeet updated SPARK-32611:
---------------------------
Description:
*How to reproduce this behavior?*
* TZ="America/Los_Angeles" ./bin/spark-shell
* sql("set spark.sql.hive.convertMetastoreOrc=true")
* sql("set spark.sql.orc.impl=hive")
* sql("create table t_spark(col timestamp) stored as orc;")
* sql("insert into t_spark values (cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp));")
* sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
*This will return empty results, which is incorrect.*
* sql("set spark.sql.orc.impl=native")
* sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
*This will return 1 row, which is the expected output.*
The above query using (True, hive) returns *correct results if pushdown filters are turned off*.
* sql("set spark.sql.orc.filterPushdown=false")
* sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
*This will return 1 row, which is the expected output.*
was:
*How to reproduce this behavior?*
* TZ="America/Los_Angeles" ./bin/spark-shell --conf spark.sql.catalogImplementation=hive
* sql("set spark.sql.hive.convertMetastoreOrc=true")
* sql("set spark.sql.orc.impl=hive")
* sql("create table t_spark(col timestamp) stored as orc;")
* sql("insert into t_spark values (cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp));")
* sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
*This will return empty results, which is incorrect.*
* sql("set spark.sql.orc.impl=native")
* sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
*This will return 1 row, which is the expected output.*
The above query using (True, hive) returns *correct results if pushdown filters are turned off*.
* sql("set spark.sql.orc.filterPushdown=false")
* sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
*This will return 1 row, which is the expected output.*
> Querying ORC table in Spark3 using spark.sql.orc.impl=hive produces incorrect when timestamp is present in predicate
> --------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-32611
> URL: https://issues.apache.org/jira/browse/SPARK-32611
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0, 3.0.1
> Reporter: Sumeet
> Priority: Major
>
> *How to reproduce this behavior?*
> * TZ="America/Los_Angeles" ./bin/spark-shell
> * sql("set spark.sql.hive.convertMetastoreOrc=true")
> * sql("set spark.sql.orc.impl=hive")
> * sql("create table t_spark(col timestamp) stored as orc;")
> * sql("insert into t_spark values (cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp));")
> * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
> *This will return empty results, which is incorrect.*
> * sql("set spark.sql.orc.impl=native")
> * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
> *This will return 1 row, which is the expected output.*
>
> The above query using (True, hive) returns *correct results if pushdown filters are turned off*.
> * sql("set spark.sql.orc.filterPushdown=false")
> * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false)
> *This will return 1 row, which is the expected output.*
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org