You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2022/05/23 00:07:00 UTC

[jira] [Resolved] (SPARK-39241) Spark SQL 'Like' operator behaves wrongly while filtering on partitioned column after Spark 3.1

     [ https://issues.apache.org/jira/browse/SPARK-39241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-39241.
----------------------------------
    Resolution: Cannot Reproduce

> Spark SQL 'Like' operator behaves wrongly while filtering on partitioned column after Spark 3.1
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39241
>                 URL: https://issues.apache.org/jira/browse/SPARK-39241
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.2
>         Environment: *Environment: EMR*
> Release label:emr-6.5.0
> Hadoop distribution:Amazon 3.2.1
> Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1
>            Reporter: Dmitry Gorbatsevich
>            Priority: Major
>
> It seems like introduction of "like any" in spark 3.1 breaks "like" behaviour when filtering on partitioned column. Here is the example:
> 1. Create test table:
> {code:java}
> scala> spark.sql(
>      | """
>      | CREATE EXTERNAL TABLE tmp(
>      |         f1 STRING
>      |     )
>      |     PARTITIONED BY (dt STRING)
>      |     ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>      |     LINES TERMINATED BY '\n'
>      |     STORED AS TEXTFILE
>      |     LOCATION 's3://vlg-data-us-east-1/tmp/tmp/';
>      | """) 
> res2: org.apache.spark.sql.DataFrame = []{code}
> 2. insert something there:
> {code:java}
> scala> spark.sql(
>      | """
>      |     insert into table tmp partition(dt="2022051000") values("1")
>      | """
>      | )
> res3: org.apache.spark.sql.DataFrame = [] {code}
> 3. Do select using 'like':
> {code:java}
> scala> spark.sql(
>      |     """
>      |         select * from tmp
>      |         where dt like '202205100%'
>      |     """
>      |     ).show()
> +---+---+
> | f1| dt|
> +---+---+
> +---+---+ {code}
> 4. Do select using 'like any':
> {code:java}
> scala> spark.sql(
>      |     """
>      |         select * from tmp
>      |         where dt like any ('202205100%')
>      |     """
>      |     ).show()
> 22/05/20 14:50:26 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
> +---+----------+
> | f1|        dt|
> +---+----------+
> |  1|2022051000|
> +---+----------+ {code}
> Expectation is that results 3 and 4 are identical, however this is not the case and result #3 is obviously wrong. 
>  
> *Environment: EMR*
> Release label:emr-6.5.0
> Hadoop distribution:Amazon 3.2.1
> Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org