You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "shezm (Jira)" <ji...@apache.org> on 2022/07/28 00:01:00 UTC
[jira] [Commented] (SPARK-39900) Issue with querying dataframe produced by 'binaryFile' format using 'not' operator
[ https://issues.apache.org/jira/browse/SPARK-39900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572166#comment-17572166 ]
shezm commented on SPARK-39900:
-------------------------------
I can try to fix this issue.
> Issue with querying dataframe produced by 'binaryFile' format using 'not' operator
> ----------------------------------------------------------------------------------
>
> Key: SPARK-39900
> URL: https://issues.apache.org/jira/browse/SPARK-39900
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.2.1, 3.3.0
> Reporter: Benoit Roy
> Priority: Minor
>
> When creating a dataframe using the binaryFile format I am encountering weird result when filtering/query with the 'not' operator.
>
> Here's a repo that will help describe and reproduce the issue.
> [https://github.com/cccs-br/spark-binaryfile-issue]
> {code:java}
> git@github.com:cccs-br/spark-binaryfile-issue.git {code}
>
> Here's a very simple test case that illustrate what's going on:
> [https://github.com/cccs-br/spark-binaryfile-issue/blob/main/src/test/scala/BinaryFileSuite.scala]
> TLDR;
> {code:java}
> test("binary file dataframe") {
> // load files in directly into df using 'binaryFile' format.
> //
> // - src/test/resources/files/
> // - test1.csv
> // - test2.json
> // - test3.txt
> val df = spark
> .read
> .format("binaryFile")
> .load("src/test/resources/files")
> df.createOrReplaceTempView("files")
> // This works as expected.
> val like_count = spark.sql("select * from files where path like '%.csv'").count()
> assert(like_count === 1)
> // This does not work as expected.
> val not_like_count = spark.sql("select * from files where path not like '%.csv'").count()
> assert(not_like_count === 2)
> // This used to work in 3.2.1
> // df.filter(col("path").endsWith(".csv") === false).show()
> }{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org