You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hanna Liashchuk (Jira)" <ji...@apache.org> on 2022/08/05 21:06:00 UTC

[jira] [Created] (SPARK-39993) Spark on Kubernetes doesn't filter data by date

Hanna Liashchuk created SPARK-39993:
---------------------------------------

             Summary: Spark on Kubernetes doesn't filter data by date
                 Key: SPARK-39993
                 URL: https://issues.apache.org/jira/browse/SPARK-39993
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 3.2.2
         Environment: Kubernetes v1.23.6

Spark 3.2.2

Java 1.8.0_312

Python 3.9.13

Aws dependencies:
aws-java-sdk-bundle-1.11.901.jar and hadoop-aws-3.3.1.jar
            Reporter: Hanna Liashchuk


I'm creating a Dataset with type date and saving it into s3. When I read it and try to use where() clause, I've noticed it doesn't return data even though it's there

Below is the code snippet I'm running



 
{code:java}
from pyspark.sql.types import Row
from pyspark.sql.functions import *
ds = spark.range(10).withColumn("date", lit("2022-01-01")).withColumn("date", col("date").cast("date"))
ds.where("date = '2022-01-01'").show()
ds.write.mode("overwrite").parquet("s3a://bucket/test")
df = spark.read.format("parquet").load("s3a://bucket/test")
df.where("date = '2022-01-01'").show()
{code}
The first show() returns data, while the second one - no.

I've noticed that it's Kubernetes master related, as the same code snipped works ok with master "local"

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org