You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2023/01/12 02:47:00 UTC

[jira] [Created] (SPARK-42001) Unexpected schema set to DefaultSource plan (ReadwriterTests.test_save_and_load)

Hyukjin Kwon created SPARK-42001:
------------------------------------

             Summary: Unexpected schema set to DefaultSource plan (ReadwriterTests.test_save_and_load)
                 Key: SPARK-42001
                 URL: https://issues.apache.org/jira/browse/SPARK-42001
             Project: Spark
          Issue Type: Sub-task
          Components: Connect
    Affects Versions: 3.4.0
            Reporter: Hyukjin Kwon


{code}
                                                                                
pyspark/sql/tests/test_readwriter.py:28 (ReadwriterParityTests.test_save_and_load)
self = <pyspark.sql.tests.connect.test_parity_readwriter.ReadwriterParityTests testMethod=test_save_and_load>

    def test_save_and_load(self):
        df = self.df
        tmpPath = tempfile.mkdtemp()
        shutil.rmtree(tmpPath)
        df.write.json(tmpPath)
        actual = self.spark.read.json(tmpPath)
        self.assertEqual(sorted(df.collect()), sorted(actual.collect()))
    
        schema = StructType([StructField("value", StringType(), True)])
        actual = self.spark.read.json(tmpPath, schema)
>       self.assertEqual(sorted(df.select("value").collect()), sorted(actual.collect()))

../test_readwriter.py:39: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../connect/dataframe.py:1246: in collect
    query = self._plan.to_proto(self._session.client)
../../connect/plan.py:93: in to_proto
    plan.root.CopyFrom(self.plan(session))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pyspark.sql.connect.plan.DataSource object at 0x7fe0d09c22b0>
session = <pyspark.sql.connect.client.SparkConnectClient object at 0x7fe0d069b5b0>

    def plan(self, session: "SparkConnectClient") -> proto.Relation:
        plan = proto.Relation()
        if self.format is not None:
            plan.read.data_source.format = self.format
        if self.schema is not None:
>           plan.read.data_source.schema = self.schema
E           TypeError: StructType([StructField('value', StringType(), True)]) has type StructType, but expected one of: bytes, unicode

../../connect/plan.py:246: TypeError
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org