You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2023/01/15 11:09:00 UTC
[jira] [Resolved] (SPARK-42011) Implement DataFrameReader.csv

     [ https://issues.apache.org/jira/browse/SPARK-42011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-42011.
----------------------------------
    Fix Version/s: 3.4.0
       Resolution: Fixed

Issue resolved by pull request 39559
[https://github.com/apache/spark/pull/39559]

> Implement DataFrameReader.csv
> -----------------------------
>
>                 Key: SPARK-42011
>                 URL: https://issues.apache.org/jira/browse/SPARK-42011
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Connect
>    Affects Versions: 3.4.0
>            Reporter: Hyukjin Kwon
>            Priority: Major
>             Fix For: 3.4.0
>
>
> {code}
>                                                                           
> pyspark/sql/tests/test_datasources.py:147 (DataSourcesParityTests.test_checking_csv_header)
> self = <pyspark.sql.tests.connect.test_parity_datasources.DataSourcesParityTests testMethod=test_checking_csv_header>
>     def test_checking_csv_header(self):
>         path = tempfile.mkdtemp()
>         shutil.rmtree(path)
>         try:
>             self.spark.createDataFrame([[1, 1000], [2000, 2]]).toDF("f1", "f2").write.option(
>                 "header", "true"
>             ).csv(path)
>             schema = StructType(
>                 [
>                     StructField("f2", IntegerType(), nullable=True),
>                     StructField("f1", IntegerType(), nullable=True),
>                 ]
>             )
>             df = (
> >               self.spark.read.option("header", "true")
>                 .schema(schema)
>                 .csv(path, enforceSchema=False)
>             )
> ../test_datasources.py:162: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> self = <pyspark.sql.connect.readwriter.DataFrameReader object at 0x7fb118289520>
> args = ('/var/folders/0c/q8y15ybd3tn7sr2_jmbmftr80000gp/T/tmp4kdxohcw',)
> kwargs = {'enforceSchema': False}
>     def csv(self, *args: Any, **kwargs: Any) -> None:
> >       raise NotImplementedError("csv() is not implemented.")
> E       NotImplementedError: csv() is not implemented.
> ../../connect/readwriter.py:225: NotImplementedError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org