You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/29 08:10:31 UTC

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38801: [SPARK-41317][CONNECT][PYTHON] Add basic support for DataFrameWriter

HyukjinKwon commented on code in PR #38801:
URL: https://github.com/apache/spark/pull/38801#discussion_r1034418111


##########
python/pyspark/sql/connect/readwriter.py:
##########
@@ -140,9 +161,891 @@ def load(
             self.option("path", path)
 
         plan = DataSource(format=self._format, schema=self._schema, options=self._options)
-        df = DataFrame.withPlan(plan, self._client)
-        return df
+        return self._df(plan)
+
+    def _df(self, plan: LogicalPlan) -> "DataFrame":
+        # The import is needed here to avoid circular import issues.
+        from pyspark.sql.connect.dataframe import DataFrame
+
+        return DataFrame.withPlan(plan, self._client)
 
     def table(self, tableName: str) -> "DataFrame":
-        df = DataFrame.withPlan(Read(tableName), self._client)
-        return df
+        return self._df(Read(tableName))
+
+
+class DataFrameWriter(OptionUtils):
+    """
+    Interface used to write a :class:`DataFrame` to external storage systems
+    (e.g. file systems, key-value stores, etc). Use :attr:`DataFrame.write`
+    to access this.
+
+    .. versionadded:: 3.4.0
+    """
+
+    def __init__(self, plan: "LogicalPlan", session: "SparkSession"):
+        self._df: "LogicalPlan" = plan
+        self._spark: "SparkSession" = session
+        self._write: "WriteOperation" = WriteOperation(self._df)
+
+    def mode(self, saveMode: Optional[str]) -> "DataFrameWriter":
+        """Specifies the behavior when data or table already exists.
+
+        Options include:
+
+        * `append`: Append contents of this :class:`DataFrame` to existing data.
+        * `overwrite`: Overwrite existing data.
+        * `error` or `errorifexists`: Throw an exception if data already exists.
+        * `ignore`: Silently ignore this operation if data already exists.
+
+        .. versionadded:: 3.4.0
+
+        Examples
+        --------
+        Raise an error when writing to an existing path.
+
+        >>> import tempfile

Review Comment:
   Quick comment.. We should 1. add set up the globals e.g., https://github.com/apache/spark/blob/master/python/pyspark/sql/readwriter.py#L1969-L1994 and 2. add it into https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L506-L508 to run them as doctests ..
   
   Would have to be done separately I guess.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org