You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2023/06/02 00:08:49 UTC

[spark] branch master updated: [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame`

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new f2f6272dd97 [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame`
f2f6272dd97 is described below

commit f2f6272dd97755760b28e623efa4cd258513f5cb
Author: Beishao Cao <be...@databricks.com>
AuthorDate: Fri Jun 2 09:08:34 2023 +0900

    [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame`
    
    ### What changes were proposed in this pull request?
    Add a _ipython_key_completions_() method on Python DataFrame class to return column names. Main benefit of this is that IPython autocomplete engine(or any other using IPython e.g. IPython kernel, Databricks Notebooks) to get autocomplete suggestions for [] will suggest column names on the completion df[|].
    
    ### Why are the changes needed?
    For those who use IPython as autocomplete engine can get column name as suggested for df[|]. Increases productivity for anyone who uses an autocomplete engine on pyspark code.
    Example:
    
    https://github.com/apache/spark/assets/109033553/dd575144-bb87-47a9-8387-de2e51f1c8e2
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    New doctest
    <img width="1030" alt="Screenshot 2023-05-30 at 5 11 23 PM" src="https://github.com/apache/spark/assets/109033553/4b3a89c0-edf4-4ad2-80bf-2bba3824456f">
    
    Test in databricks notebook:
    
    ```
    class DataFrameWithColAttrs(DataFrame):
      def __init__(self, df):
        super().__init__(df._jdf, df._sql_ctx if df._sql_ctx else df._session)
    
      def _ipython_key_completions_(self):
        return self.columns
    ```
    
    Closes #41396 from BeishaoCao-db/SPARK-43892.
    
    Authored-by: Beishao Cao <be...@databricks.com>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 python/pyspark/sql/dataframe.py | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 12c445de21d..884dc997792 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -4875,6 +4875,22 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
             self._jdf.stat().freqItems(_to_seq(self._sc, cols), support), self.sparkSession
         )
 
+    def _ipython_key_completions_(self) -> List[str]:
+        """Returns the names of columns in this :class:`DataFrame`.
+
+        Examples
+        --------
+        >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], ["age", "name"])
+        >>> df._ipython_key_completions_()
+        ['age', 'name']
+
+        Would return illegal identifiers.
+        >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], ["age 1", "name?1"])
+        >>> df._ipython_key_completions_()
+        ['age 1', 'name?1']
+        """
+        return self.columns
+
     def withColumns(self, *colsMap: Dict[str, Column]) -> "DataFrame":
         """
         Returns a new :class:`DataFrame` by adding multiple columns or replacing the


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org