You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/12 05:54:23 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request, #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

HyukjinKwon opened a new pull request, #37490:
URL: https://github.com/apache/spark/pull/37490

   ### What changes were proposed in this pull request?
   
   This PR proposes to improve the examples in `pyspark.sql.streaming.catalog` by making each example self-contained with a brief explanation and a bit more realistic example.
   
   ### Why are the changes needed?
   
   To make the documentation more readable and able to copy and paste directly in PySpark shell.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it changes the documentation
   
   ### How was this patch tested?
   
   Manually ran each doctests. CI also runs this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r944145436


##########
python/pyspark/sql/catalog.py:
##########
@@ -123,19 +132,52 @@ def listCatalogs(self) -> List[CatalogMetadata]:
             catalogs.append(CatalogMetadata(name=jcatalog.name, description=jcatalog.description))
         return catalogs
 
-    @since(2.0)

Review Comment:
   Because it breaks the docstring format. `since` adds the `versionadded` directive at the end of docstring which does not work in NumPy documentation style when `Parameters`, etc are specified. We should remove these all eventually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

viirya commented on PR #37490:
URL: https://github.com/apache/spark/pull/37490#issuecomment-1214417949

   > This PR proposes to improve the examples in pyspark.sql.streaming.catalog by making each example self-contained with a brief explanation and a bit more realistic example.
   
   `pyspark.sql.streaming.catalog` or `pyspark.sql.catalog`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r944142452


##########
python/pyspark/sql/catalog.py:
##########
@@ -123,19 +132,52 @@ def listCatalogs(self) -> List[CatalogMetadata]:
             catalogs.append(CatalogMetadata(name=jcatalog.name, description=jcatalog.description))
         return catalogs
 
-    @since(2.0)

Review Comment:
   why do we remove `since`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r944473956


##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +875,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...

Review Comment:
   We can do. But the actual exception class being thrown is `pyspark.sql.utils.AnalysisException` to PySpark users. I think it's fine to show the class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on PR #37490:
URL: https://github.com/apache/spark/pull/37490#issuecomment-1212756486

   cc @cloud-fan @zhengruifeng @amaliujia @viirya mind taking a look when you find some time please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

viirya commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945320563


##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +868,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.isCached("spark_catalog.default.tbl1")
+        True
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         return self._jcatalog.isCached(tableName)
 
-    @since(2.0)
     def cacheTable(self, tableName: str) -> None:
         """Caches the specified table in-memory.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.cacheTable("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.cacheTable("spark_catalog.default.tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.cacheTable(tableName)
 
-    @since(2.0)
     def uncacheTable(self, tableName: str) -> None:
         """Removes the specified table from the in-memory cache.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        False
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.uncacheTable("not_existing_table")  # doctest: +IGNORE_EXCEPTION_DETAIL
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.uncacheTable("spark_catalog.default.tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        False
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.uncacheTable(tableName)
 
-    @since(2.0)
     def clearCache(self) -> None:
-        """Removes all cached tables from the in-memory cache."""
+        """Removes all cached tables from the in-memory cache.
+
+        .. versionadded:: 2.0.0
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.clearCache()
+        >>> spark.catalog.isCached("tbl1")
+        False
+        >>> _ = spark.sql("DROP TABLE tbl1")
+        """
         self._jcatalog.clearCache()
 
-    @since(2.0)
     def refreshTable(self, tableName: str) -> None:
         """Invalidates and refreshes all the cached data and metadata of the given table.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        The example below caches a table, and then remove the data.
+
+        >>> import tempfile
+        >>> with tempfile.TemporaryDirectory() as d:
+        ...     _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        ...     _ = spark.sql("CREATE TABLE tbl1 (col STRING) USING TEXT LOCATION '{}'".format(d))
+        ...     _ = spark.sql("INSERT INTO tbl1 SELECT 'abc'")
+        ...     spark.catalog.cacheTable("tbl1")
+        ...     spark.table("tbl1").show()
+        +---+
+        |col|
+        +---+
+        |abc|
+        +---+
+
+        Because the table is cached, it computes from the cached data as below.
+
+        >>> spark.table("tbl1").count()
+        1
+
+        After refreshing the table, it shows 0 because the data does not exist anymore.
+
+        >>> spark.catalog.refreshTable("tbl1")
+        >>> spark.table("tbl1").count()
+        0
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.refreshTable("spark_catalog.default.tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.refreshTable(tableName)
 
-    @since("2.1.1")
     def recoverPartitions(self, tableName: str) -> None:
         """Recovers all the partitions of the given table and update the catalog.
 
+        .. versionadded:: 2.1.1
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+        Notes
+        -----
         Only works with a partitioned table, and not a view.
+
+        Examples
+        --------
+        The example below creates a partitioned table against the existing directory of
+        the partitioned table. After that, it recovers the partitions.
+
+        >>> import tempfile
+        >>> with tempfile.TemporaryDirectory() as d:
+        ...     _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        ...     spark.range(1).selectExpr(
+        ...         "id as key", "id as value").write.partitionBy("key").mode("overwrite").save(d)
+        ...     _ = spark.sql(
+        ...          "CREATE TABLE tbl1 (key LONG, value LONG)"
+        ...          "USING parquet OPTIONS (path '{}') PARTITIONED BY (key)".format(d))
+        ...     spark.table("tbl1").show()
+        ...     spark.catalog.recoverPartitions("tbl1")
+        ...     spark.table("tbl1").show()
+        +-----+---+
+        |value|key|
+        +-----+---+
+        +-----+---+
+        +-----+---+
+        |value|key|
+        +-----+---+
+        |    0|  0|
+        +-----+---+
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.recoverPartitions(tableName)
 
-    @since("2.2.0")
     def refreshByPath(self, path: str) -> None:
         """Invalidates and refreshes all the cached data (and the associated metadata) for any
         DataFrame that contains the given data source path.
+
+        .. versionadded:: 2.2.0
+
+        Parameters
+        ----------
+        path : str
+            the path to refresh the cache.
+
+        Examples
+        --------
+        The example below caches a table, and then remove the data.

Review Comment:
   ```suggestion
           The example below caches a table, and then removes the data.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

viirya commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945320272


##########
python/pyspark/sql/catalog.py:
##########
@@ -613,47 +778,76 @@ def createTable(
             df = self._jcatalog.createTable(tableName, source, scala_datatype, description, options)
         return DataFrame(df, self._sparkSession)
 
-    def dropTempView(self, viewName: str) -> None:
+    def dropTempView(self, viewName: str) -> bool:
         """Drops the local temporary view with the given view name in the catalog.
         If the view has been cached before, then it will also be uncached.
         Returns true if this view is dropped successfully, false otherwise.
 
         .. versionadded:: 2.0.0
 
-        Notes
-        -----
-        The return type of this method was None in Spark 2.0, but changed to Boolean
-        in Spark 2.1.
+        Parameters
+        ----------
+        viewName : str
+            name of the temporary view to drop.
+
+        Returns
+        -------
+        bool
+            If the temporary view was successfully drooped or not.
+
+            .. versionadded:: 2.1.0
+                The return type of this method was ``None`` in Spark 2.0, but changed to ``bool``
+                in Spark 2.1.
 
         Examples
         --------
         >>> spark.createDataFrame([(1, 1)]).createTempView("my_table")
-        >>> spark.table("my_table").collect()
-        [Row(_1=1, _2=1)]
+
+        Droppping the temporary view.
+
         >>> spark.catalog.dropTempView("my_table")
         True
-        >>> spark.table("my_table") # doctest: +IGNORE_EXCEPTION_DETAIL
+
+        Throw an exception if the temporary view does not exists.s
+
+        >>> spark.table("my_table")
         Traceback (most recent call last):
             ...
         AnalysisException: ...
         """
         return self._jcatalog.dropTempView(viewName)
 
-    def dropGlobalTempView(self, viewName: str) -> None:
+    def dropGlobalTempView(self, viewName: str) -> bool:
         """Drops the global temporary view with the given view name in the catalog.
-        If the view has been cached before, then it will also be uncached.
-        Returns true if this view is dropped successfully, false otherwise.
 
         .. versionadded:: 2.1.0
 
+        Parameters
+        ----------
+        viewName : str
+            name of the global view to drop.
+
+        Returns
+        -------
+        bool
+            If the global view was successfully drooped or not.
+
+        Notes
+        -----
+        If the view has been cached before, then it will also be uncached.
+
         Examples
         --------
         >>> spark.createDataFrame([(1, 1)]).createGlobalTempView("my_table")
-        >>> spark.table("global_temp.my_table").collect()
-        [Row(_1=1, _2=1)]
+
+        Droppping the global view.
+
         >>> spark.catalog.dropGlobalTempView("my_table")
         True
-        >>> spark.table("global_temp.my_table") # doctest: +IGNORE_EXCEPTION_DETAIL
+
+        Throw an exception if the global view does not exists.s

Review Comment:
   ```suggestion
           Throw an exception if the global view does not exists.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

amaliujia commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r944730697


##########
python/pyspark/sql/catalog.py:
##########
@@ -108,13 +108,22 @@ def setCurrentCatalog(self, catalogName: str) -> None:
         ----------
         catalogName : str
             name of the catalog to set
+
+        Examples
+        --------
+        >>> spark.catalog.setCurrentCatalog("spark_catalog")
         """
         return self._jcatalog.setCurrentCatalog(catalogName)
 
     def listCatalogs(self) -> List[CatalogMetadata]:
         """Returns a list of catalogs in this session.
 
         .. versionadded:: 3.4.0
+
+        Returns
+        -------
+        list
+            A list of :class:`CatalogMetadata`.

Review Comment:
   Seeing you add 
   ```
           >>> spark.catalog.listDatabases()
           [Database(name='default', catalog='spark_catalog', description='default database', ...
   ```
   
   Why don't do the same for this API?



##########
python/pyspark/sql/catalog.py:
##########
@@ -251,19 +329,31 @@ def getTable(self, tableName: str) -> Table:
         Parameters
         ----------
         tableName : str
-                    name of the table to check existence.
+            name of the table to get.

Review Comment:
   add
   ```
               .. versionchanged:: 3.4.0
                  Allow `tableName` to be qualified with catalog name.
   ```
   ?



##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +876,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.isCached("spark_catalog.default.tbl1")
+        True
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         return self._jcatalog.isCached(tableName)
 
-    @since(2.0)
     def cacheTable(self, tableName: str) -> None:
         """Caches the specified table in-memory.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.cacheTable("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.cacheTable("spark_catalog.default.tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.cacheTable(tableName)
 
-    @since(2.0)
     def uncacheTable(self, tableName: str) -> None:
         """Removes the specified table from the in-memory cache.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        False
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.uncacheTable("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.uncacheTable("spark_catalog.default.tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        False
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.uncacheTable(tableName)
 
-    @since(2.0)
     def clearCache(self) -> None:
-        """Removes all cached tables from the in-memory cache."""
+        """Removes all cached tables from the in-memory cache.
+
+        .. versionadded:: 2.0.0
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.clearCache()
+        >>> spark.catalog.isCached("tbl1")
+        False
+        >>> _ = spark.sql("DROP TABLE tbl1")
+        """
         self._jcatalog.clearCache()
 
-    @since(2.0)
     def refreshTable(self, tableName: str) -> None:
         """Invalidates and refreshes all the cached data and metadata of the given table.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        The example below caches a table, and then remove the data.
+
+        >>> import tempfile
+        >>> with tempfile.TemporaryDirectory() as d:
+        ...     _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        ...     _ = spark.sql("CREATE TABLE tbl1 (col STRING) USING TEXT LOCATION '{}'".format(d))
+        ...     _ = spark.sql("INSERT INTO tbl1 SELECT 'abc'")
+        ...     spark.catalog.cacheTable("tbl1")
+        ...     spark.table("tbl1").show()
+        +---+
+        |col|
+        +---+
+        |abc|
+        +---+
+
+        Because the table is cached, it computes from the cached data as below.
+
+        >>> spark.table("tbl1").count()
+        1
+
+        After refreshing the table, it shows 0 because the data does not exist anymore.
+
+        >>> spark.catalog.refreshTable("tbl1")

Review Comment:
   Data files were removed because `tempfile.TemporaryDirectory()` and once the lifecycle of it ends, the directory is removed?



##########
python/pyspark/sql/catalog.py:
##########
@@ -613,47 +786,76 @@ def createTable(
             df = self._jcatalog.createTable(tableName, source, scala_datatype, description, options)
         return DataFrame(df, self._sparkSession)
 
-    def dropTempView(self, viewName: str) -> None:
+    def dropTempView(self, viewName: str) -> bool:

Review Comment:
   Is this a API change or user behavior change?



##########
python/pyspark/sql/catalog.py:
##########
@@ -327,25 +434,33 @@ def functionExists(self, functionName: str, dbName: Optional[str] = None) -> boo
         ----------
         functionName : str
             name of the function to check existence
+
+            .. versionchanged:: 3.4.0
+               Allow ``functionName`` to be qualified with catalog name
+
         dbName : str, optional
             name of the database to check function existence in.
-            If no database is specified, the current database is used
 
            .. deprecated:: 3.4.0

Review Comment:
   Same for other places if this makes sense.



##########
python/pyspark/sql/catalog.py:
##########
@@ -327,25 +434,33 @@ def functionExists(self, functionName: str, dbName: Optional[str] = None) -> boo
         ----------
         functionName : str
             name of the function to check existence
+
+            .. versionchanged:: 3.4.0
+               Allow ``functionName`` to be qualified with catalog name
+
         dbName : str, optional
             name of the database to check function existence in.
-            If no database is specified, the current database is used
 
            .. deprecated:: 3.4.0

Review Comment:
   Do not mention this is `deprecated`? 
   
   On the scala side, per the discussion with community, we don't use `@deprecated` annotation but just recommend user to use another API choice: https://github.com/databricks/runtime/blob/ef75e00fb2bd8d30aafae1eb281e6e9d0432d590/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L221



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945084295


##########
python/pyspark/sql/catalog.py:
##########
@@ -613,47 +786,76 @@ def createTable(
             df = self._jcatalog.createTable(tableName, source, scala_datatype, description, options)
         return DataFrame(df, self._sparkSession)
 
-    def dropTempView(self, viewName: str) -> None:
+    def dropTempView(self, viewName: str) -> bool:

Review Comment:
   Nope, this is a bug fix. It just fixes the type hint which does not break something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon closed pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained
URL: https://github.com/apache/spark/pull/37490


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

viirya commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945320319


##########
python/pyspark/sql/catalog.py:
##########
@@ -613,47 +778,76 @@ def createTable(
             df = self._jcatalog.createTable(tableName, source, scala_datatype, description, options)
         return DataFrame(df, self._sparkSession)
 
-    def dropTempView(self, viewName: str) -> None:
+    def dropTempView(self, viewName: str) -> bool:
         """Drops the local temporary view with the given view name in the catalog.
         If the view has been cached before, then it will also be uncached.
         Returns true if this view is dropped successfully, false otherwise.
 
         .. versionadded:: 2.0.0
 
-        Notes
-        -----
-        The return type of this method was None in Spark 2.0, but changed to Boolean
-        in Spark 2.1.
+        Parameters
+        ----------
+        viewName : str
+            name of the temporary view to drop.
+
+        Returns
+        -------
+        bool
+            If the temporary view was successfully drooped or not.
+
+            .. versionadded:: 2.1.0
+                The return type of this method was ``None`` in Spark 2.0, but changed to ``bool``
+                in Spark 2.1.
 
         Examples
         --------
         >>> spark.createDataFrame([(1, 1)]).createTempView("my_table")
-        >>> spark.table("my_table").collect()
-        [Row(_1=1, _2=1)]
+
+        Droppping the temporary view.
+
         >>> spark.catalog.dropTempView("my_table")
         True
-        >>> spark.table("my_table") # doctest: +IGNORE_EXCEPTION_DETAIL
+
+        Throw an exception if the temporary view does not exists.s
+
+        >>> spark.table("my_table")
         Traceback (most recent call last):
             ...
         AnalysisException: ...
         """
         return self._jcatalog.dropTempView(viewName)
 
-    def dropGlobalTempView(self, viewName: str) -> None:
+    def dropGlobalTempView(self, viewName: str) -> bool:
         """Drops the global temporary view with the given view name in the catalog.
-        If the view has been cached before, then it will also be uncached.
-        Returns true if this view is dropped successfully, false otherwise.
 
         .. versionadded:: 2.1.0
 
+        Parameters
+        ----------
+        viewName : str
+            name of the global view to drop.
+
+        Returns
+        -------
+        bool
+            If the global view was successfully drooped or not.

Review Comment:
   ```suggestion
               If the global view was successfully dropped or not.
   ```



##########
python/pyspark/sql/catalog.py:
##########
@@ -613,47 +778,76 @@ def createTable(
             df = self._jcatalog.createTable(tableName, source, scala_datatype, description, options)
         return DataFrame(df, self._sparkSession)
 
-    def dropTempView(self, viewName: str) -> None:
+    def dropTempView(self, viewName: str) -> bool:
         """Drops the local temporary view with the given view name in the catalog.
         If the view has been cached before, then it will also be uncached.
         Returns true if this view is dropped successfully, false otherwise.
 
         .. versionadded:: 2.0.0
 
-        Notes
-        -----
-        The return type of this method was None in Spark 2.0, but changed to Boolean
-        in Spark 2.1.
+        Parameters
+        ----------
+        viewName : str
+            name of the temporary view to drop.
+
+        Returns
+        -------
+        bool
+            If the temporary view was successfully drooped or not.
+
+            .. versionadded:: 2.1.0
+                The return type of this method was ``None`` in Spark 2.0, but changed to ``bool``
+                in Spark 2.1.
 
         Examples
         --------
         >>> spark.createDataFrame([(1, 1)]).createTempView("my_table")
-        >>> spark.table("my_table").collect()
-        [Row(_1=1, _2=1)]
+
+        Droppping the temporary view.
+
         >>> spark.catalog.dropTempView("my_table")
         True
-        >>> spark.table("my_table") # doctest: +IGNORE_EXCEPTION_DETAIL
+
+        Throw an exception if the temporary view does not exists.s

Review Comment:
   ```suggestion
           Throw an exception if the temporary view does not exists.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

amaliujia commented on PR #37490:
URL: https://github.com/apache/spark/pull/37490#issuecomment-1212760824

   thanks for working on these!  A lot of good examples! I will have a pass on this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945084386


##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +876,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.isCached("spark_catalog.default.tbl1")
+        True
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         return self._jcatalog.isCached(tableName)
 
-    @since(2.0)
     def cacheTable(self, tableName: str) -> None:
         """Caches the specified table in-memory.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.cacheTable("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.cacheTable("spark_catalog.default.tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.cacheTable(tableName)
 
-    @since(2.0)
     def uncacheTable(self, tableName: str) -> None:
         """Removes the specified table from the in-memory cache.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        False
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.uncacheTable("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.uncacheTable("spark_catalog.default.tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        False
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.uncacheTable(tableName)
 
-    @since(2.0)
     def clearCache(self) -> None:
-        """Removes all cached tables from the in-memory cache."""
+        """Removes all cached tables from the in-memory cache.
+
+        .. versionadded:: 2.0.0
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.clearCache()
+        >>> spark.catalog.isCached("tbl1")
+        False
+        >>> _ = spark.sql("DROP TABLE tbl1")
+        """
         self._jcatalog.clearCache()
 
-    @since(2.0)
     def refreshTable(self, tableName: str) -> None:
         """Invalidates and refreshes all the cached data and metadata of the given table.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        The example below caches a table, and then remove the data.
+
+        >>> import tempfile
+        >>> with tempfile.TemporaryDirectory() as d:
+        ...     _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        ...     _ = spark.sql("CREATE TABLE tbl1 (col STRING) USING TEXT LOCATION '{}'".format(d))
+        ...     _ = spark.sql("INSERT INTO tbl1 SELECT 'abc'")
+        ...     spark.catalog.cacheTable("tbl1")
+        ...     spark.table("tbl1").show()
+        +---+
+        |col|
+        +---+
+        |abc|
+        +---+
+
+        Because the table is cached, it computes from the cached data as below.
+
+        >>> spark.table("tbl1").count()
+        1
+
+        After refreshing the table, it shows 0 because the data does not exist anymore.
+
+        >>> spark.catalog.refreshTable("tbl1")

Review Comment:
   Yup, `TemporaryDirectory` gets removed when `with` block is finished.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r944500663


##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +876,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.isCached("spark_catalog.default.tbl1")
+        True
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         return self._jcatalog.isCached(tableName)
 
-    @since(2.0)
     def cacheTable(self, tableName: str) -> None:
         """Caches the specified table in-memory.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.cacheTable("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.cacheTable("spark_catalog.default.tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.cacheTable(tableName)
 
-    @since(2.0)
     def uncacheTable(self, tableName: str) -> None:
         """Removes the specified table from the in-memory cache.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        False
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.uncacheTable("not_existing_table")  # doctest: +IGNORE_EXCEPTION_DETAIL

Review Comment:
   ```suggestion
           >>> spark.catalog.uncacheTable("not_existing_table")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r944206889


##########
python/pyspark/sql/catalog.py:
##########
@@ -184,38 +234,65 @@ def databaseExists(self, dbName: str) -> bool:
         Parameters
         ----------
         dbName : str
-             name of the database to check existence
+            name of the database to check existence
+
+            .. versionchanged:: 3.4.0
+               Allow ``dbName`` to be qualified with catalog name.
 
         Returns
         -------
         bool
             Indicating whether the database exists
 
-        .. versionchanged:: 3.4
-           Allowed ``dbName`` to be qualified with catalog name.
-
         Examples
         --------
+        Check if 'test_new_database' database exists
+
         >>> spark.catalog.databaseExists("test_new_database")
         False
-        >>> df = spark.sql("CREATE DATABASE test_new_database")
+        >>> _ = spark.sql("CREATE DATABASE test_new_database")
         >>> spark.catalog.databaseExists("test_new_database")
         True
+
+        Using the fully qualified name with the catalog name.
+
         >>> spark.catalog.databaseExists("spark_catalog.test_new_database")
         True
-        >>> df = spark.sql("DROP DATABASE test_new_database")
+        >>> _ = spark.sql("DROP DATABASE test_new_database")
         """
         return self._jcatalog.databaseExists(dbName)
 
     @since(2.0)

Review Comment:
   does this annotation also need to be removed?



##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +875,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...

Review Comment:
   can this be simplified to `AnalysisException`?



##########
python/pyspark/sql/catalog.py:
##########
@@ -404,26 +529,42 @@ def getFunction(self, functionName: str) -> Function:
     def listColumns(self, tableName: str, dbName: Optional[str] = None) -> List[Column]:
         """Returns a list of columns for the given table/view in the specified database.
 
-         If no database is specified, the current database is used.
-
         .. versionadded:: 2.0.0
 
         Parameters
         ----------
         tableName : str
-                    name of the table to check existence
+            name of the table to list columns.
+
+            .. versionchanged:: 3.4.0
+               Allow ``tableName`` to be qualified with catalog name when ``dbName`` is None.
+
         dbName : str, optional
-                 name of the database to check table existence in.

Review Comment:
   dumb question: it is possbile to make `lint-python` also check the style in docstring?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945084066


##########
python/pyspark/sql/catalog.py:
##########
@@ -108,13 +108,22 @@ def setCurrentCatalog(self, catalogName: str) -> None:
         ----------
         catalogName : str
             name of the catalog to set
+
+        Examples
+        --------
+        >>> spark.catalog.setCurrentCatalog("spark_catalog")
         """
         return self._jcatalog.setCurrentCatalog(catalogName)
 
     def listCatalogs(self) -> List[CatalogMetadata]:
         """Returns a list of catalogs in this session.
 
         .. versionadded:: 3.4.0
+
+        Returns
+        -------
+        list
+            A list of :class:`CatalogMetadata`.

Review Comment:
   Because it currently returns an empty catalog (that you're fixing at https://github.com/apache/spark/pull/37488)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r944472905


##########
python/pyspark/sql/catalog.py:
##########
@@ -184,38 +234,65 @@ def databaseExists(self, dbName: str) -> bool:
         Parameters
         ----------
         dbName : str
-             name of the database to check existence
+            name of the database to check existence
+
+            .. versionchanged:: 3.4.0
+               Allow ``dbName`` to be qualified with catalog name.
 
         Returns
         -------
         bool
             Indicating whether the database exists
 
-        .. versionchanged:: 3.4
-           Allowed ``dbName`` to be qualified with catalog name.
-
         Examples
         --------
+        Check if 'test_new_database' database exists
+
         >>> spark.catalog.databaseExists("test_new_database")
         False
-        >>> df = spark.sql("CREATE DATABASE test_new_database")
+        >>> _ = spark.sql("CREATE DATABASE test_new_database")
         >>> spark.catalog.databaseExists("test_new_database")
         True
+
+        Using the fully qualified name with the catalog name.
+
         >>> spark.catalog.databaseExists("spark_catalog.test_new_database")
         True
-        >>> df = spark.sql("DROP DATABASE test_new_database")
+        >>> _ = spark.sql("DROP DATABASE test_new_database")
         """
         return self._jcatalog.databaseExists(dbName)
 
     @since(2.0)

Review Comment:
   Yeah, thanks for pointing this out!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on PR #37490:
URL: https://github.com/apache/spark/pull/37490#issuecomment-1214476573

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on PR #37490:
URL: https://github.com/apache/spark/pull/37490#issuecomment-1214470830

   Thank you @viirya !!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

viirya commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945320427


##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +868,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.isCached("spark_catalog.default.tbl1")
+        True
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         return self._jcatalog.isCached(tableName)
 
-    @since(2.0)
     def cacheTable(self, tableName: str) -> None:
         """Caches the specified table in-memory.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.cacheTable("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.cacheTable("spark_catalog.default.tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.cacheTable(tableName)
 
-    @since(2.0)
     def uncacheTable(self, tableName: str) -> None:
         """Removes the specified table from the in-memory cache.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        False
+
+        Throw an analysis exception when the table does not exists.

Review Comment:
   ```suggestion
           Throw an analysis exception when the table does not exist.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dcoliversun commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

dcoliversun commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r944256156


##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +875,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...

Review Comment:
   doctest will check result match or not. If only write `AnalysisException`, doctest will fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

viirya commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945320484


##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +868,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.isCached("spark_catalog.default.tbl1")
+        True
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         return self._jcatalog.isCached(tableName)
 
-    @since(2.0)
     def cacheTable(self, tableName: str) -> None:
         """Caches the specified table in-memory.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.cacheTable("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.cacheTable("spark_catalog.default.tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.cacheTable(tableName)
 
-    @since(2.0)
     def uncacheTable(self, tableName: str) -> None:
         """Removes the specified table from the in-memory cache.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        False
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.uncacheTable("not_existing_table")  # doctest: +IGNORE_EXCEPTION_DETAIL
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
+
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.uncacheTable("spark_catalog.default.tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        False
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         self._jcatalog.uncacheTable(tableName)
 
-    @since(2.0)
     def clearCache(self) -> None:
-        """Removes all cached tables from the in-memory cache."""
+        """Removes all cached tables from the in-memory cache.
+
+        .. versionadded:: 2.0.0
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.clearCache()
+        >>> spark.catalog.isCached("tbl1")
+        False
+        >>> _ = spark.sql("DROP TABLE tbl1")
+        """
         self._jcatalog.clearCache()
 
-    @since(2.0)
     def refreshTable(self, tableName: str) -> None:
         """Invalidates and refreshes all the cached data and metadata of the given table.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        The example below caches a table, and then remove the data.

Review Comment:
   ```suggestion
           The example below caches a table, and then removes the data.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

viirya commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945320410


##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +868,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
+
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        Using the fully qualified name for the table.
+
+        >>> spark.catalog.isCached("spark_catalog.default.tbl1")
+        True
+        >>> spark.catalog.uncacheTable("tbl1")
+        >>> _ = spark.sql("DROP TABLE tbl1")
         """
         return self._jcatalog.isCached(tableName)
 
-    @since(2.0)
     def cacheTable(self, tableName: str) -> None:
         """Caches the specified table in-memory.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+
+        Throw an analysis exception when the table does not exists.

Review Comment:
   ```suggestion
           Throw an analysis exception when the table does not exist.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] viirya commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

viirya commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945320190


##########
python/pyspark/sql/catalog.py:
##########
@@ -613,47 +778,76 @@ def createTable(
             df = self._jcatalog.createTable(tableName, source, scala_datatype, description, options)
         return DataFrame(df, self._sparkSession)
 
-    def dropTempView(self, viewName: str) -> None:
+    def dropTempView(self, viewName: str) -> bool:
         """Drops the local temporary view with the given view name in the catalog.
         If the view has been cached before, then it will also be uncached.
         Returns true if this view is dropped successfully, false otherwise.
 
         .. versionadded:: 2.0.0
 
-        Notes
-        -----
-        The return type of this method was None in Spark 2.0, but changed to Boolean
-        in Spark 2.1.
+        Parameters
+        ----------
+        viewName : str
+            name of the temporary view to drop.
+
+        Returns
+        -------
+        bool
+            If the temporary view was successfully drooped or not.

Review Comment:
   ```suggestion
               If the temporary view was successfully dropped or not.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r944290256


##########
python/pyspark/sql/catalog.py:
##########
@@ -674,59 +875,267 @@ def registerFunction(
         warnings.warn("Deprecated in 2.3.0. Use spark.udf.register instead.", FutureWarning)
         return self._sparkSession.udf.register(name, f, returnType)
 
-    @since(2.0)
     def isCached(self, tableName: str) -> bool:
-        """Returns true if the table is currently cached in-memory.
+        """
+        Returns true if the table is currently cached in-memory.
+
+        .. versionadded:: 2.0.0
+
+        Parameters
+        ----------
+        tableName : str
+            name of the table to get.
+
+            .. versionchanged:: 3.4.0
+                Allow ``tableName`` to be qualified with catalog name.
+
+        Returns
+        -------
+        bool
+
+        Examples
+        --------
+        >>> _ = spark.sql("DROP TABLE IF EXISTS tbl1")
+        >>> _ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
+        >>> spark.catalog.cacheTable("tbl1")
+        >>> spark.catalog.isCached("tbl1")
+        True
+
+        Throw an analysis exception when the table does not exists.
 
-        .. versionchanged:: 3.4
-           Allowed ``tableName`` to be qualified with catalog name.
+        >>> spark.catalog.isCached("not_existing_table")
+        Traceback (most recent call last):
+            ...
+        pyspark.sql.utils.AnalysisException: ...

Review Comment:
   oh, I mean what about adding `# doctest: +IGNORE_EXCEPTION_DETAIL` like other functions (`dropGlobalTempView`, `dropTempView`)?
   
   ```
           >>> spark.table("my_table") # doctest: +IGNORE_EXCEPTION_DETAIL
           Traceback (most recent call last):
               ...
           AnalysisException: ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on PR #37490:
URL: https://github.com/apache/spark/pull/37490#issuecomment-1214345099

   Will merge this in few days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r944473353


##########
python/pyspark/sql/catalog.py:
##########
@@ -404,26 +529,42 @@ def getFunction(self, functionName: str) -> Function:
     def listColumns(self, tableName: str, dbName: Optional[str] = None) -> List[Column]:
         """Returns a list of columns for the given table/view in the specified database.
 
-         If no database is specified, the current database is used.
-
         .. versionadded:: 2.0.0
 
         Parameters
         ----------
         tableName : str
-                    name of the table to check existence
+            name of the table to list columns.
+
+            .. versionchanged:: 3.4.0
+               Allow ``tableName`` to be qualified with catalog name when ``dbName`` is None.
+
         dbName : str, optional
-                 name of the database to check table existence in.

Review Comment:
   Seems like it doesn't ... 😢 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on PR #37490:
URL: https://github.com/apache/spark/pull/37490#issuecomment-1213587704

   Thanks @amaliujia. Let me address the comments soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37490: [SPARK-40051][PYTHON][SQL][DOCS] Make pyspark.sql.catalog examples self-contained

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #37490:
URL: https://github.com/apache/spark/pull/37490#discussion_r945084133


##########
python/pyspark/sql/catalog.py:
##########
@@ -251,19 +329,31 @@ def getTable(self, tableName: str) -> Table:
         Parameters
         ----------
         tableName : str
-                    name of the table to check existence.
+            name of the table to get.

Review Comment:
   It's a new API in 3.4.0 so we won't have to add that log



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org