You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "khalidmammadov (via GitHub)" <gi...@apache.org> on 2023/04/23 16:20:30 UTC

[GitHub] [spark] khalidmammadov opened a new pull request, #40916: [SPARK-43243][PySpark][Connect] Add level param to printSchema for Python

khalidmammadov opened a new pull request, #40916:
URL: https://github.com/apache/spark/pull/40916

   ### What changes were proposed in this pull request?
   This feature parity improvement and to add **level** param to df.printSchema for Python API (PySpark & Connect)
   
   ## Connect API
   
   Examples:
   ```
   root@f53642e9adb0:/home/spark# bin/pyspark --remote "local[*]"
   Python 3.9.5 (default, Nov 23 2021, 15:27:38) 
   [GCC 9.3.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
   23/04/23 13:31:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 3.5.0.dev0
         /_/
   
   Using Python version 3.9.5 (default, Nov 23 2021 15:27:38)
   Client connected to the Spark Connect server at localhost
   SparkSession available as 'spark'.
   >>> df = spark.createDataFrame([(1, (2,2))], ["a", "b"])
   >>> df.printSchema(1)
   root
    |-- a: long (nullable = true)
    |-- b: struct (nullable = true)
   
   >>> df.printSchema(2)
   root
    |-- a: long (nullable = true)
    |-- b: struct (nullable = true)
    |    |-- _1: long (nullable = true)
    |    |-- _2: long (nullable = true)
   
   >>> df.printSchema(3)
   root
    |-- a: long (nullable = true)
    |-- b: struct (nullable = true)
    |    |-- _1: long (nullable = true)
    |    |-- _2: long (nullable = true)
    
   >>> df.printSchema()
   root
    |-- a: long (nullable = true)
    |-- b: struct (nullable = true)
    |    |-- _1: long (nullable = true)
    |    |-- _2: long (nullable = true)
   
   root@f53642e9adb0:/home/spark# bin/pyspark                    
   Python 3.9.5 (default, Nov 23 2021, 15:27:38) 
   [GCC 9.3.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
   23/04/23 13:36:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 3.5.0-SNAPSHOT
         /_/
   
   Using Python version 3.9.5 (default, Nov 23 2021 15:27:38)
   Spark context Web UI available at http://localhost:4040
   Spark context available as 'sc' (master = local[*], app id = local-1682257002957).
   SparkSession available as 'spark'.
   >>> df = spark.createDataFrame([(1, (2,2))], ["a", "b"])
   >>> df.printSchema(1)
   root
    |-- a: long (nullable = true)
    |-- b: struct (nullable = true)
   
   >>> df.printSchema(2)
   root
    |-- a: long (nullable = true)
    |-- b: struct (nullable = true)
    |    |-- _1: long (nullable = true)
    |    |-- _2: long (nullable = true)
   
   >>> df.printSchema(3)
   root
    |-- a: long (nullable = true)
    |-- b: struct (nullable = true)
    |    |-- _1: long (nullable = true)
    |    |-- _2: long (nullable = true)
   
   >>> df.printSchema(0)
   root
    |-- a: long (nullable = true)
    |-- b: struct (nullable = true)
    |    |-- _1: long (nullable = true)
    |    |-- _2: long (nullable = true)
   
   ```
   
   
   ### Why are the changes needed?
   Feature parity
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes
   
   ### How was this patch tested?
   Existing and new test cases
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] khalidmammadov commented on a diff in pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python

Posted by "khalidmammadov (via GitHub)" <gi...@apache.org>.
khalidmammadov commented on code in PR #40916:
URL: https://github.com/apache/spark/pull/40916#discussion_r1176026841


##########
python/pyspark/sql/dataframe.py:
##########
@@ -584,14 +584,14 @@ def printSchema(self, level: Optional[int] = None) -> None:
         .. versionchanged:: 3.4.0
             Supports Spark Connect.
 
-        .. versionchanged:: 3.5.0
-            Added Level parameter.
-
         Parameters
         ----------
         level : int, optional, default None
             How many levels to print for nested schemas.
 
+        .. versionchanged:: 3.5.0

Review Comment:
   Good spot! fixed. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #40916:
URL: https://github.com/apache/spark/pull/40916#discussion_r1174756518


##########
python/pyspark/sql/dataframe.py:
##########
@@ -575,14 +575,23 @@ def schema(self) -> StructType:
                 )
         return self._schema
 
-    def printSchema(self) -> None:
+    def printSchema(self, level: Optional[int] = None) -> None:
         """Prints out the schema in the tree format.
+        Optionally allows to specify how many levels to print if schema is nested.
 
         .. versionadded:: 1.3.0
 
         .. versionchanged:: 3.4.0
             Supports Spark Connect.
 
+        .. versionchanged:: 3.5.0
+            Added Level parameter.
+
+        Parameters
+        ----------
+        level : int, optional, default None
+            How many levels to print for nested schemas.

Review Comment:
   Can we move:
   
   ```
           .. versionchanged:: 3.5.0
               Added Level parameter.
   ```
   
   to here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #40916:
URL: https://github.com/apache/spark/pull/40916#discussion_r1175909780


##########
python/pyspark/sql/dataframe.py:
##########
@@ -584,14 +584,14 @@ def printSchema(self, level: Optional[int] = None) -> None:
         .. versionchanged:: 3.4.0
             Supports Spark Connect.
 
-        .. versionchanged:: 3.5.0
-            Added Level parameter.
-
         Parameters
         ----------
         level : int, optional, default None
             How many levels to print for nested schemas.
 
+        .. versionchanged:: 3.5.0

Review Comment:
   I think the indent is not right, you may refer to https://github.com/apache/spark/blob/7f724c3bc7567b0cddc09d5bed11b79879533368/python/pyspark/sql/dataframe.py#L1917-L1921



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python
URL: https://github.com/apache/spark/pull/40916


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #40916:
URL: https://github.com/apache/spark/pull/40916#issuecomment-1521346298

   merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org