You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "khalidmammadov (via GitHub)" <gi...@apache.org> on 2023/10/28 18:28:41 UTC

[PR] [SPARK-45716][PySpark][Connect] Add StructType.treeString to Python [spark]

khalidmammadov opened a new pull request, #43572:
URL: https://github.com/apache/spark/pull/43572

   ### What changes were proposed in this pull request?
   This PR adds missing parity method StructType.treeString() to Python API. 
   
   Usage examples:
   
   **PySpark**
   ```
   root@4691452b5bbd:/home/spark# bin/pyspark                    
   Python 3.9.5 (default, Nov 23 2021, 15:27:38) 
   ..... 
   >>> from pyspark.sql.types import *
   >>> s = StructType([StructField("level1", StructType([StructField("f1", StringType(), True)]), True)])
   >>> s.treeString()
   'root\n |-- level1: struct (nullable = true)\n |    |-- f1: string (nullable = true)\n'
   >>> s.treeString(9)
   'root\n |-- level1: struct (nullable = true)\n |    |-- f1: string (nullable = true)\n'
   >>> s.treeString(-1)
   'root\n |-- level1: struct (nullable = true)\n |    |-- f1: string (nullable = true)\n'
   >>> s.treeString(1)
   'root\n |-- level1: struct (nullable = true)\n'
   >>> print(s.treeString(1))
   root
    |-- level1: struct (nullable = true)
   ```
   
   
   **Connect**
   ```
   root@4691452b5bbd:/home/spark# bin/pyspark --remote "local[*]"
   Python 3.9.5 (default, Nov 23 2021, 15:27:38) 
   ......
   >>> from pyspark.sql.types import *
   >>> s = StructType(
   ... [StructField("level1", StructType([StructField("f1", StringType(), True)]), True)])
   >>> s.treeString(       
   ... )
   'root\n |-- level1: struct (nullable = true)\n |    |-- f1: string (nullable = true)\n'
   >>> s.treeString(1)
   'root\n |-- level1: struct (nullable = true)\n'
   >>> s.treeString(0)
   'root\n |-- level1: struct (nullable = true)\n |    |-- f1: string (nullable = true)\n'
   >>> s.treeString(6)
   'root\n |-- level1: struct (nullable = true)\n |    |-- f1: string (nullable = true)\n'
   >>> s.treeString(-1)
   'root\n |-- level1: struct (nullable = true)\n |    |-- f1: string (nullable = true)\n'
   ```
   
   ### Why are the changes needed?
   To be compatible with Scala and allow users to access to the output of commonly used df.printSchema() without printing it out.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, a new Python method
   
   
   ### How was this patch tested?
   Added PySpark & Spark Connect test.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45716][PYTHON][CONNECT] Add StructType.treeString to Python [spark]

Posted by "khalidmammadov (via GitHub)" <gi...@apache.org>.
khalidmammadov commented on PR #43572:
URL: https://github.com/apache/spark/pull/43572#issuecomment-1803650470

   @HyukjinKwon does it look ok now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45716][PYTHON][CONNECT] Add StructType.treeString to Python [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #43572: [SPARK-45716][PYTHON][CONNECT] Add StructType.treeString to Python
URL: https://github.com/apache/spark/pull/43572


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45716][PySpark][Connect] Add StructType.treeString to Python [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43572:
URL: https://github.com/apache/spark/pull/43572#discussion_r1375409233


##########
python/pyspark/sql/types.py:
##########
@@ -1216,6 +1216,45 @@ def fieldNames(self) -> List[str]:
         """
         return list(self.names)
 
+    def treeString(self, maxDepth: Optional[int] = None) -> str:
+        """Return tree representation of the schema
+
+        .. versionadded:: 4.0.0
+
+        Parameters
+        ----------
+        maxDepth : int, optional, default None
+            Depth of the schema for nested schemas.
+
+        Examples
+        --------
+        >>> from pyspark.sql.types import *
+        >>> s = StructType(
+        ... [StructField("level1", StructType([StructField("f1", StringType(), True)]), True)]
+        ... )
+
+        >>> s.treeString()
+        'root\\n |-- level1: struct (nullable = true)\\n |    |-- f1: string (nullable = true)\\n'
+        >>> print(s.treeString())
+        root
+         |-- level1: struct (nullable = true)
+         |    |-- f1: string (nullable = true)
+
+        >>> print(s.treeString(1))
+        root
+         |-- level1: struct (nullable = true)
+        """
+        from pyspark.sql import SparkSession
+
+        # Intentionally uses SparkSession so one implementation can be shared with/without
+        # Spark Connect.

Review Comment:
   connect DataFrame doesn't have `_tree_string` and access to JVM so it would fail. We should add a test at `test_dataframe` as an example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45716][PySpark][Connect] Add StructType.treeString to Python [spark]

Posted by "khalidmammadov (via GitHub)" <gi...@apache.org>.
khalidmammadov commented on PR #43572:
URL: https://github.com/apache/spark/pull/43572#issuecomment-1783891928

   @HyukjinKwon can you please take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org