You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "khalidmammadov (via GitHub)" <gi...@apache.org> on 2023/04/23 16:20:30 UTC
[GitHub] [spark] khalidmammadov opened a new pull request, #40916: [SPARK-43243][PySpark][Connect] Add level param to printSchema for Python
khalidmammadov opened a new pull request, #40916:
URL: https://github.com/apache/spark/pull/40916
### What changes were proposed in this pull request?
This feature parity improvement and to add **level** param to df.printSchema for Python API (PySpark & Connect)
## Connect API
Examples:
```
root@f53642e9adb0:/home/spark# bin/pyspark --remote "local[*]"
Python 3.9.5 (default, Nov 23 2021, 15:27:38)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/04/23 13:31:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.5.0.dev0
/_/
Using Python version 3.9.5 (default, Nov 23 2021 15:27:38)
Client connected to the Spark Connect server at localhost
SparkSession available as 'spark'.
>>> df = spark.createDataFrame([(1, (2,2))], ["a", "b"])
>>> df.printSchema(1)
root
|-- a: long (nullable = true)
|-- b: struct (nullable = true)
>>> df.printSchema(2)
root
|-- a: long (nullable = true)
|-- b: struct (nullable = true)
| |-- _1: long (nullable = true)
| |-- _2: long (nullable = true)
>>> df.printSchema(3)
root
|-- a: long (nullable = true)
|-- b: struct (nullable = true)
| |-- _1: long (nullable = true)
| |-- _2: long (nullable = true)
>>> df.printSchema()
root
|-- a: long (nullable = true)
|-- b: struct (nullable = true)
| |-- _1: long (nullable = true)
| |-- _2: long (nullable = true)
root@f53642e9adb0:/home/spark# bin/pyspark
Python 3.9.5 (default, Nov 23 2021, 15:27:38)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/04/23 13:36:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.5.0-SNAPSHOT
/_/
Using Python version 3.9.5 (default, Nov 23 2021 15:27:38)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1682257002957).
SparkSession available as 'spark'.
>>> df = spark.createDataFrame([(1, (2,2))], ["a", "b"])
>>> df.printSchema(1)
root
|-- a: long (nullable = true)
|-- b: struct (nullable = true)
>>> df.printSchema(2)
root
|-- a: long (nullable = true)
|-- b: struct (nullable = true)
| |-- _1: long (nullable = true)
| |-- _2: long (nullable = true)
>>> df.printSchema(3)
root
|-- a: long (nullable = true)
|-- b: struct (nullable = true)
| |-- _1: long (nullable = true)
| |-- _2: long (nullable = true)
>>> df.printSchema(0)
root
|-- a: long (nullable = true)
|-- b: struct (nullable = true)
| |-- _1: long (nullable = true)
| |-- _2: long (nullable = true)
```
### Why are the changes needed?
Feature parity
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
Existing and new test cases
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] khalidmammadov commented on a diff in pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python
Posted by "khalidmammadov (via GitHub)" <gi...@apache.org>.
khalidmammadov commented on code in PR #40916:
URL: https://github.com/apache/spark/pull/40916#discussion_r1176026841
##########
python/pyspark/sql/dataframe.py:
##########
@@ -584,14 +584,14 @@ def printSchema(self, level: Optional[int] = None) -> None:
.. versionchanged:: 3.4.0
Supports Spark Connect.
- .. versionchanged:: 3.5.0
- Added Level parameter.
-
Parameters
----------
level : int, optional, default None
How many levels to print for nested schemas.
+ .. versionchanged:: 3.5.0
Review Comment:
Good spot! fixed. Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #40916:
URL: https://github.com/apache/spark/pull/40916#discussion_r1174756518
##########
python/pyspark/sql/dataframe.py:
##########
@@ -575,14 +575,23 @@ def schema(self) -> StructType:
)
return self._schema
- def printSchema(self) -> None:
+ def printSchema(self, level: Optional[int] = None) -> None:
"""Prints out the schema in the tree format.
+ Optionally allows to specify how many levels to print if schema is nested.
.. versionadded:: 1.3.0
.. versionchanged:: 3.4.0
Supports Spark Connect.
+ .. versionchanged:: 3.5.0
+ Added Level parameter.
+
+ Parameters
+ ----------
+ level : int, optional, default None
+ How many levels to print for nested schemas.
Review Comment:
Can we move:
```
.. versionchanged:: 3.5.0
Added Level parameter.
```
to here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a diff in pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python
Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #40916:
URL: https://github.com/apache/spark/pull/40916#discussion_r1175909780
##########
python/pyspark/sql/dataframe.py:
##########
@@ -584,14 +584,14 @@ def printSchema(self, level: Optional[int] = None) -> None:
.. versionchanged:: 3.4.0
Supports Spark Connect.
- .. versionchanged:: 3.5.0
- Added Level parameter.
-
Parameters
----------
level : int, optional, default None
How many levels to print for nested schemas.
+ .. versionchanged:: 3.5.0
Review Comment:
I think the indent is not right, you may refer to https://github.com/apache/spark/blob/7f724c3bc7567b0cddc09d5bed11b79879533368/python/pyspark/sql/dataframe.py#L1917-L1921
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python
Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python
URL: https://github.com/apache/spark/pull/40916
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #40916: [SPARK-43243][PYTHON][CONNECT] Add level param to printSchema for Python
Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #40916:
URL: https://github.com/apache/spark/pull/40916#issuecomment-1521346298
merged to master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org