You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "LuciferYang (via GitHub)" <gi...@apache.org> on 2024/01/09 11:49:20 UTC
[PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]
LuciferYang opened a new pull request, #44639:
URL: https://github.com/apache/spark/pull/44639
### What changes were proposed in this pull request?
This pr refine docstring of `from_csv/schema_of_csv/to_csv` and add some new examples.
### Why are the changes needed?
To improve PySpark documentation
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass Github Actions
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #44639:
URL: https://github.com/apache/spark/pull/44639#issuecomment-1883108328
```
Error: Internal server error occurred while resolving "actions/cache@v3". Internal server error occurred while resolving "actions/checkout@v4". Internal server error occurred while resolving "actions/setup-java@v4". Internal server error occurred while resolving "actions/upload-artifact@v3"
```
Seems there are some issues with GA, need to wait until it's resolved to continue testing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44639:
URL: https://github.com/apache/spark/pull/44639#discussion_r1446915049
##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -14968,11 +15006,12 @@ def schema_of_csv(csv: Union[Column, str], options: Optional[Dict[str, str]] = N
return _invoke_function("schema_of_csv", col, _options_to_str(options))
-
+# TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display
Review Comment:
`Example 2: Converting a complex StructType to a CSV string` displayed different results between Regular Spark and Spark Connect, skip test it in this pr and add `TODO(SPARK-46654)`:
```
**********************************************************************
3953File "/__w/spark/spark/python/pyspark/sql/connect/functions/builtin.py", line 2232, in pyspark.sql.connect.functions.builtin.to_csv
3954Failed example:
3955 df.select(sf.to_csv(df.value)).show(truncate=False)
3956Expected:
3957 +-----------------------+
3958 |to_csv(value) |
3959 +-----------------------+
3960 |2,Alice,"[100,200,300]"|
3961 +-----------------------+
3962Got:
3963 +--------------------------------------------------------------------------+
3964 |to_csv(value) |
3965 +--------------------------------------------------------------------------+
3966 |2,Alice,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@99c5e30f|
3967 +--------------------------------------------------------------------------+
3968 <BLANKLINE>
3969**********************************************************************
3970 1 of 18 in pyspark.sql.connect.functions.builtin.to_csv
3971***Test Failed*** 1 failures.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44639:
URL: https://github.com/apache/spark/pull/44639#discussion_r1446917191
##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -14968,11 +15006,12 @@ def schema_of_csv(csv: Union[Column, str], options: Optional[Dict[str, str]] = N
return _invoke_function("schema_of_csv", col, _options_to_str(options))
-
+# TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display
Review Comment:
Regular Spark:
```
Python 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/01/10 13:54:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/01/10 13:54:53 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.5.0
/_/
Using Python version 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022 15:24:06)
Spark context Web UI available at http://localhost:4041
Spark context available as 'sc' (master = local[*], app id = local-1704866093640).
SparkSession available as 'spark'.
>>> from pyspark.sql import Row, functions as sf
>>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
>>> df = spark.createDataFrame(data, ("key", "value"))
>>> df.select(sf.to_csv(df.value)).show(truncate=False)
+-----------------------+
|to_csv(value) |
+-----------------------+
|2,Alice,"[100,200,300]"|
+-----------------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44639:
URL: https://github.com/apache/spark/pull/44639#discussion_r1446917814
##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -14968,11 +15006,12 @@ def schema_of_csv(csv: Union[Column, str], options: Optional[Dict[str, str]] = N
return _invoke_function("schema_of_csv", col, _options_to_str(options))
-
+# TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display
Review Comment:
```
Python 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/01/10 13:56:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/01/10 13:56:18 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
24/01/10 13:56:18 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT
/_/
Using Python version 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022 15:24:06)
Spark context Web UI available at http://localhost:4042
Spark context available as 'sc' (master = local[*], app id = local-1704866178807).
SparkSession available as 'spark'.
>>> from pyspark.sql import Row, functions as sf
>>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
>>> df = spark.createDataFrame(data, ("key", "value"))
>>> df.select(sf.to_csv(df.value)).show(truncate=False)
+-----------------------+
|to_csv(value) |
+-----------------------+
|2,Alice,"[100,200,300]"|
+-----------------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #44639:
URL: https://github.com/apache/spark/pull/44639#issuecomment-1884419945
Merged into master. Thanks @HyukjinKwon
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44639:
URL: https://github.com/apache/spark/pull/44639#discussion_r1446936561
##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -14968,11 +15006,12 @@ def schema_of_csv(csv: Union[Column, str], options: Optional[Dict[str, str]] = N
return _invoke_function("schema_of_csv", col, _options_to_str(options))
-
+# TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display
Review Comment:
```
./bin/pyspark --remote "sc://localhost"
Python 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 4.0.0.dev0
/_/
Using Python version 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022 15:24:06)
Client connected to the Spark Connect server at localhost
SparkSession available as 'spark'.
>>> from pyspark.sql import Row, functions as sf
>>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
>>> df = spark.createDataFrame(data, ("key", "value"))
>>> df.select(sf.to_csv(df.value)).show(truncate=False)
+--------------------------------------------------------------------------+
|to_csv(value) |
+--------------------------------------------------------------------------+
|2,Alice,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@99c5e30f|
+--------------------------------------------------------------------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang closed pull request #44639: [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv`
URL: https://github.com/apache/spark/pull/44639
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org