You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "LuciferYang (via GitHub)" <gi...@apache.org> on 2024/01/09 11:49:20 UTC

[PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

LuciferYang opened a new pull request, #44639:
URL: https://github.com/apache/spark/pull/44639

   ### What changes were proposed in this pull request?
   This pr refine docstring of  `from_csv/schema_of_csv/to_csv` and add some new examples.
   
   ### Why are the changes needed?
   To improve PySpark documentation
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Pass Github Actions
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #44639:
URL: https://github.com/apache/spark/pull/44639#issuecomment-1883108328

   ```
   Error: Internal server error occurred while resolving "actions/cache@v3". Internal server error occurred while resolving "actions/checkout@v4". Internal server error occurred while resolving "actions/setup-java@v4". Internal server error occurred while resolving "actions/upload-artifact@v3"
   ```
   
   Seems there are some issues with GA,  need to wait until it's resolved to continue testing.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44639:
URL: https://github.com/apache/spark/pull/44639#discussion_r1446915049


##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -14968,11 +15006,12 @@ def schema_of_csv(csv: Union[Column, str], options: Optional[Dict[str, str]] = N
 
     return _invoke_function("schema_of_csv", col, _options_to_str(options))
 
-
+# TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display

Review Comment:
   `Example 2: Converting a complex StructType to a CSV string` displayed different results between Regular Spark and Spark Connect, skip test it in this pr and add `TODO(SPARK-46654)`:
   ```
   **********************************************************************
   3953File "/__w/spark/spark/python/pyspark/sql/connect/functions/builtin.py", line 2232, in pyspark.sql.connect.functions.builtin.to_csv
   3954Failed example:
   3955    df.select(sf.to_csv(df.value)).show(truncate=False)
   3956Expected:
   3957    +-----------------------+
   3958    |to_csv(value)          |
   3959    +-----------------------+
   3960    |2,Alice,"[100,200,300]"|
   3961    +-----------------------+
   3962Got:
   3963    +--------------------------------------------------------------------------+
   3964    |to_csv(value)                                                             |
   3965    +--------------------------------------------------------------------------+
   3966    |2,Alice,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@99c5e30f|
   3967    +--------------------------------------------------------------------------+
   3968    <BLANKLINE>
   3969**********************************************************************
   3970   1 of  18 in pyspark.sql.connect.functions.builtin.to_csv
   3971***Test Failed*** 1 failures. 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44639:
URL: https://github.com/apache/spark/pull/44639#discussion_r1446917191


##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -14968,11 +15006,12 @@ def schema_of_csv(csv: Union[Column, str], options: Optional[Dict[str, str]] = N
 
     return _invoke_function("schema_of_csv", col, _options_to_str(options))
 
-
+# TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display

Review Comment:
   Regular Spark:
   
   ```
   Python 3.11.1 (v3.11.1:a7a450f84a, Dec  6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
   24/01/10 13:54:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   24/01/10 13:54:53 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 3.5.0
         /_/
   
   Using Python version 3.11.1 (v3.11.1:a7a450f84a, Dec  6 2022 15:24:06)
   Spark context Web UI available at http://localhost:4041
   Spark context available as 'sc' (master = local[*], app id = local-1704866093640).
   SparkSession available as 'spark'.
   >>> from pyspark.sql import Row, functions as sf
   >>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
   >>> df = spark.createDataFrame(data, ("key", "value"))
   >>> df.select(sf.to_csv(df.value)).show(truncate=False)
   +-----------------------+                                                       
   |to_csv(value)          |
   +-----------------------+
   |2,Alice,"[100,200,300]"|
   +-----------------------+
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44639:
URL: https://github.com/apache/spark/pull/44639#discussion_r1446917814


##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -14968,11 +15006,12 @@ def schema_of_csv(csv: Union[Column, str], options: Optional[Dict[str, str]] = N
 
     return _invoke_function("schema_of_csv", col, _options_to_str(options))
 
-
+# TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display

Review Comment:
   ```
   Python 3.11.1 (v3.11.1:a7a450f84a, Dec  6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
   24/01/10 13:56:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   24/01/10 13:56:18 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
   24/01/10 13:56:18 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
         /_/
   
   Using Python version 3.11.1 (v3.11.1:a7a450f84a, Dec  6 2022 15:24:06)
   Spark context Web UI available at http://localhost:4042
   Spark context available as 'sc' (master = local[*], app id = local-1704866178807).
   SparkSession available as 'spark'.
   >>> from pyspark.sql import Row, functions as sf
   >>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
   >>> df = spark.createDataFrame(data, ("key", "value"))
   >>> df.select(sf.to_csv(df.value)).show(truncate=False)
   +-----------------------+                                                       
   |to_csv(value)          |
   +-----------------------+
   |2,Alice,"[100,200,300]"|
   +-----------------------+
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #44639:
URL: https://github.com/apache/spark/pull/44639#issuecomment-1884419945

   Merged into master. Thanks @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44639:
URL: https://github.com/apache/spark/pull/44639#discussion_r1446936561


##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -14968,11 +15006,12 @@ def schema_of_csv(csv: Union[Column, str], options: Optional[Dict[str, str]] = N
 
     return _invoke_function("schema_of_csv", col, _options_to_str(options))
 
-
+# TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display

Review Comment:
   ```
   ./bin/pyspark --remote "sc://localhost"
   
   Python 3.11.1 (v3.11.1:a7a450f84a, Dec  6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 4.0.0.dev0
         /_/
   
   Using Python version 3.11.1 (v3.11.1:a7a450f84a, Dec  6 2022 15:24:06)
   Client connected to the Spark Connect server at localhost
   SparkSession available as 'spark'.
   >>> from pyspark.sql import Row, functions as sf
   >>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
   >>> df = spark.createDataFrame(data, ("key", "value"))
   >>> df.select(sf.to_csv(df.value)).show(truncate=False)
   +--------------------------------------------------------------------------+
   |to_csv(value)                                                             |
   +--------------------------------------------------------------------------+
   |2,Alice,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@99c5e30f|
   +--------------------------------------------------------------------------+
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang closed pull request #44639: [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv`
URL: https://github.com/apache/spark/pull/44639


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org