You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "ueshin (via GitHub)" <gi...@apache.org> on 2023/10/20 22:36:17 UTC

[PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

ueshin opened a new pull request, #43470:
URL: https://github.com/apache/spark/pull/43470

   ### What changes were proposed in this pull request?
   
   Fix user-facing APIs related to Python UDTF to use camelCase.
   
   ### Why are the changes needed?
   
   To keep the naming convention for user-facing APIs.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Updated the related tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43470:
URL: https://github.com/apache/spark/pull/43470#issuecomment-1776166663

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43470:
URL: https://github.com/apache/spark/pull/43470#issuecomment-1774646296

   Mind reverting this? (I'm outside so can't revert now)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on PR #43470:
URL: https://github.com/apache/spark/pull/43470#issuecomment-1773473975

   cc @dtenedor @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43470: [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase
URL: https://github.com/apache/spark/pull/43470


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43470:
URL: https://github.com/apache/spark/pull/43470#issuecomment-1774665687

   > Mind reverting this? (I'm outside so can't revert now)
   
   OK, revert this one first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43470:
URL: https://github.com/apache/spark/pull/43470#discussion_r1368458391


##########
sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala:
##########
@@ -543,7 +543,7 @@ object IntegratedUDFTestUtils extends SQLHelper {
         |    def analyze(initial_count, input_table):
         |        buffer = ""
         |        if initial_count.value is not None:
-        |            assert(not initial_count.is_table)
+        |            assert(not initial_count.isTable)
         |            assert(initial_count.data_type == IntegerType())

Review Comment:
   `data_type` should be `dataType`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43470:
URL: https://github.com/apache/spark/pull/43470#issuecomment-1774628434

   Seems there are scala side test failed after this one merged:
   
   - https://github.com/apache/spark/actions/runs/6608112828/job/17946379914
   - https://github.com/apache/spark/actions/runs/6608763491/job/17947949938
   
   <img width="1228" alt="image" src="https://github.com/apache/spark/assets/1475305/b52bcbdf-faa7-46c5-a99f-0116e4a9788a">
   
   I test `PythonUDTFSuite` locally:
   
   1. before this pr
   
   ```
   // [SPARK-44753][PYTHON][CONNECT] XML: pyspark sql xml reader writer
   git reset --hard 9f675c54a56e8165e24e84a83c186c949ced5be8
   build/sbt clean "sql/testOnly org.apache.spark.sql.execution.python.PythonUDTFSuite"
   ```
   then 
   
   ```
   [info] Run completed in 8 seconds, 301 milliseconds.
   [info] Total number of tests run: 9
   [info] Suites: completed 1, aborted 0
   [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
   [info] All tests passed.
   ```
   
   2. after this pr
   
   ```
   // [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase
   git reset --hard e3ba9cf0403ade734f87621472088687e533b2cd
   build/sbt clean "sql/testOnly org.apache.spark.sql.execution.python.PythonUDTFSuite"
   ```
   
   then 
   
   ```
   15:46:02.673 WARN org.apache.spark.sql.catalyst.analysis.SimpleTableFunctionRegistry: The function testudtf replaced a previously registered function.
   [info] - SPARK-44503: Specify PARTITION BY and ORDER BY for TABLE arguments *** FAILED *** (420 milliseconds)
   [info]   org.apache.spark.sql.AnalysisException: [TABLE_VALUED_FUNCTION_FAILED_TO_ANALYZE_IN_PYTHON] Failed to analyze the Python user defined table function: Traceback (most recent call last):
   [info]   File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/worker/analyze_udtf.py", line 119, in main
   [info]     result = handler.analyze(*args, **kwargs)  # type: ignore[attr-defined]
   [info]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   [info]   File "<string>", line 21, in analyze
   [info] AttributeError: 'AnalyzeArgument' object has no attribute 'is_table'
   [info]  SQLSTATE: 38000; line 8 pos 5
   [info]   at org.apache.spark.sql.errors.QueryCompilationErrors$.tableValuedFunctionFailedToAnalyseInPythonError(QueryCompilationErrors.scala:1985)
   [info]   at org.apache.spark.sql.execution.python.UserDefinedPythonTableFunctionAnalyzeRunner.receiveFromPython(UserDefinedPythonFunction.scala:229)
   [info]   at org.apache.spark.sql.execution.python.UserDefinedPythonTableFunctionAnalyzeRunner.receiveFromPython(UserDefinedPythonFunction.scala:186)
   [info]   at org.apache.spark.sql.execution.python.PythonPlannerRunner.runInPython(PythonPlannerRunner.scala:103)
   
   ...
   [info] - SPARK-45402: Add UDTF API for 'analyze' to return a buffer to consume on class creation *** FAILED *** (39 milliseconds)
   [info]   org.apache.spark.sql.AnalysisException: [TABLE_VALUED_FUNCTION_FAILED_TO_ANALYZE_IN_PYTHON] Failed to analyze the Python user defined table function: Traceback (most recent call last):
   [info]   File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/worker/analyze_udtf.py", line 119, in main
   [info]     result = handler.analyze(*args, **kwargs)  # type: ignore[attr-defined]
   [info]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   [info]   File "<string>", line 16, in analyze
   [info] AttributeError: 'AnalyzeArgument' object has no attribute 'data_type'
   [info]  SQLSTATE: 38000; line 1 pos 14
   ...
   [info] Run completed in 8 seconds, 26 milliseconds.
   [info] Total number of tests run: 9
   [info] Suites: completed 1, aborted 0
   [info] Tests: succeeded 7, failed 2, canceled 0, ignored 0, pending 0
   [info] *** 2 TESTS FAILED ***
   [error] Failed tests:
   [error] 	org.apache.spark.sql.execution.python.PythonUDTFSuite
   [error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
   ```
   
   GA of this pr passed seems due to this PR did not touch any Scala code, all tests on the Scala side were skipped.
   
   - https://github.com/apache/spark/actions/runs/6607626526/job/17945278871
   
   <img width="1708" alt="image" src="https://github.com/apache/spark/assets/1475305/32485985-c02e-43c2-8085-0792618cbbfc">
   
   
   Could you take a look? Thanks @ueshin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43470:
URL: https://github.com/apache/spark/pull/43470#issuecomment-1774886767

   I pushed some changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43470: [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase
URL: https://github.com/apache/spark/pull/43470


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43470:
URL: https://github.com/apache/spark/pull/43470#discussion_r1368461012


##########
sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala:
##########
@@ -543,7 +543,7 @@ object IntegratedUDFTestUtils extends SQLHelper {
         |    def analyze(initial_count, input_table):
         |        buffer = ""
         |        if initial_count.value is not None:
-        |            assert(not initial_count.is_table)
+        |            assert(not initial_count.isTable)
         |            assert(initial_count.data_type == IntegerType())

Review Comment:
   `with_single_partition`, `partition_by` and `order_by` should also be changed accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43470:
URL: https://github.com/apache/spark/pull/43470#issuecomment-1774298126

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org