You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2024/01/23 01:14:59 UTC
[PR] [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script [spark]
HyukjinKwon opened a new pull request, #44842:
URL: https://github.com/apache/spark/pull/44842
### What changes were proposed in this pull request?
This PR cleans up the obsolete code in PySpark coverage script
### Why are the changes needed?
We used to use `coverage_daemon.py` for Python workers to track the coverage of the Python worker side (e.g., the coverage within Python UDF), added in https://github.com/apache/spark/pull/20204. However, seems it does not work anymore. In fact, it has been multiple years that it stopped working. The approach of replacing the Python worker itself was a bit hacky workaround. We should just get rid of them first, and find a proper way.
This should also deflake the scheduled jobs, and speed up the build.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manually tested via:
```python
./run-tests-with-coverage --python-executables=python3 --testname="pyspark.sql.functions.builtin"
```
```
Finished test(python3): pyspark.sql.tests.test_functions (87s)
Tests passed in 87 seconds
Combining collected coverage data under /Users/hyukjin.kwon/workspace/forked/spark/python/test_coverage/coverage_data
Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71607.501653
Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71798.177503
Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71417.646740
Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71419.320617
Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71418.130736
Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71415.781423
Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71416.272012
Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71799.843181
Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71421.946328
Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71411.225487
Creating XML report file at python/coverage.xml
Wrote XML report to coverage.xml
Reporting the coverage data at /Users/hyukjin.kwon/workspace/forked/spark/python/test_coverage/coverage_data/coverage
Name Stmts Miss Branch BrPart Cover
-------------------------------------------------------------------------
pyspark/__init__.py 48 7 10 3 76%
pyspark/_globals.py 16 3 4 2 75%
pyspark/accumulators.py 123 38 26 5 66%
pyspark/broadcast.py 121 79 40 3 33%
pyspark/conf.py 99 33 50 5 64%
pyspark/context.py 451 216 151 26 51%
pyspark/errors/__init__.py 3 0 0 0 100%
pyspark/errors/error_classes.py 3 0 0 0 100%
pyspark/errors/exceptions/__init__.py 0 0 0 0 100%
pyspark/errors/exceptions/base.py 91 15 24 4 83%
pyspark/errors/exceptions/captured.py 168 81 57 17 48%
pyspark/errors/utils.py 34 8 6 2 70%
pyspark/files.py 34 15 12 3 57%
pyspark/find_spark_home.py 30 24 12 2 19%
pyspark/java_gateway.py 114 31 30 12 69%
pyspark/join.py 66 58 58 0 6%
pyspark/profiler.py 244 182 92 3 22%
pyspark/rdd.py 1064 741 378 9 27%
pyspark/rddsampler.py 68 50 32 0 18%
pyspark/resource/__init__.py 5 0 0 0 100%
pyspark/resource/information.py 11 4 4 0 73%
pyspark/resource/profile.py 110 82 58 1 27%
pyspark/resource/requests.py 139 90 70 0 35%
pyspark/resultiterable.py 14 6 2 1 56%
pyspark/serializers.py 349 185 90 13 43%
pyspark/shuffle.py 397 322 180 1 13%
pyspark/sql/__init__.py 14 0 0 0 100%
pyspark/sql/catalog.py 203 127 66 2 30%
pyspark/sql/column.py 268 78 64 12 67%
pyspark/sql/conf.py 40 16 10 3 58%
pyspark/sql/context.py 170 95 58 2 47%
pyspark/sql/dataframe.py 900 475 459 40 45%
pyspark/sql/functions/__init__.py 3 0 0 0 100%
pyspark/sql/functions/builtin.py 1741 542 1126 26 76%
pyspark/sql/functions/partitioning.py 41 19 18 3 59%
pyspark/sql/group.py 81 30 32 3 65%
pyspark/sql/observation.py 54 37 22 1 26%
pyspark/sql/pandas/__init__.py 1 0 0 0 100%
pyspark/sql/pandas/conversion.py 277 249 156 2 8%
pyspark/sql/pandas/functions.py 67 49 34 0 18%
pyspark/sql/pandas/group_ops.py 89 65 22 2 25%
pyspark/sql/pandas/map_ops.py 37 27 10 2 26%
pyspark/sql/pandas/serializers.py 381 323 172 0 10%
pyspark/sql/pandas/typehints.py 41 32 26 1 15%
pyspark/sql/pandas/types.py 407 383 326 1 3%
pyspark/sql/pandas/utils.py 29 11 10 5 59%
pyspark/sql/profiler.py 80 47 54 1 39%
pyspark/sql/readwriter.py 362 253 146 7 27%
pyspark/sql/session.py 469 206 228 22 56%
pyspark/sql/sql_formatter.py 41 26 16 1 28%
pyspark/sql/streaming/__init__.py 4 0 0 0 100%
pyspark/sql/streaming/listener.py 400 200 186 1 61%
pyspark/sql/streaming/query.py 102 63 40 1 39%
pyspark/sql/streaming/readwriter.py 268 207 118 2 21%
pyspark/sql/streaming/state.py 100 68 44 0 29%
pyspark/sql/tests/__init__.py 0 0 0 0 100%
pyspark/sql/tests/test_functions.py 646 2 244 7 99%
pyspark/sql/types.py 1013 355 528 74 62%
pyspark/sql/udf.py 240 132 90 20 42%
pyspark/sql/udtf.py 152 98 52 2 33%
pyspark/sql/utils.py 160 83 54 10 45%
pyspark/sql/window.py 89 23 56 5 77%
pyspark/statcounter.py 79 58 20 0 21%
pyspark/status.py 36 13 6 0 55%
pyspark/storagelevel.py 41 9 0 0 78%
pyspark/taskcontext.py 111 63 40 1 40%
pyspark/testing/__init__.py 2 0 0 0 100%
pyspark/testing/sqlutils.py 149 44 52 1 75%
pyspark/testing/utils.py 312 238 162 2 17%
pyspark/traceback_utils.py 38 4 14 6 81%
pyspark/util.py 153 120 56 2 18%
pyspark/version.py 1 0 0 0 100%
...
```
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script [spark]
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44842:
URL: https://github.com/apache/spark/pull/44842#issuecomment-1905151734
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script [spark]
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #44842: [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script
URL: https://github.com/apache/spark/pull/44842
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org