You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2024/01/23 01:14:59 UTC

[PR] [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script [spark]

HyukjinKwon opened a new pull request, #44842:
URL: https://github.com/apache/spark/pull/44842

   ### What changes were proposed in this pull request?
   
   This PR cleans up the obsolete code in PySpark coverage script
   
   ### Why are the changes needed?
   
   We used to use `coverage_daemon.py` for Python workers to track the coverage of the Python worker side (e.g., the coverage within Python UDF), added in https://github.com/apache/spark/pull/20204. However, seems it does not work anymore. In fact, it has been multiple years that it stopped working. The approach of replacing the Python worker itself was a bit hacky workaround. We should just get rid of them first, and find a proper way.
   
   This should also deflake the scheduled jobs, and speed up the build.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually tested via:
   
   ```python
   ./run-tests-with-coverage --python-executables=python3 --testname="pyspark.sql.functions.builtin"
   ```
   
   ```
   
   Finished test(python3): pyspark.sql.tests.test_functions (87s)
   Tests passed in 87 seconds
   Combining collected coverage data under /Users/hyukjin.kwon/workspace/forked/spark/python/test_coverage/coverage_data
   Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71607.501653
   Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71798.177503
   Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71417.646740
   Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71419.320617
   Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71418.130736
   Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71415.781423
   Skipping duplicate data test_coverage/coverage_data/coverage.C02CV6VVMD6R.71416.272012
   Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71799.843181
   Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71421.946328
   Combined data file test_coverage/coverage_data/coverage.C02CV6VVMD6R.71411.225487
   Creating XML report file at python/coverage.xml
   Wrote XML report to coverage.xml
   Reporting the coverage data at /Users/hyukjin.kwon/workspace/forked/spark/python/test_coverage/coverage_data/coverage
   Name                                    Stmts   Miss Branch BrPart  Cover
   -------------------------------------------------------------------------
   pyspark/__init__.py                        48      7     10      3    76%
   pyspark/_globals.py                        16      3      4      2    75%
   pyspark/accumulators.py                   123     38     26      5    66%
   pyspark/broadcast.py                      121     79     40      3    33%
   pyspark/conf.py                            99     33     50      5    64%
   pyspark/context.py                        451    216    151     26    51%
   pyspark/errors/__init__.py                  3      0      0      0   100%
   pyspark/errors/error_classes.py             3      0      0      0   100%
   pyspark/errors/exceptions/__init__.py       0      0      0      0   100%
   pyspark/errors/exceptions/base.py          91     15     24      4    83%
   pyspark/errors/exceptions/captured.py     168     81     57     17    48%
   pyspark/errors/utils.py                    34      8      6      2    70%
   pyspark/files.py                           34     15     12      3    57%
   pyspark/find_spark_home.py                 30     24     12      2    19%
   pyspark/java_gateway.py                   114     31     30     12    69%
   pyspark/join.py                            66     58     58      0     6%
   pyspark/profiler.py                       244    182     92      3    22%
   pyspark/rdd.py                           1064    741    378      9    27%
   pyspark/rddsampler.py                      68     50     32      0    18%
   pyspark/resource/__init__.py                5      0      0      0   100%
   pyspark/resource/information.py            11      4      4      0    73%
   pyspark/resource/profile.py               110     82     58      1    27%
   pyspark/resource/requests.py              139     90     70      0    35%
   pyspark/resultiterable.py                  14      6      2      1    56%
   pyspark/serializers.py                    349    185     90     13    43%
   pyspark/shuffle.py                        397    322    180      1    13%
   pyspark/sql/__init__.py                    14      0      0      0   100%
   pyspark/sql/catalog.py                    203    127     66      2    30%
   pyspark/sql/column.py                     268     78     64     12    67%
   pyspark/sql/conf.py                        40     16     10      3    58%
   pyspark/sql/context.py                    170     95     58      2    47%
   pyspark/sql/dataframe.py                  900    475    459     40    45%
   pyspark/sql/functions/__init__.py           3      0      0      0   100%
   pyspark/sql/functions/builtin.py         1741    542   1126     26    76%
   pyspark/sql/functions/partitioning.py      41     19     18      3    59%
   pyspark/sql/group.py                       81     30     32      3    65%
   pyspark/sql/observation.py                 54     37     22      1    26%
   pyspark/sql/pandas/__init__.py              1      0      0      0   100%
   pyspark/sql/pandas/conversion.py          277    249    156      2     8%
   pyspark/sql/pandas/functions.py            67     49     34      0    18%
   pyspark/sql/pandas/group_ops.py            89     65     22      2    25%
   pyspark/sql/pandas/map_ops.py              37     27     10      2    26%
   pyspark/sql/pandas/serializers.py         381    323    172      0    10%
   pyspark/sql/pandas/typehints.py            41     32     26      1    15%
   pyspark/sql/pandas/types.py               407    383    326      1     3%
   pyspark/sql/pandas/utils.py                29     11     10      5    59%
   pyspark/sql/profiler.py                    80     47     54      1    39%
   pyspark/sql/readwriter.py                 362    253    146      7    27%
   pyspark/sql/session.py                    469    206    228     22    56%
   pyspark/sql/sql_formatter.py               41     26     16      1    28%
   pyspark/sql/streaming/__init__.py           4      0      0      0   100%
   pyspark/sql/streaming/listener.py         400    200    186      1    61%
   pyspark/sql/streaming/query.py            102     63     40      1    39%
   pyspark/sql/streaming/readwriter.py       268    207    118      2    21%
   pyspark/sql/streaming/state.py            100     68     44      0    29%
   pyspark/sql/tests/__init__.py               0      0      0      0   100%
   pyspark/sql/tests/test_functions.py       646      2    244      7    99%
   pyspark/sql/types.py                     1013    355    528     74    62%
   pyspark/sql/udf.py                        240    132     90     20    42%
   pyspark/sql/udtf.py                       152     98     52      2    33%
   pyspark/sql/utils.py                      160     83     54     10    45%
   pyspark/sql/window.py                      89     23     56      5    77%
   pyspark/statcounter.py                     79     58     20      0    21%
   pyspark/status.py                          36     13      6      0    55%
   pyspark/storagelevel.py                    41      9      0      0    78%
   pyspark/taskcontext.py                    111     63     40      1    40%
   pyspark/testing/__init__.py                 2      0      0      0   100%
   pyspark/testing/sqlutils.py               149     44     52      1    75%
   pyspark/testing/utils.py                  312    238    162      2    17%
   pyspark/traceback_utils.py                 38      4     14      6    81%
   pyspark/util.py                           153    120     56      2    18%
   pyspark/version.py                          1      0      0      0   100%
   ...
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #44842:
URL: https://github.com/apache/spark/pull/44842#issuecomment-1905151734

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon closed pull request #44842: [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script
URL: https://github.com/apache/spark/pull/44842


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org