You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/17 16:21:58 UTC

[GitHub] [arrow] jorisvandenbossche commented on pull request #10118: ARROW-12468: [Python][R] Expose ScannerBuilder::UseAsync to Python & R

jorisvandenbossche commented on pull request #10118:
URL: https://github.com/apache/arrow/pull/10118#issuecomment-842458877


   One concern I had when seeing the heavy parameterization (basically a lot of tests x4) is about the run test time, but checking the most expensive tests, it's basically all related to the S3 related tests:
   
   ```
   $ pytest python/pyarrow/tests/test_dataset.py --durations=20
   ...
   ============================================================================================== slowest 20 durations ===============================================================================================
   15.39s call     pyarrow/tests/test_dataset.py::test_open_dataset_from_s3_with_filesystem_uri[threaded-sync]
   15.36s call     pyarrow/tests/test_dataset.py::test_open_dataset_from_s3_with_filesystem_uri[serial-sync]
   15.36s call     pyarrow/tests/test_dataset.py::test_open_dataset_from_s3_with_filesystem_uri[serial-async]
   15.35s call     pyarrow/tests/test_dataset.py::test_open_dataset_from_s3_with_filesystem_uri[threaded-async]
   3.13s call     pyarrow/tests/test_dataset.py::test_write_dataset_s3
   2.03s setup    pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3[threaded-async]
   2.02s setup    pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3_fsspec[threaded-async]
   2.02s setup    pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3[serial-async]
   2.02s setup    pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3_fsspec[serial-async]
   2.02s setup    pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3[threaded-sync]
   2.02s setup    pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3[serial-sync]
   2.02s setup    pyarrow/tests/test_dataset.py::test_write_dataset_s3
   2.01s setup    pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3_fsspec[serial-sync]
   1.58s setup    pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3_fsspec[threaded-sync]
   1.07s call     pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3[threaded-async]
   1.06s call     pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3[threaded-sync]
   1.05s call     pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3[serial-sync]
   1.05s call     pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3[serial-async]
   0.47s call     pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3_fsspec[threaded-async]
   0.30s setup    pyarrow/tests/test_dataset.py::test_make_fragment
   ```
   
   So we now run those 4 times, but instead of objecting to the parameterization (since I suppose it's especially useful for S3 tests?), it's probably more useful to see why it actually takes such a long time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org