You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "iajoiner (via GitHub)" <gi...@apache.org> on 2023/02/20 16:05:08 UTC

[GitHub] [arrow-datafusion] iajoiner opened a new issue, #5347: Dataframe doctests in the main branch are taking very long to run (over 60 seconds)

iajoiner opened a new issue, #5347:
URL: https://github.com/apache/arrow-datafusion/issues/5347

   **Describe the bug**
   A clear and concise description of what the bug is.
   Doctests in `dataframe.rs` are taking very long to run in the main branch
   **To Reproduce**
   Steps to reproduce the behavior:
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Dataframe doctests in the main branch are taking very long to run (over 60 seconds) [arrow-datafusion]

Posted by "devinjdangelo (via GitHub)" <gi...@apache.org>.
devinjdangelo commented on issue #5347:
URL: https://github.com/apache/arrow-datafusion/issues/5347#issuecomment-1970986598

   The reason these tests lock up is very high memory utilization to run them in parallel, which is cargo's default behavior. My system peaked at over 100GB of memory utilization :exploding_head: ! I took a look through the dataframe doc tests, and I don't see any inherent reason for such extreme memory usage. I believe @Jefffrey is correct that the cause is rust loading many multiples of a large debug binary into memory.
   
   I think it would be a reasonable workaround to improve the developer experience to find a way to default cargo to run these specific tests with a maximum parallelism of somewhere in the 1-4 range which should work on most systems. 
   
   You can do this manually by running `cargo test --doc dataframe -- --test-threads 1`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Dataframe doctests in the main branch are taking very long to run (over 60 seconds) [arrow-datafusion]

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead closed issue #5347: Dataframe doctests in the main branch are taking very long to run (over 60 seconds)
URL: https://github.com/apache/arrow-datafusion/issues/5347


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Jefffrey commented on issue #5347: Dataframe doctests in the main branch are taking very long to run (over 60 seconds)

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.
Jefffrey commented on issue #5347:
URL: https://github.com/apache/arrow-datafusion/issues/5347#issuecomment-1454317374

   I think this may be an issue with Rust doctest in general: https://github.com/rust-lang/rust/issues/75341
   
   Not sure what can be done here, maybe try to reduce the amount of doctests used (not really ideal), or be able to omit the doctests from default `cargo test`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org