You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "sergiimk (via GitHub)" <gi...@apache.org> on 2023/05/28 00:32:28 UTC

[GitHub] [arrow-datafusion] sergiimk opened a new issue, #6470: Consider providing an option for dynamic linking

sergiimk opened a new issue, #6470:
URL: https://github.com/apache/arrow-datafusion/issues/6470

   ### Is your feature request related to a problem or challenge?
   
   `datafusion` is a "swiss army knife" kind of library - it's large and developing against it means **~13 second link times** every time you re-run tests, even with modern linkers like `mold`.
   
   I recently tried the approach described in [this blog post](https://robert.kra.hn/posts/2022-09-09-speeding-up-incremental-rust-compilation-with-dylibs/) - wrapping `datafusion` into a dynamic library. This reduced my incremental test build times by ~10 seconds!
   
   ### Describe the solution you'd like
   
   Bevy project (also a very large library) now [offer a feature flag](https://bevyengine.org/learn/book/getting-started/setup/#enable-fast-compiles-optional) to link it dynamically.
   
   Specifying `crate-type = ["rlib", "dylib"]` might also be enough to let users easily link `datafusion` dynamically in their `dev` builds using `RUSTFLAGS="-C prefer-dynamic"`. 
   
   It may iterations speeds for `datafusion`'s own development.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] sergiimk commented on issue #6470: Consider providing an option for dynamic linking

Posted by "sergiimk (via GitHub)" <gi...@apache.org>.
sergiimk commented on issue #6470:
URL: https://github.com/apache/arrow-datafusion/issues/6470#issuecomment-1567276336

   As a user of datafusion I rarely recompile the library itself, but I do pay 13s in link time every time I touch my app or tests.
   
   So while splitting up crates can speed up your dev iterations, think of dynamic linking as primarily user-facing speedup feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6470: Consider providing an option for dynamic linking

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6470:
URL: https://github.com/apache/arrow-datafusion/issues/6470#issuecomment-1566124974

   Thanks @sergiimk  -- this is quite interesting
   
   I think one reason for the long cycle time is not just re-link but also that datafusion-core is large and any change to it requires recompiling the entire crate.  
   
   Another approach I have seen taken in the rust community is to break such large crates into smaller ones. Not only does this tend to increase test cycle times it also encourages more encapsulation. 
   
   We have been chipping away at doing this  (e.g. https://github.com/apache/arrow-datafusion/issues/1754) but extracting `datafusion-physical-plan` and `datafusion-datasource` are especially connected.
   
   Maybe it is time to invest some more time trying to split out those crates 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org