You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "benibus (via GitHub)" <gi...@apache.org> on 2023/02/21 05:27:00 UTC

[GitHub] [arrow] benibus commented on issue #25025: [C++] Split non-cast compute kernels into a separate shared library

benibus commented on issue #25025:
URL: https://github.com/apache/arrow/issues/25025#issuecomment-1437881001

   @lidavidm @westonpace @zeroshade @felipecrv
   I've been working on the PR for this - just wanted to give an update and get some design opinions.
   
   Currently, I've set things up in the build system so that everything outside of `compute/kernels` is built into libarrow unconditionally - along with:
   - `scalar_cast_*.cc`
   - `vector_selection.cc` for `take` (used by dictionary casts, parquet)
   - `vector_hash.cc` for `unique` (used by parquet)
   
   If `ARROW_COMPUTE` is enabled, then the remaining kernel sources are built into libarrow_compute, which links against libarrow. Alternatively, we could introduce a new flag for this, but as it stands, `-DARROW_COMPUTE=ON` still gives you all the kernels (and we could unconditionally compile code that uses casts - i.e. the CSV writer, STL tests, etc).
   
   Assuming that sounds reasonable (if not, let me know), I just need to determine how registration will work for the extra kernels. Currently, this is done on the first invocation of `compute::GetFunctionRegistry` in libarrow.so. However, it'll no longer be possible to register the extra kernels in this way given that their registration functions are in a different object that libarrow doesn't link against. As I see it, there are a couple possibilities:
   - Load the kernels from libarrow_compute at runtime (more flexible/forward-looking, but platform-dependent)
   - Circularly link libarrow and libarrow_compute (would require fancy linker flags). You might be able to get the same effect with an intermediate library but I'm not sure if/how it would work in practice.
   
   Any suggestions? Perhaps I'm missing a more obvious solution. @wesm's original post suggests creating a "plugin hook" for loading kernels from an external lib. To me, that sounds like a `dlopen` type of deal, but I'm not positive.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org