You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2020/09/08 10:55:00 UTC

[jira] [Created] (ARROW-9940) [Rust][DataFusion] Generic "extension package" mechanism

Andrew Lamb created ARROW-9940:
----------------------------------

             Summary: [Rust][DataFusion] Generic "extension package" mechanism
                 Key: ARROW-9940
                 URL: https://issues.apache.org/jira/browse/ARROW-9940
             Project: Apache Arrow
          Issue Type: New Feature
            Reporter: Andrew Lamb


This came from [~jorgecarleitao]'s suggestion on this PR: 
 https://github.com/apache/arrow/pull/8097/files#r482968858

The high level idea is to design and implement an upgrade/ improvement to the DataFusion APIs which allows registering composeable sets of UserDefinedLogicalNode, Logical planning rules and Physical Planning rules for some functionality.

h2. The use case:

You publish the TopK extension as a (library) crate called datafusion-topk, and I publish a crate datafusion-s3 with another extension.

A user wants to use both extensions. It installs them by:

# adding each crate to Cargo.toml
# initialize the default planner with both of them
# plan them
# execute them
I.e. freaking easy!

Broadly speaking, this allows the existence of an ecosystem of extensions/user-defined plans: people can share hand-crafted plans and plans can be added as dependencies to the crate and registered to the planner to be used by other people. 🤯

This also reduces the pressure of placing everything in DataFusion's codebase: if we offer an API to extend DataFusion in this way, people can just distribute libraries with the extension/user-defined plan without having to go through the decision process of whether X is part of DataFusion's core or not (e.g. a scan of format Y, or a scan over protocol Z).

For me, this use case does require an easy way to achieve 2. initialize the default planner with both of them. But again, this PR is definitely a major step in this direction!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)