You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "waynexia (via GitHub)" <gi...@apache.org> on 2023/05/11 08:46:39 UTC

[GitHub] [arrow-datafusion] waynexia opened a new issue, #6335: Add extension plan supports for substrait

waynexia opened a new issue, #6335:
URL: https://github.com/apache/arrow-datafusion/issues/6335

   ### Is your feature request related to a problem or challenge?
   
   DataFusion provides good extensibility for user-defined plans and expressions. The extended plan is defined as a branch `LogicalPlan::Extension(_)`, which is based on the trait `UserDefinedLogicalNode`.
   
   This ticket requests to add support for such extension plans in substrait.
   
   ### Describe the solution you'd like
   
   Substrait has defined [User Defined Relations](https://substrait.io/relations/user_defined_relations/) (which is currently a placeholder page..) and in the library it also contains a similar branch [`ExtensionSingleRel`](https://docs.rs/substrait/latest/substrait/proto/struct.ExtensionSingleRel.html) from [`RelType`](https://docs.rs/substrait/latest/substrait/proto/rel/enum.RelType.html). The extension uses protobuf's [`Any`](https://docs.rs/prost-types/0.11.9/prost_types/struct.Any.html) type, which is another raw bytes:
   ```rust
   pub struct Any {
       pub type_url: [String](https://doc.rust-lang.org/nightly/alloc/string/struct.String.html),
       pub value: [Vec](https://doc.rust-lang.org/nightly/alloc/vec/struct.Vec.html)<[u8](https://doc.rust-lang.org/nightly/std/primitive.u8.html)>,
   }
   ```
   
   I propose adding two methods to the `UserDefinedLogicalNode` like the following. But as this interface is widely used I think we have to provide the default implementation to avoid breaking. Another way is to define them in a separate trait like `UserDefinedLogicalNodeSubstraitExt` (ahhh it's too long), and only require this trait on the substrait-related interfaces.
   ```rust
   
       fn to_substrait(&self) -> Result<Vec<u8>, ()>;
   
       fn from_substrait(&self, buf: &[u8]) -> Result<Arc<dyn UserDefinedLogicalNode>, ()>;
   ```
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia commented on issue #6335: Add extension plan supports for substrait

Posted by "waynexia (via GitHub)" <gi...@apache.org>.
waynexia commented on issue #6335:
URL: https://github.com/apache/arrow-datafusion/issues/6335#issuecomment-1543599756

   cc @alamb @nseekhao Maybe you have some insights about the API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6335: Add extension plan supports for substrait

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6335:
URL: https://github.com/apache/arrow-datafusion/issues/6335#issuecomment-1545959085

   Adding these functions to UserDefinedLogicalNode makes a lot of sense to me (as eventually to serialize some user defined node, the extension will have to define the mapping to/from Substrait)
   
   I think the default implementations can return "NotImplemented" which will not be a breaking change:
   
   ```rust
   
       fn to_substrait(&self) -> Result<Vec<u8>> {
          Err(DataFusionError::NotImplemented(format!("nice error message here")))
       }
   
       fn from_substrait(&self, buf: &[u8]) -> Result<Arc<dyn UserDefinedLogicalNode>> {
          Err(DataFusionError::NotImplemented(format!("nice error message here")))
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #6335: Add extension plan supports for substrait

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #6335: Add extension plan supports for substrait
URL: https://github.com/apache/arrow-datafusion/issues/6335


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia commented on issue #6335: Add extension plan supports for substrait

Posted by "waynexia (via GitHub)" <gi...@apache.org>.
waynexia commented on issue #6335:
URL: https://github.com/apache/arrow-datafusion/issues/6335#issuecomment-1551180123

   Realize that the deserialize method need an registry to dispatch types. I plan to add an field in `SessionState` to do that. It looks like
   ```rust
   pub trait ExtensionDeserializer {
       fn deserialize_logical_plan(
           &self,
           name: &str,
           bytes: &[u8],
       ) -> Result<Arc<dyn UserDefinedLogicalNode>>;
   
       // Won't be added this time
       fn deserialize_physical_plan(
           &self,
           name: &str,
           bytes: &[u8],
       ) -> Result<Arc<dyn ExecutionPlan>>;
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org