You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/23 10:22:35 UTC

[GitHub] [arrow] rdettai opened a new pull request #8513: ARROW-10368 [Rust] [DataFusion] Refactor scan nodes to allow extensions

rdettai opened a new pull request #8513:
URL: https://github.com/apache/arrow/pull/8513


   Replace all specific XxxScan nodes with a generic SourceScan that dynamically dispatches to any source implementation.
   
   The main goal is to make implementations of custom data sources possible (read from S3, custom file formats...). It might also make the LogicalPlan enum a bit more readable.
   
   Initial discussion: https://issues.apache.org/jira/browse/ARROW-10368


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rdettai closed pull request #8513: ARROW-10368 [Rust] [DataFusion] Refactor scan nodes to allow extensions

Posted by GitBox <gi...@apache.org>.
rdettai closed pull request #8513:
URL: https://github.com/apache/arrow/pull/8513


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rdettai commented on pull request #8513: ARROW-10368 [Rust] [DataFusion] Refactor scan nodes to allow extensions

Posted by GitBox <gi...@apache.org>.
rdettai commented on pull request #8513:
URL: https://github.com/apache/arrow/pull/8513#issuecomment-715378341


   I just found out something! It seems that datafusion is full of interesting mysteries 😄 ! You actually already have the abstractions required to do what I want, but they are a little hidden.
   
   With `ExecutionContext::register_table(&mut self, name: &str, provider: Box<dyn TableProvider>)` you can actually already directly register an implementation of a custom source exec. You have to implement the `TableProvider` trait that allows you to have the projection pushdown, then directly the `ExecutionPlan` trait. You can then run your queries on it, and you can even use the new source from the dataframe API with `ExecutionContext::table(&mut self, table_name: &str) -> Result<Arc<dyn DataFrame>>`.
   
   Isn't that wonderful ?
   
   Now the only thing that remains to be done I guess is:
   - add a commodity function like `ExecutionContext::read_provider(&mut self, provider: Box<dyn TableProvider>)  -> Result<Arc<dyn DataFrame>>` that shortcuts the two calls mentioned above. This is mainly meant to make this feature more explicit.
   - add a new example ?
   - enjoy... ❤️ 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rdettai commented on pull request #8513: ARROW-10368 [Rust] [DataFusion] Refactor scan nodes to allow extensions

Posted by GitBox <gi...@apache.org>.
rdettai commented on pull request #8513:
URL: https://github.com/apache/arrow/pull/8513#issuecomment-715257359


   @jorgecarleitao I have implemented the basic structure, with a `SourceScanner` trait proposal. I have shown how the `SourceScan` logical plan is mapped to an execution by just passing through the source implementation. This basically means that there is no "real" conversion from logical to physical plan, we are just passing the same implem from one to the other.
   
   If this is "conceptually" okay with you, I will port all the currently implemented sources to this model.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8513: ARROW-10368 [Rust] [DataFusion] Refactor scan nodes to allow extensions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8513:
URL: https://github.com/apache/arrow/pull/8513#issuecomment-715255711


   https://issues.apache.org/jira/browse/ARROW-10368


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org