You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/23 11:27:02 UTC

[GitHub] [arrow-datafusion] returnString commented on pull request #552: mv register_schema() to impl

returnString commented on pull request #552:
URL: https://github.com/apache/arrow-datafusion/pull/552#issuecomment-866756001


   Hrm, I think if we're moving this to the interface, we need to codify some notion of "unsupported operation". That's actually why I left it out initially - registering a schema inside e.g. the information_schema catalog doesn't really make sense, because it's a read-only projection of DB internals, and I didn't want to commit to more public API than was necessary.
   
   In my DataFusion-powered projects, I typically treat ExecutionContext instances as immutable which simplifies a lot of the setup. Essentially this entails creating catalogues using concrete types like `MemoryCatalogProvider` and then just attaching those to a new context, so I can work with the type-specific impls, rather than just trait methods. I'm not sure how widely adopted this is as a methodology, but I've found it works well!
   
   For example, if I were building a traditional database, here's how I'd execute queries:
   - build the list of catalogs (and internally, schemas/tables) the user has permissions to access (this relies on out-of-band data)
   - create an execution context populated with said catalog list
   - run the query using the context
   - discard the context
   
   Obviously this relies on the context setup being quite cheap, but I don't see any moves toward making that a particularly intensive process 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org