You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/31 04:08:34 UTC

[GitHub] [arrow-datafusion] matthewmturner commented on issue #2656: Is ObjectStoreSchemaProvider Still Needed?

matthewmturner commented on issue #2656:
URL: https://github.com/apache/arrow-datafusion/issues/2656#issuecomment-1141646635

   FYI https://github.com/apache/arrow-datafusion/issues/1836 and https://github.com/apache/arrow-datafusion/pull/1863 is background conversation on the API design for this.
   
   The original idea was to use this as shown below (slightly updated from how it was discussed in the above issue) - i.e. easily register multiple tables from a prefix into a schema.
   
   ```
   let object_store = S3FileSystem::default();
   let schema = ObjectStoreSchemaProvider::new();
   schema.register_store(object_store);
   let tables = object_store.list_dir("s3://active/schema1");
   tables.iter().map(|file|  {
       let config = ListingTableConfig::new(object_store, file).infer().await?;
       let name = extract_name_from_file(&file);
       schema.register_listing_table(name, file, config);
   }
   ```
   
   Thinking on it now, i think the same could be done without the need for this abstraction, like the following:
   
   ```
   let object_store = S3FileSystem::default();
   let schema = MemorySchemaProvider::new();
   let tables = object_store.list_dir("s3://active/schema1");
   tables.iter().map(|file|  {
       let config = ListingTableConfig::new(object_store, file).infer().await?;
       let table = ListingTable::try_new(config)?;
       let name = extract_name_from_file(&file);
       schema.register_table(name, table)?;
   }
   ```
   
   If you agree, then yes I do think we can remove `ObjectStoreSchemaProvider`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org