You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/18 05:30:48 UTC

[GitHub] [arrow-datafusion] elliot14A opened a new issue, #4269: `infer_schema` is not working with s3 Urls or http endpoints

elliot14A opened a new issue, #4269:
URL: https://github.com/apache/arrow-datafusion/issues/4269

   **Describe the bug**
   I was not able to use `infer_schema` function in `datafusion::datasource::listing::ListingOptions` with s3 urls and http endpoints where it is working fine with path urls.
   
   **To Reproduce**
   ```
   #[tokio::main]
   async fn main() -> anyhow::Result<()> {
       let ctx = SessionContext::new();
       let url = "s3://roapi-test/blogs-flattened.parquet";
       // let url = "https://s3.eu-central-1.wasabisys.com/roapi-test/blogs_flattened.parquet";
       // let url = "./test_data/blogs.parquet";
   
       let options = ListingOptions::new(Arc::new(ParquetFormat::default()));
       let table_url = ListingTableUrl::parse(url)?;
       let s = options.infer_schema(&ctx.state(), &table_url).await?;
       println!("{}", s);
       Ok(())
   }
   ```
   This is the error it returns when run the above code:
   ```
   Error: Internal error: No suitable object store found for s3://roapi-test/blogs-flattened.parquet. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
   ```
   
   **Expected behavior**
   It should infer the schema of a file on s3 or http just like local files
   
   **Additional context**
   I did some debugging and found out that the actual code which is throwing is in the file `datafusion/core/src/datasource/object_store.rs` and in this bit of code:
   ```
   pub fn get_by_url(&self, url: impl AsRef<Url>) -> Result<Arc<dyn ObjectStore>> {
           let url = url.as_ref();
           // First check whether can get object store from registry
           let s = &url[url::Position::BeforeScheme..url::Position::BeforePath];
           let store = self.object_stores.get(s).map(|o| o.value().clone());
   
           match store {
               Some(store) => Ok(store),
               None => match &self.provider {
                   Some(provider) => {
                       let store = provider.get_by_url(url)?;
                       let key =
                           &url[url::Position::BeforeScheme..url::Position::BeforePath];
                       self.object_stores.insert(key.to_owned(), store.clone());
                       Ok(store)
                   }
                   None => Err(DataFusionError::Internal(format!(
                       "No suitable object store found for {}",
                       url
                   ))),
               },
           }
       }
   ```
   the `self.object_store` dash map does not contain the s3://bucket_name url so it is throwing error. It is mentioned in the comments that it returns s3 store so how should I register this s3 url 
   
   
   Any Help is appreciated!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] elliot14A closed issue #4269: `infer_schema` function is not working with s3 Urls or http endpoints

Posted by GitBox <gi...@apache.org>.
elliot14A closed issue #4269: `infer_schema` function is not working with s3 Urls or http endpoints
URL: https://github.com/apache/arrow-datafusion/issues/4269


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Cheappie commented on issue #4269: `infer_schema` function is not working with s3 Urls or http endpoints

Posted by GitBox <gi...@apache.org>.
Cheappie commented on issue #4269:
URL: https://github.com/apache/arrow-datafusion/issues/4269#issuecomment-1321152852

   From what I see you haven't registered s3 object store, It is mentioned in error message too `No suitable object store found`
   
   Here are examples that might help you:
   https://github.com/apache/arrow-datafusion/tree/master/datafusion-examples/examples
   
   Example that configures s3 object store
   https://github.com/apache/arrow-datafusion/blob/master/datafusion-examples/examples/query-aws-s3.rs
   
   Example that configures parquet
   https://github.com/apache/arrow-datafusion/blob/master/datafusion-examples/examples/parquet_sql.rs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] elliot14A commented on issue #4269: `infer_schema` function is not working with s3 Urls or http endpoints

Posted by GitBox <gi...@apache.org>.
elliot14A commented on issue #4269:
URL: https://github.com/apache/arrow-datafusion/issues/4269#issuecomment-1326050279

   Thank you now I understand. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org