You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/14 15:08:46 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue, #4204: Cannot register table to represent multiple parquet files in S3 bucket

andygrove opened a new issue, #4204:
URL: https://github.com/apache/arrow-datafusion/issues/4204

   **Describe the bug**
   This works:
   
   ```sql
   CREATE EXTERNAL TABLE yellow_2019_01 STORED AS PARQUET LOCATION "s3://ossb-nyctaxi/yellow/2019/yellow_tripdata_2019-01.parquet";
   ```
   
   This does not work:
   
   ```sql
   CREATE EXTERNAL TABLE yellow_2019 STORED AS PARQUET LOCATION "s3://ossb-nyctaxi/yellow/2019";
   ```
   
   Fails with:
   
   ```
   ObjectStore(NotFound { path: "yellow/2019", source: Error { retries: 0, message: "No Body", source: Some(reqwest::Error { kind: Status(404), url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("s3.us-east-2.amazonaws.com")), port: None, path: "/ossb-nyctaxi/yellow/2019", query: None, fragment: None } }) } }
   ```
   
   **To Reproduce**
   As described.
   
   **Expected behavior**
   Should be able to register a directory containing parquet files.
   
   **Additional context**
   Possibly related to https://github.com/apache/arrow-datafusion/issues/1736
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] cfraz89 commented on issue #4204: Cannot register table to represent multiple parquet files in S3 bucket

Posted by GitBox <gi...@apache.org>.
cfraz89 commented on issue #4204:
URL: https://github.com/apache/arrow-datafusion/issues/4204#issuecomment-1368496243

   I found that using the library directly `ListingSchemaProvider` would pass directories as table paths without the trailing slash too, causing issues when used together with `ListingTableFactory`. Made a PR to fix this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] kylebrooks-8451 commented on issue #4204: Cannot register table to represent multiple parquet files in S3 bucket

Posted by "kylebrooks-8451 (via GitHub)" <gi...@apache.org>.
kylebrooks-8451 commented on issue #4204:
URL: https://github.com/apache/arrow-datafusion/issues/4204#issuecomment-1525726739

   > I found that adding a trailing `/` to the URL fixes this, so I will file a PR to add documentation
   
   In addition to documenting this, could we change the [`is_dir` logic in this code](https://github.com/apache/arrow-datafusion/blob/a38480951f40abce7ee2d5919251a1d1607f1dee/datafusion/core/src/datasource/listing/url.rs#L149) to detect if the path is a folder?
   
   I'm thinking that we could Try to list the path and catch a failure to test if it is a directory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #4204: Cannot register table to represent multiple parquet files in S3 bucket

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #4204:
URL: https://github.com/apache/arrow-datafusion/issues/4204#issuecomment-1313987186

   I found that adding a trailing `/` to the URL fixes this, so I will file a PR to add documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org