You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/05 17:46:21 UTC

[GitHub] [arrow-datafusion] matthewmturner opened a new issue #1930: Add `ObjectStore` support via SQL

matthewmturner opened a new issue #1930:
URL: https://github.com/apache/arrow-datafusion/issues/1930


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*)
   
   I am working towards making datafusion-cli a powerful tool to use locally for doing ad-hoc data analysis.  The first step for that was #1875 which enables defining a local "database" that runs on startup with a `.datafusionrc` file.  As a second step, I would like to be able to connect to object stores, such as S3, just from SQL.  That will of course require adding s3 as a feature to datafusion-cli but that feature is useless unless `ObjectStores` can be registered.  Below is the current behaviour:
   
   ```
   ❯ CREATE EXTERNAL TABLE t STORED AS CSV LOCATION 's3://bucket/t.csv';
   Internal("No suitable object store found for s3")
   ```
   
   **Describe the solution you'd like**
   A clear and concise description of what you want to happen.
   
   I would like to be able to register a `ObjectStore` just from SQL.  Given that `ObjectStore` is a DataFusion concept I was thinking that we can add a function such as `register_object_store`, rather than having a SQL statement.
   
   So it would look something like
   
   Default credentials
   ```
   ❯   register_object_store('s3');
   ```
   
   Minio
   ```
   ❯   register_object_store('s3', ACCESS_KEY, SECRET_KEY, PROVIDER, ENDPOINT);
   ```
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on issue #1930: Add `ObjectStore` support via SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on issue #1930:
URL: https://github.com/apache/arrow-datafusion/issues/1930#issuecomment-1059805594


   actually, im not sure how well those parameters in `register_object_store` will generalize to other `ObjectStore` besides s3.  so now im not sure if a general function like that could be used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on issue #1930: Add `ObjectStore` support via SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on issue #1930:
URL: https://github.com/apache/arrow-datafusion/issues/1930#issuecomment-1059804187


   @seddonm1 @yjshen @houqp FYI - in case you have thoughts on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on issue #1930: Add `ObjectStore` support via SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on issue #1930:
URL: https://github.com/apache/arrow-datafusion/issues/1930#issuecomment-1059872236


   maybe my objective could be achieved with some command line options instead.  for example:
   
   Default credentials
   ```
   $ datafusion-cli --object-store s3
   ```
   
   Minio
   ```
   $ datafusion-cli --object-store s3 --access-key KEY --secret-key ABC --provider PROVIDER --endpoint ENDPOINT
   ```
   
   @houqp @yjshen @seddonm1 do you have a view on whether `ObjectStore` registration can be done via SQL or if this should be part of datafusion-cli? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on issue #1930: Add `ObjectStore` support via SQL

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #1930:
URL: https://github.com/apache/arrow-datafusion/issues/1930#issuecomment-1062579261


   I think it can be done through both because secret key credentials and endpoint can be provided through environment variables as well. In this case, user will only need to provide the s3 path in the SQL query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org