You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/07/14 14:22:22 UTC

[GitHub] [superset] Mageswaran1989 opened a new issue, #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Mageswaran1989 opened a new issue, #20708:
URL: https://github.com/apache/superset/issues/20708

   I was following the DuckDB setup as per this [PR](https://github.com/apache/superset/pull/19317) and was able to load the DB file and create charts.
   
   As a next step I wanted to load S3 parquet files in to Superset using DuckDB in memory option `duckdb:///:memory:` 
   
   Before trying with Superset, I used below Python code to check the DuckDB S3 parquet loading and found it to be working:
   
   ```python
   import duckdb
   cursor = duckdb.connect()
   cursor.execute("INSTALL httpfs;")
   cursor.execute("LOAD httpfs;")
   cursor.execute("SET s3_region='******'")
   cursor.execute("SET s3_access_key_id=''**************")
   cursor.execute("SET s3_secret_access_key='*****************************'")
   cursor.execute("PRAGMA enable_profiling;")
   cursor.execute("SELECT count(*) FROM read_parquet('s3://<bucket>/prefix/*.parquet'")
   ```
   
   When I tried to `SET` the S3 environment values I was getting below error:
   
   **Error:**
   
   ```
   DuckDB Error
   duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"
   
   Did you mean: "TimeZone"
   
   
   This may be triggered by:
   Issue 1002 - The database returned an unexpected error. 
   
   See less
   ````
   
   Full Trace:
   
   ```shell
   
   superset_cache          | 1:M 14 Jul 2022 14:18:00.250 * Background saving terminated with success
   superset_app            | 127.0.0.1 - - [14/Jul/2022:14:18:24 +0000] "GET /health HTTP/1.1" 200 2 "-" "curl/7.74.0"
   superset_app            | 172.20.0.1 - - [14/Jul/2022:14:18:34 +0000] "GET /api/v1/me/ HTTP/1.1" 200 150 "http://localhost:8088/superset/sqllab/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
   superset_app            | 172.20.0.1 - - [14/Jul/2022:14:18:34 +0000] "GET /api/v1/me/ HTTP/1.1" 200 150 "http://localhost:8088/superset/sqllab/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
   superset_app            | Triggering query_id: 60
   superset_app            | 2022-07-14 14:18:35,809:INFO:superset.sqllab.command:Triggering query_id: 60
   superset_app            | Query 60: Executing 1 statement(s)
   superset_app            | 2022-07-14 14:18:35,942:INFO:superset.sql_lab:Query 60: Executing 1 statement(s)
   superset_app            | Query 60: Set query to 'running'
   superset_app            | 2022-07-14 14:18:35,943:INFO:superset.sql_lab:Query 60: Set query to 'running'
   superset_app            | Query 60: Running statement 1 out of 1
   superset_app            | 2022-07-14 14:18:36,083:INFO:superset.sql_lab:Query 60: Running statement 1 out of 1
   superset_app            | Query 60: <class 'RuntimeError'>
   superset_app            | Traceback (most recent call last):
   superset_app            |   File "/app/superset/sql_lab.py", line 272, in execute_sql_statement
   superset_app            |     db_engine_spec.execute(cursor, sql, async_=True)
   superset_app            |   File "/app/superset/db_engine_specs/base.py", line 1261, in execute
   superset_app            |     raise cls.get_dbapi_mapped_exception(ex)
   superset_app            |   File "/app/superset/db_engine_specs/base.py", line 1259, in execute
   superset_app            |     cursor.execute(query)
   superset_app            |   File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 97, in execute
   superset_app            |     raise e
   superset_app            |   File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 85, in execute
   superset_app            |     self.c.execute(statement)
   superset_app            | RuntimeError: Catalog Error: unrecognized configuration parameter "s3_region"
   superset_app            | 
   superset_app            | Did you mean: "TimeZone"
   superset_app            | 2022-07-14 14:18:36,157:ERROR:superset.sql_lab:Query 60: <class 'RuntimeError'>
   superset_app            | Traceback (most recent call last):
   superset_app            |   File "/app/superset/sql_lab.py", line 272, in execute_sql_statement
   superset_app            |     db_engine_spec.execute(cursor, sql, async_=True)
   superset_app            |   File "/app/superset/db_engine_specs/base.py", line 1261, in execute
   superset_app            |     raise cls.get_dbapi_mapped_exception(ex)
   superset_app            |   File "/app/superset/db_engine_specs/base.py", line 1259, in execute
   superset_app            |     cursor.execute(query)
   superset_app            |   File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 97, in execute
   superset_app            |     raise e
   superset_app            |   File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 85, in execute
   superset_app            |     self.c.execute(statement)
   superset_app            | RuntimeError: Catalog Error: unrecognized configuration parameter "s3_region"
   superset_app            | 
   superset_app            | Did you mean: "TimeZone"
   superset_app            | [SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
   superset_app            | Traceback (most recent call last):
   superset_app            |   File "/app/superset/views/base.py", line 207, in wraps
   superset_app            |     return f(self, *args, **kwargs)
   superset_app            |   File "/app/superset/utils/log.py", line 244, in wrapper
   superset_app            |     value = f(*args, **kwargs)
   superset_app            |   File "/app/superset/views/core.py", line 2412, in sql_json
   superset_app            |     command_result: CommandResult = command.run()
   superset_app            |   File "/app/superset/sqllab/command.py", line 114, in run
   superset_app            |     raise ex
   superset_app            |   File "/app/superset/sqllab/command.py", line 98, in run
   superset_app            |     status = self._run_sql_json_exec_from_scratch()
   superset_app            |   File "/app/superset/sqllab/command.py", line 151, in _run_sql_json_exec_from_scratch
   superset_app            |     raise ex
   superset_app            |   File "/app/superset/sqllab/command.py", line 146, in _run_sql_json_exec_from_scratch
   superset_app            |     return self._sql_json_executor.execute(
   superset_app            |   File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
   superset_app            |     raise SupersetErrorsException(
   superset_app            | superset.exceptions.SupersetErrorsException: [SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
   superset_app            | 2022-07-14 14:18:36,229:WARNING:superset.views.base:[SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
   superset_app            | Traceback (most recent call last):
   superset_app            |   File "/app/superset/views/base.py", line 207, in wraps
   superset_app            |     return f(self, *args, **kwargs)
   superset_app            |   File "/app/superset/utils/log.py", line 244, in wrapper
   superset_app            |     value = f(*args, **kwargs)
   superset_app            |   File "/app/superset/views/core.py", line 2412, in sql_json
   superset_app            |     command_result: CommandResult = command.run()
   superset_app            |   File "/app/superset/sqllab/command.py", line 114, in run
   superset_app            |     raise ex
   superset_app            |   File "/app/superset/sqllab/command.py", line 98, in run
   superset_app            |     status = self._run_sql_json_exec_from_scratch()
   superset_app            |   File "/app/superset/sqllab/command.py", line 151, in _run_sql_json_exec_from_scratch
   superset_app            |     raise ex
   superset_app            |   File "/app/superset/sqllab/command.py", line 146, in _run_sql_json_exec_from_scratch
   superset_app            |     return self._sql_json_executor.execute(
   superset_app            |   File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
   superset_app            |     raise SupersetErrorsException(
   superset_app            | superset.exceptions.SupersetErrorsException: [SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mageswaran1989 commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mageswaran1989 commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1211817295

   @alitrack Thanks for the response. I am abale to read data from S3 as you mentioned.
   
   But for each query, I had to give S3 credentials. 
   Any option to set it once and use it across queries?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mause commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mause commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1212031495

   If you can pass configuration to duckdb using superset (https://github.com/Mause/duckdb_engine/#configuration) I would recommend doing that instead


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] alitrack commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
alitrack commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1211577816

   @Mageswaran1989 , @Mause 
   
   I tested with  minio, 
   ```bash 
   /usr/local/opt/minio/bin/minio server --config-dir=/usr/local/etc/minio --address=:9900 /usr/local/var/minio
   ```
   
   and it works, 
   
   ```sql
   install 'httpfs';
   load 'httpfs';
   SET s3_endpoint='127.0.0.1:9900';
   SET s3_access_key_id='minioadmin';
   SET s3_secret_access_key='minioadmin';
   SET s3_url_style = 'path';
   SET s3_use_ssl=false;
   select count(*) from 's3://ontime/*.parquet';
   ```
   
   <img width="442" alt="image" src="https://user-images.githubusercontent.com/20972179/184071422-b4128cd0-f9a8-4579-936f-23621260e64d.png">
   
   
   you need check `Allow DML`
    
   <img width="502" alt="image" src="https://user-images.githubusercontent.com/20972179/184071760-05ad5da6-1cb6-4f7d-ae9c-8fd064aebd43.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] alitrack commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
alitrack commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1224093868

   @Mause 
   Does not work,  even in python code 
   
   ```python
   connect_args={ 
           "preload_extensions": ["httpfs"],
           "config": {
             "s3_endpoint":"127.0.0.1:9900",
             "s3_access_key_id":"minioadmin",
             "s3_secret_access_key":"minioadmin",
             "s3_url_style":"path",
             "s3_use_ssl":False
     }
   }
   from curses import endwin
   from sqlalchemy import create_engine
   
   engine = create_engine("duckdb:///",connect_args=connect_args)
   
   import pandas as pd
   
   def test_s3():
       df = pd.read_sql("""
   
       select count(*) from 's3://ontime/*.parquet'
   
       """, engine)
       print(df)
   try:
       test_s3()
   except Exception as e:
       print(e)
       engine.execute("""
       SET s3_endpoint='127.0.0.1:9900';
       SET s3_access_key_id='minioadmin';
       SET s3_secret_access_key='minioadmin';
       SET s3_url_style = 'path';
       SET s3_use_ssl=0;
       """)
   
   test_s3()
   ```
   
   btw, you can try the Minio Play account (it is public)
   
   ``` sql
   SET s3_endpoint='play.min.io:9000';
   SET s3_access_key_id='Q3AM3UQ867SPQQA43P2F';
   SET s3_secret_access_key='zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG';
   SET s3_url_style = 'path';
   SET s3_use_ssl=true;
   SET s3_region = 'us-east-1';
   
   select * from 's3://sales5m/sales_5m_0.parquet' limit  3;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] alitrack commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
alitrack commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1224282935

   
   works now,
   connect_args should be 
   
   ```python
   
   connect_args={ 
           "preload_extensions": ["httpfs"],
           "config": {
             "s3_endpoint":"127.0.0.1:9900",
             "s3_access_key_id":"minioadmin",
             "s3_secret_access_key":"minioadmin",
             "s3_url_style":"path",
             "s3_use_ssl":"False"
     }
   }
   ```
   
   
   and  in superset, should be 
   
   ```json
   {
       "connect_args": {
           "preload_extensions": [
               "httpfs"
           ],
           "config": {
               "s3_endpoint": "127.0.0.1:9900",
               "s3_access_key_id": "minioadmin",
               "s3_secret_access_key": "minioadmin",
               "s3_url_style": "path",
               "s3_use_ssl": "False"
           }
       }
   }
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] alitrack commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
alitrack commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1224288087

   attention, 
   superset  config does not support boolean, must be string 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] Unable to read S3 parquet files using DuckDB as Database Connector [superset]

Posted by "rajivpatki (via GitHub)" <gi...@apache.org>.
rajivpatki commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1923597807

   > appends a method at the bottom of [db_engine_specs/duckdb.py](https://github.com/apache/superset/blob/master/superset/db_engine_specs/duckdb.py#L80)
   
   This works well, but someone with SQL Lab access can read plaintext credentials with `SELECT * FROM duckdb_settings();`
   
   > If you can pass configuration to duckdb using superset (https://github.com/Mause/duckdb_engine/#configuration) I would recommend doing that instead
   
   Using @Mause's suggestion: any admin with rights to edit the database connection can read the secrets in plain text.
   
   Suggestion: override `execute()` for `DuckDBEngineSpec` in this manner:
   ```
       @classmethod
       def execute(cls, cursor: Any, query: str, **kwargs: Any) -> None:
           sql = f"""
           INSTALL aws;
           LOAD aws;
           INSTALL httpfs;
           LOAD httpfs;
           CALL load_aws_credentials();
           """
   
           if 's3://' in query and 'duckdb_settings()' not in query:
               cursor.execute(sql)
           return super().execute(cursor, query, **kwargs)
   ```
   
   Note that `CALL load_aws_credentials();` from aws extension of duckdb has a **significant hit on query performance**. The above approach allows us to:
   - Bypass loading credentials and extensions when not required
   - Not have to specify credentials in plain text and instead use [AWS Default Credentials Provider Chain](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials-chain.html)
   - Plug a security hole where credentials can be displayed in plain text
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] alitrack commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
alitrack commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1211938637

   There is a dirty job,  
   
   appends a method at the bottom of [db_engine_specs/duckdb.py](https://github.com/apache/superset/blob/master/superset/db_engine_specs/duckdb.py#L80) 
   
   here is a example, 
   
   ```python
       @classmethod
       def execute(cls, cursor: Any, query: str, **kwargs: Any) -> None:
           sql = f"""
           install 'httpfs';
           load 'httpfs';
           SET s3_endpoint='127.0.0.1:9900';
           SET s3_access_key_id='minioadmin';
           SET s3_secret_access_key='minioadmin';
           SET s3_url_style = 'path';
           SET s3_use_ssl=false;
           """
           cursor.execute(sql)
   
           return super().execute(cursor, query, **kwargs)
   
   ```
   
   it overrides the execute method of BaseEngineSpec.
   
   <img width="468" alt="image" src="https://user-images.githubusercontent.com/20972179/184135676-d9d7a7b7-095f-4ddd-9def-d8a14ebe857e.png">
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] alitrack commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
alitrack commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1212812693

   Superset supports ENGINE PARAMETERS, but DuckDB need install and load httpfs extension first, 
   then can pass s3 related configuration
   ``` python 
   from sqlalchemy import create_engine
   connect_args = {'config': {'s3_endpoint': '127.0.0.1:9900', 's3_access_key_id': 'minioadmin', 's3_secret_access_key': 'minioadmin', 's3_url_style': 'path', 's3_use_ssl': 0}}
   engine = create_engine("duckdb:///",connect_args=connect_args)
   ```
   get the issue, 
   
   ```
   duckdb.InvalidInputException: Invalid Input Error: Unrecognized configuration property "s3_endpoint"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] Unable to read S3 parquet files using DuckDB as Database Connector [superset]

Posted by "include (via GitHub)" <gi...@apache.org>.
include commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1968675348

   Hi all,
   
   I almost there too. I am facing another problem but I've successfully connected to S3 via DuckDB to retrieve parquet files using the following setup.
   
   Go to advanced Database connections / (DuckDB) / settings / security and just drop the `connect_args` without the `s3_endpoint`.
   
   ```
   {
       "connect_args": {
           "preload_extensions": ["httpfs"],
           "config": { 
               "s3_region": "eu-west-1",
               "s3_access_key_id": "FOO", 
               "s3_secret_access_key": "BAR"
           }
       }
   }
   ``` 
   
   your query should be something like:
   
   ```
   SELECT * FROM read_parquet('s3://bucket/reports/data/*');
   ```
   
   My problem is that when I try to create a new dataset from this database, I get a weird Schema / Table list which does not represet anything (blank) from the parquet.
   
   I've tried to create a temporary table but without effect too and I know it is reading correctly the file because the `SELECT` and `DESCRIBE` works just nice and I see the contents without issues. Perhaps there might be something when building the dataset which cannot find a "physical" table?!
   
   Would love to get your tips!.
   
   Cheers,
   F
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] alitrack commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
alitrack commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1225098467

   > > attention, superset config does not support boolean, must be string
   > 
   > If it's JSON, it probably just needs to be `false` instead of `False`
   
   you got, 
   with the last version of duckdb-engine, boolean false works too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mause commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mause commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1224314886

   > attention, superset config does not support boolean, must be string
   
   If it's JSON, it probably just needs to be `false` instead of `False`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] Unable to read S3 parquet files using DuckDB as Database Connector [superset]

Posted by "include (via GitHub)" <gi...@apache.org>.
include commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1968676425

   Hi all,
   
   I almost there too. I am facing another problem but I've successfully connected to S3 via DuckDB to retrieve parquet files using the following setup.
   
   Go to advanced Database connections / (DuckDB) / settings / security and just drop the `connect_args` without the `s3_endpoint`.
   
   ```
   {
       "connect_args": {
           "preload_extensions": ["httpfs"],
           "config": { 
               "s3_region": "eu-west-1",
               "s3_access_key_id": "FOO", 
               "s3_secret_access_key": "BAR"
           }
       }
   }
   ``` 
   
   your query should be something like:
   
   ```
   SELECT * FROM read_parquet('s3://bucket/reports/data/*');
   ```
   
   My problem is that when I try to create a new dataset from this database, I get a weird Schema / Table list which does not represet anything (blank) from the parquet.
   
   I've tried to create a temporary table but without effect too and I know it is reading correctly the file because the `SELECT` and `DESCRIBE` works just nice and I see the contents without issues. Perhaps there might be something when building the dataset which cannot find a "physical" table?!
   
   Would love to get your tips!.
   
   Cheers,
   F
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] Unable to read S3 parquet files using DuckDB as Database Connector [superset]

Posted by "rajivpatki (via GitHub)" <gi...@apache.org>.
rajivpatki commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1968791911

   @include 
   > your query should be something like:
   > 
   > ```
   > SELECT * FROM read_parquet('s3://bucket/reports/data/*');
   > ```
   > 
   > My problem is that when I try to create a new dataset from this database, I get a weird Schema / Table list which does not represet anything (blank) from the parquet.
   
   
   I think you have not created a physical table on duckdb. You're only running a query that reads a s3 parquet dataset. Create a table: `CREATE OR REPLACE TABLE main.table_name AS SELECT * FROM read_parquet('s3://...')` or define the dataset as a view with query being `SELECT * FROM read_parquet('s3://...')`
   
   Temporary tables do not survive beyond the life of the connection/cursor: https://duckdb.org/docs/sql/statements/create_table.html#temporary-tables 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mageswaran1989 commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mageswaran1989 commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1187739081

   @Mause Could you please have look on this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mause commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mause commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1221862526

   @alitrack for now, I've added experimental support to `duckdb-engine` (as of 0.6.0) for the preloading of extensions via engine parameters - https://github.com/Mause/duckdb_engine#preloading-extensions-experimental


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mause commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mause commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1225166925

   I believe this issue is resolved then?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mause commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mause commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1193280993

   @Mageswaran1989 unfortunately I'm not a superset developer, and wouldn't know where to start investigating this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mageswaran1989 commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mageswaran1989 commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1187735730

   Update: By patching the `conenct` method of the DuckDB engine [here](https://github.com/Mause/duckdb_engine/blob/master/duckdb_engine/__init__.py#L160) @sachintyagi22 was able to read the parquet files. 
   
   Something like this:
   
   ```python
       def connect(self, *args: Any, **kwargs: Any) -> ConnectionWrapper:
           cursor = duckdb.connect(*args, **kwargs)
           cursor.execute("INSTALL httpfs;")
           cursor.execute("LOAD httpfs;")
           cursor.execute("SET s3_region='******'")
           cursor.execute("SET s3_access_key_id=''**************")
           cursor.execute("SET s3_secret_access_key='*****************************'")
           return ConnectionWrapper(cursor)
   ```
   
   Without this patch, the duckdb session is not abel to recognize the `SET` commands. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mause commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mause commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1224140788

   @alitrack what exception are you getting?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mause commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mause commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1224285181

   > 
   > works now,
   > connect_args should be 
   > 
   > ```python
   > 
   > connect_args={ 
   >         "preload_extensions": ["httpfs"],
   >         "config": {
   >           "s3_endpoint":"127.0.0.1:9900",
   >           "s3_access_key_id":"minioadmin",
   >           "s3_secret_access_key":"minioadmin",
   >           "s3_url_style":"path",
   >           "s3_use_ssl":"False"
   >   }
   > }
   > ```
   > 
   > 
   > and  in superset, should be 
   > 
   > ```json
   > {
   >     "connect_args": {
   >         "preload_extensions": [
   >             "httpfs"
   >         ],
   >         "config": {
   >             "s3_endpoint": "127.0.0.1:9900",
   >             "s3_access_key_id": "minioadmin",
   >             "s3_secret_access_key": "minioadmin",
   >             "s3_url_style": "path",
   >             "s3_use_ssl": "False"
   >         }
   >     }
   > }
   > 
   > ```
   > 
   > 
   
   Yeah I realized the boolean bug after you posted, I'll push a patch release to fix that shortly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] sapcode commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
sapcode commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1342768294

   Dear @Mause,
   
   we have installed the superset with the current helm chart and also installed the DuckDB engine, it shows now in the Databases Connections dropdown. The superset version in the right menu is shown as "Version: 0.0.0-dev".
   
   When we create a new DuckDB database without engine parameter but with "Allow DML" and call SQL-Lab we are able to use the connection settings when executing as DML statements and the data is delivered correctly:
   
   install 'httpfs';
   load 'httpfs';
   SET s3_endpoint='argo-artifacts:9000';
   SET s3_access_key_id='XXX';
   SET s3_secret_access_key='XXX';
   SET s3_url_style = 'path';
   SET s3_use_ssl=false;
   select count(*) from 's3://raw/*.parquet';
   
   However we don't want to provide the credentials in the SQL-Lab but would like to store them in the DuckDB Connector Advanced ENGINE PARAMETER dialog, we have tested the following parameter JSON strings but none of them seems to work:
   
   Test1:
   `{
   	"engine_params": {
   		 "connect_args":{
   				"preload_extensions": ["httpfs"],
   				"config": {
   				  "s3_endpoint": "argo-artifacts:9000",
   				  "s3_access_key_id": "XXX",
   				  "s3_secret_access_key": "XXX",
   				  "s3_url_style": "path",
   				  "s3_use_ssl": "False"
   		  }
   		}
   	}
   }`
   
   Test2:
   `{
   	 "connect_args":{
   		"preload_extensions": ["httpfs"],
   		"config": {
   			  "s3_endpoint": "argo-artifacts:9000",
   			  "s3_access_key_id": "XXX",
   			  "s3_secret_access_key": "XXX",
   			  "s3_url_style": "path",
   			  "s3_use_ssl": "False"
   		  }
   	}
   }`
   
   We always get the same error:
   **An error occurred while fetching databases: "Connection failed, please check your connection settings"**
   In chrome debug we see:
   PUT http://superset.mini.kube/api/v1/database/1 422 (UNPROCESSABLE ENTITY)
   Response {type: 'basic', url: 'http://superset.mini.kube/api/v1/database/1'
   
   When looking at the core.py line 394 - 405 it looks like that "engine_params" is searched first, then "connect_args"
   https://github.com/apache/superset/blob/master/superset/models/core.py
   
           params = extra.get("engine_params", {})
           if nullpool:
               params["poolclass"] = NullPool
   
           connect_args = params.get("connect_args", {})
           if self.impersonate_user:
               self.db_engine_spec.update_impersonation_config(
                   connect_args, str(sqlalchemy_url), effective_username
               )
   
           if connect_args:
               params["connect_args"] = connect_args
   
   
   In Connector Advanced ENGINE PARAMETER dialog, what is the specific connection string which need to be passed ? 
   Are the any restrictions in usage of double " or single ' quotes in the JSON string ? 
   
   Thank you very much!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mause commented on issue #20708: Unable to read S3 parquet files using DuckDB as Database Connector

Posted by GitBox <gi...@apache.org>.
Mause commented on issue #20708:
URL: https://github.com/apache/superset/issues/20708#issuecomment-1342846806

   @sapcode if you have issues with duckdb-engine specifically, please raise them in the duckdb-engine repo. I cannot help with general superset issues, but would recommend that you create a new issue instead of piggy-backing on this one
   
    @apache / @robdiciuccio I'd appreciate it if you closed and locked this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org