You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/03/20 14:04:22 UTC

[GitHub] [superset] rwhaling opened a new pull request #19265: feat(duckdb) - add duckdb as DataSource - Fixes #14563

rwhaling opened a new pull request #19265:
URL: https://github.com/apache/superset/pull/19265


   ### SUMMARY
   Adding DuckDB as a datasource, as it's SQLAlchemy dialect appears to have matured a good deal
   https://github.com/Mause/duckdb_engine/pull/219
   To be clear, this is @alitrack's hard work, not mine, just opening up a PR to keep momentum, and working on testing it myself.  Haven't set up for local dev on superset before, so might take me a few days.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   TBD, working on it.
   
   ### TESTING INSTRUCTIONS
   TBD, working on it.
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [x] Has associated issue: https://github.com/apache/superset/issues/14563
   - [ ] Required feature flags:
   - [x] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [x] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a change in pull request #19265: feat(duckdb) - add duckdb as DataSource - Fixes #14563

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on a change in pull request #19265:
URL: https://github.com/apache/superset/pull/19265#discussion_r830773104



##########
File path: superset/db_engine_specs/duckdb.py
##########
@@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import re
+from datetime import datetime
+from typing import Any, Dict, List, Optional, Pattern, Tuple, TYPE_CHECKING
+
+from flask_babel import gettext as __
+from sqlalchemy.engine.reflection import Inspector
+
+from superset.db_engine_specs.base import BaseEngineSpec
+from superset.errors import SupersetErrorType
+from superset.utils import core as utils
+
+if TYPE_CHECKING:
+    # prevent circular imports
+    from superset.models.core import Database
+
+
+COLUMN_DOES_NOT_EXIST_REGEX = re.compile("no such column: (?P<column_name>.+)")
+
+
+class DuckDBEngineSpec(BaseEngineSpec):
+    engine = "duckdb"
+    engine_name = "DuckDB"
+
+    # pylint: disable=line-too-long
+    _time_grain_expressions = {
+        None: "{col}",
+        "PT1S": "DATE_TRUNC('second', {col})",
+        "PT1M": "DATE_TRUNC('minute', {col})",
+        "PT1H": "DATE_TRUNC('hour', {col})",
+        "P1D": "DATE_TRUNC('day', {col})",
+        "P1W": "DATE_TRUNC('week', {col})",
+        "P1M": "DATE_TRUNC('month', {col})",
+        "P0.25Y": "DATE_TRUNC('quarter', {col})",
+        "P1Y": "DATE_TRUNC('year', {col})",
+    }
+
+    custom_errors: Dict[Pattern[str], Tuple[str, SupersetErrorType, Dict[str, Any]]] = {
+        COLUMN_DOES_NOT_EXIST_REGEX: (
+            __('We can\'t seem to resolve the column "%(column_name)s"'),
+            SupersetErrorType.COLUMN_DOES_NOT_EXIST_ERROR,
+            {},
+        ),
+    }
+
+    @classmethod
+    def epoch_to_dttm(cls) -> str:
+        return "datetime({col}, 'unixepoch')"
+
+    @classmethod
+    def get_all_datasource_names(
+        cls, database: "Database", datasource_type: str
+    ) -> List[utils.DatasourceName]:
+        schemas = database.get_all_schema_names(
+            cache=database.schema_cache_enabled,
+            cache_timeout=database.schema_cache_timeout,
+            force=True,
+        )
+        schema = schemas[0]
+        if datasource_type == "table":
+            return database.get_all_table_names_in_schema(
+                schema=schema,
+                force=True,
+                cache=database.table_cache_enabled,
+                cache_timeout=database.table_cache_timeout,
+            )
+        if datasource_type == "view":
+            return database.get_all_view_names_in_schema(
+                schema=schema,
+                force=True,
+                cache=database.table_cache_enabled,
+                cache_timeout=database.table_cache_timeout,
+            )
+        raise Exception(f"Unsupported datasource_type: {datasource_type}")

Review comment:
       This part looks like it can directly inherit the [Super Class](https://github.com/apache/superset/blob/e1d0b83885d89c50e96c6eef872beffe0de7cf50/superset/db_engine_specs/base.py#L825).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Mause commented on a change in pull request #19265: feat(duckdb) - add duckdb as DataSource - Fixes #14563

Posted by GitBox <gi...@apache.org>.
Mause commented on a change in pull request #19265:
URL: https://github.com/apache/superset/pull/19265#discussion_r830715250



##########
File path: superset/db_engine_specs/duckdb.py
##########
@@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import re
+from datetime import datetime
+from typing import Any, Dict, List, Optional, Pattern, Tuple, TYPE_CHECKING
+
+from flask_babel import gettext as __
+from sqlalchemy.engine.reflection import Inspector
+
+from superset.db_engine_specs.base import BaseEngineSpec
+from superset.errors import SupersetErrorType
+from superset.utils import core as utils
+
+if TYPE_CHECKING:
+    # prevent circular imports
+    from superset.models.core import Database
+
+
+COLUMN_DOES_NOT_EXIST_REGEX = re.compile("no such column: (?P<column_name>.+)")
+
+
+class DuckDBEngineSpec(BaseEngineSpec):
+    engine = "duckdb"
+    engine_name = "DuckDB"
+
+    # pylint: disable=line-too-long
+    _time_grain_expressions = {
+        None: "{col}",
+        "PT1S": "DATE_TRUNC('second', {col})",
+        "PT1M": "DATE_TRUNC('minute', {col})",
+        "PT1H": "DATE_TRUNC('hour', {col})",
+        "P1D": "DATE_TRUNC('day', {col})",
+        "P1W": "DATE_TRUNC('week', {col})",
+        "P1M": "DATE_TRUNC('month', {col})",
+        "P0.25Y": "DATE_TRUNC('quarter', {col})",
+        "P1Y": "DATE_TRUNC('year', {col})",
+    }
+
+    custom_errors: Dict[Pattern[str], Tuple[str, SupersetErrorType, Dict[str, Any]]] = {

Review comment:
       Would this be better solved on the duckdb-engine side?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rwhaling commented on pull request #19265: feat: add duckdb as DataSource - Fixes #14563

Posted by GitBox <gi...@apache.org>.
rwhaling commented on pull request #19265:
URL: https://github.com/apache/superset/pull/19265#issuecomment-1075609455


   progress - got a local development environment mostly working, screenshots above.  Working through some bugs still
   Struggling to get a local version of the `duckdb_engine` running - the way either superset or sqlalchemy discover dialects isn't loving a `pip install -e` copy, throwing `AttributeError: module 'duckdb_engine' has no attribute 'name'`
   
   Will do a little more debugging, might close this and open a new PR from a branch off my own fork.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rwhaling commented on a change in pull request #19265: feat(duckdb) - add duckdb as DataSource - Fixes #14563

Posted by GitBox <gi...@apache.org>.
rwhaling commented on a change in pull request #19265:
URL: https://github.com/apache/superset/pull/19265#discussion_r832374775



##########
File path: superset/db_engine_specs/duckdb.py
##########
@@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import re
+from datetime import datetime
+from typing import Any, Dict, List, Optional, Pattern, Tuple, TYPE_CHECKING
+
+from flask_babel import gettext as __
+from sqlalchemy.engine.reflection import Inspector
+
+from superset.db_engine_specs.base import BaseEngineSpec
+from superset.errors import SupersetErrorType
+from superset.utils import core as utils
+
+if TYPE_CHECKING:
+    # prevent circular imports
+    from superset.models.core import Database
+
+
+COLUMN_DOES_NOT_EXIST_REGEX = re.compile("no such column: (?P<column_name>.+)")
+
+
+class DuckDBEngineSpec(BaseEngineSpec):
+    engine = "duckdb"
+    engine_name = "DuckDB"
+
+    # pylint: disable=line-too-long
+    _time_grain_expressions = {
+        None: "{col}",
+        "PT1S": "DATE_TRUNC('second', {col})",
+        "PT1M": "DATE_TRUNC('minute', {col})",
+        "PT1H": "DATE_TRUNC('hour', {col})",
+        "P1D": "DATE_TRUNC('day', {col})",
+        "P1W": "DATE_TRUNC('week', {col})",
+        "P1M": "DATE_TRUNC('month', {col})",
+        "P0.25Y": "DATE_TRUNC('quarter', {col})",
+        "P1Y": "DATE_TRUNC('year', {col})",
+    }
+
+    custom_errors: Dict[Pattern[str], Tuple[str, SupersetErrorType, Dict[str, Any]]] = {

Review comment:
       you would have a better judgment call on that than me; however, this looks to be more or less copy-pasted from the equivalent SQLite datasource - https://github.com/apache/superset/blob/master/superset/db_engine_specs/sqlite.py#L61
   so presumably fine for now at least?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rwhaling commented on a change in pull request #19265: feat(duckdb) - add duckdb as DataSource - Fixes #14563

Posted by GitBox <gi...@apache.org>.
rwhaling commented on a change in pull request #19265:
URL: https://github.com/apache/superset/pull/19265#discussion_r832379224



##########
File path: superset/db_engine_specs/duckdb.py
##########
@@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import re
+from datetime import datetime
+from typing import Any, Dict, List, Optional, Pattern, Tuple, TYPE_CHECKING
+
+from flask_babel import gettext as __
+from sqlalchemy.engine.reflection import Inspector
+
+from superset.db_engine_specs.base import BaseEngineSpec
+from superset.errors import SupersetErrorType
+from superset.utils import core as utils
+
+if TYPE_CHECKING:
+    # prevent circular imports
+    from superset.models.core import Database
+
+
+COLUMN_DOES_NOT_EXIST_REGEX = re.compile("no such column: (?P<column_name>.+)")
+
+
+class DuckDBEngineSpec(BaseEngineSpec):
+    engine = "duckdb"
+    engine_name = "DuckDB"
+
+    # pylint: disable=line-too-long
+    _time_grain_expressions = {
+        None: "{col}",
+        "PT1S": "DATE_TRUNC('second', {col})",
+        "PT1M": "DATE_TRUNC('minute', {col})",
+        "PT1H": "DATE_TRUNC('hour', {col})",
+        "P1D": "DATE_TRUNC('day', {col})",
+        "P1W": "DATE_TRUNC('week', {col})",
+        "P1M": "DATE_TRUNC('month', {col})",
+        "P0.25Y": "DATE_TRUNC('quarter', {col})",
+        "P1Y": "DATE_TRUNC('year', {col})",
+    }
+
+    custom_errors: Dict[Pattern[str], Tuple[str, SupersetErrorType, Dict[str, Any]]] = {
+        COLUMN_DOES_NOT_EXIST_REGEX: (
+            __('We can\'t seem to resolve the column "%(column_name)s"'),
+            SupersetErrorType.COLUMN_DOES_NOT_EXIST_ERROR,
+            {},
+        ),
+    }
+
+    @classmethod
+    def epoch_to_dttm(cls) -> str:
+        return "datetime({col}, 'unixepoch')"
+
+    @classmethod
+    def get_all_datasource_names(
+        cls, database: "Database", datasource_type: str
+    ) -> List[utils.DatasourceName]:
+        schemas = database.get_all_schema_names(
+            cache=database.schema_cache_enabled,
+            cache_timeout=database.schema_cache_timeout,
+            force=True,
+        )
+        schema = schemas[0]
+        if datasource_type == "table":
+            return database.get_all_table_names_in_schema(
+                schema=schema,
+                force=True,
+                cache=database.table_cache_enabled,
+                cache_timeout=database.table_cache_timeout,
+            )
+        if datasource_type == "view":
+            return database.get_all_view_names_in_schema(
+                schema=schema,
+                force=True,
+                cache=database.table_cache_enabled,
+                cache_timeout=database.table_cache_timeout,
+            )
+        raise Exception(f"Unsupported datasource_type: {datasource_type}")

Review comment:
       made the change here - https://github.com/rwhaling/superset/commit/9fa02d7c1cc4b1c672f5f68637b3c922bfae32d6
   I forgot that I can't point an existing PR to a new branch, might need to close this one - will wait for any additional feedback first though.  Still working on getting set up for local testing as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rwhaling commented on pull request #19265: feat: add duckdb as DataSource - Fixes #14563

Posted by GitBox <gi...@apache.org>.
rwhaling commented on pull request #19265:
URL: https://github.com/apache/superset/pull/19265#issuecomment-1075639343


   Got it working on a new branch, still kind of hackish - going to close this and open a fresh PR, apologies.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rwhaling closed pull request #19265: feat: add duckdb as DataSource - Fixes #14563

Posted by GitBox <gi...@apache.org>.
rwhaling closed pull request #19265:
URL: https://github.com/apache/superset/pull/19265


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rwhaling commented on a change in pull request #19265: feat(duckdb) - add duckdb as DataSource - Fixes #14563

Posted by GitBox <gi...@apache.org>.
rwhaling commented on a change in pull request #19265:
URL: https://github.com/apache/superset/pull/19265#discussion_r832370878



##########
File path: superset/db_engine_specs/duckdb.py
##########
@@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import re
+from datetime import datetime
+from typing import Any, Dict, List, Optional, Pattern, Tuple, TYPE_CHECKING
+
+from flask_babel import gettext as __
+from sqlalchemy.engine.reflection import Inspector
+
+from superset.db_engine_specs.base import BaseEngineSpec
+from superset.errors import SupersetErrorType
+from superset.utils import core as utils
+
+if TYPE_CHECKING:
+    # prevent circular imports
+    from superset.models.core import Database
+
+
+COLUMN_DOES_NOT_EXIST_REGEX = re.compile("no such column: (?P<column_name>.+)")
+
+
+class DuckDBEngineSpec(BaseEngineSpec):
+    engine = "duckdb"
+    engine_name = "DuckDB"
+
+    # pylint: disable=line-too-long
+    _time_grain_expressions = {
+        None: "{col}",
+        "PT1S": "DATE_TRUNC('second', {col})",
+        "PT1M": "DATE_TRUNC('minute', {col})",
+        "PT1H": "DATE_TRUNC('hour', {col})",
+        "P1D": "DATE_TRUNC('day', {col})",
+        "P1W": "DATE_TRUNC('week', {col})",
+        "P1M": "DATE_TRUNC('month', {col})",
+        "P0.25Y": "DATE_TRUNC('quarter', {col})",
+        "P1Y": "DATE_TRUNC('year', {col})",
+    }
+
+    custom_errors: Dict[Pattern[str], Tuple[str, SupersetErrorType, Dict[str, Any]]] = {
+        COLUMN_DOES_NOT_EXIST_REGEX: (
+            __('We can\'t seem to resolve the column "%(column_name)s"'),
+            SupersetErrorType.COLUMN_DOES_NOT_EXIST_ERROR,
+            {},
+        ),
+    }
+
+    @classmethod
+    def epoch_to_dttm(cls) -> str:
+        return "datetime({col}, 'unixepoch')"
+
+    @classmethod
+    def get_all_datasource_names(
+        cls, database: "Database", datasource_type: str
+    ) -> List[utils.DatasourceName]:
+        schemas = database.get_all_schema_names(
+            cache=database.schema_cache_enabled,
+            cache_timeout=database.schema_cache_timeout,
+            force=True,
+        )
+        schema = schemas[0]
+        if datasource_type == "table":
+            return database.get_all_table_names_in_schema(
+                schema=schema,
+                force=True,
+                cache=database.table_cache_enabled,
+                cache_timeout=database.table_cache_timeout,
+            )
+        if datasource_type == "view":
+            return database.get_all_view_names_in_schema(
+                schema=schema,
+                force=True,
+                cache=database.table_cache_enabled,
+                cache_timeout=database.table_cache_timeout,
+            )
+        raise Exception(f"Unsupported datasource_type: {datasource_type}")

Review comment:
       Hmm, good catch - 
   https://github.com/apache/superset/blob/master/superset/db_engine_specs/base.py#L825
   incidentally, looks like the SQLite datasource follows the same pattern
   https://github.com/apache/superset/blob/master/superset/db_engine_specs/sqlite.py#L74
   I don't think it would be wise to clean that up in this PR but might be a quick win.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org