You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/03/29 18:35:34 UTC

[GitHub] [superset] betodealmeida opened a new pull request #19416: perf: improve perf in SIP-68 migration

betodealmeida opened a new pull request #19416:
URL: https://github.com/apache/superset/pull/19416


   <!---
   Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   
   Improve performance on the SIP-68 migration script by using `sqloxide` (https://pypi.org/project/sqloxide/) to parse the SQL when extracting dependencies. In case the parsing fails it falls back to using `sqlparse`.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   N/A
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   WIP
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19416: perf: improve perf in SIP-68 migration

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19416:
URL: https://github.com/apache/superset/pull/19416#discussion_r838001744



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -36,21 +37,82 @@
 from sqlalchemy.orm import backref, relationship, Session
 from sqlalchemy.schema import UniqueConstraint
 from sqlalchemy_utils import UUIDType
+from sqloxide import parse_sql
 
 from superset import app, db
 from superset.connectors.sqla.models import ADDITIVE_METRIC_TYPES
 from superset.extensions import encrypted_field_factory
-from superset.sql_parse import ParsedQuery
+from superset.sql_parse import ParsedQuery, Table
 
 # revision identifiers, used by Alembic.
 revision = "b8d3a24d9131"
 down_revision = "5afbb1a5849b"
 
+logger = logging.getLogger("alembic")
+
 Base = declarative_base()
 custom_password_store = app.config["SQLALCHEMY_CUSTOM_PASSWORD_STORE"]
 DB_CONNECTION_MUTATOR = app.config["DB_CONNECTION_MUTATOR"]
 
 
+# mapping between sqloxide and SQLAlchemy dialects
+sqloxide_dialects = {
+    "ansi": {"trino", "trinonative", "presto"},
+    "hive": {"hive", "databricks"},
+    "ms": {"mssql"},
+    "mysql": {"mysql"},
+    "postgres": {
+        "cockroachdb",
+        "hana",
+        "netezza",
+        "postgres",
+        "postgresql",
+        "redshift",
+        "vertica",
+    },
+    "snowflake": {"snowflake"},
+    "sqlite": {"sqlite", "gsheets", "shillelagh"},
+    "clickhouse": {"clickhouse"},
+}
+
+
+def find_nodes_by_key(element: Any, target: str) -> Iterator[Any]:
+    """
+    Find all nodes in a SQL tree matching a given key.
+    """
+    if isinstance(element, list):
+        for child in element:
+            yield from find_nodes_by_key(child, target)
+    elif isinstance(element, dict):
+        for key, value in element.items():
+            if key == target:
+                yield value
+            else:
+                yield from find_nodes_by_key(value, target)
+
+
+def get_dependencies(expression: str, sqla_dialect: str) -> Set[Table]:

Review comment:
       Yeah, `extract`. Didn't notice it was `sqla` not `sql`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] commented on pull request #19416: perf: improve perf in SIP-68 migration

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #19416:
URL: https://github.com/apache/superset/pull/19416#issuecomment-1082400670


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19416](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f92077c) into [master](https://codecov.io/gh/apache/superset/commit/816a2c3e1e9fb1cc82923f8c0948267697c1d234?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (816a2c3) will **decrease** coverage by `0.09%`.
   > The diff coverage is `93.05%`.
   
   > :exclamation: Current head f92077c differs from pull request most recent head f78d837. Consider uploading reports for the commit f78d837 to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19416      +/-   ##
   ==========================================
   - Coverage   66.48%   66.39%   -0.10%     
   ==========================================
     Files        1670     1670              
     Lines       63968    63824     -144     
     Branches     6512     6510       -2     
   ==========================================
   - Hits        42531    42374     -157     
   - Misses      19748    19761      +13     
     Partials     1689     1689              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `81.86% <92.68%> (+0.21%)` | :arrow_up: |
   | postgres | `81.91% <92.68%> (+0.21%)` | :arrow_up: |
   | presto | `?` | |
   | python | `82.00% <92.68%> (-0.13%)` | :arrow_down: |
   | sqlite | `81.67% <92.68%> (+0.20%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ntrols/src/components/CertifiedIconWithTooltip.tsx](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGFja2FnZXMvc3VwZXJzZXQtdWktY2hhcnQtY29udHJvbHMvc3JjL2NvbXBvbmVudHMvQ2VydGlmaWVkSWNvbldpdGhUb29sdGlwLnRzeA==) | `80.00% <ø> (ø)` | |
   | [...d/packages/superset-ui-chart-controls/src/index.ts](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGFja2FnZXMvc3VwZXJzZXQtdWktY2hhcnQtY29udHJvbHMvc3JjL2luZGV4LnRz) | `100.00% <ø> (ø)` | |
   | [...omponents/ColumnConfigControl/ColumnConfigItem.tsx](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGFja2FnZXMvc3VwZXJzZXQtdWktY2hhcnQtY29udHJvbHMvc3JjL3NoYXJlZC1jb250cm9scy9jb21wb25lbnRzL0NvbHVtbkNvbmZpZ0NvbnRyb2wvQ29sdW1uQ29uZmlnSXRlbS50c3g=) | `0.00% <ø> (ø)` | |
   | [...tiveFilters/FiltersConfigModal/DraggableFilter.tsx](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2Rhc2hib2FyZC9jb21wb25lbnRzL25hdGl2ZUZpbHRlcnMvRmlsdGVyc0NvbmZpZ01vZGFsL0RyYWdnYWJsZUZpbHRlci50c3g=) | `71.87% <ø> (ø)` | |
   | [...t/annotation\_layers/annotations/commands/update.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvYW5ub3RhdGlvbl9sYXllcnMvYW5ub3RhdGlvbnMvY29tbWFuZHMvdXBkYXRlLnB5) | `88.23% <ø> (ø)` | |
   | [superset/annotation\_layers/annotations/schemas.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvYW5ub3RhdGlvbl9sYXllcnMvYW5ub3RhdGlvbnMvc2NoZW1hcy5weQ==) | `100.00% <ø> (ø)` | |
   | [superset/cli/examples.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY2xpL2V4YW1wbGVzLnB5) | `0.00% <ø> (ø)` | |
   | [superset/cli/importexport.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY2xpL2ltcG9ydGV4cG9ydC5weQ==) | `80.00% <ø> (ø)` | |
   | [superset/cli/main.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY2xpL21haW4ucHk=) | `0.00% <ø> (ø)` | |
   | [superset/cli/thumbnails.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY2xpL3RodW1ibmFpbHMucHk=) | `0.00% <ø> (ø)` | |
   | ... and [111 more](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [816a2c3...f78d837](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] betodealmeida commented on a change in pull request #19416: perf: improve perf in SIP-68 migration

Posted by GitBox <gi...@apache.org>.
betodealmeida commented on a change in pull request #19416:
URL: https://github.com/apache/superset/pull/19416#discussion_r837984072



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -36,21 +37,82 @@
 from sqlalchemy.orm import backref, relationship, Session
 from sqlalchemy.schema import UniqueConstraint
 from sqlalchemy_utils import UUIDType
+from sqloxide import parse_sql
 
 from superset import app, db
 from superset.connectors.sqla.models import ADDITIVE_METRIC_TYPES
 from superset.extensions import encrypted_field_factory
-from superset.sql_parse import ParsedQuery
+from superset.sql_parse import ParsedQuery, Table
 
 # revision identifiers, used by Alembic.
 revision = "b8d3a24d9131"
 down_revision = "5afbb1a5849b"
 
+logger = logging.getLogger("alembic")
+
 Base = declarative_base()
 custom_password_store = app.config["SQLALCHEMY_CUSTOM_PASSWORD_STORE"]
 DB_CONNECTION_MUTATOR = app.config["DB_CONNECTION_MUTATOR"]
 
 
+# mapping between sqloxide and SQLAlchemy dialects
+sqloxide_dialects = {
+    "ansi": {"trino", "trinonative", "presto"},
+    "hive": {"hive", "databricks"},
+    "ms": {"mssql"},
+    "mysql": {"mysql"},
+    "postgres": {
+        "cockroachdb",
+        "hana",
+        "netezza",
+        "postgres",
+        "postgresql",
+        "redshift",
+        "vertica",
+    },
+    "snowflake": {"snowflake"},
+    "sqlite": {"sqlite", "gsheets", "shillelagh"},
+    "clickhouse": {"clickhouse"},
+}
+
+
+def find_nodes_by_key(element: Any, target: str) -> Iterator[Any]:
+    """
+    Find all nodes in a SQL tree matching a given key.
+    """
+    if isinstance(element, list):
+        for child in element:
+            yield from find_nodes_by_key(child, target)
+    elif isinstance(element, dict):
+        for key, value in element.items():
+            if key == target:
+                yield value
+            else:
+                yield from find_nodes_by_key(value, target)
+
+
+def get_dependencies(expression: str, sqla_dialect: str) -> Set[Table]:
+    """
+    Return all the dependencies from a SQL expression.
+    """
+    dialect = "generic"
+    for dialect, sqla_dialects in sqloxide_dialects.items():
+        if sqla_dialect in sqla_dialects:
+            break
+    try:
+        tree = parse_sql(expression, dialect=dialect)
+    except Exception:  # pylint: disable=broad-except
+        logger.warning("Unable to parse query with sqloxide: %s", expression)
+        # fallback to sqlparse
+        parsed = ParsedQuery(expression)
+        return parsed.tables
+
+    return {
+        Table(*[part["value"] for part in table["name"][::-1]])
+        for table in find_nodes_by_key(tree, "Table")
+    }

Review comment:
       Good point, will do!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] betodealmeida commented on a change in pull request #19416: perf: improve perf in SIP-68 migration

Posted by GitBox <gi...@apache.org>.
betodealmeida commented on a change in pull request #19416:
URL: https://github.com/apache/superset/pull/19416#discussion_r838001724



##########
File path: superset/connectors/sqla/models.py
##########
@@ -2278,8 +2285,7 @@ def write_shadow_dataset(  # pylint: disable=too-many-locals
             )
 
         # physical dataset
-        tables = []
-        if dataset.sql is None:
+        if not dataset.sql:

Review comment:
       Some of our example datasets have `.sql == ''`, which made them to be marked as virtual during the migration. it's not a big deal, since they will still work when we switch to the new models (in the new `Dataset` model the difference between virtual and physical is greatly reduced).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] betodealmeida commented on a change in pull request #19416: perf: improve perf in SIP-68 migration

Posted by GitBox <gi...@apache.org>.
betodealmeida commented on a change in pull request #19416:
URL: https://github.com/apache/superset/pull/19416#discussion_r838002113



##########
File path: superset/connectors/sqla/models.py
##########
@@ -2303,18 +2309,14 @@ def write_shadow_dataset(  # pylint: disable=too-many-locals
             # find referenced tables
             parsed = ParsedQuery(dataset.sql)
             referenced_tables = parsed.tables
-
-            # predicate for finding the referenced tables
-            predicate = or_(
-                *[
-                    and_(
-                        NewTable.schema == (table.schema or dataset.schema),
-                        NewTable.name == table.table,
-                    )
-                    for table in referenced_tables
-                ]
+            tables = load_or_create_tables(
+                session,
+                dataset.database_id,
+                dataset.schema,
+                referenced_tables,
+                conditional_quote,
+                engine,
             )
-            tables = session.query(NewTable).filter(predicate).all()

Review comment:
       We were only assigning tables that already exist. The `load_or_create_tables` function will create any tables that are referenced in the SQL but don't exist yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] betodealmeida commented on a change in pull request #19416: perf: improve perf in SIP-68 migration

Posted by GitBox <gi...@apache.org>.
betodealmeida commented on a change in pull request #19416:
URL: https://github.com/apache/superset/pull/19416#discussion_r837984488



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -36,21 +37,82 @@
 from sqlalchemy.orm import backref, relationship, Session
 from sqlalchemy.schema import UniqueConstraint
 from sqlalchemy_utils import UUIDType
+from sqloxide import parse_sql
 
 from superset import app, db
 from superset.connectors.sqla.models import ADDITIVE_METRIC_TYPES
 from superset.extensions import encrypted_field_factory
-from superset.sql_parse import ParsedQuery
+from superset.sql_parse import ParsedQuery, Table
 
 # revision identifiers, used by Alembic.
 revision = "b8d3a24d9131"
 down_revision = "5afbb1a5849b"
 
+logger = logging.getLogger("alembic")
+
 Base = declarative_base()
 custom_password_store = app.config["SQLALCHEMY_CUSTOM_PASSWORD_STORE"]
 DB_CONNECTION_MUTATOR = app.config["DB_CONNECTION_MUTATOR"]
 
 
+# mapping between sqloxide and SQLAlchemy dialects
+sqloxide_dialects = {
+    "ansi": {"trino", "trinonative", "presto"},
+    "hive": {"hive", "databricks"},
+    "ms": {"mssql"},
+    "mysql": {"mysql"},
+    "postgres": {
+        "cockroachdb",
+        "hana",
+        "netezza",
+        "postgres",
+        "postgresql",
+        "redshift",
+        "vertica",
+    },
+    "snowflake": {"snowflake"},
+    "sqlite": {"sqlite", "gsheets", "shillelagh"},
+    "clickhouse": {"clickhouse"},
+}
+
+
+def find_nodes_by_key(element: Any, target: str) -> Iterator[Any]:
+    """
+    Find all nodes in a SQL tree matching a given key.
+    """
+    if isinstance(element, list):
+        for child in element:
+            yield from find_nodes_by_key(child, target)
+    elif isinstance(element, dict):
+        for key, value in element.items():
+            if key == target:
+                yield value
+            else:
+                yield from find_nodes_by_key(value, target)
+
+
+def get_dependencies(expression: str, sqla_dialect: str) -> Set[Table]:

Review comment:
       I assume you mean `extract_table_references`?
   
   I used `sqla_dialect` to differentiate from the `sqloxide` dialect.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19416: perf: improve perf in SIP-68 migration

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19416:
URL: https://github.com/apache/superset/pull/19416#discussion_r837954242



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -36,21 +37,82 @@
 from sqlalchemy.orm import backref, relationship, Session
 from sqlalchemy.schema import UniqueConstraint
 from sqlalchemy_utils import UUIDType
+from sqloxide import parse_sql
 
 from superset import app, db
 from superset.connectors.sqla.models import ADDITIVE_METRIC_TYPES
 from superset.extensions import encrypted_field_factory
-from superset.sql_parse import ParsedQuery
+from superset.sql_parse import ParsedQuery, Table
 
 # revision identifiers, used by Alembic.
 revision = "b8d3a24d9131"
 down_revision = "5afbb1a5849b"
 
+logger = logging.getLogger("alembic")
+
 Base = declarative_base()
 custom_password_store = app.config["SQLALCHEMY_CUSTOM_PASSWORD_STORE"]
 DB_CONNECTION_MUTATOR = app.config["DB_CONNECTION_MUTATOR"]
 
 
+# mapping between sqloxide and SQLAlchemy dialects
+sqloxide_dialects = {
+    "ansi": {"trino", "trinonative", "presto"},
+    "hive": {"hive", "databricks"},
+    "ms": {"mssql"},
+    "mysql": {"mysql"},
+    "postgres": {
+        "cockroachdb",
+        "hana",
+        "netezza",
+        "postgres",
+        "postgresql",
+        "redshift",
+        "vertica",
+    },
+    "snowflake": {"snowflake"},
+    "sqlite": {"sqlite", "gsheets", "shillelagh"},
+    "clickhouse": {"clickhouse"},
+}
+
+
+def find_nodes_by_key(element: Any, target: str) -> Iterator[Any]:
+    """
+    Find all nodes in a SQL tree matching a given key.
+    """
+    if isinstance(element, list):
+        for child in element:
+            yield from find_nodes_by_key(child, target)
+    elif isinstance(element, dict):
+        for key, value in element.items():
+            if key == target:
+                yield value
+            else:
+                yield from find_nodes_by_key(value, target)
+
+
+def get_dependencies(expression: str, sqla_dialect: str) -> Set[Table]:
+    """
+    Return all the dependencies from a SQL expression.
+    """
+    dialect = "generic"
+    for dialect, sqla_dialects in sqloxide_dialects.items():
+        if sqla_dialect in sqla_dialects:
+            break
+    try:
+        tree = parse_sql(expression, dialect=dialect)
+    except Exception:  # pylint: disable=broad-except
+        logger.warning("Unable to parse query with sqloxide: %s", expression)
+        # fallback to sqlparse
+        parsed = ParsedQuery(expression)
+        return parsed.tables
+
+    return {
+        Table(*[part["value"] for part in table["name"][::-1]])
+        for table in find_nodes_by_key(tree, "Table")
+    }

Review comment:
       Maybe this can be moved to `/superset/migrations/shared/utils.py`? Then later moved to `superset.sql_parse` once we start swapping out sqlparse in other places as well...

##########
File path: tests/unit_tests/migrations/versions/b8d3a24d9131_new_dataset_models_test.py
##########
@@ -0,0 +1,42 @@
+"""
+Test the SIP-68 migration.
+"""
+# pylint: disable=import-outside-toplevel, unused-argument
+
+from pytest_mock import MockerFixture
+
+from superset.sql_parse import Table
+
+
+def test_get_dependencies(mocker: MockerFixture, app_context: None) -> None:

Review comment:
       Thanks for adding the tests! 

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -36,21 +37,82 @@
 from sqlalchemy.orm import backref, relationship, Session
 from sqlalchemy.schema import UniqueConstraint
 from sqlalchemy_utils import UUIDType
+from sqloxide import parse_sql
 
 from superset import app, db
 from superset.connectors.sqla.models import ADDITIVE_METRIC_TYPES
 from superset.extensions import encrypted_field_factory
-from superset.sql_parse import ParsedQuery
+from superset.sql_parse import ParsedQuery, Table
 
 # revision identifiers, used by Alembic.
 revision = "b8d3a24d9131"
 down_revision = "5afbb1a5849b"
 
+logger = logging.getLogger("alembic")
+
 Base = declarative_base()
 custom_password_store = app.config["SQLALCHEMY_CUSTOM_PASSWORD_STORE"]
 DB_CONNECTION_MUTATOR = app.config["DB_CONNECTION_MUTATOR"]
 
 
+# mapping between sqloxide and SQLAlchemy dialects
+sqloxide_dialects = {
+    "ansi": {"trino", "trinonative", "presto"},
+    "hive": {"hive", "databricks"},
+    "ms": {"mssql"},
+    "mysql": {"mysql"},
+    "postgres": {
+        "cockroachdb",
+        "hana",
+        "netezza",
+        "postgres",
+        "postgresql",
+        "redshift",
+        "vertica",
+    },
+    "snowflake": {"snowflake"},
+    "sqlite": {"sqlite", "gsheets", "shillelagh"},
+    "clickhouse": {"clickhouse"},
+}
+
+
+def find_nodes_by_key(element: Any, target: str) -> Iterator[Any]:
+    """
+    Find all nodes in a SQL tree matching a given key.
+    """
+    if isinstance(element, list):
+        for child in element:
+            yield from find_nodes_by_key(child, target)
+    elif isinstance(element, dict):
+        for key, value in element.items():
+            if key == target:
+                yield value
+            else:
+                yield from find_nodes_by_key(value, target)
+
+
+def get_dependencies(expression: str, sqla_dialect: str) -> Set[Table]:

Review comment:
       ```suggestion
   def extra_table_references(sql_text: str, dialect: str) -> Set[Table]:
   ```
   
   Just to be more explicit




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] betodealmeida commented on a change in pull request #19416: perf: improve perf in SIP-68 migration

Posted by GitBox <gi...@apache.org>.
betodealmeida commented on a change in pull request #19416:
URL: https://github.com/apache/superset/pull/19416#discussion_r838001056



##########
File path: scripts/benchmark_migration.py
##########
@@ -102,7 +102,10 @@ def find_models(module: ModuleType) -> List[Type[Model]]:
     while tables:
         table = tables.pop()
         seen.add(table)
-        model = getattr(Base.classes, table)
+        try:
+            model = getattr(Base.classes, table)
+        except AttributeError:
+            continue

Review comment:
       This is needed to run the bechmark migration script on the SIP-68 migration.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19416: perf: improve perf in SIP-68 migration

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19416:
URL: https://github.com/apache/superset/pull/19416#issuecomment-1082400670


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19416](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f92077c) into [master](https://codecov.io/gh/apache/superset/commit/816a2c3e1e9fb1cc82923f8c0948267697c1d234?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (816a2c3) will **decrease** coverage by `0.09%`.
   > The diff coverage is `93.05%`.
   
   > :exclamation: Current head f92077c differs from pull request most recent head 96d513c. Consider uploading reports for the commit 96d513c to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19416      +/-   ##
   ==========================================
   - Coverage   66.48%   66.39%   -0.10%     
   ==========================================
     Files        1670     1670              
     Lines       63968    63824     -144     
     Branches     6512     6510       -2     
   ==========================================
   - Hits        42531    42374     -157     
   - Misses      19748    19761      +13     
     Partials     1689     1689              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `81.86% <92.68%> (+0.21%)` | :arrow_up: |
   | postgres | `81.91% <92.68%> (+0.21%)` | :arrow_up: |
   | presto | `?` | |
   | python | `82.00% <92.68%> (-0.13%)` | :arrow_down: |
   | sqlite | `81.67% <92.68%> (+0.20%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ntrols/src/components/CertifiedIconWithTooltip.tsx](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGFja2FnZXMvc3VwZXJzZXQtdWktY2hhcnQtY29udHJvbHMvc3JjL2NvbXBvbmVudHMvQ2VydGlmaWVkSWNvbldpdGhUb29sdGlwLnRzeA==) | `80.00% <ø> (ø)` | |
   | [...d/packages/superset-ui-chart-controls/src/index.ts](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGFja2FnZXMvc3VwZXJzZXQtdWktY2hhcnQtY29udHJvbHMvc3JjL2luZGV4LnRz) | `100.00% <ø> (ø)` | |
   | [...omponents/ColumnConfigControl/ColumnConfigItem.tsx](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGFja2FnZXMvc3VwZXJzZXQtdWktY2hhcnQtY29udHJvbHMvc3JjL3NoYXJlZC1jb250cm9scy9jb21wb25lbnRzL0NvbHVtbkNvbmZpZ0NvbnRyb2wvQ29sdW1uQ29uZmlnSXRlbS50c3g=) | `0.00% <ø> (ø)` | |
   | [...tiveFilters/FiltersConfigModal/DraggableFilter.tsx](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2Rhc2hib2FyZC9jb21wb25lbnRzL25hdGl2ZUZpbHRlcnMvRmlsdGVyc0NvbmZpZ01vZGFsL0RyYWdnYWJsZUZpbHRlci50c3g=) | `71.87% <ø> (ø)` | |
   | [...t/annotation\_layers/annotations/commands/update.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvYW5ub3RhdGlvbl9sYXllcnMvYW5ub3RhdGlvbnMvY29tbWFuZHMvdXBkYXRlLnB5) | `88.23% <ø> (ø)` | |
   | [superset/annotation\_layers/annotations/schemas.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvYW5ub3RhdGlvbl9sYXllcnMvYW5ub3RhdGlvbnMvc2NoZW1hcy5weQ==) | `100.00% <ø> (ø)` | |
   | [superset/cli/examples.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY2xpL2V4YW1wbGVzLnB5) | `0.00% <ø> (ø)` | |
   | [superset/cli/importexport.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY2xpL2ltcG9ydGV4cG9ydC5weQ==) | `80.00% <ø> (ø)` | |
   | [superset/cli/main.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY2xpL21haW4ucHk=) | `0.00% <ø> (ø)` | |
   | [superset/cli/thumbnails.py](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY2xpL3RodW1ibmFpbHMucHk=) | `0.00% <ø> (ø)` | |
   | ... and [111 more](https://codecov.io/gh/apache/superset/pull/19416/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [816a2c3...96d513c](https://codecov.io/gh/apache/superset/pull/19416?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] betodealmeida merged pull request #19416: perf: improve perf in SIP-68 migration

Posted by GitBox <gi...@apache.org>.
betodealmeida merged pull request #19416:
URL: https://github.com/apache/superset/pull/19416


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org