You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/03/29 23:57:37 UTC

[GitHub] [superset] ktmud opened a new pull request #19421: perf: migrate new dataset models with INSERT FROM

ktmud opened a new pull request #19421:
URL: https://github.com/apache/superset/pull/19421


   ### SUMMARY
   
   This is another take on #19406 and #19416. The goal is to rewrite the bulk of loading + rewriting operations from Python to native SQL statements by utilizing `INSERT SELECT FROM`.
   
   Still a lot of work to do but this route seems promising. Loading millions of columns + metrics only took 20 seconds on my test box. I'd imagine other operations not as expensive as it.
   
   The whole migration happens in x steps:
   
   - [ ] Copy columns and metrics to the new `sl_columns` table
   - [ ] Tuck additional metadata columns (`verbose_name`, etc) under `extra_json`
   - [ ] Copy `SqlaTable` to the new `sl_datasets` and `sl_tables` table
   - [ ] Copy the relationship tables
      - [ ] table + columns       | via SQL joins
      - [ ] dataset + columns   | via SQL joins
      - [ ] dataset + tables       | via SQL parse
   - [ ] Apply dataset level `is_managed_externally` and `external_url` to columns
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] commented on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (53b3aef) into [master](https://codecov.io/gh/apache/superset/commit/6b136c2bc9a6c9756e5319b045e3c42da06243cb?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6b136c2) will **decrease** coverage by `0.01%`.
   > The diff coverage is `92.85%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   - Coverage   66.57%   66.56%   -0.02%     
   ==========================================
     Files        1675     1675              
     Lines       64092    64122      +30     
     Branches     6519     6519              
   ==========================================
   + Hits        42672    42681       +9     
   - Misses      19729    19750      +21     
     Partials     1691     1691              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.67% <25.00%> (-0.04%)` | :arrow_down: |
   | mysql | `81.91% <92.85%> (+<0.01%)` | :arrow_up: |
   | postgres | `?` | |
   | presto | `52.52% <25.00%> (-0.04%)` | :arrow_down: |
   | python | `82.34% <92.85%> (-0.05%)` | :arrow_down: |
   | sqlite | `81.73% <92.85%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `89.33% <100.00%> (+0.01%)` | :arrow_up: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [superset/reports/commands/log\_prune.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvcmVwb3J0cy9jb21tYW5kcy9sb2dfcHJ1bmUucHk=) | `85.71% <0.00%> (-3.58%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [superset/commands/importers/v1/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbWFuZHMvaW1wb3J0ZXJzL3YxL3V0aWxzLnB5) | `92.20% <0.00%> (-1.30%)` | :arrow_down: |
   | [superset/sql\_parse.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3BhcnNlLnB5) | `97.38% <0.00%> (-0.92%)` | :arrow_down: |
   | [superset/common/query\_object.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbW9uL3F1ZXJ5X29iamVjdC5weQ==) | `94.73% <0.00%> (-0.53%)` | :arrow_down: |
   | [superset/views/base\_api.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvYmFzZV9hcGkucHk=) | `97.89% <0.00%> (-0.43%)` | :arrow_down: |
   | ... and [4 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6b136c2...53b3aef](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
john-bodley commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r838006821



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -292,78 +305,70 @@ def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
         columns.append(
             NewColumn(
                 name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
                 description=metric.description,
+                expression=metric.expression,

Review comment:
       Oh my. I do love me some ABC.

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -292,78 +305,70 @@ def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
         columns.append(
             NewColumn(
                 name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
                 description=metric.description,
+                expression=metric.expression,
+                external_url=target.external_url,
+                extra_json=json.dumps(extra_json) if extra_json else None,
                 is_aggregation=True,
                 is_additive=is_additive,
-                is_physical=False,
-                is_spatial=False,
-                is_partition=False,
                 is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
                 is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
+                is_partition=False,
+                is_physical=False,
+                is_spatial=False,
+                is_temporal=False,
+                type="Unknown",  # figuring this out would require a type inferrer
+                warning_text=metric.warning_text,
             ),
         )
 
-    # physical dataset
-    tables = []
-    if target.sql is None:
-        physical_columns = [column for column in columns if column.is_physical]
-
-        # create table
+    if is_physical_table:
+        # create physical sl_table
         table = NewTable(
             name=target.table_name,
             schema=target.schema,
             catalog=None,  # currently not supported
-            database_id=target.database_id,
-            columns=physical_columns,
+            database_id=database_id,
+            # only save physical columns
+            columns=[column for column in columns if column.is_physical],
             is_managed_externally=target.is_managed_externally,
             external_url=target.external_url,
         )
-        tables.append(table)
-
-    # virtual dataset
+        tables = [table]
+        expression = conditional_quote(target.table_name)
     else:
-        # mark all columns as virtual (not physical)
-        for column in columns:
-            column.is_physical = False
-
-        # find referenced tables
+        # find referenced tables and link to dataset
         parsed = ParsedQuery(target.sql)
         referenced_tables = parsed.tables
-
-        # predicate for finding the referenced tables
         predicate = or_(
             *[
                 and_(
+                    NewTable.database_id == database_id,

Review comment:
       Nice!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r840089807



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +244,557 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,

Review comment:
       I'm porting over the same `id` and `uuid` from the original tables so relationship mapping can be easier.
   
   Info from AuditMixin are also copied over. As the new tables are intended to fully replace the original tables, retaining these info would also be useful for end user experience (especially `changed_on` and `created_on`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] github-actions[bot] commented on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1083850622


   ⚠️ @ktmud Your base branch `master` has just also updated `superset/migrations`.
   
   ❗ **Please consider rebasing your branch to avoid db migration conflicts.**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r839913456



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +241,481 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.table_name.label("name"),
+                SqlaTable.schema,
+                SqlaTable.database_id,
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+            ]
         )
+        # use an inner join to filter out only tables with valid database ids
+        .select_from(
+            sa.join(SqlaTable, Database, SqlaTable.database_id == Database.id)
+        ).where(is_physical_table),
+    )
 
-        columns.append(
-            NewColumn(
-                name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
-                description=metric.description,
-                is_aggregation=True,
-                is_additive=is_additive,
-                is_physical=False,
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # physical dataset
-    if not target.sql:
-        physical_columns = [column for column in columns if column.is_physical]
-
-        # create table
-        table = NewTable(
-            name=target.table_name,
-            schema=target.schema,
-            catalog=None,  # currently not supported
-            database_id=target.database_id,
-            columns=physical_columns,
-            is_managed_externally=target.is_managed_externally,
-            external_url=target.external_url,
-        )
-        tables = [table]
-
-    # virtual dataset
-    else:
-        # mark all columns as virtual (not physical)
-        for column in columns:
-            column.is_physical = False
-
-        # find referenced tables
-        referenced_tables = extract_table_references(target.sql, dialect_class.name)
-        tables = load_or_create_tables(
-            session,
-            target.database_id,
-            target.schema,
-            referenced_tables,
-            conditional_quote,
-        )
+def copy_datasets(session: Session) -> None:
+    """Copy all datasets"""
+    count = session.query(SqlaTable).count()
+    print(f">> Copy {count:,} SqlaTable to `sl_datasets`...")
+    insert_from_select(
+        "sl_datasets",
+        select(
+            [
+                # keep the ids the same for easier migration of relationships
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.id.label("sqlatable_id"),
+                SqlaTable.table_name.label("name"),
+                func.coalesce(SqlaTable.sql, SqlaTable.table_name).label("expression"),
+                is_physical_table.label("is_physical"),
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+                SqlaTable.extra.label("extra_json"),
+            ]
+        ),
+    )
 
-    # create the new dataset
-    dataset = NewDataset(
-        sqlatable_id=target.id,
-        name=target.table_name,
-        expression=target.sql or conditional_quote(target.table_name),
-        tables=tables,
-        columns=columns,
-        is_physical=not target.sql,
-        is_managed_externally=target.is_managed_externally,
-        external_url=target.external_url,
+    print("   Link physical datasets with tables...")
+    # Physical datasets (tables) have the same dataset.id and table.id
+    # as both are from SqlaTable.id
+    insert_from_select(
+        "sl_dataset_tables",
+        select(
+            [
+                NewTable.id.label("dataset_id"),
+                NewTable.id.label("table_id"),
+            ]
+        ),
     )
-    session.add(dataset)
 
 
-def upgrade():
-    # Create tables for the new models.
-    op.create_table(

Review comment:
       The manual specification of these `create_table` commands is not needed anymore. Tables are now created with `Base.metadata.create_all(bind=bind, tables=new_tables)`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (435844c) into [master](https://codecov.io/gh/apache/superset/commit/ab3770667c0b11043b177838f8c2eddd717fcfcc?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ab37706) will **decrease** coverage by `0.19%`.
   > The diff coverage is `93.54%`.
   
   > :exclamation: Current head 435844c differs from pull request most recent head 05d39a1. Consider uploading reports for the commit 05d39a1 to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   - Coverage   66.58%   66.39%   -0.20%     
   ==========================================
     Files        1676     1675       -1     
     Lines       64176    64111      -65     
     Branches     6525     6519       -6     
   ==========================================
   - Hits        42732    42566     -166     
   - Misses      19745    19854     +109     
   + Partials     1699     1691       -8     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `?` | |
   | postgres | `81.95% <93.54%> (+<0.01%)` | :arrow_up: |
   | presto | `?` | |
   | python | `82.00% <93.54%> (-0.38%)` | :arrow_down: |
   | sqlite | `81.72% <93.54%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `88.11% <100.00%> (-1.20%)` | :arrow_down: |
   | [superset/db\_engines/hive.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lcy9oaXZlLnB5) | `0.00% <0.00%> (-85.19%)` | :arrow_down: |
   | [...uperset-frontend/src/explore/exploreUtils/index.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2V4cGxvcmUvZXhwbG9yZVV0aWxzL2luZGV4Lmpz) | `63.90% <0.00%> (-16.55%)` | :arrow_down: |
   | [superset/db\_engine\_specs/hive.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL2hpdmUucHk=) | `70.00% <0.00%> (-15.77%)` | :arrow_down: |
   | [superset-frontend/src/utils/urlUtils.ts](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL3V0aWxzL3VybFV0aWxzLnRz) | `36.73% <0.00%> (-10.21%)` | :arrow_down: |
   | [superset/common/utils/dataframe\_utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbW9uL3V0aWxzL2RhdGFmcmFtZV91dGlscy5weQ==) | `85.71% <0.00%> (-7.15%)` | :arrow_down: |
   | [superset/db\_engine\_specs/presto.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3ByZXN0by5weQ==) | `83.68% <0.00%> (-5.44%)` | :arrow_down: |
   | [superset-frontend/src/components/Icons/Icon.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvSWNvbnMvSWNvbi50c3g=) | `95.23% <0.00%> (-4.77%)` | :arrow_down: |
   | ... and [22 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [ab37706...05d39a1](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r840092583



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +244,557 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.created_by_fk,
+                SqlaTable.changed_by_fk,
+                SqlaTable.table_name.label("name"),
+                SqlaTable.schema,
+                SqlaTable.database_id,
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+            ]
         )
+        # use an inner join to filter out only tables with valid database ids
+        .select_from(
+            sa.join(SqlaTable, Database, SqlaTable.database_id == Database.id)
+        ).where(is_physical_table),
+    )
 
-        columns.append(
-            NewColumn(
-                name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
-                description=metric.description,
-                is_aggregation=True,
-                is_additive=is_additive,
-                is_physical=False,
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # physical dataset
-    if not target.sql:
-        physical_columns = [column for column in columns if column.is_physical]
-
-        # create table
-        table = NewTable(
-            name=target.table_name,
-            schema=target.schema,
-            catalog=None,  # currently not supported
-            database_id=target.database_id,
-            columns=physical_columns,
-            is_managed_externally=target.is_managed_externally,
-            external_url=target.external_url,
-        )
-        tables = [table]
-
-    # virtual dataset
-    else:
-        # mark all columns as virtual (not physical)
-        for column in columns:
-            column.is_physical = False
-
-        # find referenced tables
-        referenced_tables = extract_table_references(target.sql, dialect_class.name)
-        tables = load_or_create_tables(
-            session,
-            target.database_id,
-            target.schema,
-            referenced_tables,
-            conditional_quote,
-        )
+def copy_datasets(session: Session) -> None:
+    """Copy all datasets"""
+    count = session.query(SqlaTable).count()
+    if not count:
+        return
+    print(f">> Copy {count:,} SqlaTable to `sl_datasets`...")
+    insert_from_select(
+        "sl_datasets",
+        select(
+            [
+                # keep the ids the same for easier migration of relationships
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.created_by_fk,
+                SqlaTable.changed_by_fk,
+                SqlaTable.id.label("sqlatable_id"),
+                SqlaTable.table_name.label("name"),
+                func.coalesce(SqlaTable.sql, SqlaTable.table_name).label("expression"),
+                is_physical_table.label("is_physical"),
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+                SqlaTable.extra.label("extra_json"),
+            ]
+        ),
+    )
 
-    # create the new dataset
-    dataset = NewDataset(
-        sqlatable_id=target.id,
-        name=target.table_name,
-        expression=target.sql or conditional_quote(target.table_name),
-        tables=tables,
-        columns=columns,
-        is_physical=not target.sql,
-        is_managed_externally=target.is_managed_externally,
-        external_url=target.external_url,
+    print("   Link physical datasets with tables...")
+    # Physical datasets (tables) have the same dataset.id and table.id
+    # as both are from SqlaTable.id
+    insert_from_select(
+        "sl_dataset_tables",
+        select(
+            [
+                NewTable.id.label("dataset_id"),
+                NewTable.id.label("table_id"),
+            ]
+        ),
     )
-    session.add(dataset)
 
 
-def upgrade():
-    # Create tables for the new models.
-    op.create_table(
+def copy_columns(session: Session) -> None:
+    """Copy columns with active associated SqlTable"""
+    count = session.query(TableColumn).select_from(active_table_columns).count()
+    if not count:
+        return
+    print(f">> Copy {count:,} active table columns to `sl_columns`...")
+    insert_from_select(
         "sl_columns",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Column
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column("type", sa.TEXT(), nullable=False),
-        sa.Column("expression", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_physical",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column("description", sa.TEXT(), nullable=True),
-        sa.Column("warning_text", sa.TEXT(), nullable=True),
-        sa.Column("unit", sa.TEXT(), nullable=True),
-        sa.Column("is_temporal", sa.BOOLEAN(), nullable=False),
-        sa.Column(
-            "is_spatial",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_partition",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_aggregation",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_additive",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_increase_desired",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.PrimaryKeyConstraint("id"),
+        select(
+            [
+                # keep the same column.id so later relationships can be added easier
+                TableColumn.id,
+                TableColumn.uuid,
+                TableColumn.created_on,
+                TableColumn.changed_on,
+                TableColumn.created_by_fk,
+                TableColumn.changed_by_fk,
+                TableColumn.column_name.label("name"),
+                TableColumn.description,
+                func.coalesce(TableColumn.expression, TableColumn.column_name).label(
+                    "expression"
+                ),
+                sa.literal(False).label("is_aggregation"),
+                or_(
+                    TableColumn.expression.is_(None), (TableColumn.expression == "")
+                ).label("is_physical"),
+                TableColumn.is_dttm.label("is_temporal"),
+                func.coalesce(TableColumn.type, "Unknown").label("type"),
+                TableColumn.extra.label("extra_json"),
+            ]
+        ).select_from(active_table_columns),
     )
-    with op.batch_alter_table("sl_columns") as batch_op:
-        batch_op.create_unique_constraint("uq_sl_columns_uuid", ["uuid"])
 
-    op.create_table(
-        "sl_tables",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Table
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("database_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("catalog", sa.TEXT(), nullable=True),
-        sa.Column("schema", sa.TEXT(), nullable=True),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.ForeignKeyConstraint(["database_id"], ["dbs.id"], name="sl_tables_ibfk_1"),
-        sa.PrimaryKeyConstraint("id"),
+    print("   Link physical table columns to `sl_tables`...")
+    insert_from_select(
+        "sl_table_columns",
+        select(
+            [
+                TableColumn.table_id,
+                TableColumn.id.label("column_id"),
+            ]
+        )
+        .select_from(active_table_columns)
+        .where(is_physical_table),
     )
-    with op.batch_alter_table("sl_tables") as batch_op:
-        batch_op.create_unique_constraint("uq_sl_tables_uuid", ["uuid"])
 
-    op.create_table(
-        "sl_table_columns",
-        sa.Column("table_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("column_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.ForeignKeyConstraint(
-            ["column_id"], ["sl_columns.id"], name="sl_table_columns_ibfk_2"
-        ),
-        sa.ForeignKeyConstraint(
-            ["table_id"], ["sl_tables.id"], name="sl_table_columns_ibfk_1"
-        ),
+    print("   Link all columns to `sl_datasets`...")
+    insert_from_select(
+        "sl_dataset_columns",
+        select(
+            [
+                TableColumn.table_id.label("dataset_id"),
+                TableColumn.id.label("column_id"),
+            ],
+        ).select_from(active_table_columns),
     )
 
-    op.create_table(
-        "sl_datasets",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Dataset
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("sqlatable_id", sa.INTEGER(), nullable=True),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column("expression", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_physical",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.PrimaryKeyConstraint("id"),
+
+def copy_metrics(session: Session) -> None:
+    """Copy metrics as virtual columns"""
+    metrics_count = session.query(SqlMetric).select_from(active_metrics).count()
+    if not metrics_count:
+        return
+    # offset metric column ids by the last id of table columns
+    id_offset = session.query(func.max(NewColumn.id)).scalar()
+
+    print(f">> Copy {metrics_count:,} metrics to `sl_columns`...")
+    insert_from_select(
+        "sl_columns",
+        select(
+            [
+                (SqlMetric.id + id_offset).label("id"),
+                SqlMetric.uuid,
+                SqlMetric.created_on,
+                SqlMetric.changed_on,
+                SqlMetric.created_by_fk,
+                SqlMetric.changed_by_fk,
+                SqlMetric.metric_name.label("name"),
+                SqlMetric.expression,
+                SqlMetric.description,
+                sa.literal("Unknown").label("type"),
+                (
+                    sa.func.lower(SqlMetric.metric_type)
+                    .in_(ADDITIVE_METRIC_TYPES_LOWER)
+                    .label("is_additive")
+                ),
+                sa.literal(False).label("is_physical"),
+                sa.literal(False).label("is_temporal"),
+                sa.literal(True).label("is_aggregation"),
+                SqlMetric.extra.label("extra_json"),
+                SqlMetric.warning_text,
+            ]
+        ).select_from(active_metrics),
     )
-    with op.batch_alter_table("sl_datasets") as batch_op:
-        batch_op.create_unique_constraint("uq_sl_datasets_uuid", ["uuid"])
-        batch_op.create_unique_constraint(
-            "uq_sl_datasets_sqlatable_id", ["sqlatable_id"]
-        )
 
-    op.create_table(
+    print("   Link metric columns to datasets...")
+    insert_from_select(
         "sl_dataset_columns",
-        sa.Column("dataset_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("column_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.ForeignKeyConstraint(
-            ["column_id"], ["sl_columns.id"], name="sl_dataset_columns_ibfk_2"
-        ),
-        sa.ForeignKeyConstraint(
-            ["dataset_id"], ["sl_datasets.id"], name="sl_dataset_columns_ibfk_1"
-        ),
+        select(
+            [
+                SqlMetric.table_id.label("dataset_id"),
+                (SqlMetric.id + id_offset).label("column_id"),
+            ],
+        ).select_from(active_metrics),
     )
 
-    op.create_table(
-        "sl_dataset_tables",
-        sa.Column("dataset_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("table_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.ForeignKeyConstraint(
-            ["dataset_id"], ["sl_datasets.id"], name="sl_dataset_tables_ibfk_1"
-        ),
-        sa.ForeignKeyConstraint(
-            ["table_id"], ["sl_tables.id"], name="sl_dataset_tables_ibfk_2"
-        ),
+
+def postprocess_datasets(session: Session) -> None:
+    """
+    Postprocess datasets after insertion to
+      - Quote table names for physical datasets (if needed)
+      - Link referenced tables to virtual datasets
+    """
+    total = session.query(SqlaTable).count()
+    if not total:
+        return
+
+    offset = 0
+    limit = 10000
+
+    joined_tables = sa.join(
+        NewDataset,
+        SqlaTable,
+        NewDataset.sqlatable_id == SqlaTable.id,
+    ).join(
+        Database,
+        Database.id == SqlaTable.database_id,
+        isouter=True,
     )
+    assert session.query(func.count()).select_from(joined_tables).scalar() == total
 
-    # migrate existing datasets to the new models
-    bind = op.get_bind()
-    session = db.Session(bind=bind)  # pylint: disable=no-member
+    print(f">> Run postprocessing on {total} datasets")
+
+    update_count = 0
+
+    def print_update_count():
+        if SHOW_PROGRESS:
+            print(
+                f"   Will update {update_count} datasets" + " " * 20,
+                end="\r",
+            )
+
+    while offset < total:
+        if SHOW_PROGRESS:
+            print(
+                f"   Postprocess dataset {offset + 1}~{min(total, offset + limit)}..."
+                + " " * 30
+            )
+        for (
+            dataset_id,
+            is_physical,
+            expression,
+            database_id,
+            schema,
+            sqlalchemy_uri,
+        ) in session.execute(
+            select(
+                [
+                    NewDataset.id,
+                    NewDataset.is_physical,
+                    NewDataset.expression,
+                    SqlaTable.database_id,
+                    SqlaTable.schema,
+                    Database.sqlalchemy_uri,
+                ]
+            )
+            .select_from(joined_tables)
+            .offset(offset)
+            .limit(limit)
+        ):
+            drivername = (sqlalchemy_uri or "").split("://")[0]
+            if is_physical and drivername:
+                quoted_expression = get_identifier_quoter(drivername)(expression)
+                if quoted_expression != expression:
+                    session.execute(
+                        sa.update(NewDataset)
+                        .where(NewDataset.id == dataset_id)
+                        .values(expression=quoted_expression)
+                    )
+                    update_count += 1
+                    print_update_count()
+            elif not is_physical and expression:
+                table_refrences = extract_table_references(
+                    expression, get_dialect_name(drivername), show_warning=False
+                )

Review comment:
       This table reference extraction should probably be in a separate process/script as well. Removing this will cut at least another 30 min of migration time for us.
   
   @betodealmeida will this info be used any time soon? Or should we design for it later when we actually want to implement features around it? There will be some manual syncing again anyway once users update their SQL queries. This extraction also doesn't capture SQL queries with Jinja well.

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +244,557 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,

Review comment:
       I'm porting over the same `id`, `uuid`, `create_on`, `changed_on` from the original tables so relationship mapping can be easier. As the new tables are intended to fully replace the original tables, retaining these info would also be useful for end user experience (especially `changed_on` and `created_on`).

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +244,557 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.created_by_fk,
+                SqlaTable.changed_by_fk,

Review comment:
       Previous migration does not copy values of these columns to the new tables. I think it'd be useful to retain them, especially the properties from AuditMixin.

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +244,557 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables

Review comment:
       This logic of syncing table schema from datasources is removed. It should lie in another offline script.

##########
File path: superset/migrations/versions/07071313dd52_change_fetch_values_predicate_to_text.py
##########
@@ -30,9 +30,7 @@
 
 import sqlalchemy as sa
 from alembic import op
-from sqlalchemy import and_, func, or_
-from sqlalchemy.dialects import postgresql
-from sqlalchemy.sql.schema import Table

Review comment:
       Bycatch: clean up unused imports.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c4ad786) into [master](https://codecov.io/gh/apache/superset/commit/eab9388f7cdaca20588d4c94c929225fd9d59870?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (eab9388) will **decrease** coverage by `0.07%`.
   > The diff coverage is `93.54%`.
   
   > :exclamation: Current head c4ad786 differs from pull request most recent head 194d572. Consider uploading reports for the commit 194d572 to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   - Coverage   66.58%   66.51%   -0.08%     
   ==========================================
     Files        1676     1676              
     Lines       64176    64191      +15     
     Branches     6525     6525              
   ==========================================
   - Hits        42732    42694      -38     
   - Misses      19745    19798      +53     
     Partials     1699     1699              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.66% <32.25%> (-0.03%)` | :arrow_down: |
   | mysql | `81.90% <93.54%> (+<0.01%)` | :arrow_up: |
   | postgres | `?` | |
   | presto | `?` | |
   | python | `82.22% <93.54%> (-0.16%)` | :arrow_down: |
   | sqlite | `81.72% <93.54%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `88.30% <100.00%> (-1.02%)` | :arrow_down: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [superset/db\_engine\_specs/presto.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3ByZXN0by5weQ==) | `84.30% <0.00%> (-4.82%)` | :arrow_down: |
   | [superset/reports/commands/log\_prune.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvcmVwb3J0cy9jb21tYW5kcy9sb2dfcHJ1bmUucHk=) | `85.71% <0.00%> (-3.58%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [superset/commands/importers/v1/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbWFuZHMvaW1wb3J0ZXJzL3YxL3V0aWxzLnB5) | `92.20% <0.00%> (-1.30%)` | :arrow_down: |
   | [superset/common/query\_object.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbW9uL3F1ZXJ5X29iamVjdC5weQ==) | `94.73% <0.00%> (-0.53%)` | :arrow_down: |
   | ... and [5 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [eab9388...194d572](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (53b3aef) into [master](https://codecov.io/gh/apache/superset/commit/6b136c2bc9a6c9756e5319b045e3c42da06243cb?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6b136c2) will **decrease** coverage by `0.01%`.
   > The diff coverage is `92.85%`.
   
   > :exclamation: Current head 53b3aef differs from pull request most recent head 6f430fb. Consider uploading reports for the commit 6f430fb to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   - Coverage   66.57%   66.56%   -0.02%     
   ==========================================
     Files        1675     1675              
     Lines       64092    64122      +30     
     Branches     6519     6519              
   ==========================================
   + Hits        42672    42681       +9     
   - Misses      19729    19750      +21     
     Partials     1691     1691              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.67% <25.00%> (-0.04%)` | :arrow_down: |
   | mysql | `81.91% <92.85%> (+<0.01%)` | :arrow_up: |
   | postgres | `?` | |
   | presto | `52.52% <25.00%> (-0.04%)` | :arrow_down: |
   | python | `82.34% <92.85%> (-0.05%)` | :arrow_down: |
   | sqlite | `81.73% <92.85%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `89.33% <100.00%> (+0.01%)` | :arrow_up: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [superset/reports/commands/log\_prune.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvcmVwb3J0cy9jb21tYW5kcy9sb2dfcHJ1bmUucHk=) | `85.71% <0.00%> (-3.58%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [superset/commands/importers/v1/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbWFuZHMvaW1wb3J0ZXJzL3YxL3V0aWxzLnB5) | `92.20% <0.00%> (-1.30%)` | :arrow_down: |
   | [superset/sql\_parse.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3BhcnNlLnB5) | `97.38% <0.00%> (-0.92%)` | :arrow_down: |
   | [superset/common/query\_object.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbW9uL3F1ZXJ5X29iamVjdC5weQ==) | `94.73% <0.00%> (-0.53%)` | :arrow_down: |
   | [superset/views/base\_api.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvYmFzZV9hcGkucHk=) | `97.89% <0.00%> (-0.43%)` | :arrow_down: |
   | ... and [4 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6b136c2...6f430fb](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
eschutho commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r840782172



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -150,55 +176,66 @@ def fetch_columns_and_metrics(self, session: Session) -> None:
     Base.metadata,
     sa.Column("table_id", sa.ForeignKey("sl_tables.id")),
     sa.Column("column_id", sa.ForeignKey("sl_columns.id")),
+    UniqueConstraint("table_id", "column_id"),

Review comment:
       For people who have already run this migration, we won't get anything new that's added in this update. Can we instead create a new migration for any additions or changes that won't affect performance?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (435844c) into [master](https://codecov.io/gh/apache/superset/commit/6b136c2bc9a6c9756e5319b045e3c42da06243cb?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6b136c2) will **decrease** coverage by `0.18%`.
   > The diff coverage is `91.08%`.
   
   > :exclamation: Current head 435844c differs from pull request most recent head 3a1e9c3. Consider uploading reports for the commit 3a1e9c3 to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   - Coverage   66.57%   66.39%   -0.19%     
   ==========================================
     Files        1675     1675              
     Lines       64092    64111      +19     
     Branches     6519     6519              
   ==========================================
   - Hits        42672    42566     -106     
   - Misses      19729    19854     +125     
     Partials     1691     1691              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `?` | |
   | postgres | `81.95% <93.75%> (-0.01%)` | :arrow_down: |
   | presto | `?` | |
   | python | `82.00% <93.75%> (-0.39%)` | :arrow_down: |
   | sqlite | `81.72% <93.75%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ugins/legacy-plugin-chart-calendar/src/Calendar.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9sZWdhY3ktcGx1Z2luLWNoYXJ0LWNhbGVuZGFyL3NyYy9DYWxlbmRhci5qcw==) | `0.00% <ø> (ø)` | |
   | [...legacy-plugin-chart-calendar/src/ReactCalendar.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9sZWdhY3ktcGx1Z2luLWNoYXJ0LWNhbGVuZGFyL3NyYy9SZWFjdENhbGVuZGFyLmpzeA==) | `0.00% <0.00%> (ø)` | |
   | [...cy-plugin-chart-calendar/src/vendor/cal-heatmap.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9sZWdhY3ktcGx1Z2luLWNoYXJ0LWNhbGVuZGFyL3NyYy92ZW5kb3IvY2FsLWhlYXRtYXAuanM=) | `0.00% <ø> (ø)` | |
   | [...plugins/legacy-plugin-chart-heatmap/src/Heatmap.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9sZWdhY3ktcGx1Z2luLWNoYXJ0LWhlYXRtYXAvc3JjL0hlYXRtYXAuanM=) | `0.00% <ø> (ø)` | |
   | [...plugins/legacy-preset-chart-nvd3/src/ReactNVD3.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9sZWdhY3ktcHJlc2V0LWNoYXJ0LW52ZDMvc3JjL1JlYWN0TlZEMy5qc3g=) | `0.00% <ø> (ø)` | |
   | [...n-chart-pivot-table/src/react-pivottable/Styles.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9wbHVnaW4tY2hhcnQtcGl2b3QtdGFibGUvc3JjL3JlYWN0LXBpdm90dGFibGUvU3R5bGVzLmpz) | `0.00% <ø> (ø)` | |
   | [...set-frontend/src/components/ModalTrigger/index.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvTW9kYWxUcmlnZ2VyL2luZGV4LmpzeA==) | `100.00% <ø> (ø)` | |
   | [...frontend/src/dashboard/components/Header/index.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2Rhc2hib2FyZC9jb21wb25lbnRzL0hlYWRlci9pbmRleC5qc3g=) | `60.92% <ø> (ø)` | |
   | [superset-frontend/src/views/CRUD/utils.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL3ZpZXdzL0NSVUQvdXRpbHMudHN4) | `65.57% <ø> (ø)` | |
   | [...perset-frontend/src/views/CRUD/welcome/Welcome.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL3ZpZXdzL0NSVUQvd2VsY29tZS9XZWxjb21lLnRzeA==) | `75.00% <ø> (ø)` | |
   | ... and [33 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6b136c2...3a1e9c3](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c4ad786) into [master](https://codecov.io/gh/apache/superset/commit/08aca83f6cba81d37d6d70cfddc7980ae95a7bb5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (08aca83) will **increase** coverage by `0.11%`.
   > The diff coverage is `93.54%`.
   
   > :exclamation: Current head c4ad786 differs from pull request most recent head f4b827e. Consider uploading reports for the commit f4b827e to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   + Coverage   66.39%   66.51%   +0.11%     
   ==========================================
     Files        1676     1676              
     Lines       64211    64191      -20     
     Branches     6537     6525      -12     
   ==========================================
   + Hits        42635    42694      +59     
   + Misses      19877    19798      -79     
     Partials     1699     1699              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.66% <32.25%> (?)` | |
   | mysql | `81.90% <93.54%> (-0.01%)` | :arrow_down: |
   | postgres | `?` | |
   | python | `82.22% <93.54%> (+0.23%)` | :arrow_up: |
   | sqlite | `81.72% <93.54%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `88.30% <100.00%> (+0.19%)` | :arrow_up: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...erset-frontend/src/components/EmptyState/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvRW1wdHlTdGF0ZS9pbmRleC50c3g=) | `69.23% <0.00%> (-5.13%)` | :arrow_down: |
   | [...nd/src/dashboard/components/gridComponents/Tab.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2Rhc2hib2FyZC9jb21wb25lbnRzL2dyaWRDb21wb25lbnRzL1RhYi5qc3g=) | `80.48% <0.00%> (-3.19%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [...uperset-frontend/src/explore/exploreUtils/index.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2V4cGxvcmUvZXhwbG9yZVV0aWxzL2luZGV4Lmpz) | `80.45% <0.00%> (-0.58%)` | :arrow_down: |
   | [superset/views/base\_api.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvYmFzZV9hcGkucHk=) | `97.89% <0.00%> (-0.43%)` | :arrow_down: |
   | [...t-frontend/src/components/AsyncAceEditor/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvQXN5bmNBY2VFZGl0b3IvaW5kZXgudHN4) | `90.90% <0.00%> (-0.21%)` | :arrow_down: |
   | ... and [20 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [08aca83...f4b827e](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (05d39a1) into [master](https://codecov.io/gh/apache/superset/commit/eab9388f7cdaca20588d4c94c929225fd9d59870?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (eab9388) will **increase** coverage by `14.44%`.
   > The diff coverage is `93.54%`.
   
   > :exclamation: Current head 05d39a1 differs from pull request most recent head 194d572. Consider uploading reports for the commit 194d572 to get more accurate results
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #19421       +/-   ##
   ===========================================
   + Coverage   52.12%   66.56%   +14.44%     
   ===========================================
     Files        1676     1676               
     Lines       64176    64191       +15     
     Branches     6525     6525               
   ===========================================
   + Hits        33450    42729     +9279     
   + Misses      29027    19763     -9264     
     Partials     1699     1699               
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.66% <32.25%> (-0.03%)` | :arrow_down: |
   | mysql | `81.90% <93.54%> (?)` | |
   | presto | `52.51% <32.25%> (-0.03%)` | :arrow_down: |
   | python | `82.33% <93.54%> (+29.37%)` | :arrow_up: |
   | sqlite | `81.72% <93.54%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+83.01%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (+24.23%)` | :arrow_up: |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `89.32% <100.00%> (+16.55%)` | :arrow_up: |
   | [superset/config.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29uZmlnLnB5) | `91.41% <0.00%> (+0.33%)` | :arrow_up: |
   | [superset/views/database/views.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvZGF0YWJhc2Uvdmlld3MucHk=) | `31.36% <0.00%> (+0.90%)` | :arrow_up: |
   | [superset/common/utils/query\_cache\_manager.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbW9uL3V0aWxzL3F1ZXJ5X2NhY2hlX21hbmFnZXIucHk=) | `89.41% <0.00%> (+1.17%)` | :arrow_up: |
   | [superset/initialization/\_\_init\_\_.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvaW5pdGlhbGl6YXRpb24vX19pbml0X18ucHk=) | `91.28% <0.00%> (+1.74%)` | :arrow_up: |
   | [superset/sql\_lab.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX2xhYi5weQ==) | `81.64% <0.00%> (+2.73%)` | :arrow_up: |
   | [superset/charts/data/commands/get\_data\_command.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY2hhcnRzL2RhdGEvY29tbWFuZHMvZ2V0X2RhdGFfY29tbWFuZC5weQ==) | `100.00% <0.00%> (+3.70%)` | :arrow_up: |
   | [superset/exceptions.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZXhjZXB0aW9ucy5weQ==) | `90.65% <0.00%> (+3.73%)` | :arrow_up: |
   | ... and [299 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [eab9388...194d572](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c4ad786) into [master](https://codecov.io/gh/apache/superset/commit/5fed10dae2723bbcab46e54c9eb6ca55272c34d3?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5fed10d) will **decrease** coverage by `0.07%`.
   > The diff coverage is `93.54%`.
   
   > :exclamation: Current head c4ad786 differs from pull request most recent head c21e3d1. Consider uploading reports for the commit c21e3d1 to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   - Coverage   66.58%   66.51%   -0.08%     
   ==========================================
     Files        1677     1676       -1     
     Lines       64238    64191      -47     
     Branches     6538     6525      -13     
   ==========================================
   - Hits        42773    42694      -79     
   - Misses      19766    19798      +32     
     Partials     1699     1699              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.66% <32.25%> (-0.03%)` | :arrow_down: |
   | mysql | `81.90% <93.54%> (-0.01%)` | :arrow_down: |
   | postgres | `?` | |
   | presto | `?` | |
   | python | `82.22% <93.54%> (-0.17%)` | :arrow_down: |
   | sqlite | `81.72% <93.54%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `88.30% <100.00%> (-1.02%)` | :arrow_down: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...erset-frontend/src/components/EmptyState/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvRW1wdHlTdGF0ZS9pbmRleC50c3g=) | `69.23% <0.00%> (-5.13%)` | :arrow_down: |
   | [superset/db\_engine\_specs/presto.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3ByZXN0by5weQ==) | `84.30% <0.00%> (-4.70%)` | :arrow_down: |
   | [superset/reports/commands/log\_prune.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvcmVwb3J0cy9jb21tYW5kcy9sb2dfcHJ1bmUucHk=) | `85.71% <0.00%> (-3.58%)` | :arrow_down: |
   | [...nd/src/dashboard/components/gridComponents/Tab.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2Rhc2hib2FyZC9jb21wb25lbnRzL2dyaWRDb21wb25lbnRzL1RhYi5qc3g=) | `80.48% <0.00%> (-3.19%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [superset/commands/importers/v1/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbWFuZHMvaW1wb3J0ZXJzL3YxL3V0aWxzLnB5) | `92.20% <0.00%> (-1.30%)` | :arrow_down: |
   | ... and [27 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [5fed10d...c21e3d1](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud commented on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086218951


   @eschutho I'm proposing to change current migration to no-op and move my updated code to a new migration. 
   
   See an earlier message I sent in Slack:
   
   ---
   
   Hi, I noticed we are making more adjustments to SIP-68 models and have prepared a [couple](https://github.com/apache/superset/pull/19425) of [more](https://github.com/apache/superset/pull/19487) db migrations. I’m wondering whether we should bundle all these migrations (including the first one that’s already merged) into one new migration and change the original migration to no-op.
   
   **Pros:**
   
   - Reduced total migration time: bundle everything should be faster than running them separately
   - We get a chance to fix a couple of more errors such as [using MediumText for MySQL](https://github.com/apache/superset/pull/19421#discussion_r839942807) and [incorrect additive_metric_types matching](https://github.com/apache/superset/pull/19421#discussion_r839903477)
   - We get a chance to copy over other missing data such as [changed on and last updated](https://github.com/apache/superset/pull/19421#discussion_r840089807)
   - We can re-ID the copied entities to follow the original ones, making it easier to spot-check potential data inconsistency bugs down the road
   - Everyone’s db is in a clean and consistent state
   
   **Cons:**
   - Those who already ran the migration and bore the slowness may have to experience it again
   
   Happy to incorporate [#19487](https://github.com/apache/superset/pull/19487/) and [#19425](https://github.com/apache/superset/pull/19425) to [my PR](https://github.com/apache/superset/pull/19421) if they are still needed. (edited)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086218951


   @eschutho I propose to change current migration to no-op and move my updated code to a new migration. 
   
   I DM'ed @betodealmeida and @hughhhh earlier on Slack. Reposting the messages here for visibility:
   
   ---
   
   Hi, I noticed we are making more adjustments to SIP-68 models and have prepared a [couple](https://github.com/apache/superset/pull/19425) of [more](https://github.com/apache/superset/pull/19487) db migrations. I’m wondering whether we should bundle all these migrations (including the first one that’s already merged) into one new migration and change the original migration to no-op.
   
   **Pros:**
   
   - Reduced total migration time: bundle everything should be faster than running them separately
   - We get a chance to fix a couple of more errors such as [using MediumText for MySQL](https://github.com/apache/superset/pull/19421#discussion_r839942807) and [incorrect additive_metric_types matching](https://github.com/apache/superset/pull/19421#discussion_r839903477)
   - We get a chance to copy over other missing data such as [changed on and last updated](https://github.com/apache/superset/pull/19421#discussion_r840089807)
   - We can re-ID the copied entities to follow the original ones, making it easier to spot-check potential data inconsistency bugs down the road
   - Everyone’s db is in a clean and consistent state
   - It's easier to review the db structure in the future
   
   **Cons:**
   - Those who already ran the migration and bore the slowness may have to experience it again
   
   Happy to incorporate [#19487](https://github.com/apache/superset/pull/19487/) and [#19425](https://github.com/apache/superset/pull/19425) to [my PR](https://github.com/apache/superset/pull/19421) if they are still needed.
   
   Btw, I think the `Dataset` model may need a `database_id` column as well. There is the implicit assumption that a dataset can only run on one database. I cannot imagine a case where we need to support a virtual dataset being used on different tables in different databases. Having direct link to databases makes sure existing virtual datasets can be linked to the correct database without relying on an unreliable table name extraction process. Currently if table name extraction fails, a virtual dataset lost its association with a correct table, hence the only link to database. It would require joining `SqlaTable` with `sqlatable_id` to get the correct database id. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (05d39a1) into [master](https://codecov.io/gh/apache/superset/commit/ab3770667c0b11043b177838f8c2eddd717fcfcc?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ab37706) will **decrease** coverage by `0.02%`.
   > The diff coverage is `93.54%`.
   
   > :exclamation: Current head 05d39a1 differs from pull request most recent head ad6e167. Consider uploading reports for the commit ad6e167 to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   - Coverage   66.58%   66.56%   -0.03%     
   ==========================================
     Files        1676     1676              
     Lines       64176    64191      +15     
     Branches     6525     6525              
   ==========================================
   - Hits        42732    42729       -3     
   - Misses      19745    19763      +18     
     Partials     1699     1699              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.66% <32.25%> (-0.03%)` | :arrow_down: |
   | mysql | `81.90% <93.54%> (+<0.01%)` | :arrow_up: |
   | postgres | `?` | |
   | presto | `52.51% <32.25%> (-0.03%)` | :arrow_down: |
   | python | `82.33% <93.54%> (-0.05%)` | :arrow_down: |
   | sqlite | `81.72% <93.54%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `89.32% <100.00%> (+<0.01%)` | :arrow_up: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [superset/reports/commands/log\_prune.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvcmVwb3J0cy9jb21tYW5kcy9sb2dfcHJ1bmUucHk=) | `85.71% <0.00%> (-3.58%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [superset/commands/importers/v1/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbWFuZHMvaW1wb3J0ZXJzL3YxL3V0aWxzLnB5) | `92.20% <0.00%> (-1.30%)` | :arrow_down: |
   | [superset/common/query\_object.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbW9uL3F1ZXJ5X29iamVjdC5weQ==) | `94.73% <0.00%> (-0.53%)` | :arrow_down: |
   | [superset/views/base\_api.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvYmFzZV9hcGkucHk=) | `97.89% <0.00%> (-0.43%)` | :arrow_down: |
   | [superset/reports/commands/execute.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvcmVwb3J0cy9jb21tYW5kcy9leGVjdXRlLnB5) | `91.14% <0.00%> (-0.37%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [eab9388...ad6e167](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r839902326



##########
File path: superset/connectors/base/models.py
##########
@@ -586,7 +586,7 @@ class BaseColumn(AuditMixinNullable, ImportExportMixin):
     type = Column(Text)
     groupby = Column(Boolean, default=True)
     filterable = Column(Boolean, default=True)
-    description = Column(Text)
+    description = Column(MediumText())

Review comment:
       `MediumText` is current type for these fields. They were updated in db migrations at some point. Updating for consistency.

##########
File path: superset/connectors/sqla/models.py
##########
@@ -130,6 +131,7 @@
     "sum",
     "doubleSum",
 }
+ADDITIVE_METRIC_TYPES_LOWER = {op.lower() for op in ADDITIVE_METRIC_TYPES}

Review comment:
       `metric_type.lower()` is compared with `doubleSum`, which will always be `false`. Not sure if the original casing will be used elsewhere, so I added a new variable.

##########
File path: superset/migrations/shared/utils.py
##########
@@ -84,18 +97,29 @@ def find_nodes_by_key(element: Any, target: str) -> Iterator[Any]:
                 yield from find_nodes_by_key(value, target)
 
 
-def extract_table_references(sql_text: str, sqla_dialect: str) -> Set[Table]:
+RE_JINJA_VAR = re.compile(r"\{\{[^\{\}]+\}\}")
+RE_JINJA_BLOCK = re.compile(r"\{[%#][^\{\}%#]+[%#]\}")
+
+
+def extract_table_references(
+    sql_text: str, sqla_dialect: str, show_warning=True
+) -> Set[Table]:
     """
     Return all the dependencies from a SQL sql_text.
     """
     dialect = "generic"
     for dialect, sqla_dialects in sqloxide_dialects.items():
         if sqla_dialect in sqla_dialects:
             break
+    sql_text = RE_JINJA_BLOCK.sub(" ", sql_text)
+    sql_text = RE_JINJA_VAR.sub("abc", sql_text)

Review comment:
       Interpolate Jinja vars to give `sqloxite` a higher chance of successfully parsing the SQL text.

##########
File path: superset/connectors/sqla/models.py
##########
@@ -522,7 +524,7 @@ class SqlaTable(Model, BaseDatasource):  # pylint: disable=too-many-public-metho
         foreign_keys=[database_id],
     )
     schema = Column(String(255))
-    sql = Column(Text)
+    sql = Column(MediumText())

Review comment:
       I found out some columns need to be `MediumText` only after I noticed sql parse was failing because some the SQL statements were cut off when copying to the new table.

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +241,481 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.table_name.label("name"),
+                SqlaTable.schema,
+                SqlaTable.database_id,
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+            ]
         )
+        # use an inner join to filter out only tables with valid database ids
+        .select_from(
+            sa.join(SqlaTable, Database, SqlaTable.database_id == Database.id)
+        ).where(is_physical_table),
+    )
 
-        columns.append(
-            NewColumn(
-                name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
-                description=metric.description,
-                is_aggregation=True,
-                is_additive=is_additive,
-                is_physical=False,
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # physical dataset
-    if not target.sql:
-        physical_columns = [column for column in columns if column.is_physical]
-
-        # create table
-        table = NewTable(
-            name=target.table_name,
-            schema=target.schema,
-            catalog=None,  # currently not supported
-            database_id=target.database_id,
-            columns=physical_columns,
-            is_managed_externally=target.is_managed_externally,
-            external_url=target.external_url,
-        )
-        tables = [table]
-
-    # virtual dataset
-    else:
-        # mark all columns as virtual (not physical)
-        for column in columns:
-            column.is_physical = False
-
-        # find referenced tables
-        referenced_tables = extract_table_references(target.sql, dialect_class.name)
-        tables = load_or_create_tables(
-            session,
-            target.database_id,
-            target.schema,
-            referenced_tables,
-            conditional_quote,
-        )
+def copy_datasets(session: Session) -> None:
+    """Copy all datasets"""
+    count = session.query(SqlaTable).count()
+    print(f">> Copy {count:,} SqlaTable to `sl_datasets`...")
+    insert_from_select(
+        "sl_datasets",
+        select(
+            [
+                # keep the ids the same for easier migration of relationships
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.id.label("sqlatable_id"),
+                SqlaTable.table_name.label("name"),
+                func.coalesce(SqlaTable.sql, SqlaTable.table_name).label("expression"),
+                is_physical_table.label("is_physical"),
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+                SqlaTable.extra.label("extra_json"),
+            ]
+        ),
+    )
 
-    # create the new dataset
-    dataset = NewDataset(
-        sqlatable_id=target.id,
-        name=target.table_name,
-        expression=target.sql or conditional_quote(target.table_name),
-        tables=tables,
-        columns=columns,
-        is_physical=not target.sql,
-        is_managed_externally=target.is_managed_externally,
-        external_url=target.external_url,
+    print("   Link physical datasets with tables...")
+    # Physical datasets (tables) have the same dataset.id and table.id
+    # as both are from SqlaTable.id
+    insert_from_select(
+        "sl_dataset_tables",
+        select(
+            [
+                NewTable.id.label("dataset_id"),
+                NewTable.id.label("table_id"),
+            ]
+        ),
     )
-    session.add(dataset)
 
 
-def upgrade():
-    # Create tables for the new models.
-    op.create_table(
+def copy_columns(session: Session) -> None:
+    """Copy columns with active associated SqlTable"""
+    count = session.query(TableColumn).select_from(active_table_columns).count()
+    print(f">> Copy {count:,} active table columns to `sl_columns`...")
+    insert_from_select(
         "sl_columns",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Column
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column("type", sa.TEXT(), nullable=False),
-        sa.Column("expression", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_physical",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column("description", sa.TEXT(), nullable=True),
-        sa.Column("warning_text", sa.TEXT(), nullable=True),
-        sa.Column("unit", sa.TEXT(), nullable=True),
-        sa.Column("is_temporal", sa.BOOLEAN(), nullable=False),
-        sa.Column(
-            "is_spatial",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_partition",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_aggregation",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_additive",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_increase_desired",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.PrimaryKeyConstraint("id"),
+        select(
+            [
+                # keep the same column.id so later relationships can be added easier
+                TableColumn.id,
+                TableColumn.uuid,
+                TableColumn.created_on,
+                TableColumn.changed_on,
+                TableColumn.column_name.label("name"),
+                TableColumn.description,
+                func.coalesce(TableColumn.expression, TableColumn.column_name).label(
+                    "expression"
+                ),
+                sa.literal(False).label("is_aggregation"),
+                (TableColumn.expression.is_(None) | TableColumn.expression == "").label(
+                    "is_physical"
+                ),
+                TableColumn.is_dttm.label("is_temporal"),
+                func.coalesce(TableColumn.type, "Unknown").label("type"),
+                TableColumn.extra.label("extra_json"),
+            ]
+        ).select_from(active_table_columns),
     )
-    with op.batch_alter_table("sl_columns") as batch_op:
-        batch_op.create_unique_constraint("uq_sl_columns_uuid", ["uuid"])
 
-    op.create_table(
-        "sl_tables",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Table
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("database_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("catalog", sa.TEXT(), nullable=True),
-        sa.Column("schema", sa.TEXT(), nullable=True),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.ForeignKeyConstraint(["database_id"], ["dbs.id"], name="sl_tables_ibfk_1"),
-        sa.PrimaryKeyConstraint("id"),
+    print("   Link physical table columns to `sl_tables`...")
+    insert_from_select(
+        "sl_table_columns",
+        select(
+            [
+                TableColumn.table_id,
+                TableColumn.id.label("column_id"),
+            ]

Review comment:
       Since table and column ids are the same, we can just fill the association table with ids from the original data.

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -53,6 +61,31 @@
 DB_CONNECTION_MUTATOR = app.config["DB_CONNECTION_MUTATOR"]
 
 
+class AuxiliaryColumnsMixin:
+    """
+    Auxiliary columns, a combination of columns added by
+       AuditMixin + ImportExportMixin
+    """
+
+    created_on = sa.Column(sa.DateTime, default=datetime.now, nullable=True)
+    changed_on = sa.Column(
+        sa.DateTime, default=datetime.now, onupdate=datetime.now, nullable=True
+    )
+    uuid = sa.Column(
+        UUIDType(binary=True), primary_key=False, unique=True, default=uuid4
+    )

Review comment:
       Previous migration does not have these columns in the new tables, which resulted them having `null` values. 

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +241,492 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.table_name.label("name"),
+                SqlaTable.schema,
+                SqlaTable.database_id,
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+            ]
         )
+        # use an inner join to filter out only tables with valid database ids
+        .select_from(
+            sa.join(SqlaTable, Database, SqlaTable.database_id == Database.id)
+        ).where(is_physical_table),
+    )
 
-        columns.append(
-            NewColumn(
-                name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
-                description=metric.description,
-                is_aggregation=True,
-                is_additive=is_additive,
-                is_physical=False,
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # physical dataset
-    if not target.sql:
-        physical_columns = [column for column in columns if column.is_physical]
-
-        # create table
-        table = NewTable(
-            name=target.table_name,
-            schema=target.schema,
-            catalog=None,  # currently not supported
-            database_id=target.database_id,
-            columns=physical_columns,
-            is_managed_externally=target.is_managed_externally,
-            external_url=target.external_url,
-        )
-        tables = [table]
-
-    # virtual dataset
-    else:
-        # mark all columns as virtual (not physical)
-        for column in columns:
-            column.is_physical = False
-
-        # find referenced tables
-        referenced_tables = extract_table_references(target.sql, dialect_class.name)
-        tables = load_or_create_tables(
-            session,
-            target.database_id,
-            target.schema,
-            referenced_tables,
-            conditional_quote,
-        )
+def copy_datasets(session: Session) -> None:
+    """Copy all datasets"""
+    count = session.query(SqlaTable).count()
+    print(f">> Copy {count:,} SqlaTable to `sl_datasets`...")
+    insert_from_select(
+        "sl_datasets",
+        select(
+            [
+                # keep the ids the same for easier migration of relationships
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.id.label("sqlatable_id"),
+                SqlaTable.table_name.label("name"),
+                func.coalesce(SqlaTable.sql, SqlaTable.table_name).label("expression"),

Review comment:
       table_name quotes will be added later.

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +241,481 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.table_name.label("name"),
+                SqlaTable.schema,
+                SqlaTable.database_id,
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+            ]
         )
+        # use an inner join to filter out only tables with valid database ids
+        .select_from(
+            sa.join(SqlaTable, Database, SqlaTable.database_id == Database.id)
+        ).where(is_physical_table),
+    )
 
-        columns.append(
-            NewColumn(
-                name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
-                description=metric.description,
-                is_aggregation=True,
-                is_additive=is_additive,
-                is_physical=False,
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # physical dataset
-    if not target.sql:
-        physical_columns = [column for column in columns if column.is_physical]
-
-        # create table
-        table = NewTable(
-            name=target.table_name,
-            schema=target.schema,
-            catalog=None,  # currently not supported
-            database_id=target.database_id,
-            columns=physical_columns,
-            is_managed_externally=target.is_managed_externally,
-            external_url=target.external_url,
-        )
-        tables = [table]
-
-    # virtual dataset
-    else:
-        # mark all columns as virtual (not physical)
-        for column in columns:
-            column.is_physical = False
-
-        # find referenced tables
-        referenced_tables = extract_table_references(target.sql, dialect_class.name)
-        tables = load_or_create_tables(
-            session,
-            target.database_id,
-            target.schema,
-            referenced_tables,
-            conditional_quote,
-        )
+def copy_datasets(session: Session) -> None:
+    """Copy all datasets"""
+    count = session.query(SqlaTable).count()
+    print(f">> Copy {count:,} SqlaTable to `sl_datasets`...")
+    insert_from_select(
+        "sl_datasets",
+        select(
+            [
+                # keep the ids the same for easier migration of relationships
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.id.label("sqlatable_id"),
+                SqlaTable.table_name.label("name"),
+                func.coalesce(SqlaTable.sql, SqlaTable.table_name).label("expression"),
+                is_physical_table.label("is_physical"),
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+                SqlaTable.extra.label("extra_json"),
+            ]
+        ),
+    )
 
-    # create the new dataset
-    dataset = NewDataset(
-        sqlatable_id=target.id,
-        name=target.table_name,
-        expression=target.sql or conditional_quote(target.table_name),
-        tables=tables,
-        columns=columns,
-        is_physical=not target.sql,
-        is_managed_externally=target.is_managed_externally,
-        external_url=target.external_url,
+    print("   Link physical datasets with tables...")
+    # Physical datasets (tables) have the same dataset.id and table.id
+    # as both are from SqlaTable.id
+    insert_from_select(
+        "sl_dataset_tables",
+        select(
+            [
+                NewTable.id.label("dataset_id"),
+                NewTable.id.label("table_id"),
+            ]
+        ),
     )
-    session.add(dataset)
 
 
-def upgrade():
-    # Create tables for the new models.
-    op.create_table(

Review comment:
       The manual duplicate specification of these `create_table` command is not needed anymore. Tables are now created with `Base.metadata.create_all(bind=bind, tables=new_tables)`.

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -150,55 +180,66 @@ def fetch_columns_and_metrics(self, session: Session) -> None:
     Base.metadata,
     sa.Column("table_id", sa.ForeignKey("sl_tables.id")),
     sa.Column("column_id", sa.ForeignKey("sl_columns.id")),
+    UniqueConstraint("table_id", "column_id"),
 )
 
 dataset_column_association_table = sa.Table(
     "sl_dataset_columns",
     Base.metadata,
     sa.Column("dataset_id", sa.ForeignKey("sl_datasets.id")),
     sa.Column("column_id", sa.ForeignKey("sl_columns.id")),
+    UniqueConstraint("dataset_id", "column_id"),
 )
 
 dataset_table_association_table = sa.Table(
     "sl_dataset_tables",
     Base.metadata,
     sa.Column("dataset_id", sa.ForeignKey("sl_datasets.id")),
     sa.Column("table_id", sa.ForeignKey("sl_tables.id")),
+    UniqueConstraint("dataset_id", "table_id"),
 )
 
 
-class NewColumn(Base):
+class NewColumn(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_columns"
 
     id = sa.Column(sa.Integer, primary_key=True)
     name = sa.Column(sa.Text)
     type = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+
+    # TODO: jesseyang
+    # this should probably be nullable=False and default=False
+    # do a migration later
     is_physical = sa.Column(sa.Boolean, default=True)
-    description = sa.Column(sa.Text)
-    warning_text = sa.Column(sa.Text)
+
+    description = sa.Column(MediumText())
+    warning_text = sa.Column(MediumText())
+    unit = sa.Column(sa.Text)

Review comment:
       @betodealmeida `unit` is in [superset.columns.models](https://github.com/apache/superset/blob/a619cb4ea98342a2fdf7f77587d8aa078c7dccef/superset/columns/models.py#L75) but not in the migration script. Should I keep this or not?

##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +241,481 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.table_name.label("name"),
+                SqlaTable.schema,
+                SqlaTable.database_id,
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+            ]
         )
+        # use an inner join to filter out only tables with valid database ids
+        .select_from(
+            sa.join(SqlaTable, Database, SqlaTable.database_id == Database.id)
+        ).where(is_physical_table),
+    )
 
-        columns.append(
-            NewColumn(
-                name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
-                description=metric.description,
-                is_aggregation=True,
-                is_additive=is_additive,
-                is_physical=False,
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # physical dataset
-    if not target.sql:
-        physical_columns = [column for column in columns if column.is_physical]
-
-        # create table
-        table = NewTable(
-            name=target.table_name,
-            schema=target.schema,
-            catalog=None,  # currently not supported
-            database_id=target.database_id,
-            columns=physical_columns,
-            is_managed_externally=target.is_managed_externally,
-            external_url=target.external_url,
-        )
-        tables = [table]
-
-    # virtual dataset
-    else:
-        # mark all columns as virtual (not physical)
-        for column in columns:
-            column.is_physical = False
-
-        # find referenced tables
-        referenced_tables = extract_table_references(target.sql, dialect_class.name)
-        tables = load_or_create_tables(
-            session,
-            target.database_id,
-            target.schema,
-            referenced_tables,
-            conditional_quote,
-        )
+def copy_datasets(session: Session) -> None:
+    """Copy all datasets"""
+    count = session.query(SqlaTable).count()
+    print(f">> Copy {count:,} SqlaTable to `sl_datasets`...")
+    insert_from_select(
+        "sl_datasets",
+        select(
+            [
+                # keep the ids the same for easier migration of relationships
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.id.label("sqlatable_id"),
+                SqlaTable.table_name.label("name"),
+                func.coalesce(SqlaTable.sql, SqlaTable.table_name).label("expression"),
+                is_physical_table.label("is_physical"),
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+                SqlaTable.extra.label("extra_json"),
+            ]
+        ),
+    )
 
-    # create the new dataset
-    dataset = NewDataset(
-        sqlatable_id=target.id,
-        name=target.table_name,
-        expression=target.sql or conditional_quote(target.table_name),
-        tables=tables,
-        columns=columns,
-        is_physical=not target.sql,
-        is_managed_externally=target.is_managed_externally,
-        external_url=target.external_url,
+    print("   Link physical datasets with tables...")
+    # Physical datasets (tables) have the same dataset.id and table.id
+    # as both are from SqlaTable.id
+    insert_from_select(
+        "sl_dataset_tables",
+        select(
+            [
+                NewTable.id.label("dataset_id"),
+                NewTable.id.label("table_id"),
+            ]
+        ),
     )
-    session.add(dataset)
 
 
-def upgrade():
-    # Create tables for the new models.
-    op.create_table(
+def copy_columns(session: Session) -> None:
+    """Copy columns with active associated SqlTable"""
+    count = session.query(TableColumn).select_from(active_table_columns).count()
+    print(f">> Copy {count:,} active table columns to `sl_columns`...")
+    insert_from_select(
         "sl_columns",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Column
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column("type", sa.TEXT(), nullable=False),
-        sa.Column("expression", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_physical",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column("description", sa.TEXT(), nullable=True),
-        sa.Column("warning_text", sa.TEXT(), nullable=True),
-        sa.Column("unit", sa.TEXT(), nullable=True),
-        sa.Column("is_temporal", sa.BOOLEAN(), nullable=False),
-        sa.Column(
-            "is_spatial",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_partition",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_aggregation",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_additive",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_increase_desired",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.PrimaryKeyConstraint("id"),
+        select(
+            [
+                # keep the same column.id so later relationships can be added easier
+                TableColumn.id,
+                TableColumn.uuid,
+                TableColumn.created_on,
+                TableColumn.changed_on,

Review comment:
       I'm porting over the same `id`, `uuid`, `create_on`, `changed_on` from the original tables so relationship mapping can be easier. As the new tables are intended to fully replace the original tables, retaining these info would also be useful for end user experience (especially `changed_on` and `created_on`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r839907037



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -53,6 +61,31 @@
 DB_CONNECTION_MUTATOR = app.config["DB_CONNECTION_MUTATOR"]
 
 
+class AuxiliaryColumnsMixin:
+    """
+    Auxiliary columns, a combination of columns added by
+       AuditMixin + ImportExportMixin
+    """
+
+    created_on = sa.Column(sa.DateTime, default=datetime.now, nullable=True)
+    changed_on = sa.Column(
+        sa.DateTime, default=datetime.now, onupdate=datetime.now, nullable=True
+    )
+    uuid = sa.Column(
+        UUIDType(binary=True), primary_key=False, unique=True, default=uuid4
+    )

Review comment:
       Previous migration does not have these columns in the new tables, which resulted them having `null` values. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r838135084



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -292,78 +305,70 @@ def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
         columns.append(
             NewColumn(
                 name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
                 description=metric.description,
+                expression=metric.expression,

Review comment:
       I know this would make you happy, John! But unfortunately I think this block will be eventually removed once we migrate to INSERT FROM SELECT.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho commented on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
eschutho commented on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086425617


   >Btw, I think the Dataset model may need a database_id column as well. There is the implicit assumption that a dataset can only run on one database. I cannot imagine a case where we need to support a virtual dataset being used on different tables in different databases. Having direct link to databases makes sure existing virtual datasets can be linked to the correct database without relying on an unreliable table name extraction process. Currently if table name extraction fails, a virtual dataset lost its association with a correct table, hence the only link to database. It would require joining SqlaTable with sqlatable_id to get the correct database id.
   
   I believe the having the db id on the table will be important for future features where we need to power a chart by a Table without a dataset, and I'm wary of having the db id in both places in case they become out of sync. I'm not sure if I follow the use case of a table name extraction failing and then the dataset doesn't have a relationship to a db. Doesn't a virtual dataset break if the table extraction doesn't work anyway? What would be the value of having a link to the db but not the table?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
eschutho edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086425617


   >Btw, I think the Dataset model may need a database_id column as well. There is the implicit assumption that a dataset can only run on one database. I cannot imagine a case where we need to support a virtual dataset being used on different tables in different databases. Having direct link to databases makes sure existing virtual datasets can be linked to the correct database without relying on an unreliable table name extraction process. Currently if table name extraction fails, a virtual dataset lost its association with a correct table, hence the only link to database. It would require joining SqlaTable with sqlatable_id to get the correct database id.
   
   Having the db id on the table will be important for future features where we need to power a chart by a Table without a dataset, and I'm wary of having the db id in both places in case they become out of sync. I'm not sure if I follow the use case of a table name extraction failing and then the dataset doesn't have a relationship to a db. Doesn't a virtual dataset break if the table extraction doesn't work anyway? What would be the value of having a link to the db but not the table?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r840099780



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +244,557 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.created_by_fk,
+                SqlaTable.changed_by_fk,
+                SqlaTable.table_name.label("name"),
+                SqlaTable.schema,
+                SqlaTable.database_id,
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+            ]
         )
+        # use an inner join to filter out only tables with valid database ids
+        .select_from(
+            sa.join(SqlaTable, Database, SqlaTable.database_id == Database.id)
+        ).where(is_physical_table),
+    )
 
-        columns.append(
-            NewColumn(
-                name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
-                description=metric.description,
-                is_aggregation=True,
-                is_additive=is_additive,
-                is_physical=False,
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # physical dataset
-    if not target.sql:
-        physical_columns = [column for column in columns if column.is_physical]
-
-        # create table
-        table = NewTable(
-            name=target.table_name,
-            schema=target.schema,
-            catalog=None,  # currently not supported
-            database_id=target.database_id,
-            columns=physical_columns,
-            is_managed_externally=target.is_managed_externally,
-            external_url=target.external_url,
-        )
-        tables = [table]
-
-    # virtual dataset
-    else:
-        # mark all columns as virtual (not physical)
-        for column in columns:
-            column.is_physical = False
-
-        # find referenced tables
-        referenced_tables = extract_table_references(target.sql, dialect_class.name)
-        tables = load_or_create_tables(
-            session,
-            target.database_id,
-            target.schema,
-            referenced_tables,
-            conditional_quote,
-        )
+def copy_datasets(session: Session) -> None:
+    """Copy all datasets"""
+    count = session.query(SqlaTable).count()
+    if not count:
+        return
+    print(f">> Copy {count:,} SqlaTable to `sl_datasets`...")
+    insert_from_select(
+        "sl_datasets",
+        select(
+            [
+                # keep the ids the same for easier migration of relationships
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.created_by_fk,
+                SqlaTable.changed_by_fk,
+                SqlaTable.id.label("sqlatable_id"),
+                SqlaTable.table_name.label("name"),
+                func.coalesce(SqlaTable.sql, SqlaTable.table_name).label("expression"),
+                is_physical_table.label("is_physical"),
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+                SqlaTable.extra.label("extra_json"),
+            ]
+        ),
+    )
 
-    # create the new dataset
-    dataset = NewDataset(
-        sqlatable_id=target.id,
-        name=target.table_name,
-        expression=target.sql or conditional_quote(target.table_name),
-        tables=tables,
-        columns=columns,
-        is_physical=not target.sql,
-        is_managed_externally=target.is_managed_externally,
-        external_url=target.external_url,
+    print("   Link physical datasets with tables...")
+    # Physical datasets (tables) have the same dataset.id and table.id
+    # as both are from SqlaTable.id
+    insert_from_select(
+        "sl_dataset_tables",
+        select(
+            [
+                NewTable.id.label("dataset_id"),
+                NewTable.id.label("table_id"),
+            ]
+        ),
     )
-    session.add(dataset)
 
 
-def upgrade():
-    # Create tables for the new models.
-    op.create_table(
+def copy_columns(session: Session) -> None:
+    """Copy columns with active associated SqlTable"""
+    count = session.query(TableColumn).select_from(active_table_columns).count()
+    if not count:
+        return
+    print(f">> Copy {count:,} active table columns to `sl_columns`...")
+    insert_from_select(
         "sl_columns",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Column
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column("type", sa.TEXT(), nullable=False),
-        sa.Column("expression", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_physical",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column("description", sa.TEXT(), nullable=True),
-        sa.Column("warning_text", sa.TEXT(), nullable=True),
-        sa.Column("unit", sa.TEXT(), nullable=True),
-        sa.Column("is_temporal", sa.BOOLEAN(), nullable=False),
-        sa.Column(
-            "is_spatial",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_partition",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_aggregation",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_additive",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_increase_desired",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.PrimaryKeyConstraint("id"),
+        select(
+            [
+                # keep the same column.id so later relationships can be added easier
+                TableColumn.id,
+                TableColumn.uuid,
+                TableColumn.created_on,
+                TableColumn.changed_on,
+                TableColumn.created_by_fk,
+                TableColumn.changed_by_fk,
+                TableColumn.column_name.label("name"),
+                TableColumn.description,
+                func.coalesce(TableColumn.expression, TableColumn.column_name).label(
+                    "expression"
+                ),
+                sa.literal(False).label("is_aggregation"),
+                or_(
+                    TableColumn.expression.is_(None), (TableColumn.expression == "")
+                ).label("is_physical"),
+                TableColumn.is_dttm.label("is_temporal"),
+                func.coalesce(TableColumn.type, "Unknown").label("type"),
+                TableColumn.extra.label("extra_json"),
+            ]
+        ).select_from(active_table_columns),
     )
-    with op.batch_alter_table("sl_columns") as batch_op:
-        batch_op.create_unique_constraint("uq_sl_columns_uuid", ["uuid"])
 
-    op.create_table(
-        "sl_tables",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Table
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("database_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("catalog", sa.TEXT(), nullable=True),
-        sa.Column("schema", sa.TEXT(), nullable=True),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.ForeignKeyConstraint(["database_id"], ["dbs.id"], name="sl_tables_ibfk_1"),
-        sa.PrimaryKeyConstraint("id"),
+    print("   Link physical table columns to `sl_tables`...")
+    insert_from_select(
+        "sl_table_columns",
+        select(
+            [
+                TableColumn.table_id,
+                TableColumn.id.label("column_id"),
+            ]
+        )
+        .select_from(active_table_columns)
+        .where(is_physical_table),
     )
-    with op.batch_alter_table("sl_tables") as batch_op:
-        batch_op.create_unique_constraint("uq_sl_tables_uuid", ["uuid"])
 
-    op.create_table(
-        "sl_table_columns",
-        sa.Column("table_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("column_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.ForeignKeyConstraint(
-            ["column_id"], ["sl_columns.id"], name="sl_table_columns_ibfk_2"
-        ),
-        sa.ForeignKeyConstraint(
-            ["table_id"], ["sl_tables.id"], name="sl_table_columns_ibfk_1"
-        ),
+    print("   Link all columns to `sl_datasets`...")
+    insert_from_select(
+        "sl_dataset_columns",
+        select(
+            [
+                TableColumn.table_id.label("dataset_id"),
+                TableColumn.id.label("column_id"),
+            ],
+        ).select_from(active_table_columns),
     )
 
-    op.create_table(
-        "sl_datasets",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Dataset
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("sqlatable_id", sa.INTEGER(), nullable=True),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column("expression", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_physical",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.PrimaryKeyConstraint("id"),
+
+def copy_metrics(session: Session) -> None:
+    """Copy metrics as virtual columns"""
+    metrics_count = session.query(SqlMetric).select_from(active_metrics).count()
+    if not metrics_count:
+        return
+    # offset metric column ids by the last id of table columns
+    id_offset = session.query(func.max(NewColumn.id)).scalar()
+
+    print(f">> Copy {metrics_count:,} metrics to `sl_columns`...")
+    insert_from_select(
+        "sl_columns",
+        select(
+            [
+                (SqlMetric.id + id_offset).label("id"),
+                SqlMetric.uuid,
+                SqlMetric.created_on,
+                SqlMetric.changed_on,
+                SqlMetric.created_by_fk,
+                SqlMetric.changed_by_fk,
+                SqlMetric.metric_name.label("name"),
+                SqlMetric.expression,
+                SqlMetric.description,
+                sa.literal("Unknown").label("type"),
+                (
+                    sa.func.lower(SqlMetric.metric_type)
+                    .in_(ADDITIVE_METRIC_TYPES_LOWER)
+                    .label("is_additive")
+                ),
+                sa.literal(False).label("is_physical"),
+                sa.literal(False).label("is_temporal"),
+                sa.literal(True).label("is_aggregation"),
+                SqlMetric.extra.label("extra_json"),
+                SqlMetric.warning_text,
+            ]
+        ).select_from(active_metrics),
     )
-    with op.batch_alter_table("sl_datasets") as batch_op:
-        batch_op.create_unique_constraint("uq_sl_datasets_uuid", ["uuid"])
-        batch_op.create_unique_constraint(
-            "uq_sl_datasets_sqlatable_id", ["sqlatable_id"]
-        )
 
-    op.create_table(
+    print("   Link metric columns to datasets...")
+    insert_from_select(
         "sl_dataset_columns",
-        sa.Column("dataset_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("column_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.ForeignKeyConstraint(
-            ["column_id"], ["sl_columns.id"], name="sl_dataset_columns_ibfk_2"
-        ),
-        sa.ForeignKeyConstraint(
-            ["dataset_id"], ["sl_datasets.id"], name="sl_dataset_columns_ibfk_1"
-        ),
+        select(
+            [
+                SqlMetric.table_id.label("dataset_id"),
+                (SqlMetric.id + id_offset).label("column_id"),
+            ],
+        ).select_from(active_metrics),
     )
 
-    op.create_table(
-        "sl_dataset_tables",
-        sa.Column("dataset_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("table_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.ForeignKeyConstraint(
-            ["dataset_id"], ["sl_datasets.id"], name="sl_dataset_tables_ibfk_1"
-        ),
-        sa.ForeignKeyConstraint(
-            ["table_id"], ["sl_tables.id"], name="sl_dataset_tables_ibfk_2"
-        ),
+
+def postprocess_datasets(session: Session) -> None:
+    """
+    Postprocess datasets after insertion to
+      - Quote table names for physical datasets (if needed)
+      - Link referenced tables to virtual datasets
+    """
+    total = session.query(SqlaTable).count()
+    if not total:
+        return
+
+    offset = 0
+    limit = 10000
+
+    joined_tables = sa.join(
+        NewDataset,
+        SqlaTable,
+        NewDataset.sqlatable_id == SqlaTable.id,
+    ).join(
+        Database,
+        Database.id == SqlaTable.database_id,
+        isouter=True,
     )
+    assert session.query(func.count()).select_from(joined_tables).scalar() == total
 
-    # migrate existing datasets to the new models
-    bind = op.get_bind()
-    session = db.Session(bind=bind)  # pylint: disable=no-member
+    print(f">> Run postprocessing on {total} datasets")
+
+    update_count = 0
+
+    def print_update_count():
+        if SHOW_PROGRESS:
+            print(
+                f"   Will update {update_count} datasets" + " " * 20,
+                end="\r",
+            )
+
+    while offset < total:
+        if SHOW_PROGRESS:
+            print(
+                f"   Postprocess dataset {offset + 1}~{min(total, offset + limit)}..."
+                + " " * 30
+            )
+        for (
+            dataset_id,
+            is_physical,
+            expression,
+            database_id,
+            schema,
+            sqlalchemy_uri,
+        ) in session.execute(
+            select(
+                [
+                    NewDataset.id,
+                    NewDataset.is_physical,
+                    NewDataset.expression,
+                    SqlaTable.database_id,
+                    SqlaTable.schema,
+                    Database.sqlalchemy_uri,
+                ]
+            )
+            .select_from(joined_tables)
+            .offset(offset)
+            .limit(limit)
+        ):
+            drivername = (sqlalchemy_uri or "").split("://")[0]
+            if is_physical and drivername:
+                quoted_expression = get_identifier_quoter(drivername)(expression)
+                if quoted_expression != expression:
+                    session.execute(
+                        sa.update(NewDataset)
+                        .where(NewDataset.id == dataset_id)
+                        .values(expression=quoted_expression)
+                    )
+                    update_count += 1
+                    print_update_count()
+            elif not is_physical and expression:
+                table_refrences = extract_table_references(
+                    expression, get_dialect_name(drivername), show_warning=False
+                )
+                found_tables = find_tables(
+                    session, database_id, schema, table_refrences
+                )
+                if found_tables:
+                    op.bulk_insert(
+                        dataset_table_association_table,
+                        [
+                            {"dataset_id": dataset_id, "table_id": table.id}
+                            for table in found_tables
+                        ],
+                    )
+                    update_count += 1
+                    print_update_count()
+        session.flush()
+        offset += limit
+    if SHOW_PROGRESS:
+        print("")
+
+
+def postprocess_columns(session: Session) -> None:
+    """
+    At this step, we will
+      - Add engine specific quotes to `expression` of physical columns
+      - Tuck some extra metadata to `extra_json`
+    """
+    total = session.query(NewColumn).count()
+    if not total:
+        return
+
+    id_offset = (
+        session.query(func.max(NewColumn.id))
+        .filter(not_(NewColumn.is_aggregation))
+        .scalar()
+    )
+
+    def get_joined_tables(offset, limit):
+        return (
+            sa.join(
+                session.query(NewColumn)
+                .offset(offset)
+                .limit(limit)
+                .subquery("sl_columns"),
+                TableColumn,
+                TableColumn.id == NewColumn.id,
+                isouter=True,
+            )
+            .join(
+                SqlMetric,
+                # use NewColumn.id - id_offset instead of SqlMetric.id + id_offset
+                # to improve join performance.
+                and_(
+                    NewColumn.id > id_offset, SqlMetric.id == NewColumn.id - id_offset
+                ),
+                isouter=True,
+            )
+            .join(
+                SqlaTable,
+                SqlaTable.id == func.coalesce(TableColumn.table_id, SqlMetric.table_id),
+                isouter=True,
+            )
+            .join(Database, Database.id == SqlaTable.database_id, isouter=True)
+        )
+
+    offset = 0
+    limit = 100000
+
+    print(f">> Run postprocessing on {total:,} columns")
+
+    update_count = 0
+
+    def print_update_count():
+        if SHOW_PROGRESS:
+            print(
+                f"   Will update {update_count} columns" + " " * 20,
+                end="\r",
+            )
 
-    datasets = session.query(SqlaTable).all()
-    for dataset in datasets:
-        dataset.fetch_columns_and_metrics(session)
-        after_insert(target=dataset)
+    while offset < total:
+        query = (
+            select(
+                [
+                    NewColumn.id,
+                    NewColumn.is_physical,
+                    TableColumn.column_name,
+                    Database.sqlalchemy_uri,
+                    TableColumn.groupby,
+                    TableColumn.filterable,
+                    func.coalesce(
+                        TableColumn.verbose_name, SqlMetric.verbose_name
+                    ).label("verbose_name"),
+                    TableColumn.python_date_format,
+                    SqlMetric.d3format,
+                    SqlMetric.metric_type,
+                    NewColumn.extra_json,
+                    SqlaTable.is_managed_externally,
+                    SqlaTable.external_url,
+                ]
+            )
+            .select_from(get_joined_tables(offset, limit))
+            .where(
+                # pre-filter to columns with potential updates
+                or_(
+                    NewColumn.is_physical,
+                    TableColumn.groupby.is_(False),
+                    TableColumn.filterable.is_(False),
+                    TableColumn.verbose_name.isnot(None),
+                    TableColumn.verbose_name.isnot(None),
+                    SqlMetric.verbose_name.isnot(None),
+                    SqlMetric.d3format.isnot(None),
+                    SqlMetric.metric_type.isnot(None),
+                )
+            )
+        )
+
+        if SHOW_PROGRESS:
+            start = offset + 1
+            end = min(total, offset + limit)
+            count = session.query(func.count()).select_from(query).scalar()
+            print(f"   Column {start:,} to {end:,}: {count:,} may be updated")
+
+        for (
+            column_id,
+            is_physical,
+            column_name,
+            sqlalchemy_uri,
+            groupby,
+            filterable,
+            verbose_name,
+            python_date_format,
+            d3format,
+            metric_type,
+            extra_json,
+            is_managed_externally,
+            external_url,
+        ) in session.execute(query):
+            try:
+                extra = json.loads(extra_json or "{}")
+            except json.decoder.JSONDecodeError:
+                extra = {}
+            updated_extra = {**extra}
+            updates = {}
+
+            # update expression for physical table columns
+            if is_physical and column_name and sqlalchemy_uri:
+                drivername = sqlalchemy_uri.split("://")[0]
+                if is_physical and drivername:
+                    quoted_expression = get_identifier_quoter(drivername)(column_name)
+                    if quoted_expression != column_name:
+                        updates["expression"] = quoted_expression
+
+            if is_managed_externally:
+                updates["is_managed_externally"] = True
+            if external_url:
+                updates["external_url"] = external_url
+
+            # update extra json
+            if groupby is False:
+                updated_extra["groupby"] = groupby
+            if filterable is False:
+                updated_extra["filterable"] = filterable
+            if verbose_name is not None:
+                updated_extra["verbose_name"] = verbose_name
+            if python_date_format is not None:
+                updated_extra["python_date_format"] = verbose_name

Review comment:
       Thanks for the catch! I changed this to a dict to avoid such typo.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] serenajiang commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
serenajiang commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r840096020



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +244,557 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.created_by_fk,
+                SqlaTable.changed_by_fk,
+                SqlaTable.table_name.label("name"),
+                SqlaTable.schema,
+                SqlaTable.database_id,
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+            ]
         )
+        # use an inner join to filter out only tables with valid database ids
+        .select_from(
+            sa.join(SqlaTable, Database, SqlaTable.database_id == Database.id)
+        ).where(is_physical_table),
+    )
 
-        columns.append(
-            NewColumn(
-                name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
-                description=metric.description,
-                is_aggregation=True,
-                is_additive=is_additive,
-                is_physical=False,
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # physical dataset
-    if not target.sql:
-        physical_columns = [column for column in columns if column.is_physical]
-
-        # create table
-        table = NewTable(
-            name=target.table_name,
-            schema=target.schema,
-            catalog=None,  # currently not supported
-            database_id=target.database_id,
-            columns=physical_columns,
-            is_managed_externally=target.is_managed_externally,
-            external_url=target.external_url,
-        )
-        tables = [table]
-
-    # virtual dataset
-    else:
-        # mark all columns as virtual (not physical)
-        for column in columns:
-            column.is_physical = False
-
-        # find referenced tables
-        referenced_tables = extract_table_references(target.sql, dialect_class.name)
-        tables = load_or_create_tables(
-            session,
-            target.database_id,
-            target.schema,
-            referenced_tables,
-            conditional_quote,
-        )
+def copy_datasets(session: Session) -> None:
+    """Copy all datasets"""
+    count = session.query(SqlaTable).count()
+    if not count:
+        return
+    print(f">> Copy {count:,} SqlaTable to `sl_datasets`...")
+    insert_from_select(
+        "sl_datasets",
+        select(
+            [
+                # keep the ids the same for easier migration of relationships
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.created_by_fk,
+                SqlaTable.changed_by_fk,
+                SqlaTable.id.label("sqlatable_id"),
+                SqlaTable.table_name.label("name"),
+                func.coalesce(SqlaTable.sql, SqlaTable.table_name).label("expression"),
+                is_physical_table.label("is_physical"),
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+                SqlaTable.extra.label("extra_json"),
+            ]
+        ),
+    )
 
-    # create the new dataset
-    dataset = NewDataset(
-        sqlatable_id=target.id,
-        name=target.table_name,
-        expression=target.sql or conditional_quote(target.table_name),
-        tables=tables,
-        columns=columns,
-        is_physical=not target.sql,
-        is_managed_externally=target.is_managed_externally,
-        external_url=target.external_url,
+    print("   Link physical datasets with tables...")
+    # Physical datasets (tables) have the same dataset.id and table.id
+    # as both are from SqlaTable.id
+    insert_from_select(
+        "sl_dataset_tables",
+        select(
+            [
+                NewTable.id.label("dataset_id"),
+                NewTable.id.label("table_id"),
+            ]
+        ),
     )
-    session.add(dataset)
 
 
-def upgrade():
-    # Create tables for the new models.
-    op.create_table(
+def copy_columns(session: Session) -> None:
+    """Copy columns with active associated SqlTable"""
+    count = session.query(TableColumn).select_from(active_table_columns).count()
+    if not count:
+        return
+    print(f">> Copy {count:,} active table columns to `sl_columns`...")
+    insert_from_select(
         "sl_columns",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Column
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column("type", sa.TEXT(), nullable=False),
-        sa.Column("expression", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_physical",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column("description", sa.TEXT(), nullable=True),
-        sa.Column("warning_text", sa.TEXT(), nullable=True),
-        sa.Column("unit", sa.TEXT(), nullable=True),
-        sa.Column("is_temporal", sa.BOOLEAN(), nullable=False),
-        sa.Column(
-            "is_spatial",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_partition",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_aggregation",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_additive",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_increase_desired",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.PrimaryKeyConstraint("id"),
+        select(
+            [
+                # keep the same column.id so later relationships can be added easier
+                TableColumn.id,
+                TableColumn.uuid,
+                TableColumn.created_on,
+                TableColumn.changed_on,
+                TableColumn.created_by_fk,
+                TableColumn.changed_by_fk,
+                TableColumn.column_name.label("name"),
+                TableColumn.description,
+                func.coalesce(TableColumn.expression, TableColumn.column_name).label(
+                    "expression"
+                ),
+                sa.literal(False).label("is_aggregation"),
+                or_(
+                    TableColumn.expression.is_(None), (TableColumn.expression == "")
+                ).label("is_physical"),
+                TableColumn.is_dttm.label("is_temporal"),
+                func.coalesce(TableColumn.type, "Unknown").label("type"),
+                TableColumn.extra.label("extra_json"),
+            ]
+        ).select_from(active_table_columns),
     )
-    with op.batch_alter_table("sl_columns") as batch_op:
-        batch_op.create_unique_constraint("uq_sl_columns_uuid", ["uuid"])
 
-    op.create_table(
-        "sl_tables",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Table
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("database_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("catalog", sa.TEXT(), nullable=True),
-        sa.Column("schema", sa.TEXT(), nullable=True),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.ForeignKeyConstraint(["database_id"], ["dbs.id"], name="sl_tables_ibfk_1"),
-        sa.PrimaryKeyConstraint("id"),
+    print("   Link physical table columns to `sl_tables`...")
+    insert_from_select(
+        "sl_table_columns",
+        select(
+            [
+                TableColumn.table_id,
+                TableColumn.id.label("column_id"),
+            ]
+        )
+        .select_from(active_table_columns)
+        .where(is_physical_table),
     )
-    with op.batch_alter_table("sl_tables") as batch_op:
-        batch_op.create_unique_constraint("uq_sl_tables_uuid", ["uuid"])
 
-    op.create_table(
-        "sl_table_columns",
-        sa.Column("table_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("column_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.ForeignKeyConstraint(
-            ["column_id"], ["sl_columns.id"], name="sl_table_columns_ibfk_2"
-        ),
-        sa.ForeignKeyConstraint(
-            ["table_id"], ["sl_tables.id"], name="sl_table_columns_ibfk_1"
-        ),
+    print("   Link all columns to `sl_datasets`...")
+    insert_from_select(
+        "sl_dataset_columns",
+        select(
+            [
+                TableColumn.table_id.label("dataset_id"),
+                TableColumn.id.label("column_id"),
+            ],
+        ).select_from(active_table_columns),
     )
 
-    op.create_table(
-        "sl_datasets",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Dataset
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("sqlatable_id", sa.INTEGER(), nullable=True),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column("expression", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_physical",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.PrimaryKeyConstraint("id"),
+
+def copy_metrics(session: Session) -> None:
+    """Copy metrics as virtual columns"""
+    metrics_count = session.query(SqlMetric).select_from(active_metrics).count()
+    if not metrics_count:
+        return
+    # offset metric column ids by the last id of table columns
+    id_offset = session.query(func.max(NewColumn.id)).scalar()
+
+    print(f">> Copy {metrics_count:,} metrics to `sl_columns`...")
+    insert_from_select(
+        "sl_columns",
+        select(
+            [
+                (SqlMetric.id + id_offset).label("id"),
+                SqlMetric.uuid,
+                SqlMetric.created_on,
+                SqlMetric.changed_on,
+                SqlMetric.created_by_fk,
+                SqlMetric.changed_by_fk,
+                SqlMetric.metric_name.label("name"),
+                SqlMetric.expression,
+                SqlMetric.description,
+                sa.literal("Unknown").label("type"),
+                (
+                    sa.func.lower(SqlMetric.metric_type)
+                    .in_(ADDITIVE_METRIC_TYPES_LOWER)
+                    .label("is_additive")
+                ),
+                sa.literal(False).label("is_physical"),
+                sa.literal(False).label("is_temporal"),
+                sa.literal(True).label("is_aggregation"),
+                SqlMetric.extra.label("extra_json"),
+                SqlMetric.warning_text,
+            ]
+        ).select_from(active_metrics),
     )
-    with op.batch_alter_table("sl_datasets") as batch_op:
-        batch_op.create_unique_constraint("uq_sl_datasets_uuid", ["uuid"])
-        batch_op.create_unique_constraint(
-            "uq_sl_datasets_sqlatable_id", ["sqlatable_id"]
-        )
 
-    op.create_table(
+    print("   Link metric columns to datasets...")
+    insert_from_select(
         "sl_dataset_columns",
-        sa.Column("dataset_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("column_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.ForeignKeyConstraint(
-            ["column_id"], ["sl_columns.id"], name="sl_dataset_columns_ibfk_2"
-        ),
-        sa.ForeignKeyConstraint(
-            ["dataset_id"], ["sl_datasets.id"], name="sl_dataset_columns_ibfk_1"
-        ),
+        select(
+            [
+                SqlMetric.table_id.label("dataset_id"),
+                (SqlMetric.id + id_offset).label("column_id"),
+            ],
+        ).select_from(active_metrics),
     )
 
-    op.create_table(
-        "sl_dataset_tables",
-        sa.Column("dataset_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.Column("table_id", sa.INTEGER(), autoincrement=False, nullable=False),
-        sa.ForeignKeyConstraint(
-            ["dataset_id"], ["sl_datasets.id"], name="sl_dataset_tables_ibfk_1"
-        ),
-        sa.ForeignKeyConstraint(
-            ["table_id"], ["sl_tables.id"], name="sl_dataset_tables_ibfk_2"
-        ),
+
+def postprocess_datasets(session: Session) -> None:
+    """
+    Postprocess datasets after insertion to
+      - Quote table names for physical datasets (if needed)
+      - Link referenced tables to virtual datasets
+    """
+    total = session.query(SqlaTable).count()
+    if not total:
+        return
+
+    offset = 0
+    limit = 10000
+
+    joined_tables = sa.join(
+        NewDataset,
+        SqlaTable,
+        NewDataset.sqlatable_id == SqlaTable.id,
+    ).join(
+        Database,
+        Database.id == SqlaTable.database_id,
+        isouter=True,
     )
+    assert session.query(func.count()).select_from(joined_tables).scalar() == total
 
-    # migrate existing datasets to the new models
-    bind = op.get_bind()
-    session = db.Session(bind=bind)  # pylint: disable=no-member
+    print(f">> Run postprocessing on {total} datasets")
+
+    update_count = 0
+
+    def print_update_count():
+        if SHOW_PROGRESS:
+            print(
+                f"   Will update {update_count} datasets" + " " * 20,
+                end="\r",
+            )
+
+    while offset < total:
+        if SHOW_PROGRESS:
+            print(
+                f"   Postprocess dataset {offset + 1}~{min(total, offset + limit)}..."
+                + " " * 30
+            )
+        for (
+            dataset_id,
+            is_physical,
+            expression,
+            database_id,
+            schema,
+            sqlalchemy_uri,
+        ) in session.execute(
+            select(
+                [
+                    NewDataset.id,
+                    NewDataset.is_physical,
+                    NewDataset.expression,
+                    SqlaTable.database_id,
+                    SqlaTable.schema,
+                    Database.sqlalchemy_uri,
+                ]
+            )
+            .select_from(joined_tables)
+            .offset(offset)
+            .limit(limit)
+        ):
+            drivername = (sqlalchemy_uri or "").split("://")[0]
+            if is_physical and drivername:
+                quoted_expression = get_identifier_quoter(drivername)(expression)
+                if quoted_expression != expression:
+                    session.execute(
+                        sa.update(NewDataset)
+                        .where(NewDataset.id == dataset_id)
+                        .values(expression=quoted_expression)
+                    )
+                    update_count += 1
+                    print_update_count()
+            elif not is_physical and expression:
+                table_refrences = extract_table_references(
+                    expression, get_dialect_name(drivername), show_warning=False
+                )
+                found_tables = find_tables(
+                    session, database_id, schema, table_refrences
+                )
+                if found_tables:
+                    op.bulk_insert(
+                        dataset_table_association_table,
+                        [
+                            {"dataset_id": dataset_id, "table_id": table.id}
+                            for table in found_tables
+                        ],
+                    )
+                    update_count += 1
+                    print_update_count()
+        session.flush()
+        offset += limit
+    if SHOW_PROGRESS:
+        print("")
+
+
+def postprocess_columns(session: Session) -> None:
+    """
+    At this step, we will
+      - Add engine specific quotes to `expression` of physical columns
+      - Tuck some extra metadata to `extra_json`
+    """
+    total = session.query(NewColumn).count()
+    if not total:
+        return
+
+    id_offset = (
+        session.query(func.max(NewColumn.id))
+        .filter(not_(NewColumn.is_aggregation))
+        .scalar()
+    )
+
+    def get_joined_tables(offset, limit):
+        return (
+            sa.join(
+                session.query(NewColumn)
+                .offset(offset)
+                .limit(limit)
+                .subquery("sl_columns"),
+                TableColumn,
+                TableColumn.id == NewColumn.id,
+                isouter=True,
+            )
+            .join(
+                SqlMetric,
+                # use NewColumn.id - id_offset instead of SqlMetric.id + id_offset
+                # to improve join performance.
+                and_(
+                    NewColumn.id > id_offset, SqlMetric.id == NewColumn.id - id_offset
+                ),
+                isouter=True,
+            )
+            .join(
+                SqlaTable,
+                SqlaTable.id == func.coalesce(TableColumn.table_id, SqlMetric.table_id),
+                isouter=True,
+            )
+            .join(Database, Database.id == SqlaTable.database_id, isouter=True)
+        )
+
+    offset = 0
+    limit = 100000
+
+    print(f">> Run postprocessing on {total:,} columns")
+
+    update_count = 0
+
+    def print_update_count():
+        if SHOW_PROGRESS:
+            print(
+                f"   Will update {update_count} columns" + " " * 20,
+                end="\r",
+            )
 
-    datasets = session.query(SqlaTable).all()
-    for dataset in datasets:
-        dataset.fetch_columns_and_metrics(session)
-        after_insert(target=dataset)
+    while offset < total:
+        query = (
+            select(
+                [
+                    NewColumn.id,
+                    NewColumn.is_physical,
+                    TableColumn.column_name,
+                    Database.sqlalchemy_uri,
+                    TableColumn.groupby,
+                    TableColumn.filterable,
+                    func.coalesce(
+                        TableColumn.verbose_name, SqlMetric.verbose_name
+                    ).label("verbose_name"),
+                    TableColumn.python_date_format,
+                    SqlMetric.d3format,
+                    SqlMetric.metric_type,
+                    NewColumn.extra_json,
+                    SqlaTable.is_managed_externally,
+                    SqlaTable.external_url,
+                ]
+            )
+            .select_from(get_joined_tables(offset, limit))
+            .where(
+                # pre-filter to columns with potential updates
+                or_(
+                    NewColumn.is_physical,
+                    TableColumn.groupby.is_(False),
+                    TableColumn.filterable.is_(False),
+                    TableColumn.verbose_name.isnot(None),
+                    TableColumn.verbose_name.isnot(None),
+                    SqlMetric.verbose_name.isnot(None),
+                    SqlMetric.d3format.isnot(None),
+                    SqlMetric.metric_type.isnot(None),
+                )
+            )
+        )
+
+        if SHOW_PROGRESS:
+            start = offset + 1
+            end = min(total, offset + limit)
+            count = session.query(func.count()).select_from(query).scalar()
+            print(f"   Column {start:,} to {end:,}: {count:,} may be updated")
+
+        for (
+            column_id,
+            is_physical,
+            column_name,
+            sqlalchemy_uri,
+            groupby,
+            filterable,
+            verbose_name,
+            python_date_format,
+            d3format,
+            metric_type,
+            extra_json,
+            is_managed_externally,
+            external_url,
+        ) in session.execute(query):
+            try:
+                extra = json.loads(extra_json or "{}")
+            except json.decoder.JSONDecodeError:
+                extra = {}
+            updated_extra = {**extra}
+            updates = {}
+
+            # update expression for physical table columns
+            if is_physical and column_name and sqlalchemy_uri:
+                drivername = sqlalchemy_uri.split("://")[0]
+                if is_physical and drivername:
+                    quoted_expression = get_identifier_quoter(drivername)(column_name)
+                    if quoted_expression != column_name:
+                        updates["expression"] = quoted_expression
+
+            if is_managed_externally:
+                updates["is_managed_externally"] = True
+            if external_url:
+                updates["external_url"] = external_url
+
+            # update extra json
+            if groupby is False:
+                updated_extra["groupby"] = groupby
+            if filterable is False:
+                updated_extra["filterable"] = filterable
+            if verbose_name is not None:
+                updated_extra["verbose_name"] = verbose_name
+            if python_date_format is not None:
+                updated_extra["python_date_format"] = verbose_name

Review comment:
       ```suggestion
                   updated_extra["python_date_format"] = python_date_format
   ```
   copy paste error?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho commented on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
eschutho commented on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086423019


   @ktmud I believe even with changing this PR to have the initial migration be a no-op, the create table functions will fail if the table already exists, unless you pass something like `checkfirst=True`. But then we still won't be getting any new columns or indexes. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (435844c) into [master](https://codecov.io/gh/apache/superset/commit/6b136c2bc9a6c9756e5319b045e3c42da06243cb?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6b136c2) will **decrease** coverage by `0.18%`.
   > The diff coverage is `91.08%`.
   
   > :exclamation: Current head 435844c differs from pull request most recent head 6f430fb. Consider uploading reports for the commit 6f430fb to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   - Coverage   66.57%   66.39%   -0.19%     
   ==========================================
     Files        1675     1675              
     Lines       64092    64111      +19     
     Branches     6519     6519              
   ==========================================
   - Hits        42672    42566     -106     
   - Misses      19729    19854     +125     
     Partials     1691     1691              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `?` | |
   | postgres | `81.95% <93.75%> (-0.01%)` | :arrow_down: |
   | presto | `?` | |
   | python | `82.00% <93.75%> (-0.39%)` | :arrow_down: |
   | sqlite | `81.72% <93.75%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ugins/legacy-plugin-chart-calendar/src/Calendar.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9sZWdhY3ktcGx1Z2luLWNoYXJ0LWNhbGVuZGFyL3NyYy9DYWxlbmRhci5qcw==) | `0.00% <ø> (ø)` | |
   | [...legacy-plugin-chart-calendar/src/ReactCalendar.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9sZWdhY3ktcGx1Z2luLWNoYXJ0LWNhbGVuZGFyL3NyYy9SZWFjdENhbGVuZGFyLmpzeA==) | `0.00% <0.00%> (ø)` | |
   | [...cy-plugin-chart-calendar/src/vendor/cal-heatmap.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9sZWdhY3ktcGx1Z2luLWNoYXJ0LWNhbGVuZGFyL3NyYy92ZW5kb3IvY2FsLWhlYXRtYXAuanM=) | `0.00% <ø> (ø)` | |
   | [...plugins/legacy-plugin-chart-heatmap/src/Heatmap.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9sZWdhY3ktcGx1Z2luLWNoYXJ0LWhlYXRtYXAvc3JjL0hlYXRtYXAuanM=) | `0.00% <ø> (ø)` | |
   | [...plugins/legacy-preset-chart-nvd3/src/ReactNVD3.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9sZWdhY3ktcHJlc2V0LWNoYXJ0LW52ZDMvc3JjL1JlYWN0TlZEMy5qc3g=) | `0.00% <ø> (ø)` | |
   | [...n-chart-pivot-table/src/react-pivottable/Styles.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvcGx1Z2lucy9wbHVnaW4tY2hhcnQtcGl2b3QtdGFibGUvc3JjL3JlYWN0LXBpdm90dGFibGUvU3R5bGVzLmpz) | `0.00% <ø> (ø)` | |
   | [...set-frontend/src/components/ModalTrigger/index.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvTW9kYWxUcmlnZ2VyL2luZGV4LmpzeA==) | `100.00% <ø> (ø)` | |
   | [...frontend/src/dashboard/components/Header/index.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2Rhc2hib2FyZC9jb21wb25lbnRzL0hlYWRlci9pbmRleC5qc3g=) | `60.92% <ø> (ø)` | |
   | [superset-frontend/src/views/CRUD/utils.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL3ZpZXdzL0NSVUQvdXRpbHMudHN4) | `65.57% <ø> (ø)` | |
   | [...perset-frontend/src/views/CRUD/welcome/Welcome.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL3ZpZXdzL0NSVUQvd2VsY29tZS9XZWxjb21lLnRzeA==) | `75.00% <ø> (ø)` | |
   | ... and [33 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6b136c2...6f430fb](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c4ad786) into [master](https://codecov.io/gh/apache/superset/commit/08aca83f6cba81d37d6d70cfddc7980ae95a7bb5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (08aca83) will **increase** coverage by `0.11%`.
   > The diff coverage is `93.54%`.
   
   > :exclamation: Current head c4ad786 differs from pull request most recent head 38f34de. Consider uploading reports for the commit 38f34de to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   + Coverage   66.39%   66.51%   +0.11%     
   ==========================================
     Files        1676     1676              
     Lines       64211    64191      -20     
     Branches     6537     6525      -12     
   ==========================================
   + Hits        42635    42694      +59     
   + Misses      19877    19798      -79     
     Partials     1699     1699              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.66% <32.25%> (?)` | |
   | mysql | `81.90% <93.54%> (-0.01%)` | :arrow_down: |
   | postgres | `?` | |
   | python | `82.22% <93.54%> (+0.23%)` | :arrow_up: |
   | sqlite | `81.72% <93.54%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `88.30% <100.00%> (+0.19%)` | :arrow_up: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...erset-frontend/src/components/EmptyState/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvRW1wdHlTdGF0ZS9pbmRleC50c3g=) | `69.23% <0.00%> (-5.13%)` | :arrow_down: |
   | [...nd/src/dashboard/components/gridComponents/Tab.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2Rhc2hib2FyZC9jb21wb25lbnRzL2dyaWRDb21wb25lbnRzL1RhYi5qc3g=) | `80.48% <0.00%> (-3.19%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [...uperset-frontend/src/explore/exploreUtils/index.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2V4cGxvcmUvZXhwbG9yZVV0aWxzL2luZGV4Lmpz) | `80.45% <0.00%> (-0.58%)` | :arrow_down: |
   | [superset/views/base\_api.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvYmFzZV9hcGkucHk=) | `97.89% <0.00%> (-0.43%)` | :arrow_down: |
   | [...t-frontend/src/components/AsyncAceEditor/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvQXN5bmNBY2VFZGl0b3IvaW5kZXgudHN4) | `90.90% <0.00%> (-0.21%)` | :arrow_down: |
   | ... and [20 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [08aca83...38f34de](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086218951


   @eschutho I propose to change current migration to no-op and move my updated code to a new migration. 
   
   See an earlier message I sent in Slack:
   
   ---
   
   Hi, I noticed we are making more adjustments to SIP-68 models and have prepared a [couple](https://github.com/apache/superset/pull/19425) of [more](https://github.com/apache/superset/pull/19487) db migrations. I’m wondering whether we should bundle all these migrations (including the first one that’s already merged) into one new migration and change the original migration to no-op.
   
   **Pros:**
   
   - Reduced total migration time: bundle everything should be faster than running them separately
   - We get a chance to fix a couple of more errors such as [using MediumText for MySQL](https://github.com/apache/superset/pull/19421#discussion_r839942807) and [incorrect additive_metric_types matching](https://github.com/apache/superset/pull/19421#discussion_r839903477)
   - We get a chance to copy over other missing data such as [changed on and last updated](https://github.com/apache/superset/pull/19421#discussion_r840089807)
   - We can re-ID the copied entities to follow the original ones, making it easier to spot-check potential data inconsistency bugs down the road
   - Everyone’s db is in a clean and consistent state
   
   **Cons:**
   - Those who already ran the migration and bore the slowness may have to experience it again
   
   Happy to incorporate [#19487](https://github.com/apache/superset/pull/19487/) and [#19425](https://github.com/apache/superset/pull/19425) to [my PR](https://github.com/apache/superset/pull/19421) if they are still needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] github-actions[bot] commented on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1082640316


   ⚠️ @ktmud Your base branch `master` has just also updated `superset/migrations`.
   
   ❗ **Please consider rebasing your branch to avoid db migration conflicts.**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (05d39a1) into [master](https://codecov.io/gh/apache/superset/commit/ab3770667c0b11043b177838f8c2eddd717fcfcc?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ab37706) will **decrease** coverage by `0.14%`.
   > The diff coverage is `93.54%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   - Coverage   66.58%   66.43%   -0.15%     
   ==========================================
     Files        1676     1676              
     Lines       64176    64191      +15     
     Branches     6525     6525              
   ==========================================
   - Hits        42732    42648      -84     
   - Misses      19745    19844      +99     
     Partials     1699     1699              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.66% <32.25%> (-0.03%)` | :arrow_down: |
   | mysql | `?` | |
   | postgres | `?` | |
   | presto | `52.51% <32.25%> (-0.03%)` | :arrow_down: |
   | python | `82.07% <93.54%> (-0.31%)` | :arrow_down: |
   | sqlite | `81.72% <93.54%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `88.85% <100.00%> (-0.46%)` | :arrow_down: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [superset/databases/commands/create.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YWJhc2VzL2NvbW1hbmRzL2NyZWF0ZS5weQ==) | `64.70% <0.00%> (-27.46%)` | :arrow_down: |
   | [superset/views/database/mixins.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvZGF0YWJhc2UvbWl4aW5zLnB5) | `60.34% <0.00%> (-20.69%)` | :arrow_down: |
   | [superset/databases/commands/update.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YWJhc2VzL2NvbW1hbmRzL3VwZGF0ZS5weQ==) | `85.71% <0.00%> (-8.17%)` | :arrow_down: |
   | [superset/common/utils/dataframe\_utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbW9uL3V0aWxzL2RhdGFmcmFtZV91dGlscy5weQ==) | `85.71% <0.00%> (-7.15%)` | :arrow_down: |
   | [superset/databases/api.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YWJhc2VzL2FwaS5weQ==) | `87.98% <0.00%> (-6.01%)` | :arrow_down: |
   | [superset/databases/schemas.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YWJhc2VzL3NjaGVtYXMucHk=) | `94.42% <0.00%> (-4.09%)` | :arrow_down: |
   | ... and [12 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [ab37706...05d39a1](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086218951


   @eschutho I propose to change current migration to no-op and move my updated code to a new migration. 
   
   See an earlier message I sent in Slack:
   
   ---
   
   Hi, I noticed we are making more adjustments to SIP-68 models and have prepared a [couple](https://github.com/apache/superset/pull/19425) of [more](https://github.com/apache/superset/pull/19487) db migrations. I’m wondering whether we should bundle all these migrations (including the first one that’s already merged) into one new migration and change the original migration to no-op.
   
   **Pros:**
   
   - Reduced total migration time: bundle everything should be faster than running them separately
   - We get a chance to fix a couple of more errors such as [using MediumText for MySQL](https://github.com/apache/superset/pull/19421#discussion_r839942807) and [incorrect additive_metric_types matching](https://github.com/apache/superset/pull/19421#discussion_r839903477)
   - We get a chance to copy over other missing data such as [changed on and last updated](https://github.com/apache/superset/pull/19421#discussion_r840089807)
   - We can re-ID the copied entities to follow the original ones, making it easier to spot-check potential data inconsistency bugs down the road
   - Everyone’s db is in a clean and consistent state
   
   **Cons:**
   - Those who already ran the migration and bore the slowness may have to experience it again
   
   Happy to incorporate [#19487](https://github.com/apache/superset/pull/19487/) and [#19425](https://github.com/apache/superset/pull/19425) to [my PR](https://github.com/apache/superset/pull/19421) if they are still needed. (edited)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r840089433



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +244,557 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.created_by_fk,
+                SqlaTable.changed_by_fk,

Review comment:
       Previous migration does not copy values of these columns to the new tables. I think it'd be useful to retain them, especially the properties from AuditMixin.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c4ad786) into [master](https://codecov.io/gh/apache/superset/commit/08aca83f6cba81d37d6d70cfddc7980ae95a7bb5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (08aca83) will **increase** coverage by `0.11%`.
   > The diff coverage is `93.54%`.
   
   > :exclamation: Current head c4ad786 differs from pull request most recent head cc7168b. Consider uploading reports for the commit cc7168b to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   + Coverage   66.39%   66.51%   +0.11%     
   ==========================================
     Files        1676     1676              
     Lines       64211    64191      -20     
     Branches     6537     6525      -12     
   ==========================================
   + Hits        42635    42694      +59     
   + Misses      19877    19798      -79     
     Partials     1699     1699              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.66% <32.25%> (?)` | |
   | mysql | `81.90% <93.54%> (-0.01%)` | :arrow_down: |
   | postgres | `?` | |
   | python | `82.22% <93.54%> (+0.23%)` | :arrow_up: |
   | sqlite | `81.72% <93.54%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `88.30% <100.00%> (+0.19%)` | :arrow_up: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...erset-frontend/src/components/EmptyState/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvRW1wdHlTdGF0ZS9pbmRleC50c3g=) | `69.23% <0.00%> (-5.13%)` | :arrow_down: |
   | [...nd/src/dashboard/components/gridComponents/Tab.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2Rhc2hib2FyZC9jb21wb25lbnRzL2dyaWRDb21wb25lbnRzL1RhYi5qc3g=) | `80.48% <0.00%> (-3.19%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [...uperset-frontend/src/explore/exploreUtils/index.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2V4cGxvcmUvZXhwbG9yZVV0aWxzL2luZGV4Lmpz) | `80.45% <0.00%> (-0.58%)` | :arrow_down: |
   | [superset/views/base\_api.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvYmFzZV9hcGkucHk=) | `97.89% <0.00%> (-0.43%)` | :arrow_down: |
   | [...t-frontend/src/components/AsyncAceEditor/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvQXN5bmNBY2VFZGl0b3IvaW5kZXgudHN4) | `90.90% <0.00%> (-0.21%)` | :arrow_down: |
   | ... and [20 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [08aca83...cc7168b](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c4ad786) into [master](https://codecov.io/gh/apache/superset/commit/08aca83f6cba81d37d6d70cfddc7980ae95a7bb5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (08aca83) will **increase** coverage by `0.11%`.
   > The diff coverage is `93.54%`.
   
   > :exclamation: Current head c4ad786 differs from pull request most recent head c9a121a. Consider uploading reports for the commit c9a121a to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   + Coverage   66.39%   66.51%   +0.11%     
   ==========================================
     Files        1676     1676              
     Lines       64211    64191      -20     
     Branches     6537     6525      -12     
   ==========================================
   + Hits        42635    42694      +59     
   + Misses      19877    19798      -79     
     Partials     1699     1699              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.66% <32.25%> (?)` | |
   | mysql | `81.90% <93.54%> (-0.01%)` | :arrow_down: |
   | postgres | `?` | |
   | python | `82.22% <93.54%> (+0.23%)` | :arrow_up: |
   | sqlite | `81.72% <93.54%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `88.30% <100.00%> (+0.19%)` | :arrow_up: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...erset-frontend/src/components/EmptyState/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvRW1wdHlTdGF0ZS9pbmRleC50c3g=) | `69.23% <0.00%> (-5.13%)` | :arrow_down: |
   | [...nd/src/dashboard/components/gridComponents/Tab.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2Rhc2hib2FyZC9jb21wb25lbnRzL2dyaWRDb21wb25lbnRzL1RhYi5qc3g=) | `80.48% <0.00%> (-3.19%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [...uperset-frontend/src/explore/exploreUtils/index.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2V4cGxvcmUvZXhwbG9yZVV0aWxzL2luZGV4Lmpz) | `80.45% <0.00%> (-0.58%)` | :arrow_down: |
   | [superset/views/base\_api.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvYmFzZV9hcGkucHk=) | `97.89% <0.00%> (-0.43%)` | :arrow_down: |
   | [...t-frontend/src/components/AsyncAceEditor/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvQXN5bmNBY2VFZGl0b3IvaW5kZXgudHN4) | `90.90% <0.00%> (-0.21%)` | :arrow_down: |
   | ... and [20 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [08aca83...c9a121a](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
eschutho edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086425617


   >Btw, I think the Dataset model may need a database_id column as well. There is the implicit assumption that a dataset can only run on one database. I cannot imagine a case where we need to support a virtual dataset being used on different tables in different databases. Having direct link to databases makes sure existing virtual datasets can be linked to the correct database without relying on an unreliable table name extraction process. Currently if table name extraction fails, a virtual dataset lost its association with a correct table, hence the only link to database. It would require joining SqlaTable with sqlatable_id to get the correct database id.
   
   Having the db id on the table will be important for future features where we need to power a chart by a Table without a dataset, and I'm wary of having the db id in both places in case they become out of sync. I'm not sure if I follow the use case of a table name extraction failing and then the dataset doesn't have a relationship to a db. Doesn't a virtual dataset break if the table extraction doesn't work anyway? What would be the value of having a link to the db but not the table?
   
   cc @betodealmeida for further context.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (53b3aef) into [master](https://codecov.io/gh/apache/superset/commit/6b136c2bc9a6c9756e5319b045e3c42da06243cb?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6b136c2) will **decrease** coverage by `0.01%`.
   > The diff coverage is `92.85%`.
   
   > :exclamation: Current head 53b3aef differs from pull request most recent head feb2ff4. Consider uploading reports for the commit feb2ff4 to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   - Coverage   66.57%   66.56%   -0.02%     
   ==========================================
     Files        1675     1675              
     Lines       64092    64122      +30     
     Branches     6519     6519              
   ==========================================
   + Hits        42672    42681       +9     
   - Misses      19729    19750      +21     
     Partials     1691     1691              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.67% <25.00%> (-0.04%)` | :arrow_down: |
   | mysql | `81.91% <92.85%> (+<0.01%)` | :arrow_up: |
   | postgres | `?` | |
   | presto | `52.52% <25.00%> (-0.04%)` | :arrow_down: |
   | python | `82.34% <92.85%> (-0.05%)` | :arrow_down: |
   | sqlite | `81.73% <92.85%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `89.33% <100.00%> (+0.01%)` | :arrow_up: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [superset/reports/commands/log\_prune.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvcmVwb3J0cy9jb21tYW5kcy9sb2dfcHJ1bmUucHk=) | `85.71% <0.00%> (-3.58%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [superset/commands/importers/v1/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbWFuZHMvaW1wb3J0ZXJzL3YxL3V0aWxzLnB5) | `92.20% <0.00%> (-1.30%)` | :arrow_down: |
   | [superset/sql\_parse.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3BhcnNlLnB5) | `97.38% <0.00%> (-0.92%)` | :arrow_down: |
   | [superset/common/query\_object.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbW9uL3F1ZXJ5X29iamVjdC5weQ==) | `94.73% <0.00%> (-0.53%)` | :arrow_down: |
   | [superset/views/base\_api.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvYmFzZV9hcGkucHk=) | `97.89% <0.00%> (-0.43%)` | :arrow_down: |
   | ... and [4 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [6b136c2...feb2ff4](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a change in pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud commented on a change in pull request #19421:
URL: https://github.com/apache/superset/pull/19421#discussion_r839914415



##########
File path: superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
##########
@@ -207,427 +241,481 @@ class NewTable(Base):
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=table_column_association_table, cascade="all, delete"
     )
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
-class NewDataset(Base):
+class NewDataset(Base, AuxiliaryColumnsMixin):
 
     __tablename__ = "sl_datasets"
 
     id = sa.Column(sa.Integer, primary_key=True)
     sqlatable_id = sa.Column(sa.Integer, nullable=True, unique=True)
     name = sa.Column(sa.Text)
-    expression = sa.Column(sa.Text)
+    expression = sa.Column(MediumText())
+    is_physical = sa.Column(sa.Boolean, default=False)
+    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
+    external_url = sa.Column(sa.Text, nullable=True)
+    extra_json = sa.Column(sa.Text, default="{}")
     tables: List[NewTable] = relationship(
         "NewTable", secondary=dataset_table_association_table
     )
     columns: List[NewColumn] = relationship(
         "NewColumn", secondary=dataset_column_association_table, cascade="all, delete"
     )
-    is_physical = sa.Column(sa.Boolean, default=False)
-    is_managed_externally = sa.Column(sa.Boolean, nullable=False, default=False)
-    external_url = sa.Column(sa.Text, nullable=True)
 
 
 TEMPORAL_TYPES = {"DATETIME", "DATE", "TIME", "TIMEDELTA"}
 
 
-def load_or_create_tables(
+def find_tables(
     session: Session,
     database_id: int,
     default_schema: Optional[str],
     tables: Set[Table],
-    conditional_quote: Callable[[str], str],
-) -> List[NewTable]:
+) -> List[int]:
     """
-    Load or create new table model instances.
+    Look for NewTable's of from a specific database
     """
     if not tables:
         return []
 
-    # set the default schema in tables that don't have it
-    if default_schema:
-        tables = list(tables)
-        for i, table in enumerate(tables):
-            if table.schema is None:
-                tables[i] = Table(table.table, default_schema, table.catalog)
-
-    # load existing tables
     predicate = or_(
         *[
             and_(
                 NewTable.database_id == database_id,
-                NewTable.schema == table.schema,
+                NewTable.schema == (table.schema or default_schema),
                 NewTable.name == table.table,
             )
             for table in tables
         ]
     )
-    new_tables = session.query(NewTable).filter(predicate).all()
-
-    # use original database model to get the engine
-    engine = (
-        session.query(OriginalDatabase)
-        .filter_by(id=database_id)
-        .one()
-        .get_sqla_engine(default_schema)
-    )
-    inspector = inspect(engine)
-
-    # add missing tables
-    existing = {(table.schema, table.name) for table in new_tables}
-    for table in tables:
-        if (table.schema, table.table) not in existing:
-            column_metadata = inspector.get_columns(table.table, schema=table.schema)
-            columns = [
-                NewColumn(
-                    name=column["name"],
-                    type=str(column["type"]),
-                    expression=conditional_quote(column["name"]),
-                    is_temporal=column["type"].python_type.__name__.upper()
-                    in TEMPORAL_TYPES,
-                    is_aggregation=False,
-                    is_physical=True,
-                    is_spatial=False,
-                    is_partition=False,
-                    is_increase_desired=True,
-                )
-                for column in column_metadata
-            ]
-            new_tables.append(
-                NewTable(
-                    name=table.table,
-                    schema=table.schema,
-                    catalog=None,
-                    database_id=database_id,
-                    columns=columns,
-                )
-            )
-            existing.add((table.schema, table.table))
+    return session.query(NewTable.id).filter(predicate).all()
 
-    return new_tables
 
+# helper SQLA elements for easier querying
+is_physical_table = or_(SqlaTable.sql.is_(None), SqlaTable.sql == "")
 
-def after_insert(target: SqlaTable) -> None:  # pylint: disable=too-many-locals
-    """
-    Copy old datasets to the new models.
-    """
-    session = inspect(target).session
+# filtering out columns and metrics with valid associated SqlTable
+active_table_columns = sa.join(
+    TableColumn,
+    SqlaTable,
+    and_(
+        TableColumn.table_id == SqlaTable.id,
+        TableColumn.is_active,
+    ),
+)
+active_metrics = sa.join(SqlMetric, SqlaTable, SqlMetric.table_id == SqlaTable.id)
 
-    # get DB-specific conditional quoter for expressions that point to columns or
-    # table names
-    database = (
-        target.database
-        or session.query(Database).filter_by(id=target.database_id).first()
-    )
-    if not database:
-        return
-    url = make_url(database.sqlalchemy_uri)
-    dialect_class = url.get_dialect()
-    conditional_quote = dialect_class().identifier_preparer.quote
-
-    # create columns
-    columns = []
-    for column in target.columns:
-        # ``is_active`` might be ``None`` at this point, but it defaults to ``True``.
-        if column.is_active is False:
-            continue
-
-        try:
-            extra_json = json.loads(column.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"groupby", "filterable", "verbose_name", "python_date_format"}:
-            value = getattr(column, attr)
-            if value:
-                extra_json[attr] = value
-
-        columns.append(
-            NewColumn(
-                name=column.column_name,
-                type=column.type or "Unknown",
-                expression=column.expression or conditional_quote(column.column_name),
-                description=column.description,
-                is_temporal=column.is_dttm,
-                is_aggregation=False,
-                is_physical=column.expression is None or column.expression == "",
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # create metrics
-    for metric in target.metrics:
-        try:
-            extra_json = json.loads(metric.extra or "{}")
-        except json.decoder.JSONDecodeError:
-            extra_json = {}
-        for attr in {"verbose_name", "metric_type", "d3format"}:
-            value = getattr(metric, attr)
-            if value:
-                extra_json[attr] = value
-
-        is_additive = (
-            metric.metric_type and metric.metric_type.lower() in ADDITIVE_METRIC_TYPES
+def copy_tables(session: Session) -> None:
+    """Copy Physical tables"""
+    count = session.query(SqlaTable).filter(is_physical_table).count()
+    print(f">> Copy {count:,} physical tables to `sl_tables`...")
+    insert_from_select(
+        "sl_tables",
+        select(
+            [
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.table_name.label("name"),
+                SqlaTable.schema,
+                SqlaTable.database_id,
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+            ]
         )
+        # use an inner join to filter out only tables with valid database ids
+        .select_from(
+            sa.join(SqlaTable, Database, SqlaTable.database_id == Database.id)
+        ).where(is_physical_table),
+    )
 
-        columns.append(
-            NewColumn(
-                name=metric.metric_name,
-                type="Unknown",  # figuring this out would require a type inferrer
-                expression=metric.expression,
-                warning_text=metric.warning_text,
-                description=metric.description,
-                is_aggregation=True,
-                is_additive=is_additive,
-                is_physical=False,
-                is_spatial=False,
-                is_partition=False,
-                is_increase_desired=True,
-                extra_json=json.dumps(extra_json) if extra_json else None,
-                is_managed_externally=target.is_managed_externally,
-                external_url=target.external_url,
-            ),
-        )
 
-    # physical dataset
-    if not target.sql:
-        physical_columns = [column for column in columns if column.is_physical]
-
-        # create table
-        table = NewTable(
-            name=target.table_name,
-            schema=target.schema,
-            catalog=None,  # currently not supported
-            database_id=target.database_id,
-            columns=physical_columns,
-            is_managed_externally=target.is_managed_externally,
-            external_url=target.external_url,
-        )
-        tables = [table]
-
-    # virtual dataset
-    else:
-        # mark all columns as virtual (not physical)
-        for column in columns:
-            column.is_physical = False
-
-        # find referenced tables
-        referenced_tables = extract_table_references(target.sql, dialect_class.name)
-        tables = load_or_create_tables(
-            session,
-            target.database_id,
-            target.schema,
-            referenced_tables,
-            conditional_quote,
-        )
+def copy_datasets(session: Session) -> None:
+    """Copy all datasets"""
+    count = session.query(SqlaTable).count()
+    print(f">> Copy {count:,} SqlaTable to `sl_datasets`...")
+    insert_from_select(
+        "sl_datasets",
+        select(
+            [
+                # keep the ids the same for easier migration of relationships
+                SqlaTable.id,
+                SqlaTable.uuid,
+                SqlaTable.created_on,
+                SqlaTable.changed_on,
+                SqlaTable.id.label("sqlatable_id"),
+                SqlaTable.table_name.label("name"),
+                func.coalesce(SqlaTable.sql, SqlaTable.table_name).label("expression"),
+                is_physical_table.label("is_physical"),
+                SqlaTable.is_managed_externally,
+                SqlaTable.external_url,
+                SqlaTable.extra.label("extra_json"),
+            ]
+        ),
+    )
 
-    # create the new dataset
-    dataset = NewDataset(
-        sqlatable_id=target.id,
-        name=target.table_name,
-        expression=target.sql or conditional_quote(target.table_name),
-        tables=tables,
-        columns=columns,
-        is_physical=not target.sql,
-        is_managed_externally=target.is_managed_externally,
-        external_url=target.external_url,
+    print("   Link physical datasets with tables...")
+    # Physical datasets (tables) have the same dataset.id and table.id
+    # as both are from SqlaTable.id
+    insert_from_select(
+        "sl_dataset_tables",
+        select(
+            [
+                NewTable.id.label("dataset_id"),
+                NewTable.id.label("table_id"),
+            ]
+        ),
     )
-    session.add(dataset)
 
 
-def upgrade():
-    # Create tables for the new models.
-    op.create_table(
+def copy_columns(session: Session) -> None:
+    """Copy columns with active associated SqlTable"""
+    count = session.query(TableColumn).select_from(active_table_columns).count()
+    print(f">> Copy {count:,} active table columns to `sl_columns`...")
+    insert_from_select(
         "sl_columns",
-        # AuditMixinNullable
-        sa.Column("created_on", sa.DateTime(), nullable=True),
-        sa.Column("changed_on", sa.DateTime(), nullable=True),
-        sa.Column("created_by_fk", sa.Integer(), nullable=True),
-        sa.Column("changed_by_fk", sa.Integer(), nullable=True),
-        # ExtraJSONMixin
-        sa.Column("extra_json", sa.Text(), nullable=True),
-        # ImportExportMixin
-        sa.Column("uuid", UUIDType(binary=True), primary_key=False, default=uuid4),
-        # Column
-        sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
-        sa.Column("name", sa.TEXT(), nullable=False),
-        sa.Column("type", sa.TEXT(), nullable=False),
-        sa.Column("expression", sa.TEXT(), nullable=False),
-        sa.Column(
-            "is_physical",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column("description", sa.TEXT(), nullable=True),
-        sa.Column("warning_text", sa.TEXT(), nullable=True),
-        sa.Column("unit", sa.TEXT(), nullable=True),
-        sa.Column("is_temporal", sa.BOOLEAN(), nullable=False),
-        sa.Column(
-            "is_spatial",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_partition",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_aggregation",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_additive",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=False,
-        ),
-        sa.Column(
-            "is_increase_desired",
-            sa.BOOLEAN(),
-            nullable=False,
-            default=True,
-        ),
-        sa.Column(
-            "is_managed_externally",
-            sa.Boolean(),
-            nullable=False,
-            server_default=sa.false(),
-        ),
-        sa.Column("external_url", sa.Text(), nullable=True),
-        sa.PrimaryKeyConstraint("id"),
+        select(
+            [
+                # keep the same column.id so later relationships can be added easier
+                TableColumn.id,
+                TableColumn.uuid,
+                TableColumn.created_on,
+                TableColumn.changed_on,

Review comment:
       I'm porting over the same `id`, `uuid`, `create_on`, `changed_on` from the original tables so relationship mapping can be easier. As the new tables are intended to fully replace the original tables, retaining these info would also be useful for end user experience (especially `changed_on` and `created_on`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086218951


   @eschutho I propose to change current migration to no-op and move my updated code to a new migration. 
   
   I DM'ed @betodealmeida and @hughhhh earlier on Slack. Here are the messages just for the record:
   
   ---
   
   Hi, I noticed we are making more adjustments to SIP-68 models and have prepared a [couple](https://github.com/apache/superset/pull/19425) of [more](https://github.com/apache/superset/pull/19487) db migrations. I’m wondering whether we should bundle all these migrations (including the first one that’s already merged) into one new migration and change the original migration to no-op.
   
   **Pros:**
   
   - Reduced total migration time: bundle everything should be faster than running them separately
   - We get a chance to fix a couple of more errors such as [using MediumText for MySQL](https://github.com/apache/superset/pull/19421#discussion_r839942807) and [incorrect additive_metric_types matching](https://github.com/apache/superset/pull/19421#discussion_r839903477)
   - We get a chance to copy over other missing data such as [changed on and last updated](https://github.com/apache/superset/pull/19421#discussion_r840089807)
   - We can re-ID the copied entities to follow the original ones, making it easier to spot-check potential data inconsistency bugs down the road
   - Everyone’s db is in a clean and consistent state
   - It's easier to review the db structure in the future
   
   **Cons:**
   - Those who already ran the migration and bore the slowness may have to experience it again
   
   Happy to incorporate [#19487](https://github.com/apache/superset/pull/19487/) and [#19425](https://github.com/apache/superset/pull/19425) to [my PR](https://github.com/apache/superset/pull/19421) if they are still needed.
   
   Btw, I think the `Dataset` model may need a `database_id` column as well. There is the implicit assumption that a dataset can only run on one database. I cannot imagine a case where we need to support a virtual dataset being used on different tables in different databases. Having direct link to databases makes sure existing virtual datasets can be linked to the correct database without relying on an unreliable table name extraction process. Currently if table name extraction fails, a virtual dataset lost its association with a correct table, hence the only link to database. It would require joining `SqlaTable` with `sqlatable_id` to get the correct database id. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
ktmud edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086218951


   @eschutho I propose to change current migration to no-op and move my updated code to a new migration. 
   
   I DM'ed @betodealmeida and @hughhhh earlier on Slack. Here are the messages just for the record:
   
   ---
   
   Hi, I noticed we are making more adjustments to SIP-68 models and have prepared a [couple](https://github.com/apache/superset/pull/19425) of [more](https://github.com/apache/superset/pull/19487) db migrations. I’m wondering whether we should bundle all these migrations (including the first one that’s already merged) into one new migration and change the original migration to no-op.
   
   **Pros:**
   
   - Reduced total migration time: bundle everything should be faster than running them separately
   - We get a chance to fix a couple of more errors such as [using MediumText for MySQL](https://github.com/apache/superset/pull/19421#discussion_r839942807) and [incorrect additive_metric_types matching](https://github.com/apache/superset/pull/19421#discussion_r839903477)
   - We get a chance to copy over other missing data such as [changed on and last updated](https://github.com/apache/superset/pull/19421#discussion_r840089807)
   - We can re-ID the copied entities to follow the original ones, making it easier to spot-check potential data inconsistency bugs down the road
   - Everyone’s db is in a clean and consistent state
   
   **Cons:**
   - Those who already ran the migration and bore the slowness may have to experience it again
   
   Happy to incorporate [#19487](https://github.com/apache/superset/pull/19487/) and [#19425](https://github.com/apache/superset/pull/19425) to [my PR](https://github.com/apache/superset/pull/19421) if they are still needed.
   
   Btw, I think the `Dataset` model may need a `database_id` column as well. There is the implicit assumption that a dataset can only run on one database. I cannot imagine a case where we need to support a virtual dataset being used on different tables in different databases. Having direct link to databases makes sure existing virtual datasets can be linked to the correct database without relying on an unreliable table name extraction process. Currently if table name extraction fails, a virtual dataset lost its association with a correct table, hence the only link to database. It would require joining `SqlaTable` with `sqlatable_id` to get the correct database id. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1085118333


   # [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#19421](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c4ad786) into [master](https://codecov.io/gh/apache/superset/commit/08aca83f6cba81d37d6d70cfddc7980ae95a7bb5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (08aca83) will **increase** coverage by `0.11%`.
   > The diff coverage is `93.54%`.
   
   > :exclamation: Current head c4ad786 differs from pull request most recent head 55c8c94. Consider uploading reports for the commit 55c8c94 to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #19421      +/-   ##
   ==========================================
   + Coverage   66.39%   66.51%   +0.11%     
   ==========================================
     Files        1676     1676              
     Lines       64211    64191      -20     
     Branches     6537     6525      -12     
   ==========================================
   + Hits        42635    42694      +59     
   + Misses      19877    19798      -79     
     Partials     1699     1699              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.66% <32.25%> (?)` | |
   | mysql | `81.90% <93.54%> (-0.01%)` | :arrow_down: |
   | postgres | `?` | |
   | python | `82.22% <93.54%> (+0.23%)` | :arrow_up: |
   | sqlite | `81.72% <93.54%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `83.01% <89.47%> (+0.96%)` | :arrow_up: |
   | [superset/connectors/base/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9iYXNlL21vZGVscy5weQ==) | `88.65% <100.00%> (ø)` | |
   | [superset/connectors/sqla/models.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `88.30% <100.00%> (+0.19%)` | :arrow_up: |
   | [superset/sql\_validators/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvc3FsX3ZhbGlkYXRvcnMvcG9zdGdyZXMucHk=) | `50.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...erset-frontend/src/components/EmptyState/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvRW1wdHlTdGF0ZS9pbmRleC50c3g=) | `69.23% <0.00%> (-5.13%)` | :arrow_down: |
   | [...nd/src/dashboard/components/gridComponents/Tab.jsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2Rhc2hib2FyZC9jb21wb25lbnRzL2dyaWRDb21wb25lbnRzL1RhYi5qc3g=) | `80.48% <0.00%> (-3.19%)` | :arrow_down: |
   | [superset/db\_engine\_specs/postgres.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3Bvc3RncmVzLnB5) | `95.45% <0.00%> (-1.82%)` | :arrow_down: |
   | [...uperset-frontend/src/explore/exploreUtils/index.js](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2V4cGxvcmUvZXhwbG9yZVV0aWxzL2luZGV4Lmpz) | `80.45% <0.00%> (-0.58%)` | :arrow_down: |
   | [superset/views/base\_api.py](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvYmFzZV9hcGkucHk=) | `97.89% <0.00%> (-0.43%)` | :arrow_down: |
   | [...t-frontend/src/components/AsyncAceEditor/index.tsx](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvQXN5bmNBY2VFZGl0b3IvaW5kZXgudHN4) | `90.90% <0.00%> (-0.21%)` | :arrow_down: |
   | ... and [20 more](https://codecov.io/gh/apache/superset/pull/19421/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [08aca83...55c8c94](https://codecov.io/gh/apache/superset/pull/19421?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org