You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/07/19 04:04:35 UTC

[GitHub] [superset] ktmud opened a new pull request, #20761: fix(db): use paginated_update for area chart migration

ktmud opened a new pull request, #20761:
URL: https://github.com/apache/superset/pull/20761

   <!---
   Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   
   DB migration introduced by #20359 did not run through in Airbnb environment and throws this error in our MySQL database:
   
   ```
   INFO  [alembic.runtime.migration] Running upgrade c747c78868b6 -> 06e1e70058c7, Migrating legacy Area
   Upgrading (1/21668): another#127
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 732, in do_execute
       cursor.execute(statement, parameters)
     File "/usr/local/lib/python3.9/dist-packages/MySQLdb/cursors.py", line 183, in execute
       while self.nextset():
     File "/usr/local/lib/python3.9/dist-packages/MySQLdb/cursors.py", line 137, in nextset
       nr = db.next_result()
   MySQLdb._exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")
   ```
   
   This is because updating objects within `iter_per` does not work well with MySQL cursors. (Maybe other databases will have similar issues if anyone with more than 1000 area charts can test?)
   
   The solution is to migrate to [paginated_update](https://github.com/apache/superset/blob/cadd259788c99415862cef7e8a5da9aaf4ed12cd/superset/migrations/shared/utils.py#L94) in `superset.migrations.shared.utils`, which runs manual pagination with `OFFSET` and `LIMIT` instead of streaming results with DBAPI cursors.
   
   While optimizing this, I also relocated to the migration files (and the corresponding tests) from superset root to `superset.migrations` as migration code should be as self-contained and stable as possible, since app code are updated much more frequently.
   
   While testing, I also noticed that some charts will fail at reloading `query_context` as the combined JSON payload is too large and the default Text column in MySQL was not able to save the full string. We need to migrate both `Slice.query_context` and `Slice.params` to `MediumText`, which I will address in another PR.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   
   N/A
   
   ### TESTING INSTRUCTIONS
   
   CI and tested locally with Airbnb db instances.
   
   The area chart migration was finished in ~20 seconds for about 
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [x] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [x] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [x] Migration is atomic, supports rollback & is backwards-compatible
     - [x] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud merged pull request #20761: fix(db): use paginated_update for viz migration

Posted by GitBox <gi...@apache.org>.
ktmud merged PR #20761:
URL: https://github.com/apache/superset/pull/20761


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] commented on pull request #20761: fix(db): use paginated_update for viz migration

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #20761:
URL: https://github.com/apache/superset/pull/20761#issuecomment-1188585837

   # [Codecov](https://codecov.io/gh/apache/superset/pull/20761?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#20761](https://codecov.io/gh/apache/superset/pull/20761?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f70a6fe) into [master](https://codecov.io/gh/apache/superset/commit/e60083b45b8953220e54c67544ce2381d7c96f2e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e60083b) will **decrease** coverage by `11.73%`.
   > The diff coverage is `47.36%`.
   
   > :exclamation: Current head f70a6fe differs from pull request most recent head b67a49f. Consider uploading reports for the commit b67a49f to get more accurate results
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #20761       +/-   ##
   ===========================================
   - Coverage   66.35%   54.62%   -11.74%     
   ===========================================
     Files        1754     1756        +2     
     Lines       66689    66725       +36     
     Branches     7049     7049               
   ===========================================
   - Hits        44253    36446     -7807     
   - Misses      20639    28482     +7843     
     Partials     1797     1797               
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `?` | |
   | postgres | `?` | |
   | presto | `53.14% <47.36%> (+0.04%)` | :arrow_up: |
   | python | `57.48% <47.36%> (-24.21%)` | :arrow_down: |
   | sqlite | `?` | |
   | unit | `50.28% <3.94%> (-0.30%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/20761?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/migrations/shared/utils.py](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvdXRpbHMucHk=) | `32.78% <37.50%> (-0.55%)` | :arrow_down: |
   | [...perset/migrations/shared/migrate\_viz/processors.py](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvbWlncmF0ZV92aXovcHJvY2Vzc29ycy5weQ==) | `45.83% <45.83%> (ø)` | |
   | [superset/migrations/shared/migrate\_viz/base.py](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvbWlncmF0ZV92aXovYmFzZS5weQ==) | `39.08% <48.83%> (ø)` | |
   | [superset/migrations/shared/migrate\_viz/\_\_init\_\_.py](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvbWlncmF0aW9ucy9zaGFyZWQvbWlncmF0ZV92aXovX19pbml0X18ucHk=) | `100.00% <100.00%> (ø)` | |
   | [superset/utils/dashboard\_import\_export.py](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdXRpbHMvZGFzaGJvYXJkX2ltcG9ydF9leHBvcnQucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [superset/key\_value/commands/update.py](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3VwZGF0ZS5weQ==) | `0.00% <0.00%> (-88.89%)` | :arrow_down: |
   | [superset/key\_value/commands/delete.py](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZS5weQ==) | `0.00% <0.00%> (-85.30%)` | :arrow_down: |
   | [superset/db\_engines/hive.py](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lcy9oaXZlLnB5) | `0.00% <0.00%> (-85.19%)` | :arrow_down: |
   | [superset/key\_value/commands/delete\_expired.py](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZV9leHBpcmVkLnB5) | `0.00% <0.00%> (-80.77%)` | :arrow_down: |
   | [superset/dashboards/commands/importers/v0.py](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGFzaGJvYXJkcy9jb21tYW5kcy9pbXBvcnRlcnMvdjAucHk=) | `15.62% <0.00%> (-76.25%)` | :arrow_down: |
   | ... and [280 more](https://codecov.io/gh/apache/superset/pull/20761/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/20761?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/20761?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [e60083b...b67a49f](https://codecov.io/gh/apache/superset/pull/20761?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on a diff in pull request #20761: fix(db): use paginated_update for viz migration

Posted by GitBox <gi...@apache.org>.
ktmud commented on code in PR #20761:
URL: https://github.com/apache/superset/pull/20761#discussion_r924045280


##########
superset/migrations/shared/migrate_viz/base.py:
##########
@@ -45,11 +63,11 @@ def _migrate(self) -> None:
 
         rv_data = {}
         for key, value in self.data.items():
-            if key in self.mapping_keys and self.mapping_keys[key] in rv_data:
+            if key in self.rename_keys and self.rename_keys[key] in rv_data:
                 raise ValueError("Duplicate key in target viz")

Review Comment:
   I'm not sure if we should raise an error here. Maybe just silently override?



##########
superset/migrations/shared/migrate_viz/base.py:
##########
@@ -72,77 +90,56 @@ def upgrade(cls, slc: Slice) -> Slice:
         clz._post_action()
 
         # only backup params
-        slc.params = json.dumps({**clz.data, "form_data_bak": form_data_bak})
+        slc.params = json.dumps({**clz.data, FORM_DATA_BAK_FIELD_NAME: form_data_bak})
 
-        query_context = json.loads(slc.query_context or "{}")
+        query_context = try_load_json(slc.query_context)
         if "form_data" in query_context:
             query_context["form_data"] = clz.data
             slc.query_context = json.dumps(query_context)
         return slc
 
     @classmethod
-    def downgrade(cls, slc: Slice) -> Slice:
-        form_data = json.loads(slc.params)
-        if "form_data_bak" in form_data and "viz_type" in form_data.get(
-            "form_data_bak"
-        ):
-            form_data_bak = form_data["form_data_bak"]
+    def downgrade_slice(cls, slc: Slice) -> Slice:
+        form_data = try_load_json(slc.params)
+        form_data_bak = form_data.get(FORM_DATA_BAK_FIELD_NAME, {})
+        if "viz_type" in form_data_bak:
             slc.params = json.dumps(form_data_bak)
             slc.viz_type = form_data_bak.get("viz_type")
-
-            query_context = json.loads(slc.query_context or "{}")
+            query_context = try_load_json(slc.query_context)
             if "form_data" in query_context:
                 query_context["form_data"] = form_data_bak
                 slc.query_context = json.dumps(query_context)
         return slc
 
-
-class MigrateTreeMap(MigrateViz):
-    source_viz_type = "treemap"
-    target_viz_type = "treemap_v2"
-    remove_keys = {"metrics"}
-
-    def _pre_action(self) -> None:
-        if (
-            "metrics" in self.data
-            and isinstance(self.data["metrics"], list)
-            and len(self.data["metrics"]) > 0
+    @classmethod
+    def upgrade(cls) -> None:
+        bind = op.get_bind()
+        session = db.Session(bind=bind)
+        slices = session.query(Slice).filter(Slice.viz_type == cls.source_viz_type)
+        for slc in paginated_update(
+            slices,
+            lambda current, total: print(
+                f"  Updating {current}/{total} charts", end="\r"
+            ),
         ):
-            self.data["metric"] = self.data["metrics"][0]
-
+            new_viz = cls.upgrade_slice(slc)
+            session.merge(new_viz)
 
-class MigrateArea(MigrateViz):
-    source_viz_type = "area"
-    target_viz_type = "echarts_area"
-    remove_keys = {"contribution", "stacked_style", "x_axis_label"}
-
-    def _pre_action(self) -> None:
-        if self.data.get("contribution"):
-            self.data["contributionMode"] = "row"
-
-        stacked = self.data.get("stacked_style")
-        if stacked:
-            stacked_map = {
-                "expand": "Expand",
-                "stack": "Stack",
-            }
-            self.data["show_extra_controls"] = True
-            self.data["stack"] = stacked_map.get(stacked)
-
-        x_axis_label = self.data.get("x_axis_label")
-        if x_axis_label:
-            self.data["x_axis_title"] = x_axis_label
-            self.data["x_axis_title_margin"] = 30
-
-
-# pylint: disable=invalid-name
-class MigrateVizEnum(str, Enum):
-    # the Enum member name is viz_type in database
-    treemap = "treemap"
-    area = "area"

Review Comment:
   Not need for such enum since we will import different stuff for different migrations anyway.



##########
superset/migrations/shared/migrate_viz/base.py:
##########
@@ -17,21 +17,39 @@
 from __future__ import annotations
 
 import json
-from enum import Enum
-from typing import Dict, Set, Type, TYPE_CHECKING
+from typing import Dict, Set
 
-if TYPE_CHECKING:
-    from superset.models.slice import Slice
+from alembic import op
+from sqlalchemy import and_, Column, Integer, String, Text
+from sqlalchemy.ext.declarative import declarative_base
+
+from superset import db
+from superset.migrations.shared.utils import paginated_update, try_load_json
+
+Base = declarative_base()
+
+
+class Slice(Base):  # type: ignore
+    __tablename__ = "slices"
+
+    id = Column(Integer, primary_key=True)
+    slice_name = Column(String(250))
+    viz_type = Column(String(250))
+    params = Column(Text)
+    query_context = Column(Text)
+
+
+FORM_DATA_BAK_FIELD_NAME = "form_data_bak"
 
 
 class MigrateViz:
     remove_keys: Set[str] = set()
-    mapping_keys: Dict[str, str] = {}
+    rename_keys: Dict[str, str] = {}
     source_viz_type: str
     target_viz_type: str
 
     def __init__(self, form_data: str) -> None:
-        self.data = json.loads(form_data)
+        self.data = try_load_json(form_data)

Review Comment:
   Data may be corrupted, let's always try/catch just to be safe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org