You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2020/10/12 08:23:41 UTC

[GitHub] [incubator-superset] ktmud commented on a change in pull request #11234: perf: cache dashboard bootstrap data

ktmud commented on a change in pull request #11234:
URL: https://github.com/apache/incubator-superset/pull/11234#discussion_r503094641



##########
File path: superset/config.py
##########
@@ -369,6 +370,7 @@ def _try_json_readsha(  # pylint: disable=unused-argument
 CACHE_DEFAULT_TIMEOUT = 60 * 60 * 24
 CACHE_CONFIG: CacheConfig = {"CACHE_TYPE": "null"}
 TABLE_NAMES_CACHE_CONFIG: CacheConfig = {"CACHE_TYPE": "null"}
+DASHBOARD_CACHE_TIMEOUT = 60 * 60 * 24 * 14

Review comment:
       Cache dashboard page for 2 weeks by default, assuming a typical "shelf time" for a new dashboard.

##########
File path: superset/connectors/base/models.py
##########
@@ -477,6 +477,10 @@ def update_from_object(self, obj: Dict[str, Any]) -> None:
             if self.column_class and "columns" in obj
             else []
         )
+        if is_feature_enabled("DASHBOARD_CACHE"):
+            from superset.models.dashboard import Dashboard
+
+            Dashboard.clear_cache_for_datasource(datasource_id=self.id)

Review comment:
       Tried to add this to SQLAlchemy events, just like what we did with slices, but it was somehow triggered multiple times (4 or 5), which could have performance risks when a datasource is used in multiple dashboards.
   
   So I moved it to this `BaseDatasource` API instead. The drawback is it may not trigger if the datasource is updated from FAB CRUD view (an unlikely use case).

##########
File path: superset/models/dashboard.py
##########
@@ -239,6 +248,41 @@ def data(self) -> Dict[str, Any]:
             "position_json": positions,
         }
 
+    @cache.memoize(
+        # manually maintain cache key version
+        make_name=lambda fname: f"{fname}-v1",
+        timeout=config["DASHBOARD_CACHE_TIMEOUT"],
+        unless=lambda: not is_feature_enabled("DASHBOARD_CACHE"),
+    )
+    def full_data(
+        self,
+        reduce_payload: bool = is_feature_enabled("REDUCE_DASHBOARD_BOOTSTRAP_PAYLOAD"),

Review comment:
       @etr2460 @villebro  This feature (`REDUCE_DASHBOARD_BOOTSTRAP_PAYLOAD`) has been [default on](https://github.com/apache/incubator-superset/pull/9585) for about 6 months now. Should we clean it up?

##########
File path: superset/models/dashboard.py
##########
@@ -239,6 +248,41 @@ def data(self) -> Dict[str, Any]:
             "position_json": positions,
         }
 
+    @cache.memoize(
+        # manually maintain cache key version
+        make_name=lambda fname: f"{fname}-v1",
+        timeout=config["DASHBOARD_CACHE_TIMEOUT"],
+        unless=lambda: not is_feature_enabled("DASHBOARD_CACHE"),
+    )
+    def full_data(
+        self,
+        reduce_payload: bool = is_feature_enabled("REDUCE_DASHBOARD_BOOTSTRAP_PAYLOAD"),
+    ) -> Dict[str, Any]:
+        """Bootstrap data for rendering the dashboard page."""
+        slices = self.slices
+        datasource_slices = defaultdict(list)
+        for slc in slices:
+            datasource = slc.datasource
+            if datasource:
+                datasource_slices[datasource].append(slc)
+
+        data = {
+            # dashboard metadata
+            "dashboard": self.data,
+            # slices data
+            "slices": [slc.data for slc in slices],
+            # datasource data
+            "datasources": {
+                # Filter out unneeded fields from the datasource payload
+                datasource.uid: datasource.data_for_slices(slices)
+                if reduce_payload

Review comment:
       TODO: clean up when `REDUCE_DASHBOARD_BOOTSTRAP_PAYLOAD` is cleaned up.

##########
File path: superset/models/dashboard.py
##########
@@ -145,7 +154,7 @@ class Dashboard(  # pylint: disable=too-many-instance-attributes
     ]
 
     def __repr__(self) -> str:
-        return self.dashboard_title or str(self.id)
+        return f"Dashboard<{self.slug or self.id}>"

Review comment:
       `__repr__` is used in cache key.

##########
File path: superset/models/dashboard.py
##########
@@ -131,7 +140,7 @@ class Dashboard(  # pylint: disable=too-many-instance-attributes
     css = Column(Text)
     json_metadata = Column(Text)
     slug = Column(String(255), unique=True)
-    slices = relationship("Slice", secondary=dashboard_slices, backref="dashboards")
+    slices = relationship(Slice, secondary=dashboard_slices, backref="dashboards")

Review comment:
       Remove quotes for consistency.

##########
File path: tests/superset_test_config.py
##########
@@ -50,6 +50,7 @@
 SQL_MAX_ROW = 666
 SQLLAB_CTAS_NO_LIMIT = True  # SQL_MAX_ROW will not take affect for the CTA queries
 FEATURE_FLAGS = {
+    **FEATURE_FLAGS,

Review comment:
       So we can override the configs from `~/.superset/config.py`

##########
File path: superset/dashboards/dao.py
##########
@@ -108,7 +108,7 @@ def set_dash_metadata(
                 and obj["meta"]["chartId"]
             ):
                 chart_id = obj["meta"]["chartId"]
-                obj["meta"]["uuid"] = uuid_map[chart_id]
+                obj["meta"]["uuid"] = uuid_map.get(chart_id)

Review comment:
       This will throw 500 errors when chart don't have uuid, which could happen when chart is deleted but the dashboard cache is not purged for some reason.

##########
File path: superset/charts/commands/delete.py
##########
@@ -44,6 +45,7 @@ def __init__(self, user: User, model_id: int):
     def run(self) -> Model:
         self.validate()
         try:
+            Dashboard.clear_cache_for_slice(slice_id=self._model_id)

Review comment:
       Can't use `before_delete` event for `Slice` because the association record in the relationship table `datasource_slice` is deleted before the `Slice` object is deleted.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org