You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2019/03/14 05:00:28 UTC

[GitHub] [incubator-superset] betodealmeida opened a new pull request #7032: Fetch charts with GET to benefit from browser cache and conditional requests

betodealmeida opened a new pull request #7032: Fetch charts with GET to benefit from browser cache and conditional requests
URL: https://github.com/apache/incubator-superset/pull/7032
 
 
   This is a small PR that does a lot. It changes the initial request for charts (in explore or dashboards) to be done through a `GET` request, greatly improving the loading speed of dashboards. It also moves the caching to the HTTP layer, allowing us to benefit from `Expires` and `ETag` headers for conditional requests.
   
   # The problem
   
   This diagram compares the current flow ("before") with the one implemented by this PR ("after"):
   
   <img width="512" alt="Cache" src="https://user-images.githubusercontent.com/1534870/54331676-b53d3d80-4623-11e9-86ed-6e05c315c3d5.png">
   
   ## Before
   
   Let's assume Superset is configured with a **1 hour cache**, and also that the data changes on a longer period (daily, eg):
   
   1. User "A" requests a chart from Superset doing a `POST` request with the payload.
   2. Superset computes the query and sends it to the DB.
   3. DB returns a dataframe.
   4. Superset caches the dataframe.
   5. Superset serializes the payload and sends it back to user "A".
   6. User "A" refreshes the dashboard.
   7. Superset finds the dataframe cached.
   8. Superset serializes the payload and send it back to user "A".
   9. Superset cache expires after 1 hour.
   10. User "A" refreshes the dashboard.
   11. Superset computes the query and sends it to the DB.
   12. DB returns the exact same dataframe.
   13. Superset caches the dataframe again.
   14. Superset serializes the payload and sends it back to user "A".
   
   There are a few inefficiencies here:
   
   - The browser cache is never used, because it's doing `POST` requests.
   - Superset needs to serialize the payload even on a cache hit.
   - Data is transferred to the browser even if it hasn't changed.
   
   ## After
   
   1. User "A" requests a chart from Superset **doing a `GET` request with the chart id**.
   2. Superset computes the query and sends it to the DB.
   3. DB returns a dataframe.
   4. Superset serializes the dataframe and **caches the HTTP response**.
   5. Superset sends the payload to user "A", with an `Expires` header of 1 hour, and an `ETag` header which is a hash of the payload.
   6. The browser stores the response in its native cache, and `SupersetClient` caches it also in the [Cache interface](https://developer.mozilla.org/en-US/docs/Web/API/Cache).
   7. The user refreshes the dashboard.
   8. Because of the `Expires` header and the use of `GET` the **data is read directly from the native browser cache**.
   9. Superset cache expires after 1 hour.
   10. User "A" refreshes the dashboard. The **native cache is not used, since `Expires` is now in the past**. `SupersetClient` looks for a cached response in the Cache interface, and if one is found, extracts its `ETag`.
   11. The **browser requests the chart with an `If-None-Match` header**, containing the hash of the cached response (its `ETag`).
   12. Superset computes the query and sends it to the DB.
   13. DB returns the exact same dataframe.
   14. Superset serializes the dataframe and **caches the HTTP response**.
   15. Superset sees that the `ETag` matches the `If-None-Match` header, returning a `304 No Content` response.
   16. Browser fetches the cached response from the Cache interface.
   17. Browser uses the response.
   
   # Notes
   
   - The `GET` request is done only the first time the chart is mounted. Forcing refresh on dashboards and clicking "Run Query" in the Explore views perform `POST` requests, which bypass the cache, and cache the new response. I tested the Explore view and dashboards with filters, and all further interactions are done with `POST`s.
   
   - Since we're caching the HTTP response, we need to verify that the user has permission to read the cached response. This is done by passing a `check_perms` function to the decorator that caches the responses.
   
   - The [fetch API](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch) has no support for conditional responses with ETags. We need to add explicit support in `SupersetClient`. I have a separate PR for that.
   
   - **There is one small downside to this approach**. During the time while `Expires` is still valid, the browser will not perform any requests for cached charts unless the user explicitly refreshes a dashboard or click "Run Query" in the Explore view. If the data is bad, they will see bad data until it expires or they purposefully refresh the chart. In the current workflow, in theory we can purge the cache in this case, since it lives only on the server-side. This is a hypothetical scenario, and we could workaround it by sending a notification to dashboards that one or more charts have bad data and should be refreshed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org