You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2019/12/03 10:44:56 UTC
[GitHub] [incubator-superset] alexandrejuma opened a new issue #8731: Advanced Superset cache management

alexandrejuma opened a new issue #8731: Advanced Superset cache management
URL: https://github.com/apache/incubator-superset/issues/8731
 
 
   **Is your feature request related to a problem? Please describe.**
   
   Superset cache management appears to follow a traditional strategy of time-based stale cache mechanism and also with a proactive time-based cache warm-up mechanism so when users open some dashboard/slice, the results are already pre-cached.
   
   We have a use-case where we pre-process a number of heavy near-real-time aggregations using Kafka Streams and we wish to push these results directly to our Redis cluster, which would actually be the cache store for Superset, thus bypassing the need for Superset to proactively query the data source to refresh the cache.
   
   We know that pre-processing and storing it in a supported fast layer for Superset to update its cache from, even with short cache time-out periods, is something that is feasible, but there's always some query delay no matter how fast the serving layer is and the time synchronization of the stream-processing applications producing/storing results and the superset cache mechanism refreshing its data (it's interval based).
   
   Our requirements is to have our monitoring solution updated every single minute (aggregations will have 5m buckets in a sliding window updated every 1m).
   
   **Describe the solution you'd like**
   
   - Be able to manage Superset cache directly (I.e: push pre-processed results directly to the cache)
   - Be able to push notifications to Superset cache manager so it can come and refresh its data (instead of just time based cache staling/refresh mechanism)
   
   **Describe alternatives you've considered**
   
   Because I'm not sure what I just said makes any kind of sense for Superset roadmap, we're also working on the following solutions:
   - Testing Druid ingesting kafka bound raw-data without any pre-processing and leverage standard Druid ingest level pre-aggregation
   - Testing Druid ingesting kafka bound pre-processed aggregations (with kafka streams) 
   
   Then we'd leverage Superset -> Druid integration and regular cache mechanism to provide results, assuming Superset can deliver valid cache results while the warmup mechanism updates the cache proactively (i.e: don't block cache hits while it's working), it should also be transparent to the user in terms of loading times. 
   
   I think the objective is to avoid any user request going directly to the data source (in this case Druid).
   
   **Additional context**
   
   Our solution, besides having a very short refresh rate (every 1m), has to be able to load quite the number of visualizations simultaneously (think of an operations center) and with high concurrency (lots of users). We're thinking on leveraging Superset embeddable visualizations on a 3d party application and leverage its caching mechanism to support blazing fast experience,.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org