You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2019/11/23 02:26:44 UTC

[GitHub] [incubator-superset] john-bodley opened a new issue #8639: [SIP-26] Paranoid Deletes

john-bodley opened a new issue #8639: [SIP-26] Paranoid Deletes
URL: https://github.com/apache/incubator-superset/issues/8639
 
 
   ## [SIP] Proposal for Paranoid Deletes
   
   ### Motivation
   
   Currently at Airbnb we have a vast number of entities in Superset. Our deployment is approaching 200k  charts (both manually and procedurally generated), 10k dashboards, 80k registered tables (both physical and virtual), and 2.5k Druid datasources. 
   
   In a recent analysis of a specific Druid NoSQL (native) cluster, from a sample of ~ 5k charts only 34% of charts rendered, i.e., returned a 200 status code from the `/supserset/slice_json` route.  The following chart the renderability of charts as a function of last saved, which shows that a chart's viability often decays over time due to creep in the datasource metadata and the saved chart parameters.  
   
   ![Screen Shot 2019-11-22 at 5 50 14 PM](https://user-images.githubusercontent.com/4567245/69471248-193b0880-0d52-11ea-920a-01bea6511628.png)
   
   Ideally we would like to have a mechanism to clean up obsolete resources (charts, dashboards, or datasources) in a somewhat paranoid manner, i.e., using soft deletes. This should help improve the perceived reliability and quality of Superset assets.  
   
   ### Proposed Change
   
   The proposed solution was originally mentioned by @etr2460 but I thought it was worthwhile formalizing this as a SIP. This borrows an idea from [Ruby](https://github.com/rubysherpas/paranoia) where we first soft delete records my marking them as deleted (with an associated timestamp) before performing a hard delete (deleting the record _n_-days later). Users could be prompted that their charts were being deleted and they can take action to undelete it if they take corrective action to fix the any errors or mark as non-deletable (or similar).
   
   There's actually a Python package [sqla-paranoid](https://github.com/jeanphix/sqla-paranoid) which brings transparent soft deletes to SQLAlchemy which we could use or replicate. The TL;DR is this would add a `deleted_at` (or `deleted_on` for consistency) column which would track soft deleted records. Records which are soft deleted wouldn't show up in the CRUD views by default unless the filter was enabled (not unlike how SQL Lab Views are ignored by default in the `tablemodelview`).   
   
   Records could be marked using a hook, trigger, or cron as deletable based on various criterion using cascading context:
   
   #### Charts 
   
   - Returns an error. 
   - Has not been viewed for _n_-days.
   
   #### Dashboards
   
   - Contains no charts. 
   
   ##### Tables/Datasources 
   
   - Not referenced by any charts.
   
   ### New or Changed Public Interfaces
   
   We would need to updated the data model and leverage `sql-paranoid` (or similar) for enabling the soft-deletes. We would also need to update the FAB views to handle filtering/exclusion of soft deleted records. Finally we would need to implement triggers or similar to i) soft delete records, and ii) hard delete records. 
   
   ### New dependencies
   
   The only new dependency would be `sqla-paranoid` (no public license) if we decided not to write this ourself. Note the package only contains several hundred lines of codes. 
   
   ### Migration Plan and Compatibility
   
   We would need to update the schema to include the `deleted_at` (or `deleted_on`) column for certain tables. Note I think we only need this for charts, dashboards, and datasources (the cascade deletes should handle the cleanup of columns and metrics). 
   
   ### Rejected Alternatives
   
   None. 
   
   to: @etr2460 @mistercrunch @villebro @willbarrett 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org