You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by "john-bodley (via GitHub)" <gi...@apache.org> on 2023/06/22 20:34:27 UTC

[GitHub] [superset] john-bodley opened a new pull request, #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

john-bodley opened a new pull request, #24488:
URL: https://github.com/apache/superset/pull/24488

   <!---
   Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] craig-rueda commented on pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "craig-rueda (via GitHub)" <gi...@apache.org>.
craig-rueda commented on PR #24488:
URL: https://github.com/apache/superset/pull/24488#issuecomment-1604572286

   Totally agree with this philosophy, except for situations where you really want to leave FK data hanging. IMO, the DB should keep things clean wherever possible and should enforce referential integrity (kinda the point of FK's in the first place :D ). Delegating to the DB to deal with cleanup keeps app-tier logic cleaner and less buggy as it's less concerned with cascades and what not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] craig-rueda commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "craig-rueda (via GitHub)" <gi...@apache.org>.
craig-rueda commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1244023822


##########
superset/daos/dataset.py:
##########
@@ -361,25 +362,24 @@ def create_metric(
         """
         return DatasetMetricDAO.create(properties, commit=commit)
 
-    @staticmethod
-    def bulk_delete(models: Optional[list[SqlaTable]], commit: bool = True) -> None:
+    @classmethod
+    def bulk_delete(
+        cls, models: Optional[list[SqlaTable]], commit: bool = True
+    ) -> None:
         item_ids = [model.id for model in models] if models else []
-        # bulk delete, first delete related data
-        if models:
-            for model in models:
-                model.owners = []
-                db.session.merge(model)
-            db.session.query(SqlMetric).filter(SqlMetric.table_id.in_(item_ids)).delete(
-                synchronize_session="fetch"
-            )
-            db.session.query(TableColumn).filter(
-                TableColumn.table_id.in_(item_ids)
-            ).delete(synchronize_session="fetch")
         # bulk delete itself
         try:
             db.session.query(SqlaTable).filter(SqlaTable.id.in_(item_ids)).delete(
                 synchronize_session="fetch"
             )
+
+            if models:
+                connection = db.session.connection()
+                mapper = next(iter(cls.model_cls.registry.mappers))  # type: ignore
+
+                for model in models:
+                    security_manager.dataset_after_delete(mapper, connection, model)

Review Comment:
   Yea, either way. It's more of a stylistic thing IMO. Keeping the DAO layer as decoupled as possible has worked well for me in the past, hence my comments :).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239092173


##########
tests/integration_tests/sqla_models_tests.py:
##########
@@ -189,11 +189,6 @@ def test_extra_cache_keys(self, flask_g):
         self.assertTrue(table3.has_extra_cache_key_calls(query_obj))
         assert extra_cache_keys == ["abc"]
 
-        # Cleanup

Review Comment:
   CI was failing for PostgreSQL (though not MySQL or SQLite—which is a little perplexing) with SQLAlchemy error which would indicate that we're trying to delete objects which previously weren't committed to the database. Given how these tables are constructed, 
   
   ```python
   table3 = SqlaTable(
       table_name="test_has_no_extra_cache_keys_table",
       sql=query,
       database=get_example_database(),
   )
   ```
   
   and there's no clear add/merge call it seems that SQLAlchemy's viewpoint is right, i.e., there's no cleanup required. If I'm right then this is a win for SIP-92. I suspect throughout the code and tests we're doing numerous database operations inefficiently with superfluous commits, not rolling back test state, etc. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239084491


##########
superset/datasets/commands/delete.py:
##########
@@ -43,19 +41,13 @@ def __init__(self, model_id: int):
 
     def run(self) -> Model:
         self.validate()
-        self._model = cast(SqlaTable, self._model)
+        assert self._model
+
         try:
-            # Even though SQLAlchemy should in theory delete rows from the association

Review Comment:
   @betodealmeida this should address the problem you were trying to circumvent in https://github.com/apache/superset/pull/23414.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] commented on pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #24488:
URL: https://github.com/apache/superset/pull/24488#issuecomment-1603363482

   ## [Codecov](https://app.codecov.io/gh/apache/superset/pull/24488?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#24488](https://app.codecov.io/gh/apache/superset/pull/24488?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (ebfa61c) into [master](https://app.codecov.io/gh/apache/superset/commit/2a4ef5cccf3d3f8f2f7ccfc8c590f8f2fde31ab0?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (2a4ef5c) will **decrease** coverage by `11.00%`.
   > The diff coverage is `50.00%`.
   
   > :exclamation: Current head ebfa61c differs from pull request most recent head 81224e0. Consider uploading reports for the commit 81224e0 to get more accurate results
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #24488       +/-   ##
   ===========================================
   - Coverage   69.03%   58.03%   -11.00%     
   ===========================================
     Files        1901     1901               
     Lines       74006    73994       -12     
     Branches     8115     8115               
   ===========================================
   - Hits        51088    42942     -8146     
   - Misses      20807    28941     +8134     
     Partials     2111     2111               
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `?` | |
   | postgres | `?` | |
   | presto | `53.80% <50.00%> (+0.01%)` | :arrow_up: |
   | python | `60.61% <50.00%> (-22.88%)` | :arrow_down: |
   | sqlite | `?` | |
   | unit | `54.67% <50.00%> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/superset/pull/24488?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [superset/daos/dataset.py](https://app.codecov.io/gh/apache/superset/pull/24488?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3VwZXJzZXQvZGFvcy9kYXRhc2V0LnB5) | `50.38% <ø> (-40.86%)` | :arrow_down: |
   | [superset/datasets/commands/delete.py](https://app.codecov.io/gh/apache/superset/pull/24488?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvZGVsZXRlLnB5) | `48.38% <25.00%> (-48.92%)` | :arrow_down: |
   | [superset/connectors/sqla/models.py](https://app.codecov.io/gh/apache/superset/pull/24488?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL21vZGVscy5weQ==) | `74.85% <100.00%> (-16.53%)` | :arrow_down: |
   
   ... and [298 files with indirect coverage changes](https://app.codecov.io/gh/apache/superset/pull/24488/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1242937469


##########
superset/daos/dataset.py:
##########
@@ -361,25 +362,24 @@ def create_metric(
         """
         return DatasetMetricDAO.create(properties, commit=commit)
 
-    @staticmethod
-    def bulk_delete(models: Optional[list[SqlaTable]], commit: bool = True) -> None:
+    @classmethod
+    def bulk_delete(
+        cls, models: Optional[list[SqlaTable]], commit: bool = True
+    ) -> None:
         item_ids = [model.id for model in models] if models else []
-        # bulk delete, first delete related data
-        if models:
-            for model in models:
-                model.owners = []
-                db.session.merge(model)
-            db.session.query(SqlMetric).filter(SqlMetric.table_id.in_(item_ids)).delete(
-                synchronize_session="fetch"
-            )
-            db.session.query(TableColumn).filter(
-                TableColumn.table_id.in_(item_ids)
-            ).delete(synchronize_session="fetch")
         # bulk delete itself
         try:
             db.session.query(SqlaTable).filter(SqlaTable.id.in_(item_ids)).delete(
                 synchronize_session="fetch"
             )
+
+            if models:
+                connection = db.session.connection()
+                mapper = next(iter(cls.model_cls.registry.mappers))  # type: ignore
+
+                for model in models:
+                    security_manager.dataset_after_delete(mapper, connection, model)

Review Comment:
   @craig-rueda I generally agree, but how does this differ from the DAO asking "please delete these records" which triggers the SQLAlchemy ORM to handle post deletion events (per the [after_delete](https://docs.sqlalchemy.org/en/20/orm/events.html#sqlalchemy.orm.MapperEvents.after_delete) event handler) which in turn invokes the security manager?
   
   Note we typically use these SQLAlchemy events because of short comings with either our data model and/or constructs which can't be modeled by our database schema.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239187318


##########
superset/utils/core.py:
##########
@@ -815,12 +815,15 @@ def __exit__(  # pylint: disable=redefined-outer-name,redefined-builtin
 
 def pessimistic_connection_handling(some_engine: Engine) -> None:
     @event.listens_for(some_engine, "engine_connect")
-    def ping_connection(connection: Connection, branch: bool) -> None:
+    def on_connect(connection: Connection, branch: bool) -> None:
         if branch:
             # 'branch' refers to a sub-connection of a connection,
             # we don't want to bother pinging on these.
             return
 
+        if connection.dialect.name == "sqlite":

Review Comment:
   See [here](https://docs.sqlalchemy.org/en/20/dialects/sqlite.html#foreign-key-support) for why this is needed. I do wonder if we should drop support for SQLite given it's not supported in production. We already skip a bunch of tests, i.e., 
   
   ```python
   if backend() == "sqlite":
       return
   ```
   
   possibly because of how the foreign key constraints have no impact on the underlying table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239092173


##########
tests/integration_tests/sqla_models_tests.py:
##########
@@ -189,11 +189,6 @@ def test_extra_cache_keys(self, flask_g):
         self.assertTrue(table3.has_extra_cache_key_calls(query_obj))
         assert extra_cache_keys == ["abc"]
 
-        # Cleanup

Review Comment:
   CI was failing for PostgreSQL (though not MySQL or SQLite—which is a little perplexing) with SQLAlchemy error which would indicate that we're trying to delete objects which previously weren't committed to the database. Given how these tables are constructed, 
   
   ```python
   table3 = SqlaTable(
       table_name="test_has_no_extra_cache_keys_table",
       sql=query,
       database=get_example_database(),
   )
   ```
   
   and there's no clear add/commit call it seems that SQLAlchemy's viewpoint is right, i.e., there's no cleanup required. If I'm right then this is a win for SIP-92. I suspect throughout the code and tests we're doing numerous database operations inefficiently with superfluous commits, not rolling back test state, etc. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] craig-rueda commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "craig-rueda (via GitHub)" <gi...@apache.org>.
craig-rueda commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1242480042


##########
superset/daos/dataset.py:
##########
@@ -361,25 +362,24 @@ def create_metric(
         """
         return DatasetMetricDAO.create(properties, commit=commit)
 
-    @staticmethod
-    def bulk_delete(models: Optional[list[SqlaTable]], commit: bool = True) -> None:
+    @classmethod
+    def bulk_delete(
+        cls, models: Optional[list[SqlaTable]], commit: bool = True
+    ) -> None:
         item_ids = [model.id for model in models] if models else []
-        # bulk delete, first delete related data
-        if models:
-            for model in models:
-                model.owners = []
-                db.session.merge(model)
-            db.session.query(SqlMetric).filter(SqlMetric.table_id.in_(item_ids)).delete(
-                synchronize_session="fetch"
-            )
-            db.session.query(TableColumn).filter(
-                TableColumn.table_id.in_(item_ids)
-            ).delete(synchronize_session="fetch")
         # bulk delete itself
         try:
             db.session.query(SqlaTable).filter(SqlaTable.id.in_(item_ids)).delete(
                 synchronize_session="fetch"
             )
+
+            if models:
+                connection = db.session.connection()
+                mapper = next(iter(cls.model_cls.registry.mappers))  # type: ignore
+
+                for model in models:
+                    security_manager.dataset_after_delete(mapper, connection, model)

Review Comment:
   Ideally, DAO's should be super light, only concerned with interfacing with the DB. They shouldn't have dependencies upwards (to the mid-tier), which can lead to circular dependencies as basically all mid-tier logic will at some point need a DAO. This can obviously be worked around using some late-binding technique. 
   
   From a semantic POV, the command is just asking to "please delete these records". Calling the SM to do additional work here is a side effect, which ultimately makes this code less reusable for lower level operations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239092173


##########
tests/integration_tests/sqla_models_tests.py:
##########
@@ -189,11 +189,6 @@ def test_extra_cache_keys(self, flask_g):
         self.assertTrue(table3.has_extra_cache_key_calls(query_obj))
         assert extra_cache_keys == ["abc"]
 
-        # Cleanup

Review Comment:
   CI was failing for PostgreSQL with SQLAlchemy error which would indicate that we're trying to delete objects which previously weren't committed to the database. Given how these tables are constructed, 
   
   ```python
   table3 = SqlaTable(
       table_name="test_has_no_extra_cache_keys_table",
       sql=query,
       database=get_example_database(),
   )
   ```
   
   and there's no clear add/merge call it seems that SQLAlchemy's viewpoint is right, i.e., there's no cleanup required. If I'm right then this is a win for SIP-92. I suspect throughout the code and tests we're doing numerous database operations inefficiently with superfluous commits, not rolling back test state, etc. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] craig-rueda commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "craig-rueda (via GitHub)" <gi...@apache.org>.
craig-rueda commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1240076650


##########
superset/daos/dataset.py:
##########
@@ -361,25 +362,24 @@ def create_metric(
         """
         return DatasetMetricDAO.create(properties, commit=commit)
 
-    @staticmethod
-    def bulk_delete(models: Optional[list[SqlaTable]], commit: bool = True) -> None:
+    @classmethod
+    def bulk_delete(
+        cls, models: Optional[list[SqlaTable]], commit: bool = True
+    ) -> None:
         item_ids = [model.id for model in models] if models else []
-        # bulk delete, first delete related data
-        if models:
-            for model in models:
-                model.owners = []
-                db.session.merge(model)
-            db.session.query(SqlMetric).filter(SqlMetric.table_id.in_(item_ids)).delete(
-                synchronize_session="fetch"
-            )
-            db.session.query(TableColumn).filter(
-                TableColumn.table_id.in_(item_ids)
-            ).delete(synchronize_session="fetch")
         # bulk delete itself
         try:
             db.session.query(SqlaTable).filter(SqlaTable.id.in_(item_ids)).delete(
                 synchronize_session="fetch"
             )
+
+            if models:
+                connection = db.session.connection()
+                mapper = next(iter(cls.model_cls.registry.mappers))  # type: ignore
+
+                for model in models:
+                    security_manager.dataset_after_delete(mapper, connection, model)

Review Comment:
   Ideally, we wouldn't reference the SM from the DAO layer as it's more of a mid-tier concern. This belongs in the CMD.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239084491


##########
superset/datasets/commands/delete.py:
##########
@@ -43,19 +41,13 @@ def __init__(self, model_id: int):
 
     def run(self) -> Model:
         self.validate()
-        self._model = cast(SqlaTable, self._model)
+        assert self._model
+
         try:
-            # Even though SQLAlchemy should in theory delete rows from the association

Review Comment:
   @betodealmeida this should address the problem you were trying to circumvent in https://github.com/apache/superset/pull/23414.
   
   Fun fact I wondered if I wrote this comment as it's somewhat in the style I write, but I realized I don't typically use the term "let's".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1242937469


##########
superset/daos/dataset.py:
##########
@@ -361,25 +362,24 @@ def create_metric(
         """
         return DatasetMetricDAO.create(properties, commit=commit)
 
-    @staticmethod
-    def bulk_delete(models: Optional[list[SqlaTable]], commit: bool = True) -> None:
+    @classmethod
+    def bulk_delete(
+        cls, models: Optional[list[SqlaTable]], commit: bool = True
+    ) -> None:
         item_ids = [model.id for model in models] if models else []
-        # bulk delete, first delete related data
-        if models:
-            for model in models:
-                model.owners = []
-                db.session.merge(model)
-            db.session.query(SqlMetric).filter(SqlMetric.table_id.in_(item_ids)).delete(
-                synchronize_session="fetch"
-            )
-            db.session.query(TableColumn).filter(
-                TableColumn.table_id.in_(item_ids)
-            ).delete(synchronize_session="fetch")
         # bulk delete itself
         try:
             db.session.query(SqlaTable).filter(SqlaTable.id.in_(item_ids)).delete(
                 synchronize_session="fetch"
             )
+
+            if models:
+                connection = db.session.connection()
+                mapper = next(iter(cls.model_cls.registry.mappers))  # type: ignore
+
+                for model in models:
+                    security_manager.dataset_after_delete(mapper, connection, model)

Review Comment:
   @craig-rueda I generally agree, but how does this differ from the DAO asking "please delete these records" which triggers the SQLAlchemy ORM to handle post deletion events (per the [after_delete](https://docs.sqlalchemy.org/en/20/orm/events.html#sqlalchemy.orm.MapperEvents.after_delete) event handler) which in turn invokes the security manager?
   
   Note we typically use these SQLAlchemy events because of short comings with either our data model and/or constructs which can't be modeled by our database schema.
   
   I guess what I’m saying is I would expect the DAO to behave in the same way whether it was deleting a single entity or bulk deleting multiple entities and explicitly calling the `affer_delete` callback (which is only invoked for the former) ensures that the behavior is consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239092173


##########
tests/integration_tests/sqla_models_tests.py:
##########
@@ -189,11 +189,6 @@ def test_extra_cache_keys(self, flask_g):
         self.assertTrue(table3.has_extra_cache_key_calls(query_obj))
         assert extra_cache_keys == ["abc"]
 
-        # Cleanup

Review Comment:
   CI was failing with a SQLAlchemy error which would indicate that we're trying to delete objects which previously weren't committed to the database. Given how these tables are constructed, 
   
   ```
   table3 = SqlaTable(
               table_name="test_has_no_extra_cache_keys_table",
               sql=query,
               database=get_example_database(),
           )
   ```
   
   and there's no clear add/merge call it seems that SQLAlchemy's viewpoint is right, i.e., there's no cleanup required. If I'm write then this is a win for SIP-92. I suspect throughout the code and tests we're doing numerous database operations inefficiently with superfluous commits, not rolling back test state, etc. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239138419


##########
superset/datasets/commands/bulk_delete.py:
##########
@@ -40,35 +39,10 @@ def __init__(self, model_ids: list[int]):
 
     def run(self) -> None:
         self.validate()
-        if not self._models:
-            return None
+        assert self._models
+
         try:
             DatasetDAO.bulk_delete(self._models)
-            for model in self._models:

Review Comment:
   This is logic taken https://github.com/apache/superset/pull/24466. Previously a test was failing because the relationship no longer existed in Python after the commit. See [here](https://docs.sqlalchemy.org/en/20/orm/cascades.html#notes-on-delete-deleting-objects-referenced-from-collections-and-scalar-relationships) for details. The issue was that:
   
   ```python
   DatasetDAO.bulk_delete(self._models)
   ```
   
   includes a `db.session.commit()` in which all the attributes have expired. I'm not really sure why this worked before, but the TL;DR is this logic has been moved inside the DAO and is executed prior to the commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1240047263


##########
superset/utils/core.py:
##########
@@ -849,6 +850,24 @@ def ping_connection(connection: Connection, branch: bool) -> None:
             # restore 'close with result'
             connection.should_close_with_result = save_should_close_with_result
 
+        if some_engine.dialect.name == "sqlite":
+
+            @event.listens_for(some_engine, "connect")
+            def set_sqlite_pragma(  # pylint: disable=unused-argument

Review Comment:
   Hopefully (if approved) this will remove the need for [this](https://github.com/search?q=repo%3Aapache%2Fsuperset+backend%28%29+%3D%3D+%22sqlite%22&type=code) logic in a number of tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] craig-rueda commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "craig-rueda (via GitHub)" <gi...@apache.org>.
craig-rueda commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1240077021


##########
superset/migrations/versions/2023-06-22_13-39_6fbe660cac39_add_on_delete_cascade_for_tables_references.py:
##########
@@ -0,0 +1,94 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""add on delete cascade for tables references
+
+Revision ID: 6fbe660cac39
+Revises: 83e1abbe777f
+Create Date: 2023-06-22 13:39:47.989373
+
+"""
+from __future__ import annotations
+
+# revision identifiers, used by Alembic.
+revision = "6fbe660cac39"
+down_revision = "83e1abbe777f"
+
+import sqlalchemy as sa
+from alembic import op
+
+from superset.utils.core import generic_find_fk_constraint_name
+
+
+def migrate(ondelete: str | None) -> None:
+    """
+    Redefine the foreign key constraints, via a successive DROP and ADD, for all tables
+    related to the `tables` table to include the ON DELETE construct for cascading
+    purposes.
+
+    :param ondelete: If set, emit ON DELETE <value> when issuing DDL for this constraint
+    """
+
+    bind = op.get_bind()
+    insp = sa.engine.reflection.Inspector.from_engine(bind)
+
+    conv = {
+        "fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
+    }
+
+    for table in ("sql_metrics", "table_columns"):

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239082684


##########
superset/daos/dataset.py:
##########
@@ -364,17 +364,6 @@ def create_metric(
     @staticmethod
     def bulk_delete(models: Optional[list[SqlaTable]], commit: bool = True) -> None:
         item_ids = [model.id for model in models] if models else []
-        # bulk delete, first delete related data

Review Comment:
   I'm not sure if this was required because when you bulk delete a table (using the `delete()` method) you bypass the ORM and that's where the cascade logic resides, but given we've adopted the "shift left" mentality it's something of a moot point, i.e., the database will handle the deletion irrespective of how the records are deleted (SQLAlchemy ORM, raw connection, MySQL/PostgreSQL CLI, etc.).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239257301


##########
superset/utils/core.py:
##########
@@ -849,6 +851,19 @@ def ping_connection(connection: Connection, branch: bool) -> None:
             # restore 'close with result'
             connection.should_close_with_result = save_should_close_with_result
 
+        if some_engine.dialect.name == "sqlite":

Review Comment:
   I wonder if this is why a number of tests currently have,
   
   ```python
   if backend() == "sqlite":
       return
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239297169


##########
tests/integration_tests/sqla_models_tests.py:
##########
@@ -430,7 +425,7 @@ def test_multiple_sql_statements_raises_exception(self):
         }
 
         table = SqlaTable(
-            table_name="test_has_extra_cache_keys_table",
+            table_name="test_multiple_sql_statements",

Review Comment:
   See my comment above. SQLite was complaining that said table already existed when I removed the "cleanup", though neither MySQL nor PostgreSQL did. I'm not entirely sure what's going on here, but it does seem like the table name was copypasta from the other test and thus the easiest solution was to rename the table in accordance with the test. 
   
   I sense with this refactoring work it's a case of two steps forwards, one step back at time. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1240114939


##########
superset/daos/dataset.py:
##########
@@ -361,25 +362,24 @@ def create_metric(
         """
         return DatasetMetricDAO.create(properties, commit=commit)
 
-    @staticmethod
-    def bulk_delete(models: Optional[list[SqlaTable]], commit: bool = True) -> None:
+    @classmethod
+    def bulk_delete(
+        cls, models: Optional[list[SqlaTable]], commit: bool = True
+    ) -> None:
         item_ids = [model.id for model in models] if models else []
-        # bulk delete, first delete related data
-        if models:
-            for model in models:
-                model.owners = []
-                db.session.merge(model)
-            db.session.query(SqlMetric).filter(SqlMetric.table_id.in_(item_ids)).delete(
-                synchronize_session="fetch"
-            )
-            db.session.query(TableColumn).filter(
-                TableColumn.table_id.in_(item_ids)
-            ).delete(synchronize_session="fetch")
         # bulk delete itself
         try:
             db.session.query(SqlaTable).filter(SqlaTable.id.in_(item_ids)).delete(
                 synchronize_session="fetch"
             )
+
+            if models:
+                connection = db.session.connection()
+                mapper = next(iter(cls.model_cls.registry.mappers))  # type: ignore
+
+                for model in models:
+                    security_manager.dataset_after_delete(mapper, connection, model)

Review Comment:
   @craig-rueda could you help educate me as to why this is? Granted that the DAO is only abstraction from the database (and not the security manager which in turns interfaces with the database), but isn't this the interface we expect commands, APIs, etc. to interface with? Note this logic would be handled by the `after_delete` event listener if we were using the ORM, as opposed to bulk queries, to delete the assets.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239082684


##########
superset/daos/dataset.py:
##########
@@ -364,17 +364,6 @@ def create_metric(
     @staticmethod
     def bulk_delete(models: Optional[list[SqlaTable]], commit: bool = True) -> None:
         item_ids = [model.id for model in models] if models else []
-        # bulk delete, first delete related data

Review Comment:
   This is no longer needed as the database will handle said logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #24488:
URL: https://github.com/apache/superset/pull/24488#discussion_r1239092173


##########
tests/integration_tests/sqla_models_tests.py:
##########
@@ -189,11 +189,6 @@ def test_extra_cache_keys(self, flask_g):
         self.assertTrue(table3.has_extra_cache_key_calls(query_obj))
         assert extra_cache_keys == ["abc"]
 
-        # Cleanup

Review Comment:
   CI was failing with a SQLAlchemy error which would indicate that we're trying to delete objects which previously weren't committed to the database. Given how these tables are constructed, 
   
   ```python
   table3 = SqlaTable(
       table_name="test_has_no_extra_cache_keys_table",
       sql=query,
       database=get_example_database(),
   )
   ```
   
   and there's no clear add/merge call it seems that SQLAlchemy's viewpoint is right, i.e., there's no cleanup required. If I'm write then this is a win for SIP-92. I suspect throughout the code and tests we're doing numerous database operations inefficiently with superfluous commits, not rolling back test state, etc. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley merged pull request #24488: chore(dao): Add explicit ON DELETE CASCADE when deleting datasets

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley merged PR #24488:
URL: https://github.com/apache/superset/pull/24488


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org