You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by "john-bodley (via GitHub)" <gi...@apache.org> on 2023/02/02 08:12:34 UTC

[GitHub] [superset] john-bodley opened a new pull request, #22957: chore(datasets): Refactor DatasetDAO with bulk operations

john-bodley opened a new pull request, #22957:
URL: https://github.com/apache/superset/pull/22957

   <!---
   Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   
   Somewhat akin to https://github.com/apache/superset/pull/22413, this PR optimizes the PUT operation for the RESTful API `/api/v1/dataset/{pk}` endpoint.
   
   Though updates are sparse in nature, i.e., per [here](https://github.com/apache/superset/blob/21a2e7bc91125ffab2ddc62d6cee7a7e9a51840d/superset/dao/base.py#L166-L167) where only the specified properties are updated, from a dataset perspective all the columns and metrics must be specified—even if updating a subset—otherwise the undefined (unseen) columns and metrics will be deleted per [here](https://github.com/apache/superset/blob/21a2e7bc91125ffab2ddc62d6cee7a7e9a51840d/superset/datasets/dao.py#L221-L222) and [here](https://github.com/apache/superset/blob/21a2e7bc91125ffab2ddc62d6cee7a7e9a51840d/superset/datasets/dao.py#L262-L263) respectively.
   
   For large datasets (like Airbnb's uber virtual dataset—comprising of thousands of columns and metrics) these updates are rather expensive given that the create, update, and delete operations work on a single entity and thus the PUT operation for the RESTful API `/api/v1/dataset/{pk}` endpoint (even for sparely defined entities) the request timed out after five minutes.
   
   The proposed fix is to use bulk logic for the create, update, and delete operations which reduces the request to well under 60 seconds.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   ### TESTING INSTRUCTIONS
   
   CI, i.e., existing unit/integration tests.
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #22957: chore(datasets): Refactor DatasetDAO with bulk operations

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on code in PR #22957:
URL: https://github.com/apache/superset/pull/22957#discussion_r1094170324


##########
superset/datasets/dao.py:
##########
@@ -241,26 +252,36 @@ def update_metrics(
         then we delete.
         """
 
-        metric_by_id = {metric.id: metric for metric in model.metrics}
-        seen = set()
-
-        for properties in property_metrics:
-            if "id" in properties:
-                seen.add(properties["id"])
+        metrics_by_id = {metric.id: metric for metric in model.metrics}

Review Comment:
   metrics (plural) and not metric (singular).



##########
superset/datasets/dao.py:
##########
@@ -241,26 +252,36 @@ def update_metrics(
         then we delete.
         """
 
-        metric_by_id = {metric.id: metric for metric in model.metrics}
-        seen = set()
-
-        for properties in property_metrics:
-            if "id" in properties:
-                seen.add(properties["id"])
+        metrics_by_id = {metric.id: metric for metric in model.metrics}
+
+        property_metrics_by_id = {
+            properties["id"]: properties
+            for properties in property_metrics
+            if "id" in properties
+        }
+
+        db.session.bulk_insert_mappings(
+            SqlMetric,
+            [
+                {**properties, "table_id": model.id}
+                for properties in property_metrics
+                if not "id" in properties
+            ],
+        )
 
-                DatasetDAO.update_metric(
-                    metric_by_id[properties["id"]],
-                    properties,
-                    commit=False,
-                )
-            else:
-                DatasetDAO.create_metric(
-                    {**properties, "table_id": model.id},
-                    commit=False,
-                )
+        db.session.bulk_update_mappings(
+            SqlMetric,
+            [
+                {**metrics_by_id[properties["id"]].__dict__, **properties}
+                for properties in property_metrics_by_id.values()
+            ],
+        )
 
-        for id_ in {obj.id for obj in model.metrics} - seen:
-            DatasetDAO.delete_column(metric_by_id[id_], commit=False)
+        db.session.query(SqlMetric).filter(
+            SqlMetric.id.in_(
+                {metric.id for metric in model.metrics} - property_metrics_by_id.keys()
+            )
+        ).delete(synchronize_session="fetch")

Review Comment:
   Using the same `synchronize_session` as the `bulk_delete` method for deleting multiple datasets.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] commented on pull request #22957: chore(datasets): Refactor DatasetDAO update to leverage bulk logic for create, update, and delete operations

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #22957:
URL: https://github.com/apache/superset/pull/22957#issuecomment-1413327649

   # [Codecov](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#22957](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0021148) into [master](https://codecov.io/gh/apache/superset/commit/e13ebb6134edbeed42a8e4cf227a416a4dd8b484?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e13ebb6) will **decrease** coverage by `9.63%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #22957      +/-   ##
   ==========================================
   - Coverage   65.81%   56.19%   -9.63%     
   ==========================================
     Files        1876     1876              
     Lines       72084    72072      -12     
     Branches     7857     7857              
   ==========================================
   - Hits        47444    40502    -6942     
   - Misses      22626    29556    +6930     
     Partials     2014     2014              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `?` | |
   | postgres | `?` | |
   | presto | `52.64% <0.00%> (+0.01%)` | :arrow_up: |
   | python | `58.81% <0.00%> (-20.00%)` | :arrow_down: |
   | sqlite | `?` | |
   | unit | `52.44% <0.00%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/datasets/dao.py](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvZGFvLnB5) | `48.14% <0.00%> (-46.41%)` | :arrow_down: |
   | [superset/utils/dashboard\_import\_export.py](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdXRpbHMvZGFzaGJvYXJkX2ltcG9ydF9leHBvcnQucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [superset/tags/core.py](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdGFncy9jb3JlLnB5) | `4.54% <0.00%> (-95.46%)` | :arrow_down: |
   | [superset/key\_value/commands/update.py](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3VwZGF0ZS5weQ==) | `0.00% <0.00%> (-90.91%)` | :arrow_down: |
   | [superset/key\_value/commands/delete.py](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZS5weQ==) | `0.00% <0.00%> (-87.88%)` | :arrow_down: |
   | [superset/db\_engines/hive.py](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lcy9oaXZlLnB5) | `0.00% <0.00%> (-85.19%)` | :arrow_down: |
   | [superset/key\_value/commands/delete\_expired.py](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZV9leHBpcmVkLnB5) | `0.00% <0.00%> (-84.00%)` | :arrow_down: |
   | [superset/dashboards/commands/importers/v0.py](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGFzaGJvYXJkcy9jb21tYW5kcy9pbXBvcnRlcnMvdjAucHk=) | `15.62% <0.00%> (-76.25%)` | :arrow_down: |
   | [superset/datasets/commands/create.py](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvY3JlYXRlLnB5) | `30.61% <0.00%> (-69.39%)` | :arrow_down: |
   | [superset/datasets/commands/update.py](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvdXBkYXRlLnB5) | `25.00% <0.00%> (-69.05%)` | :arrow_down: |
   | ... and [337 more](https://codecov.io/gh/apache/superset/pull/22957?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley merged pull request #22957: chore(datasets): Refactor DatasetDAO update to leverage bulk logic for create, update, and delete operations

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley merged PR #22957:
URL: https://github.com/apache/superset/pull/22957


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org