You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@superset.apache.org by jo...@apache.org on 2023/06/20 05:11:28 UTC

[superset] 01/01: docs: Add tips regarding authoring DAOs

This is an automated email from the ASF dual-hosted git repository.

johnbodley pushed a commit to branch john-bodley--docs-daos
in repository https://gitbox.apache.org/repos/asf/superset.git

commit 012fe60191182b5083fb9b6ae212dbbcb4b73d8c
Author: John Bodley <45...@users.noreply.github.com>
AuthorDate: Mon Jun 19 22:11:22 2023 -0700

    docs: Add tips regarding authoring DAOs
---
 CONTRIBUTING.md | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index e36363a7a5..1a5a26c933 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -96,6 +96,7 @@ little bit helps, and credit will always be given.
     - [Merging DB migrations](#merging-db-migrations)
     - [SQL Lab Async](#sql-lab-async)
     - [Async Chart Queries](#async-chart-queries)
+    - [Data Access Object (DAO)](#data-access-object-dao)
   - [Chart Parameters](#chart-parameters)
     - [Datasource \& Chart Type](#datasource--chart-type)
     - [Time](#time)
@@ -1382,6 +1383,24 @@ The following configuration settings are available for async queries (see config
 
 More information on the async query feature can be found in [SIP-39](https://github.com/apache/superset/issues/9190).
 
+### Data Access Object (DAO)
+
+A Data Access Object (DAO) is a pattern that provides an abstract interface to the SQLAlchemy Object Relational Mapper (ORM). The DAOs are critical as they form the building block of the application which are wrapped by the associated commands and RESTful API endpoints. 
+
+Currently there are numerous inconsistencies and violation of the DRY principal within the codebase as it relates to DAOs and ORMs—unnecessary commits, non-ACID transactions, etc.—which makes the code unncessarily complex and convoluted. Addressing the underlying issues with the DAOs _should_ help simplify the downstream operations and improve the developer experience.
+
+To ensure consistency the following rules should be adhered to:
+
+1. All database operations (including testing) shoudl be defined within a DAO, i.e., there should not be any explicit `db.session.add`, `db.session.merge`, etc. calls outside of a DAO. 
+2. A DAO should use `create`, `update`, `delete`, `upsert` terms—typical database operations which ensure consistency with commands—rather than action based terms like `save`, `saveas`, `override`, etc.
+3. Sessions should be managed via a [context manager](https://docs.sqlalchemy.org/en/20/orm/session_transaction.html#begin-once) which auto-commits on success and rolls back on failure, i.e., there should be no explicit `db.session.commit` or `db.session.rollback` calls within the DAO.
+4. There should be a single atomic transaction representing the entirety of the operation, i.e., when creating a dataset with associated columns and metrics either all the changes succeed when the transaction is commited, or all the changes are undone when the transaction is rolled back. SQLAlchemy supportes [nested transactions](https://docs.sqlalchemy.org/en/20/orm/session_transaction.html#nested-transaction) via the `begin_nested` method which can be nested—inline with how DAOs are invoked.
+5. The database layer should adopt a "shift left" mentality i.e., uniqueness/foreign key constraints, relationships, cascades, etc. should all be defined in the underlying database model where possible. Note there are times where this isn’t possible, i.e., MySQL and PosgreSQL treat` NULL` values differently from a uniqueness perspective. If modeled correctly the ORM should implicitly delete all associations and thus one should not need to preemptively undefine relationships, etc.
+6. Avoid validation logic, i.e., ask for forgiveness rather than permission. This is akin to a try/except block where the exception is the exception rather than the rule. Thus instead of than first checking whether something is defined before adding, simply try adding it (and per 5) and rely on the database to verify wheter the model is acceptable. This removes potential race conditions, reduces the number of database operations, and reduces the code footprint.
+7. Provide bulk create, update, delete, etc. methods (if possible) for performance reasons.
+8. By default updates should be sparse in nature, i.e., only those attributes which are explicitly defined are updated.
+9. Tests should leverage nested transaction which should be rolled back on teardown. This is cleaner than deleting objects. See [here](https://docs.sqlalchemy.org/en/20/orm/session_transaction.html#joining-a-session-into-an-external-transaction-such-as-for-test-suites) for details.
+
 ## Chart Parameters
 
 Chart parameters are stored as a JSON encoded string the `slices.params` column and are often referenced throughout the code as form-data. Currently the form-data is neither versioned nor typed as thus is somewhat free-formed. Note in the future there may be merit in using something like [JSON Schema](https://json-schema.org/) to both annotate and validate the JSON object in addition to using a Mypy `TypedDict` (introduced in Python 3.8) for typing the form-data in the backend. This sect [...]