You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/07/06 21:27:46 UTC

[GitHub] [superset] john-bodley opened a new issue, #20630: [WIP] [SIP-88] Proposal for restructuring the Python code base

john-bodley opened a new issue, #20630:
URL: https://github.com/apache/superset/issues/20630

   *Please make sure you are familiar with the SIP process documented*
   (here)[https://github.com/apache/superset/issues/5602]. The SIP number should be the next number after the latest SIP listed [here](https://github.com/apache/superset/issues?q=is%3Aissue+label%3Asip).
   
   ## [SIP-88] Proposal for restructuring the Python code base
   
   ### Motivation
   
   Description of the problem to be solved.
   
   ### Proposed Change
   
   Describe how the feature will be implemented, or the problem will be solved. If possible, include mocks, screenshots, or screencasts (even if from different tools).
   
   
   ```
   superset/
   ├─ blueprints/
   │  ├─ api/
   |  |  ├─ legacy/         # Previously defined in superset/connectors/*/views.py, superset/views/, etc.
   |  |  ├─ v1/
   |  |  |  ├─ datasets.py  # Previously superset/datasets/api.py
   |  |  |  ├─ ...
   ├─ daos/
   │  ├─ datasets/          # Previously superset/datasets/
   |  |  ├─ models.py       # Previously defined in superset/connectors/sqla/models.py
   |  |  ├─ ...
   ```
   
   ### New or Changed Public Interfaces
   
   Describe any new additions to the model, views or `REST` endpoints. Describe any changes to existing visualizations, dashboards and React components. Describe changes that affect the Superset CLI and how Superset is deployed.
   
   ### New dependencies
   
   Describe any `npm`/`PyPI` packages that are required. Are they actively maintained? What are their licenses?
   
   ### Migration Plan and Compatibility
   
   Describe any database migrations that are necessary, or updates to stored URLs.
   
   ### Rejected Alternatives
   
   Describe alternative approaches that were considered and rejected.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on issue #20630: [WIP] [SIP-92] Proposal for restructuring the Python code base

Posted by GitBox <gi...@apache.org>.
ktmud commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1296418873

   I like this! Changing `superset/{datasets,dashboard...}/{api,command,dao}` to `superset/{api,commands,daos}/{dataset,dashboard,...}` would at least reduce the amount of top-level folders and make it easier to navigate for people who are just working on the APIs. And more often than not, you'd probably work on APIs utilizing multiple models rather than working on both models and the API of a single entity type at the same time.
   
   I also like that the core SQL engine gets its own folder. It's long overdue.
   
   One thing I'm not sure about is can we make it faster to find all models that correspond to a db table? I assume they will all be moved to `dao/...`, but is the nesting under `dao/{entity_type}/models.py` too deep still?
   
   Another thing that has been bothering me is---there seems to be too many boilerplates and parameters passing around between the API, the commands and the DAO. Also, what is the boundary between a command and a DAO method anyway? Currently it seems a very thick validation layer can be added to either the [DAO](https://github.com/apache/superset/blob/bc435e08d01b87efcf8774f29a7078cee8988e39/superset/datasets/dao.py) or a [command](https://github.com/apache/superset/blob/4d192e6e4d74157c1eb8fed63df7ddaee4c8ecf7/superset/reports/commands/alert.py).
   
   So I wonder if we can further simplify things down to something like this:
   
   ```
   superset
   ├── api 
   │   ├── legacy
   │   └── v1
   ├── views
   │   └── legacy 
   │       ├── core.py
   │       └── ...
   ├── tasks
   │   └── ...
   ├── commands
   │   ├── datasets
   │   ├── dashboards
   │   │   ├── create_dashboard.py
   │   │   ├── get_dashboard_charts.py
   │   │   └── ...
   │   ├── reports
   │   └── ...
   ├── engine
   │   ├── specs
   │   ├── query
   │   │   ├── builder
   │   │   ├── executor
   │   │   └── results
   ├── models
   │   ├── base.py
   │   ├── chart.py
   │   ├── dashboard.py
   │   ├── dataset.py
   │   ├── query.py
   │   ├── report.py
   │   └── ...
   └── utils
   ```
   
   A couple of things to note:
   
   1. Input validation that does not require pulling data from db ([e.g.](https://github.com/apache/superset/blob/4d192e6e4d74157c1eb8fed63df7ddaee4c8ecf7/superset/reports/commands/base.py#L56-L66)) would stay with the API---ideally done by the API framework directly via custom validators (using  Marshmallow or [any](https://docs.python-cerberus.org/en/stable/) [other](https://pydantic-docs.helpmanual.io/usage/validators/) validation library).
   2. `commands` will include any data operations (query or mutation) that require more than one model. Classes in `models` should try their best to not import any other models unless the dependency is strictly one directional (i.e., it's a 1-many relationship instead of many-to-many, e.g., dataset -> columns).
   3. Validations before data writes should happen privately within the data writer. There should be no need to export the validation methods---so we can avoid passing parameters around for too many layers.
   4. `models` contains only classes that directly corresponds to a db table--and the base classes and mixins that support such classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on issue #20630: [SIP-92] Proposal for restructuring the Python code base

Posted by GitBox <gi...@apache.org>.
john-bodley commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1334198622

   @ktmud I spoke with @hughhhh  and @rusackas briefly about this and they were in agreement with you about keeping the concept of views and thus I've updated the directory schematic. As it currently stands views contain both API endpoints as well as non-API endpoints, i.e., those which render HTML templates, and thus the non-API endpoints would be housed under the `superset/blueprints/views` subfolder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] [SIP-92] Proposal for restructuring the Python code base [superset]

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1939507265

   @rusackas I'm not sure, though I have started working with custom Pylint checkers. I suspect the "enforcement" likely happens implicitly with developers following the new patterns. That said the state of our current backend code is rather troubling as we have things strewn all over the place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] craig-rueda commented on issue #20630: [SIP-92] Proposal for restructuring the Python code base

Posted by GitBox <gi...@apache.org>.
craig-rueda commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1387532458

   Thanks for this, @john-bodley! I +1 to @ktmud 's structure. I have noticed recently that models are mixed all around the code and should be considered top-level. Ideally they would be somewhat isolated and should not depend directly on layers above, so this change would be great. I'd be glad to help out with the refactor :)
   
   **My view of what each layer "does" (open to discussion):**
   
   ```
   MVC Views / API
   ...
   ...
   Commands
   ...
   ...
   DAO
   ...
   ...
   Models
   ```
   
   ### MVC Views / API
   * Responsible for handling interactions with external actors (REST, views, etc)
   * Should perform validation from an API point of view
   * Translates errors emitted from mid-tier
   
   ### Commands
   * Where the business logic lives!
   * Performs its own validations and potentially raises when things don't look right
   * Might chain to other commands
   * Interfaces with DAO's in order to query the DB
   
   ### DAOs
   * Interfaces with the DB (via SQLA in our case)
   * Exposes low-level methods like `get_by_id()`, etc.
   * SHOULD NOT CONTAIN BUSINESS LOGIC (keeps things reusable and non-crosscutting)
   * Centralizes query logic, making mid-tier easy to test
   
   ### Models
   * Basically `py` dataclasses :)
   * Utilized by upper tiers to represent persisted data
   * **Note that quite a lot of "business" logic has crept into this layer (see: https://github.com/apache/superset/blob/master/superset/models/core.py#L379)**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on issue #20630: [SIP-92] Proposal for restructuring the Python code base

Posted by GitBox <gi...@apache.org>.
john-bodley commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1334107939

   @ktmud I spoke with @hughhhh  and @rusackas briefly about this and they were in agreement with you about keeping the concept of views and thus I've updated the directory schematic. As it currently stands views contain both API endpoints as well as non-API endpoints and thus the non-API endpoints would be housed under the `superset/blueprints/views` subfolder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] [SIP-92] Proposal for restructuring the Python code base [superset]

Posted by "rusackas (via GitHub)" <gi...@apache.org>.
rusackas commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1932498489

   @john-bodley is there a way to provide linting rules around this so we can (a) enforce it, and (b) track the tech debt here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on issue #20630: [WIP] [SIP-92] Proposal for restructuring the Python code base

Posted by GitBox <gi...@apache.org>.
john-bodley commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1297462242

   @ktmud thanks the the comments. I believe views—which are part of the MVC model; alongside models—are likely deemed legacy, i.e., should be migrated to RESTful API endpoints, and thus likely shouldn't be explicitly called out.
   
   Regarding models, I'm not overly versed with the DAO pattern, though per [this](https://levelup.gitconnected.com/structuring-a-large-production-flask-application-7a0066a65447) post DAOs and models could/should coexist and thus having a top-level `superset/models/` directory seems viable. The one aspect I'm not entirely sure about is whether the existing model definitions are actually DAO-esque in nature and would need to be refactored. As an example the [Dashboard](https://github.com/apache/superset/blob/master/superset/models/dashboard.py) model contains numerous properties/methods which reach beyond what the [original intent](https://flask-sqlalchemy.palletsprojects.com/en/3.0.x/models/) of a Flask-SQLAlchemy model is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on issue #20630: [WIP] [SIP-92] Proposal for restructuring the Python code base

Posted by GitBox <gi...@apache.org>.
john-bodley commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1298820953

   @ktmud I've updated the original description which explicitly calls out the models. Regarding views—per [here](https://flask.palletsprojects.com/en/2.2.x/views)—I alway perceived these as solely defining the legacy API as they merely contain routes hence why they're grouped under the `superset/blueprints/api/legacy` folder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on issue #20630: [SIP-92] Proposal for restructuring the Python code base

Posted by GitBox <gi...@apache.org>.
ktmud commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1334201653

   Cool. Thanks for the update!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] michael-s-molina commented on issue #20630: [SIP-92] Proposal for restructuring the Python code base

Posted by GitBox <gi...@apache.org>.
michael-s-molina commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1349882777

   Thanks for the SIP @john-bodley. I went through a similar process when writing [SIP-61 - Improve the organization of front-end folders](https://github.com/apache/superset/issues/13632). When researching best practices for organizing projects, the concept of a feature-based organization appeared in many articles that described how large codebases were organized. You'll find many of the reasoning behind this model in SIP-61 and its references. The basic idea is that files related to a feature should belong together in a structured way. This allows you to easily switch between feature implementations, facilitates the use of feature flags, and also promotes better-defined dependencies. Maybe we can apply some of the concepts here too 😉 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] ktmud commented on issue #20630: [WIP] [SIP-92] Proposal for restructuring the Python code base

Posted by GitBox <gi...@apache.org>.
ktmud commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1297603192

   The reason I wanted to keep a `views` folder is exactly to call-out the legacy code so they can be more proactively cleaned up. The `views` folder can be removed once everything is migrated to client-side rendering.
   
   > The one aspect I'm not entirely sure about is whether the existing model definitions are actually DAO-esque in nature and would need to be refactored
   
   Yes, I was thinking of the same thing. `models` should just be a thin layer of SQLAlchemy's declaritive mapping + some shared classes & mixins. We can have an explicit  DAO layer, or, just stay simple and put more things in "commands" instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] [SIP-92] Proposal for restructuring the Python code base [superset]

Posted by "rusackas (via GitHub)" <gi...@apache.org>.
rusackas commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1745666366

   Closing the issue since the vote has [passed](https://lists.apache.org/thread/0m5gt1tnlodtbcodjfg5zvvrr9mryjhh). @john-bodley please move the issue to the "Implemented/Done" column when you think that's appropriate :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] [SIP-92] Proposal for restructuring the Python code base [superset]

Posted by "rusackas (via GitHub)" <gi...@apache.org>.
rusackas closed issue #20630: [SIP-92] Proposal for restructuring the Python code base
URL: https://github.com/apache/superset/issues/20630


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org