You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/17 19:48:55 UTC

[GitHub] [airflow] jedcunningham opened a new issue #18334: Move `airflow.models.crypto.*` out of `airflow.models`

jedcunningham opened a new issue #18334:
URL: https://github.com/apache/airflow/issues/18334


   ### Body
   
   Everything in `airflow.models` should be a sqlalchemy model, and the stuff in `airflow.models.crypto.*` isn't.
   
   ### Committer
   
   - [X] I acknowledge that I am a maintainer/committer of the Apache Airflow project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] msumit commented on issue #18334: Move `airflow.models.crypto.*` out of `airflow.models`

Posted by GitBox <gi...@apache.org>.
msumit commented on issue #18334:
URL: https://github.com/apache/airflow/issues/18334#issuecomment-922226836


   The concept of `DataClasses` isn't new, so I won't recommend putting a hardcore requirement of keeping only database backed classes in the model's folder. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #18334: Move `airflow.models.crypto.*` out of `airflow.models`

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18334:
URL: https://github.com/apache/airflow/issues/18334#issuecomment-922252101


   Yeah.... DataClasses are fine, but I think I agree with @jedcunningham  it would be great to eventually have a separate Database-related package that keeps only those for the future. 
   
   The "model" package is not best in this context, I think more appropriate will be 'sqlachemy' or 'db' or smth like that. Maybe this is really what the problem is :). And I think we should have a bit different criteria here than whether something is a DataClass type or not.
   
   Some more context: I can imagine that in order to implement the true workload isolation and multi-tenancy in the future (which is something we likely have to implement sooner or later), we will have to separate the raw DB access through SQLAlchemy with real "Internal API" of Airflow that can be used by workers (without accessing the DB directly). This likely mean that we should be very strict about which of  Airflow classes can (and should be) accessed by "workers" (this is really now all about isolating workers - all other parts are pretty much done for multi-tenancy). 
   
   This likely means that things like BaseOperator (and maybe crypto - depending where we will take care about decrypting Fernet-encrypted fields) should be moved to a separate package, together with some other "worker available internal APIs" - variables, connections, context, etc. ). For those I think we should have some non-DB-related "facades" (which should not even require importing SQLAlchemy to work). 
   
   And having a separate package where we have all non-SQL Alchemy related APIs and DataClasses clearly separated from SQL-Alchemy ones is - I believe a prerequisite for that. Separating those via separate packages, sounds like good idea.
   
   Actually 'crypto.*' is an interesting one and I think where it should be depends on the decision we make on where to do decryption/encryption.
   
   Even if it is not an SQLAlchemy based, it is actually deeply connected with the DB and we could make a choice (in the future) that all Fernet decryption is done outside of workers. If we decide to implement an "internal API" server that will provide APIs for the workers to communicate, decryption could be done there. In this case Crypto actually would make sense to be in the `sqlalchemy`-related package (even if it is not SQLAlchemy related). But if we decide that's better to decrypt those in-workers and transmit encrypted values to the workers (which we might well do as well - both approaches have its pros and cons) - then Crypto will belong to the "Internal API" part.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #18334: Move `airflow.models.crypto.*` out of `airflow.models`

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #18334:
URL: https://github.com/apache/airflow/issues/18334#issuecomment-922252101


   Yeah.... DataClasses are fine, but I think I agree with @jedcunningham  it would be great to eventually have a separate Database-related package that keeps only those for the future. 
   
   The "model" package is not best name in this context, I think more appropriate will be 'sqlachemy' or 'db' or smth like that. Maybe this is really what the problem is :). And I think we should have a bit different criteria here than whether something is a DataClass type or not.
   
   Some more context: I can imagine that in order to implement the true workload isolation and multi-tenancy in the future (which is something we likely have to implement sooner or later), we will have to separate the raw DB access through SQLAlchemy with real "Internal API" of Airflow that can be used by workers (without accessing the DB directly). This likely mean that we should be very strict about which of  Airflow classes can (and should be) accessed by "workers" (this is really now all about isolating workers - all other parts are pretty much done for multi-tenancy). 
   
   This likely means that things like BaseOperator (and maybe crypto - depending where we will take care about decrypting Fernet-encrypted fields) should be moved to a separate package, together with some other "worker available internal APIs" - variables, connections, context, etc. ). For those I think we should have some non-DB-related "facades" (which should not even require importing SQLAlchemy to work). 
   
   And having a separate package where we have all non-SQL Alchemy related APIs and DataClasses clearly separated from SQL-Alchemy ones is - I believe a prerequisite for that. Separating those via separate packages, sounds like good idea.
   
   Actually 'crypto.*' is an interesting one and I think where it should be depends on the decision we make on where to do decryption/encryption.
   
   Even if it is not an SQLAlchemy based, it is actually deeply connected with the DB and we could make a choice (in the future) that all Fernet decryption is done outside of workers. If we decide to implement an "internal API" server that will provide APIs for the workers to communicate, decryption could be done there. In this case Crypto actually would make sense to be in the `sqlalchemy`-related package (even if it is not SQLAlchemy related). But if we decide that's better to decrypt those in-workers and transmit encrypted values to the workers (which we might well do as well - both approaches have its pros and cons) - then Crypto will belong to the "Internal API" part.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #18334: Move `airflow.models.crypto.*` out of `airflow.models`

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #18334:
URL: https://github.com/apache/airflow/issues/18334#issuecomment-922253682


   Agree on restructuring to separate classes by their availablity in different Airflow processes. It’s difficult to keep track what is available in what context, and creates observable difficulties from contributors touching anything but the very surface level of Airflow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #18334: Move `airflow.models.crypto.*` out of `airflow.models`

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18334:
URL: https://github.com/apache/airflow/issues/18334#issuecomment-922051916


   Well. BaseOperator isn't an sqlalchemy model either :) . But a good point nevertheless
   
   It's one of the last remainder of Airlfow 1.10


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org