You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/04/15 06:23:11 UTC

[GitHub] [airflow] thoralf-gutierrez opened a new issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type

thoralf-gutierrez opened a new issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type
URL: https://github.com/apache/airflow/issues/8315
 
 
   **Apache Airflow version**: 1.10.10
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: VMs
   - **OS**: Ubuntu 16.04
   - **Kernel**: Linux {host} 4.15.0-88-generic #88~16.04.1-Ubuntu SMP Wed Feb 12 04:19:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
   - **Install tools**: pip
   - **Others**:
     - OS is configured in UTC
     - UI is configured to show UTC
     - default_timezone in airflow config set to UTC
     - DB is MySQL and configured in **PT**
   
   **What happened**:
   
   I noticed that execution_dates and other datetimes were off in the ModelView parts of the RBAC UI (e.g. `/dagrun/list/`). Even though all DAGs were set to UTC schedules, execution_dates showed 5pm (which is what midnight UTC looks when shown in PT).
   
   It seems that the MySQL timezone leaked and caused the UI to show wrong datetimes. Though this should not happen as datetimes are encoded as UTC in the databases and other parts of the UI were not showing this error.
   
   This also impacts links to log files in the task instance list view, which end up no where since the datetime in the URL is now wrong too.
   
   **What you expected to happen**:
   
   I would expect the datetimes in ModelView pages to also be aligned with UTC.
   
   **How to reproduce it**:
   
   1. Install Airflow on a OS that is set to UTC
   2. Set it up with a MySQL database that is configured in some other timezone
   3. Run some example DAGs to get some dag runs in the database
   4. Go to the `/dagrun/list` or `/taskinstance/list`, notice the wrong datetimes
   
   **Anything else we need to know**:
   
   Other things I looked at
   * Other parts of the UI, such as the home page and tooltips on the tree view showed the expected execution_dates at midnight
   * I double checked the datetimes in the database, and they were all the correct timestamps: after setting my session timezone to UTC in MySQL, execution_dates were on midnight
   
   
   --> Something seemed to be wrong specifically for these ModelViews.
   
   I went down the rabbit hole and I think I figured it out (at least enough to propose a fix).
   
   1. The `/last_dagruns` API returns **correct datetimes**, and it queries the database using a session that is provided by airflow with the `@provide_session` decorator. This was my beacon to understand what went wrong in the other views.
   
   https://github.com/apache/airflow/blob/96697180d79bfc90f6964a8e99f9dd441789177c/airflow/www_rbac/views.py#L474-L483
   
   2. The `/dagrun/list` view returns **bad datetimes**, and it queries the database with some very different path. This is the view
   
   https://github.com/apache/airflow/blob/96697180d79bfc90f6964a8e99f9dd441789177c/airflow/www_rbac/views.py#L2502-L2505
   
   It extends `AirflowModelView` which extends `ModelView` where we finally find the `/list/` route
   
   https://github.com/dpgaspar/Flask-AppBuilder/blob/0f46427ce02eedddbd9375339d3d921c2e42494e/flask_appbuilder/views.py#L547-L554
   
   That view ends up querying its `self.datamodel`
   
   https://github.com/dpgaspar/Flask-AppBuilder/blob/2a6099d6d45a12308904b96bcad6ad4277e266af/flask_appbuilder/baseviews.py#L1021-L1027
   
   using `self.datamodel.query()` where `self.datamodel` is the `datamodel = AirflowModelView.CustomSQLAInterface(models.DagRun)` that was set in our `DagRunModelView`.
   
   This `datamodel` object is a `SQLAInterface` and this is the query function that gets called: 
   
   https://github.com/dpgaspar/Flask-AppBuilder/blob/0f46427ce02eedddbd9375339d3d921c2e42494e/flask_appbuilder/models/sqla/interface.py#L140
   
   Note that it expects the datamodel / SQLAInterface to have a `self.session`, which we did not provide! Somehow, it was still provided because it _is_ able to query DAG runs, but I inspected the session object and it is very different from the sessions we create in Airflow, i.e. very different from the one used in 1. for `/last_dagruns` that gave us correct datetimes.
   
   If we do provide our own session to the `datamodel` object, then the problem goes away, datetimes are back to correct. The constructor even expects a session but for some reason we forgot to provide it.
   
   So my proposed fix is [here](https://github.com/apache/airflow/blob/1.10.10/airflow/www_rbac/utils.py#L446-L455)
   
   ```python
   class CustomSQLAInterface(SQLAInterface):
       """
       FAB does not know how to handle columns with leading underscores because
       they are not supported by WTForm. This hack will remove the leading
       '_' from the key to lookup the column names.
       """
       @provide_session  // <------ this is the fix
       def __init__(self, obj, session=None):
           super(CustomSQLAInterface, self).__init__(obj, session=session)
   ```
   
   I have tested this locally and it works.
   
   If someone can give this a thumbs up, I'd be happy to propose a PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] boring-cyborg[bot] commented on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type
URL: https://github.com/apache/airflow/issues/8315#issuecomment-613841095
 
 
   Thanks for opening your first issue here! Be sure to follow the issue template!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb edited a comment on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type
URL: https://github.com/apache/airflow/issues/8315#issuecomment-613958215
 
 
   That doesn't feel like the right fix -- it would create the session too early (and hold open/use a connection?) but maybe that doesn't. Hmmmm.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type
URL: https://github.com/apache/airflow/issues/8315#issuecomment-613958215
 
 
   That doesn't feel like the right fix -- it would create the session too early. Hmmmm.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] thoralf-gutierrez edited a comment on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type

Posted by GitBox <gi...@apache.org>.
thoralf-gutierrez edited a comment on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type
URL: https://github.com/apache/airflow/issues/8315#issuecomment-614145127
 
 
   Ah yes that's where the bad session comes from.
   
   If I print the details of the `db.session` that is used in the constructor of the app builder, i.e. the one that gives **incorrect** datetimes that is defined here
   
   https://github.com/apache/airflow/blob/96697180d79bfc90f6964a8e99f9dd441789177c/airflow/www_rbac/app.py#L103-L108
   
   I get :
   ```
   db.session = <sqlalchemy.orm.scoping.scoped_session object at 0x7f4037339d30>
   db.session.__dict__ = {
     'session_factory': sessionmaker(
       class_='SignallingSession',
       db=<SQLA engine=mysql+mysqldb://user:***@host/airflow_db?charset=utf8>,
       query_cls=<class 'flask_sqlalchemy.BaseQuery'>,
       bind=None,
       autoflush=True,
       autocommit=False,
       expire_on_commit=True),
     'registry': <sqlalchemy.util._collections.ScopedRegistry object at 0x7f4037339e10>
   }
   ```
   
   If I print the details of the session in `/last_dagruns`, i.e. the one that gives the **correct** datetimes provided here
   
   https://github.com/apache/airflow/blob/96697180d79bfc90f6964a8e99f9dd441789177c/airflow/www_rbac/views.py#L474-L483
   
   I get:
   
   ```
   session = <sqlalchemy.orm.session.Session object at 0x7f40360ad828>
   session.__dict__ = {
     '_identity_cls': <class 'sqlalchemy.orm.identity.WeakInstanceDict'>,
     'identity_map': <sqlalchemy.orm.identity.WeakInstanceDict object at 0x7f40360ad160>,
     '_new': {},
     '_deleted': {},
     'bind': Engine(mysql+mysqldb://user:***@host/airflow_db),
     '_Session__binds': {},
     '_flushing': False,
     '_warn_on_events': False,
     'transaction': <sqlalchemy.orm.session.SessionTransaction object at 0x7f403609e080>,
     'hash_key': 9,
     'autoflush': False,
     'autocommit': False,
     'expire_on_commit': False,
     'enable_baked_queries': True,
     '_enable_transaction_accounting': True,
     'twophase': False,
     '_query_cls': <class 'sqlalchemy.orm.query.Query'>,
     'dispatch': <sqlalchemy.event.base.SessionEventsDispatch object at 0x7f403628ca98>
   }
   ```
   
   This is the type difference I mentioned.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] thoralf-gutierrez commented on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type

Posted by GitBox <gi...@apache.org>.
thoralf-gutierrez commented on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type
URL: https://github.com/apache/airflow/issues/8315#issuecomment-614145127
 
 
   Ah yes that's right.
   
   If I print the details of the `db.session` that is used in the constructor of the app builder, i.e. the one that gives **incorrect** datetimes that is defined here
   
   https://github.com/apache/airflow/blob/96697180d79bfc90f6964a8e99f9dd441789177c/airflow/www_rbac/app.py#L103-L108
   
   I get :
   ```
   db.session = <sqlalchemy.orm.scoping.scoped_session object at 0x7f4037339d30>
   db.session.__dict__ = {
     'session_factory': sessionmaker(
       class_='SignallingSession',
       db=<SQLA engine=mysql+mysqldb://user:***@host/airflow_db?charset=utf8>,
       query_cls=<class 'flask_sqlalchemy.BaseQuery'>,
       bind=None,
       autoflush=True,
       autocommit=False,
       expire_on_commit=True),
     'registry': <sqlalchemy.util._collections.ScopedRegistry object at 0x7f4037339e10>
   }
   ```
   
   If I print the details of the session in `/last_dagruns`, i.e. the one that gives the **correct** datetimes provided here
   
   https://github.com/apache/airflow/blob/96697180d79bfc90f6964a8e99f9dd441789177c/airflow/www_rbac/views.py#L474-L483
   
   I get:
   
   ```
   session = <sqlalchemy.orm.session.Session object at 0x7f40360ad828>
   session.__dict__ = {
     '_identity_cls': <class 'sqlalchemy.orm.identity.WeakInstanceDict'>,
     'identity_map': <sqlalchemy.orm.identity.WeakInstanceDict object at 0x7f40360ad160>,
     '_new': {},
     '_deleted': {},
     'bind': Engine(mysql+mysqldb://user:***@host/airflow_db),
     '_Session__binds': {},
     '_flushing': False,
     '_warn_on_events': False,
     'transaction': <sqlalchemy.orm.session.SessionTransaction object at 0x7f403609e080>,
     'hash_key': 9,
     'autoflush': False,
     'autocommit': False,
     'expire_on_commit': False,
     'enable_baked_queries': True,
     '_enable_transaction_accounting': True,
     'twophase': False,
     '_query_cls': <class 'sqlalchemy.orm.query.Query'>,
     'dispatch': <sqlalchemy.event.base.SessionEventsDispatch object at 0x7f403628ca98>
   }
   ```
   
   This is the type difference I mentioned.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #8315: Wrong datetimes in RBAC ModelViews caused by wrong session type
URL: https://github.com/apache/airflow/issues/8315#issuecomment-613960078
 
 
   https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/base.py#L339-L344 which ultimately comes from airflow/www/app.py -- I think adding it there is better.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services