You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/27 18:24:19 UTC

[GitHub] [airflow] blag opened a new pull request #21167: Add a session backend to store session data in the database

blag opened a new pull request #21167:
URL: https://github.com/apache/airflow/pull/21167


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   This PR re-implements modified portions of [Flask-Session](https://github.com/fengsp/flask-session) to store session data in the database.
   
   TODO:
   * [ ] Tests
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham closed pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
jedcunningham closed pull request #21167:
URL: https://github.com/apache/airflow/pull/21167


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] blag commented on a change in pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
blag commented on a change in pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#discussion_r794734820



##########
File path: airflow/migrations/versions/c381b21cb7e4_add_session_table_to_db.py
##########
@@ -0,0 +1,60 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""add session table to db
+
+Revision ID: c381b21cb7e4
+Revises: e655c0453f75
+Create Date: 2022-01-25 13:56:35.069429
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+from sqlalchemy.engine.reflection import Inspector
+
+# revision identifiers, used by Alembic.
+revision = 'c381b21cb7e4'
+down_revision = 'e655c0453f75'
+branch_labels = None
+depends_on = None
+
+TABLE_NAME = 'sessions'
+
+
+def upgrade():
+    """Apply add session table to db"""
+    conn = op.get_bind()
+    inspector = Inspector.from_engine(conn)

Review comment:
       Interesting, thanks for pointing me to that. I'll remove that then and have the upgrade assume that the table doesn't exist.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] blag commented on pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
blag commented on pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#issuecomment-1027637286


   @jedcunningham Some of the code your wanted changes to is the vendored in code. I intentionally made as minimal modifications to that code as I needed, with the intent to possibly make a PR to upstream Flask-Session, get that merged, and migrate to using that (and removing our vendored code). The more and more in-depth modifications we make to the vendored code, the more difficult it will be to switch in the future.
   
   I'm happy to make the changes you noted if you'd still like me to (please let me know), I just wanted to bring that to your attention first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] blag commented on a change in pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
blag commented on a change in pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#discussion_r798837969



##########
File path: airflow/www/extensions/init_session.py
##########
@@ -15,33 +15,30 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from flask import request, session as flask_session
-from flask.sessions import SecureCookieSessionInterface
+from flask import session as builtin_flask_session
 
+from airflow.configuration import conf
+from airflow.www.session import AirflowDatabaseSessionInterface, AirflowSecureCookieSessionInterface
 
-class AirflowSessionInterface(SecureCookieSessionInterface):
-    """
-    Airflow cookie session interface.
-    Modifications of sessions should be done here because
-    the change here is global.
-    """
 
-    def save_session(self, *args, **kwargs):
-        """Prevent creating session from REST API requests."""
-        if request.blueprint == '/api/v1':
-            return None
-        return super().save_session(*args, **kwargs)
-
-
-def init_permanent_session(app):
-    """Make session permanent to allows us to store data"""
-
-    def make_session_permanent():
-        flask_session.permanent = True
-
-    app.before_request(make_session_permanent)
-
-
-def init_airflow_session_interface(app):
+def init_airflow_session_interface(app, db):
     """Set airflow session interface"""
-    app.session_interface = AirflowSessionInterface()
+    config = app.config.copy()
+    selected_backend = conf.get('webserver', 'SESSION_BACKEND', fallback='database')
+
+    if selected_backend == 'securecookie':
+        app.session_interface = AirflowSecureCookieSessionInterface()
+        if config.get('SESSION_PERMANENT', True):
+
+            def make_session_permanent():
+                builtin_flask_session.permanent = True
+
+            app.before_request(make_session_permanent)
+    elif selected_backend == 'database' or selected_backend is None:
+        app.session_interface = AirflowDatabaseSessionInterface(
+            db,
+            config.get('SESSION_SQLALCHEMY_TABLE', 'sessions'),
+            config.get('SESSION_KEY_PREFIX', 'session:'),
+            config.get('SESSION_USE_SIGNER', True),
+            config.get('SESSION_PERMANENT', True),
+        )

Review comment:
       I'm not sure how well this will fly, but I'd like to leave these undocumented - for now.
   
   The upstream Flask-Session package supported these options, presumably because some users needed them. But every option we add is more branches and more code to possibly break and debug, so I'd like to keep the number of configurable options to a minimum.
   
   So for these, I'd like to keep these available as escape hatches **for the interim** as we switch (most? all?) users over to database-backed sessions. If none of our users need these options then we can remove them and simply hard code the values. If we do have users who need these options, then we can explicitly document (and support) them later.
   
   But if we really want to surface these options to users or if we want to hard code them out of the gate, I'm happy to make that change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#discussion_r801994382



##########
File path: airflow/_vendor/flask_session/sessions.py
##########
@@ -0,0 +1,153 @@
+# ~~~~~~~~~~~~~~~~~~~~~~
+#
+# Server-side Sessions and SessionInterfaces.
+#
+# Originally copied from flask_session.sessions
+# Flask-Session
+# Copyright (C) 2014 by Shipeng Feng and other contributors
+# BSD, see LICENSE for more details.
+
+from datetime import datetime
+import json
+from uuid import uuid4
+
+from flask.sessions import SessionInterface as FlaskSessionInterface
+from flask.sessions import SessionMixin as FlaskSessionMixin
+from werkzeug.datastructures import CallbackDict
+from itsdangerous import Signer, BadSignature, want_bytes
+
+from .db_models import Session as SessionModel
+
+
+class SQLAlchemySession(CallbackDict, FlaskSessionMixin):
+    """Baseclass for server-side based sessions."""
+
+    def __init__(self, initial=None, sid=None, permanent=None):
+        def on_update(self):
+            self.modified = True
+        super().__init__(initial, on_update)
+        self.sid = sid
+        if permanent:
+            self.permanent = permanent
+        self.modified = False
+
+
+class SessionInterface(FlaskSessionInterface):
+    def _generate_sid(self):
+        return str(uuid4())
+
+    def _get_signer(self, app):
+        if not app.secret_key:
+            return None
+        # The salt is combined with app.secret_key, and is used to "distinguish
+        # signatures in different contexts"
+        return Signer(app.secret_key, salt='airflow',
+                      key_derivation='hmac')
+
+
+class SQLAlchemySessionInterface(SessionInterface):
+    """Uses the Flask-SQLAlchemy from a flask app as a session backend.
+
+    .. versionadded:: 0.2
+
+    :param db: A Flask-SQLAlchemy instance.
+    :param table: The table name you want to use.
+    :param key_prefix: A prefix that is added to all store keys.
+    :param use_signer: Whether to sign the session id cookie or not.
+    :param permanent: Whether to use permanent session or not.
+    """
+
+    serializer = json
+    session_class = SQLAlchemySession
+    sql_session_model = SessionModel
+
+    def __init__(self, app, db, table, key_prefix, use_signer=False,
+                 permanent=True):
+        if db is None:
+            from flask_sqlalchemy import SQLAlchemy
+            db = SQLAlchemy(app)
+        self.db = db
+        self.key_prefix = key_prefix
+        self.use_signer = use_signer
+        self.permanent = permanent
+        self.has_same_site_capability = hasattr(self, "get_cookie_samesite")
+
+        self.sql_session_model = SessionModel
+
+    def open_session(self, app, request):
+        sid = request.cookies.get(app.session_cookie_name)
+        if not sid:
+            sid = self._generate_sid()
+            return self.session_class(sid=sid, permanent=self.permanent)
+        if self.use_signer:
+            signer = self._get_signer(app)
+            if signer is None:
+                return None
+            try:
+                sid_as_bytes = signer.unsign(sid)
+                sid = sid_as_bytes.decode()
+            except BadSignature:
+                sid = self._generate_sid()
+                return self.session_class(sid=sid, permanent=self.permanent)
+
+        store_id = self.key_prefix + sid
+        db_session = self.db.session()

Review comment:
       We might want to use `provide_session` instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#discussion_r797156794



##########
File path: airflow/_vendor/flask_session/sessions.py
##########
@@ -0,0 +1,146 @@
+# ~~~~~~~~~~~~~~~~~~~~~~
+#
+# Server-side Sessions and SessionInterfaces.
+#
+# Originally copied from flask_session.sessions
+# Flask-Session
+# Copyright (C) 2014 by Shipeng Feng and other contributors
+# BSD, see LICENSE for more details.
+
+from datetime import datetime
+from uuid import uuid4
+try:
+    import cPickle as pickle
+except ImportError:
+    import pickle

Review comment:
       ```suggestion
   import pickle
   ```
   
   Let's keep it simple.

##########
File path: docs/apache-airflow/migrations-ref.rst
##########
@@ -23,7 +23,9 @@ Here's the list of all the Database Migrations that are executed via when you ru
 +--------------------------------+------------------+-----------------+---------------------------------------------------------------------------------------+
 | Revision ID                    | Revises ID       | Airflow Version | Description                                                                           |
 +--------------------------------+------------------+-----------------+---------------------------------------------------------------------------------------+
-| ``e655c0453f75`` (head)        | ``587bdf053233`` | ``2.3.0``       | Add ``map_index`` column to TaskInstance to identify task-mapping, and a ``task_map`` |
+| ``c381b21cb7e4`` (head)        | ``e655c0453f75`` | ``2.3.0``       | Create a ``sessions`` table to store web session data                                 |

Review comment:
       This will need to be moved into the right spot for 2.2.4 (e.g. a different "revises id"), but I'm most likely going to pull another revision back to 2.2.4 tomorrow so hold off for now. At least this comment will remind us we need to do it.

##########
File path: airflow/_vendor/flask_session/db_models.py
##########
@@ -0,0 +1,35 @@
+# ~~~~~~~~~~~~~~~~~~~~~~
+#
+# Server-side Sessions and SessionInterfaces.
+#
+# Originally copied from flask_session.sessions
+# Flask-Session
+# Copyright (C) 2014 by Shipeng Feng and other contributors
+# BSD, see LICENSE for more details.
+
+from sqlalchemy import (
+    Column,
+    DateTime,
+    Integer,
+    LargeBinary,
+    String,
+)
+from airflow.models.base import Base
+from airflow.utils.log.logging_mixin import LoggingMixin
+
+
+class Session(Base, LoggingMixin):
+    __tablename__ = 'sessions'
+
+    id = Column(Integer, primary_key=True)
+    session_id = Column(String(255), unique=True)
+    data = Column(LargeBinary)

Review comment:
       If we can do json, then we probably need to adjust this column type?

##########
File path: airflow/www/extensions/init_session.py
##########
@@ -15,33 +15,30 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from flask import request, session as flask_session
-from flask.sessions import SecureCookieSessionInterface
+from flask import session as builtin_flask_session
 
+from airflow.configuration import conf
+from airflow.www.session import AirflowDatabaseSessionInterface, AirflowSecureCookieSessionInterface
 
-class AirflowSessionInterface(SecureCookieSessionInterface):
-    """
-    Airflow cookie session interface.
-    Modifications of sessions should be done here because
-    the change here is global.
-    """
 
-    def save_session(self, *args, **kwargs):
-        """Prevent creating session from REST API requests."""
-        if request.blueprint == '/api/v1':
-            return None
-        return super().save_session(*args, **kwargs)
-
-
-def init_permanent_session(app):
-    """Make session permanent to allows us to store data"""
-
-    def make_session_permanent():
-        flask_session.permanent = True
-
-    app.before_request(make_session_permanent)
-
-
-def init_airflow_session_interface(app):
+def init_airflow_session_interface(app, db):
     """Set airflow session interface"""
-    app.session_interface = AirflowSessionInterface()
+    config = app.config.copy()
+    selected_backend = conf.get('webserver', 'SESSION_BACKEND', fallback='database')

Review comment:
       We don't need a fallback - defaults are loaded automatically.
   
   (You'd use fallbacks if it's possible for the config to not be in the default, e.g. if a provider could be installed on an earlier version of core airflow)
   
   ```suggestion
       selected_backend = conf.get('webserver', 'SESSION_BACKEND')
   ```

##########
File path: airflow/www/extensions/init_session.py
##########
@@ -15,33 +15,30 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from flask import request, session as flask_session
-from flask.sessions import SecureCookieSessionInterface
+from flask import session as builtin_flask_session
 
+from airflow.configuration import conf
+from airflow.www.session import AirflowDatabaseSessionInterface, AirflowSecureCookieSessionInterface
 
-class AirflowSessionInterface(SecureCookieSessionInterface):
-    """
-    Airflow cookie session interface.
-    Modifications of sessions should be done here because
-    the change here is global.
-    """
 
-    def save_session(self, *args, **kwargs):
-        """Prevent creating session from REST API requests."""
-        if request.blueprint == '/api/v1':
-            return None
-        return super().save_session(*args, **kwargs)
-
-
-def init_permanent_session(app):
-    """Make session permanent to allows us to store data"""
-
-    def make_session_permanent():
-        flask_session.permanent = True
-
-    app.before_request(make_session_permanent)
-
-
-def init_airflow_session_interface(app):
+def init_airflow_session_interface(app, db):
     """Set airflow session interface"""
-    app.session_interface = AirflowSessionInterface()
+    config = app.config.copy()
+    selected_backend = conf.get('webserver', 'SESSION_BACKEND', fallback='database')
+
+    if selected_backend == 'securecookie':
+        app.session_interface = AirflowSecureCookieSessionInterface()
+        if config.get('SESSION_PERMANENT', True):
+
+            def make_session_permanent():
+                builtin_flask_session.permanent = True
+
+            app.before_request(make_session_permanent)
+    elif selected_backend == 'database' or selected_backend is None:
+        app.session_interface = AirflowDatabaseSessionInterface(
+            db,
+            config.get('SESSION_SQLALCHEMY_TABLE', 'sessions'),
+            config.get('SESSION_KEY_PREFIX', 'session:'),
+            config.get('SESSION_USE_SIGNER', True),
+            config.get('SESSION_PERMANENT', True),
+        )

Review comment:
       We should probably raise if we get something else?

##########
File path: airflow/config_templates/config.yml
##########
@@ -1016,6 +1016,13 @@
       type: string
       example: ~
       default: ""
+    - name: web_server_session_backend
+      description: |
+        The type of backend used to store web session data, can be 'database' or 'securecookie'
+      version_added: ~

Review comment:
       ```suggestion
         version_added: 2.2.4
   ```

##########
File path: airflow/www/extensions/init_session.py
##########
@@ -15,33 +15,30 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from flask import request, session as flask_session
-from flask.sessions import SecureCookieSessionInterface
+from flask import session as builtin_flask_session
 
+from airflow.configuration import conf
+from airflow.www.session import AirflowDatabaseSessionInterface, AirflowSecureCookieSessionInterface
 
-class AirflowSessionInterface(SecureCookieSessionInterface):
-    """
-    Airflow cookie session interface.
-    Modifications of sessions should be done here because
-    the change here is global.
-    """
 
-    def save_session(self, *args, **kwargs):
-        """Prevent creating session from REST API requests."""
-        if request.blueprint == '/api/v1':
-            return None
-        return super().save_session(*args, **kwargs)
-
-
-def init_permanent_session(app):
-    """Make session permanent to allows us to store data"""
-
-    def make_session_permanent():
-        flask_session.permanent = True
-
-    app.before_request(make_session_permanent)
-
-
-def init_airflow_session_interface(app):
+def init_airflow_session_interface(app, db):
     """Set airflow session interface"""
-    app.session_interface = AirflowSessionInterface()
+    config = app.config.copy()
+    selected_backend = conf.get('webserver', 'SESSION_BACKEND', fallback='database')
+
+    if selected_backend == 'securecookie':
+        app.session_interface = AirflowSecureCookieSessionInterface()
+        if config.get('SESSION_PERMANENT', True):
+
+            def make_session_permanent():
+                builtin_flask_session.permanent = True
+
+            app.before_request(make_session_permanent)
+    elif selected_backend == 'database' or selected_backend is None:

Review comment:
       ```suggestion
       elif selected_backend == 'database':
   ```

##########
File path: airflow/_vendor/flask_session/sessions.py
##########
@@ -0,0 +1,146 @@
+# ~~~~~~~~~~~~~~~~~~~~~~
+#
+# Server-side Sessions and SessionInterfaces.
+#
+# Originally copied from flask_session.sessions
+# Flask-Session
+# Copyright (C) 2014 by Shipeng Feng and other contributors
+# BSD, see LICENSE for more details.
+
+from datetime import datetime
+from uuid import uuid4
+try:
+    import cPickle as pickle
+except ImportError:
+    import pickle
+
+from flask.sessions import SessionInterface as FlaskSessionInterface
+from flask.sessions import SessionMixin as FlaskSessionMixin
+from werkzeug.datastructures import CallbackDict
+from itsdangerous import Signer, BadSignature, want_bytes
+
+from .db_models import Session as SessionModel
+
+
+class SQLAlchemySession(CallbackDict, FlaskSessionMixin):
+    """Baseclass for server-side based sessions."""
+
+    def __init__(self, initial=None, sid=None, permanent=None):
+        def on_update(self):
+            self.modified = True
+        super().__init__(initial, on_update)
+        self.sid = sid
+        if permanent:
+            self.permanent = permanent
+        self.modified = False
+
+
+class SessionInterface(FlaskSessionInterface):
+    def _generate_sid(self):
+        return str(uuid4())
+
+    def _get_signer(self, app):
+        if not app.secret_key:
+            return None
+        # The salt is combined with app.secret_key, and is used to "distinguish
+        # signatures in different contexts"
+        return Signer(app.secret_key, salt='airflow',
+                      key_derivation='hmac')
+
+
+class SQLAlchemySessionInterface(SessionInterface):
+    """Uses the Flask-SQLAlchemy from a flask app as a session backend.
+
+    .. versionadded:: 0.2
+
+    :param db: A Flask-SQLAlchemy instance.
+    :param table: The table name you want to use.
+    :param key_prefix: A prefix that is added to all store keys.
+    :param use_signer: Whether to sign the session id cookie or not.
+    :param permanent: Whether to use permanent session or not.
+    """
+
+    serializer = pickle

Review comment:
       In fact, can we serialize with json instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#issuecomment-1024300856


   What's the motivation for that change ?  It's quite a big change and you have not written why you want to implement it - this should be explained as part of the commit message - providing the context etc. Otherwise reviewer does not know what to review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#issuecomment-1031803799


   > If we're trying to be able to easily understand what changes we've made to the vendored code, should we merge it in first with no changes? then make PRs on top of that which modify its behavior?
   
   In a case or two previous to this we have merged the PR _without_ squashing (manually on the command line).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#issuecomment-1040603598


   Superseded by #21478.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#discussion_r799874678



##########
File path: airflow/www/extensions/init_session.py
##########
@@ -15,33 +15,30 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from flask import request, session as flask_session
-from flask.sessions import SecureCookieSessionInterface
+from flask import session as builtin_flask_session
 
+from airflow.configuration import conf
+from airflow.www.session import AirflowDatabaseSessionInterface, AirflowSecureCookieSessionInterface
 
-class AirflowSessionInterface(SecureCookieSessionInterface):
-    """
-    Airflow cookie session interface.
-    Modifications of sessions should be done here because
-    the change here is global.
-    """
 
-    def save_session(self, *args, **kwargs):
-        """Prevent creating session from REST API requests."""
-        if request.blueprint == '/api/v1':
-            return None
-        return super().save_session(*args, **kwargs)
-
-
-def init_permanent_session(app):
-    """Make session permanent to allows us to store data"""
-
-    def make_session_permanent():
-        flask_session.permanent = True
-
-    app.before_request(make_session_permanent)
-
-
-def init_airflow_session_interface(app):
+def init_airflow_session_interface(app, db):
     """Set airflow session interface"""
-    app.session_interface = AirflowSessionInterface()
+    config = app.config.copy()
+    selected_backend = conf.get('webserver', 'SESSION_BACKEND', fallback='database')
+
+    if selected_backend == 'securecookie':
+        app.session_interface = AirflowSecureCookieSessionInterface()
+        if config.get('SESSION_PERMANENT', True):
+
+            def make_session_permanent():
+                builtin_flask_session.permanent = True
+
+            app.before_request(make_session_permanent)
+    elif selected_backend == 'database' or selected_backend is None:
+        app.session_interface = AirflowDatabaseSessionInterface(
+            db,
+            config.get('SESSION_SQLALCHEMY_TABLE', 'sessions'),
+            config.get('SESSION_KEY_PREFIX', 'session:'),
+            config.get('SESSION_USE_SIGNER', True),
+            config.get('SESSION_PERMANENT', True),
+        )

Review comment:
       I think we should:
   - hard code the table name (otherwise we'd need to use it in the migration too, and honestly I don't think it's worth it)
   - remove the `SESSION_KEY_PREFIX` (we wont ever share a backend)
   - force `SESSION_USE_SIGNER` to True (we already require a secret_key)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#issuecomment-1040603598


   Superseded by #21478.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish edited a comment on pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
dstandish edited a comment on pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#issuecomment-1031743721


   > @jedcunningham Some of the code your wanted changes to is the vendored in code. I intentionally made as minimal modifications to that code as I needed, with the intent to possibly make a PR to upstream Flask-Session, get that merged, and migrate to using that (and removing our vendored code). The more and more in-depth modifications we make to the vendored code, the more difficult it will be to switch in the future.
   I'm happy to make the changes you noted if you'd still like me to (please let me know), I just wanted to bring that to your attention first.
   
   On this topic, thinking out loud...
   
   If we're trying to be able to easily understand what changes we've made to the vendored code, should we merge it in first with no changes?  then make PRs on top of that which modify its behavior?
   Then we'd be able to tell exactly what we did to it (instead of vendoring and changing in one commit / pr).
   And it would make the PRs easier to review too.
   
   Alternatively I don't think we do submodules 'round here but is this a valid case for it?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] blag commented on a change in pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
blag commented on a change in pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#discussion_r797313338



##########
File path: airflow/_vendor/flask_session/sessions.py
##########
@@ -0,0 +1,146 @@
+# ~~~~~~~~~~~~~~~~~~~~~~
+#
+# Server-side Sessions and SessionInterfaces.
+#
+# Originally copied from flask_session.sessions
+# Flask-Session
+# Copyright (C) 2014 by Shipeng Feng and other contributors
+# BSD, see LICENSE for more details.
+
+from datetime import datetime
+from uuid import uuid4
+try:
+    import cPickle as pickle
+except ImportError:
+    import pickle
+
+from flask.sessions import SessionInterface as FlaskSessionInterface
+from flask.sessions import SessionMixin as FlaskSessionMixin
+from werkzeug.datastructures import CallbackDict
+from itsdangerous import Signer, BadSignature, want_bytes
+
+from .db_models import Session as SessionModel
+
+
+class SQLAlchemySession(CallbackDict, FlaskSessionMixin):
+    """Baseclass for server-side based sessions."""
+
+    def __init__(self, initial=None, sid=None, permanent=None):
+        def on_update(self):
+            self.modified = True
+        super().__init__(initial, on_update)
+        self.sid = sid
+        if permanent:
+            self.permanent = permanent
+        self.modified = False
+
+
+class SessionInterface(FlaskSessionInterface):
+    def _generate_sid(self):
+        return str(uuid4())
+
+    def _get_signer(self, app):
+        if not app.secret_key:
+            return None
+        # The salt is combined with app.secret_key, and is used to "distinguish
+        # signatures in different contexts"
+        return Signer(app.secret_key, salt='airflow',
+                      key_derivation='hmac')
+
+
+class SQLAlchemySessionInterface(SessionInterface):
+    """Uses the Flask-SQLAlchemy from a flask app as a session backend.
+
+    .. versionadded:: 0.2
+
+    :param db: A Flask-SQLAlchemy instance.
+    :param table: The table name you want to use.
+    :param key_prefix: A prefix that is added to all store keys.
+    :param use_signer: Whether to sign the session id cookie or not.
+    :param permanent: Whether to use permanent session or not.
+    """
+
+    serializer = pickle

Review comment:
       Can we guarantee that all data being stored in the session is JSON-serializable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#issuecomment-1025945178


   We need to add it to [NOTICE](https://github.com/apache/airflow/blob/main/NOTICE) as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#issuecomment-1031743721


   > @jedcunningham Some of the code your wanted changes to is the vendored in code. I intentionally made as minimal modifications to that code as I needed, with the intent to possibly make a PR to upstream Flask-Session, get that merged, and migrate to using that (and removing our vendored code). The more and more in-depth modifications we make to the vendored code, the more difficult it will be to switch in the future.
   I'm happy to make the changes you noted if you'd still like me to (please let me know), I just wanted to bring that to your attention first.
   
   On this topic, thinking out loud...
   
   If we're trying to be able to easily understand what changes we've made to the vendored code, should we merge it in first with no changes?  then make PRs on top of that which modify its behavior?
   Then we'd be able to tell exactly what we did to it (instead of vendoring and changing in one commit / pr).
   
   Alternatively I don't think we do submodules 'round here but is this a valid case for it?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham closed pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
jedcunningham closed pull request #21167:
URL: https://github.com/apache/airflow/pull/21167


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#discussion_r795469484



##########
File path: airflow/models/__init__.py
##########
@@ -29,6 +29,7 @@
 from airflow.models.pool import Pool
 from airflow.models.renderedtifields import RenderedTaskInstanceFields
 from airflow.models.sensorinstance import SensorInstance  # noqa: F401
+from airflow.models.session import Session

Review comment:
       Rather than doing this (and adding `airflow/models/session.py`) import it airflow.utils.db -- that's enough to get it loaded and reset and doing `reset db`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] blag commented on a change in pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
blag commented on a change in pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#discussion_r794932051



##########
File path: airflow/migrations/versions/c381b21cb7e4_add_session_table_to_db.py
##########
@@ -0,0 +1,60 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""add session table to db
+
+Revision ID: c381b21cb7e4
+Revises: e655c0453f75
+Create Date: 2022-01-25 13:56:35.069429
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+from sqlalchemy.engine.reflection import Inspector
+
+# revision identifiers, used by Alembic.
+revision = 'c381b21cb7e4'
+down_revision = 'e655c0453f75'
+branch_labels = None
+depends_on = None
+
+TABLE_NAME = 'sessions'
+
+
+def upgrade():
+    """Apply add session table to db"""
+    conn = op.get_bind()
+    inspector = Inspector.from_engine(conn)

Review comment:
       Marking this as resolved since I believe I fixed the issue. Please review again and let me know if I need to unresolve it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#discussion_r794630722



##########
File path: airflow/migrations/versions/c381b21cb7e4_add_session_table_to_db.py
##########
@@ -0,0 +1,60 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""add session table to db
+
+Revision ID: c381b21cb7e4
+Revises: e655c0453f75
+Create Date: 2022-01-25 13:56:35.069429
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+from sqlalchemy.engine.reflection import Inspector
+
+# revision identifiers, used by Alembic.
+revision = 'c381b21cb7e4'
+down_revision = 'e655c0453f75'
+branch_labels = None
+depends_on = None
+
+TABLE_NAME = 'sessions'
+
+
+def upgrade():
+    """Apply add session table to db"""
+    conn = op.get_bind()
+    inspector = Inspector.from_engine(conn)

Review comment:
       We are trying to implement running migrations offline starting from 2.0.0 migration head and this would be a blocker.
   PR: https://github.com/apache/airflow/pull/20962




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk removed a comment on pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
potiuk removed a comment on pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#issuecomment-1024300856


   What's the motivation for that change ?  It's quite a big change and you have not written why you want to implement it - this should be explained as part of the commit message - providing the context etc. Otherwise reviewer does not know what to review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] blag commented on pull request #21167: Add a session backend to store session data in the database

Posted by GitBox <gi...@apache.org>.
blag commented on pull request #21167:
URL: https://github.com/apache/airflow/pull/21167#issuecomment-1031870760


   @dstandish That's a good call, but it's looking less and less likely that our changes are going to be palatable to the maintainer of Flask-Session. If the changes we need won't ever be merged into Flask-Session, then our modified vendored code just becomes...our code, with an additional copyright line/header.
   
   A "middle path" of this would be to start maintaining our own fork of Flask-Session, merge in the changes that we need, maybe merge in a few more unmerged PRs from Flask-Session, and have our fork become the friendlier (than Flask-Session) session package for Flask. But I don't want to sign the company up for that responsibility without some consensus and blessing, and I don't really want to maintain my own personal fork of Flask-Session specifically for Airflow.
   
   I'm happy to do whatever it takes to get this merged, but I will need some guidance on how to get there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org