You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/17 10:06:26 UTC

[GitHub] [airflow] potiuk opened a new pull request, #24519: Get rid of TimedJSONWebSignatureSerializer

potiuk opened a new pull request, #24519:
URL: https://github.com/apache/airflow/pull/24519

   The TimedJSONWebSignatureSerializer has been deprecated from the
   itsdangerous library and they recommended to use dedicated
   libraries for it.
   
   https://github.com/pallets/itsdangerous/issues/129
   
   Since we are going to move to FAB 4+ with #22397 where newer version of
   itsdangerous is used, we need to switch to another library.
   
   We are already using PyJWT so the choice is obvious.
   
   Additionally to switching, the following improvements were done:
   
   * the use of JWT claims has been fixed to follow JWT standard.
     We were using "iat" header wrongly. The specification of JWT only
     expects the header to be there and be valid UTC timestamp, but the
     claim does not impact maturity of the signature - the signature
     is valid if iat is in the future.
     Instead "nbf" - "not before" claim should be used to verify if the
     request is not coming from the future. We now require all claims
     to be present in the request.
   
   * rather than using salt/signing_context we switched to standard
     JWT "audience" claim (same end result)
   
   * we have now much better diagnostics on the server side of the
     reason why request is forbidden - explicit error messages
     are printed in server logs and details of the exception. This
     is secure, we do not spill the information about the reason
     to the client, it's only available in server logs, so there is
     no risk attacker could use it.
   
   * the JWTSigner is "use-agnostic". We should be able to use the
     same class for any other signatures (Internal API from AIP-44)
     with just different audience
   
   * Short, 5 seconds default clock skew is allowed, to account for
     systems that have "almost" synchronized time
   
   * more tests addded with proper time freezing testing both
     expiry and immaturity of the request
   
   This change is not a breaking one because the JWT authentication
   details are not "public API" - but in case someone reverse engineered
   our claims and implemented their own log file retrieval, we
   should add a change in our changelog - therefore newsfragment
   is added.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragement file, named `{pr_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24519:
URL: https://github.com/apache/airflow/pull/24519#issuecomment-1159439234

   BTW. Interesting dicussion here https://bugs.python.org/issue46200


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24519:
URL: https://github.com/apache/airflow/pull/24519#discussion_r900793627


##########
airflow/utils/serve_logs.py:
##########
@@ -16,55 +16,87 @@
 # under the License.
 
 """Serve logs process"""
+import logging
 import os
-import time
 
 import gunicorn.app.base
 from flask import Flask, abort, request, send_from_directory
-from itsdangerous import TimedJSONWebSignatureSerializer
+from jwt.exceptions import (
+    ExpiredSignatureError,
+    ImmatureSignatureError,
+    InvalidAudienceError,
+    InvalidIssuedAtError,
+    InvalidSignatureError,
+)
 from setproctitle import setproctitle
 
 from airflow.configuration import conf
+from airflow.utils.docs import get_docs_url
+from airflow.utils.jwt_signer import JWTSigner
+
+logger = logging.getLogger(__name__)
 
 
 def create_app():
     flask_app = Flask(__name__, static_folder=None)
-    max_request_age = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
+    expiration_time_in_seconds = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
     log_directory = os.path.expanduser(conf.get('logging', 'BASE_LOG_FOLDER'))
 
-    signer = TimedJSONWebSignatureSerializer(
+    signer = JWTSigner(
         secret_key=conf.get('webserver', 'secret_key'),
-        algorithm_name='HS512',
-        expires_in=max_request_age,
-        # This isn't really a "salt", more of a signing context
-        salt='task-instance-logs',
+        expiration_time_in_seconds=expiration_time_in_seconds,
+        audience="task-instance-logs",
     )
 
     # Prevent direct access to the logs port
     @flask_app.before_request
     def validate_pre_signed_url():
         try:
-            auth = request.headers['Authorization']
-
-            # We don't actually care about the payload, just that the signature
-            # was valid and the `exp` claim is correct
-            filename, headers = signer.loads(auth, return_header=True)
-
-            issued_at = int(headers['iat'])
-            expires_at = int(headers['exp'])
-        except Exception:
+            auth = request.headers.get('Authorization')
+            if auth is None:
+                logger.warning(f"The Authorization header is missing: {request.headers}")

Review Comment:
   Hmmm. That made me think actually - since in production we always print warnings as they come - WHY do we not want to use f-strings in warnings? I understand in debug logs for sure, and possibly (here not certain) in info. But there is no benefit of switching to %s in warnings, because they will be anyhow rended always when printed. 
   
   Do you know some sources that tell otherwise @mik-laj ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24519:
URL: https://github.com/apache/airflow/pull/24519#issuecomment-1159414691

   I also added `algorithm` constructor argument, as we will likely use different algorithm (symmetric RS) in the future for Internal API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mik-laj commented on a diff in pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
mik-laj commented on code in PR #24519:
URL: https://github.com/apache/airflow/pull/24519#discussion_r900790476


##########
airflow/utils/serve_logs.py:
##########
@@ -16,55 +16,87 @@
 # under the License.
 
 """Serve logs process"""
+import logging
 import os
-import time
 
 import gunicorn.app.base
 from flask import Flask, abort, request, send_from_directory
-from itsdangerous import TimedJSONWebSignatureSerializer
+from jwt.exceptions import (
+    ExpiredSignatureError,
+    ImmatureSignatureError,
+    InvalidAudienceError,
+    InvalidIssuedAtError,
+    InvalidSignatureError,
+)
 from setproctitle import setproctitle
 
 from airflow.configuration import conf
+from airflow.utils.docs import get_docs_url
+from airflow.utils.jwt_signer import JWTSigner
+
+logger = logging.getLogger(__name__)
 
 
 def create_app():
     flask_app = Flask(__name__, static_folder=None)
-    max_request_age = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
+    expiration_time_in_seconds = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
     log_directory = os.path.expanduser(conf.get('logging', 'BASE_LOG_FOLDER'))
 
-    signer = TimedJSONWebSignatureSerializer(
+    signer = JWTSigner(
         secret_key=conf.get('webserver', 'secret_key'),
-        algorithm_name='HS512',
-        expires_in=max_request_age,
-        # This isn't really a "salt", more of a signing context
-        salt='task-instance-logs',
+        expiration_time_in_seconds=expiration_time_in_seconds,
+        audience="task-instance-logs",
     )
 
     # Prevent direct access to the logs port
     @flask_app.before_request
     def validate_pre_signed_url():
         try:
-            auth = request.headers['Authorization']
-
-            # We don't actually care about the payload, just that the signature
-            # was valid and the `exp` claim is correct
-            filename, headers = signer.loads(auth, return_header=True)
-
-            issued_at = int(headers['iat'])
-            expires_at = int(headers['exp'])
-        except Exception:
+            auth = request.headers.get('Authorization')
+            if auth is None:
+                logger.warning(f"The Authorization header is missing: {request.headers}")
+                abort(403)
+            payload = signer.verify_token(auth)
+            token_filename = payload.get("filename")
+            request_filename = request.view_args['filename']
+            if token_filename is None:
+                logger.warning("The payload does not contain 'filename' key: %s.", payload)
+                abort(403)
+            if token_filename != request_filename:
+                logger.warning(

Review Comment:
   F-strings in logger. 



##########
airflow/utils/serve_logs.py:
##########
@@ -16,55 +16,87 @@
 # under the License.
 
 """Serve logs process"""
+import logging
 import os
-import time
 
 import gunicorn.app.base
 from flask import Flask, abort, request, send_from_directory
-from itsdangerous import TimedJSONWebSignatureSerializer
+from jwt.exceptions import (
+    ExpiredSignatureError,
+    ImmatureSignatureError,
+    InvalidAudienceError,
+    InvalidIssuedAtError,
+    InvalidSignatureError,
+)
 from setproctitle import setproctitle
 
 from airflow.configuration import conf
+from airflow.utils.docs import get_docs_url
+from airflow.utils.jwt_signer import JWTSigner
+
+logger = logging.getLogger(__name__)
 
 
 def create_app():
     flask_app = Flask(__name__, static_folder=None)
-    max_request_age = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
+    expiration_time_in_seconds = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
     log_directory = os.path.expanduser(conf.get('logging', 'BASE_LOG_FOLDER'))
 
-    signer = TimedJSONWebSignatureSerializer(
+    signer = JWTSigner(
         secret_key=conf.get('webserver', 'secret_key'),
-        algorithm_name='HS512',
-        expires_in=max_request_age,
-        # This isn't really a "salt", more of a signing context
-        salt='task-instance-logs',
+        expiration_time_in_seconds=expiration_time_in_seconds,
+        audience="task-instance-logs",
     )
 
     # Prevent direct access to the logs port
     @flask_app.before_request
     def validate_pre_signed_url():
         try:
-            auth = request.headers['Authorization']
-
-            # We don't actually care about the payload, just that the signature
-            # was valid and the `exp` claim is correct
-            filename, headers = signer.loads(auth, return_header=True)
-
-            issued_at = int(headers['iat'])
-            expires_at = int(headers['exp'])
-        except Exception:
+            auth = request.headers.get('Authorization')
+            if auth is None:
+                logger.warning(f"The Authorization header is missing: {request.headers}")

Review Comment:
   F-strings in loggers. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24519:
URL: https://github.com/apache/airflow/pull/24519#discussion_r900782570


##########
airflow/utils/serve_logs.py:
##########
@@ -16,55 +16,86 @@
 # under the License.
 
 """Serve logs process"""
+import logging
 import os
-import time
 
 import gunicorn.app.base
 from flask import Flask, abort, request, send_from_directory
-from itsdangerous import TimedJSONWebSignatureSerializer
+from jwt.exceptions import (
+    ExpiredSignatureError,
+    ImmatureSignatureError,
+    InvalidAudienceError,
+    InvalidIssuedAtError,
+    InvalidSignatureError,
+)
 from setproctitle import setproctitle
 
 from airflow.configuration import conf
+from airflow.utils.jwt_signer import JWTSigner
+
+logger = logging.getLogger(__name__)
 
 
 def create_app():
     flask_app = Flask(__name__, static_folder=None)
-    max_request_age = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
+    expiration_time_in_seconds = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
     log_directory = os.path.expanduser(conf.get('logging', 'BASE_LOG_FOLDER'))
 
-    signer = TimedJSONWebSignatureSerializer(
+    signer = JWTSigner(
         secret_key=conf.get('webserver', 'secret_key'),
-        algorithm_name='HS512',
-        expires_in=max_request_age,
-        # This isn't really a "salt", more of a signing context
-        salt='task-instance-logs',
+        expiration_time_in_seconds=expiration_time_in_seconds,
+        audience="task-instance-logs",
     )
 
     # Prevent direct access to the logs port
     @flask_app.before_request
     def validate_pre_signed_url():
         try:
-            auth = request.headers['Authorization']
-
-            # We don't actually care about the payload, just that the signature
-            # was valid and the `exp` claim is correct
-            filename, headers = signer.loads(auth, return_header=True)
-
-            issued_at = int(headers['iat'])
-            expires_at = int(headers['exp'])
-        except Exception:
+            auth = request.headers.get('Authorization')
+            if auth is None:
+                logger.warning(f"The Authorization header is missing: {request.headers}")
+                abort(403)
+            payload = signer.verify_token(auth)
+            token_filename = payload.get("filename")
+            request_filename = request.view_args['filename']
+            if token_filename is None:
+                logger.warning(f"The payload does not contain 'filename' key: {payload}")
+                abort(403)
+            if token_filename != request_filename:
+                logger.warning(
+                    f"The payload log_relative_path key is different than the one in token:"
+                    f"Request path: {request_filename}. Token path: {token_filename}"
+                )
+                abort(403)
+        except InvalidAudienceError:
+            logger.warning("Invalid audience for the request", exc_info=True)
             abort(403)
-
-        if filename != request.view_args['filename']:
+        except InvalidSignatureError:
+            logger.warning("The signature of the request was wrong", exc_info=True)
             abort(403)
-
-        # Validate the `iat` and `exp` are within `max_request_age` of now.
-        now = int(time.time())
-        if abs(now - issued_at) > max_request_age:
+        except ImmatureSignatureError:
+            logger.warning("The signature of the request was sent from the future", exc_info=True)
             abort(403)
-        if abs(now - expires_at) > max_request_age:
+        except ExpiredSignatureError:
+            logger.warning(
+                "The signature of the request has expired. Make sure that all components "
+                "in your system have synchronized clocks. "
+                "See more at https://airflow.apache.org/docs/apache-airflow/"

Review Comment:
   Ah. I keep on forgetting we have it :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24519:
URL: https://github.com/apache/airflow/pull/24519#discussion_r900796932


##########
airflow/utils/serve_logs.py:
##########
@@ -16,55 +16,87 @@
 # under the License.
 
 """Serve logs process"""
+import logging
 import os
-import time
 
 import gunicorn.app.base
 from flask import Flask, abort, request, send_from_directory
-from itsdangerous import TimedJSONWebSignatureSerializer
+from jwt.exceptions import (
+    ExpiredSignatureError,
+    ImmatureSignatureError,
+    InvalidAudienceError,
+    InvalidIssuedAtError,
+    InvalidSignatureError,
+)
 from setproctitle import setproctitle
 
 from airflow.configuration import conf
+from airflow.utils.docs import get_docs_url
+from airflow.utils.jwt_signer import JWTSigner
+
+logger = logging.getLogger(__name__)
 
 
 def create_app():
     flask_app = Flask(__name__, static_folder=None)
-    max_request_age = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
+    expiration_time_in_seconds = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
     log_directory = os.path.expanduser(conf.get('logging', 'BASE_LOG_FOLDER'))
 
-    signer = TimedJSONWebSignatureSerializer(
+    signer = JWTSigner(
         secret_key=conf.get('webserver', 'secret_key'),
-        algorithm_name='HS512',
-        expires_in=max_request_age,
-        # This isn't really a "salt", more of a signing context
-        salt='task-instance-logs',
+        expiration_time_in_seconds=expiration_time_in_seconds,
+        audience="task-instance-logs",
     )
 
     # Prevent direct access to the logs port
     @flask_app.before_request
     def validate_pre_signed_url():
         try:
-            auth = request.headers['Authorization']
-
-            # We don't actually care about the payload, just that the signature
-            # was valid and the `exp` claim is correct
-            filename, headers = signer.loads(auth, return_header=True)
-
-            issued_at = int(headers['iat'])
-            expires_at = int(headers['exp'])
-        except Exception:
+            auth = request.headers.get('Authorization')
+            if auth is None:
+                logger.warning(f"The Authorization header is missing: {request.headers}")

Review Comment:
   Ok. I answered myself. Actually, this is also better for any external logging systems (sentry etc.) as they might for example group the same message based on messages which are the same, or it could be useful in cases lije our secret masker.
   
   BTW. I really wish in the future somethng like warnigf, infof will get into the standard library: https://discuss.python.org/t/safer-logging-methods-for-f-strings-and-new-style-formatting/13802 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24519:
URL: https://github.com/apache/airflow/pull/24519#issuecomment-1159553963

   All should be cool @mik-laj :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24519:
URL: https://github.com/apache/airflow/pull/24519#issuecomment-1159438626

   Right. Fixed and reviewed all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24519:
URL: https://github.com/apache/airflow/pull/24519#discussion_r900793627


##########
airflow/utils/serve_logs.py:
##########
@@ -16,55 +16,87 @@
 # under the License.
 
 """Serve logs process"""
+import logging
 import os
-import time
 
 import gunicorn.app.base
 from flask import Flask, abort, request, send_from_directory
-from itsdangerous import TimedJSONWebSignatureSerializer
+from jwt.exceptions import (
+    ExpiredSignatureError,
+    ImmatureSignatureError,
+    InvalidAudienceError,
+    InvalidIssuedAtError,
+    InvalidSignatureError,
+)
 from setproctitle import setproctitle
 
 from airflow.configuration import conf
+from airflow.utils.docs import get_docs_url
+from airflow.utils.jwt_signer import JWTSigner
+
+logger = logging.getLogger(__name__)
 
 
 def create_app():
     flask_app = Flask(__name__, static_folder=None)
-    max_request_age = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
+    expiration_time_in_seconds = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
     log_directory = os.path.expanduser(conf.get('logging', 'BASE_LOG_FOLDER'))
 
-    signer = TimedJSONWebSignatureSerializer(
+    signer = JWTSigner(
         secret_key=conf.get('webserver', 'secret_key'),
-        algorithm_name='HS512',
-        expires_in=max_request_age,
-        # This isn't really a "salt", more of a signing context
-        salt='task-instance-logs',
+        expiration_time_in_seconds=expiration_time_in_seconds,
+        audience="task-instance-logs",
     )
 
     # Prevent direct access to the logs port
     @flask_app.before_request
     def validate_pre_signed_url():
         try:
-            auth = request.headers['Authorization']
-
-            # We don't actually care about the payload, just that the signature
-            # was valid and the `exp` claim is correct
-            filename, headers = signer.loads(auth, return_header=True)
-
-            issued_at = int(headers['iat'])
-            expires_at = int(headers['exp'])
-        except Exception:
+            auth = request.headers.get('Authorization')
+            if auth is None:
+                logger.warning(f"The Authorization header is missing: {request.headers}")

Review Comment:
   Hmmm. That made me think actually - since in production we always print warnings as they come - WHY do we not want to use f-strings in warnings? I understand in debug logs for sure, and possibly (here not certain) in info. But there is no benefit of switching to %s in warnings, because they will be anyhow render string always when printed. 
   
   Do you know some sources that tell otherwise @mik-laj ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24519:
URL: https://github.com/apache/airflow/pull/24519#issuecomment-1159413355

   All fixed @mik-laj 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #24519:
URL: https://github.com/apache/airflow/pull/24519


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24519:
URL: https://github.com/apache/airflow/pull/24519#issuecomment-1158721449

   This is first change from (likely) a series of changes that will bring us closer to migrating to FAB 4+.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mik-laj commented on a diff in pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
mik-laj commented on code in PR #24519:
URL: https://github.com/apache/airflow/pull/24519#discussion_r900742043


##########
airflow/utils/jwt_signer.py:
##########
@@ -0,0 +1,82 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import logging
+from datetime import datetime, timedelta
+from typing import Any, Dict
+
+import jwt
+
+logger = logging.getLogger(__name__)

Review Comment:
   Is it needed?



##########
airflow/utils/serve_logs.py:
##########
@@ -16,55 +16,86 @@
 # under the License.
 
 """Serve logs process"""
+import logging
 import os
-import time
 
 import gunicorn.app.base
 from flask import Flask, abort, request, send_from_directory
-from itsdangerous import TimedJSONWebSignatureSerializer
+from jwt.exceptions import (
+    ExpiredSignatureError,
+    ImmatureSignatureError,
+    InvalidAudienceError,
+    InvalidIssuedAtError,
+    InvalidSignatureError,
+)
 from setproctitle import setproctitle
 
 from airflow.configuration import conf
+from airflow.utils.jwt_signer import JWTSigner
+
+logger = logging.getLogger(__name__)
 
 
 def create_app():
     flask_app = Flask(__name__, static_folder=None)
-    max_request_age = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
+    expiration_time_in_seconds = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
     log_directory = os.path.expanduser(conf.get('logging', 'BASE_LOG_FOLDER'))
 
-    signer = TimedJSONWebSignatureSerializer(
+    signer = JWTSigner(
         secret_key=conf.get('webserver', 'secret_key'),
-        algorithm_name='HS512',
-        expires_in=max_request_age,
-        # This isn't really a "salt", more of a signing context
-        salt='task-instance-logs',
+        expiration_time_in_seconds=expiration_time_in_seconds,
+        audience="task-instance-logs",
     )
 
     # Prevent direct access to the logs port
     @flask_app.before_request
     def validate_pre_signed_url():
         try:
-            auth = request.headers['Authorization']
-
-            # We don't actually care about the payload, just that the signature
-            # was valid and the `exp` claim is correct
-            filename, headers = signer.loads(auth, return_header=True)
-
-            issued_at = int(headers['iat'])
-            expires_at = int(headers['exp'])
-        except Exception:
+            auth = request.headers.get('Authorization')
+            if auth is None:
+                logger.warning(f"The Authorization header is missing: {request.headers}")
+                abort(403)
+            payload = signer.verify_token(auth)
+            token_filename = payload.get("filename")
+            request_filename = request.view_args['filename']
+            if token_filename is None:
+                logger.warning(f"The payload does not contain 'filename' key: {payload}")
+                abort(403)
+            if token_filename != request_filename:
+                logger.warning(
+                    f"The payload log_relative_path key is different than the one in token:"
+                    f"Request path: {request_filename}. Token path: {token_filename}"
+                )
+                abort(403)
+        except InvalidAudienceError:
+            logger.warning("Invalid audience for the request", exc_info=True)
             abort(403)
-
-        if filename != request.view_args['filename']:
+        except InvalidSignatureError:
+            logger.warning("The signature of the request was wrong", exc_info=True)
             abort(403)
-
-        # Validate the `iat` and `exp` are within `max_request_age` of now.
-        now = int(time.time())
-        if abs(now - issued_at) > max_request_age:
+        except ImmatureSignatureError:
+            logger.warning("The signature of the request was sent from the future", exc_info=True)
             abort(403)
-        if abs(now - expires_at) > max_request_age:
+        except ExpiredSignatureError:
+            logger.warning(
+                "The signature of the request has expired. Make sure that all components "
+                "in your system have synchronized clocks. "
+                "See more at https://airflow.apache.org/docs/apache-airflow/"

Review Comment:
   Can you use get_docs_url function?
   https://github.com/apache/airflow/blob/b692517ce3aafb276e9d23570e9734c30a5f3d1f/airflow/utils/docs.py#L28



##########
airflow/utils/serve_logs.py:
##########
@@ -16,55 +16,86 @@
 # under the License.
 
 """Serve logs process"""
+import logging
 import os
-import time
 
 import gunicorn.app.base
 from flask import Flask, abort, request, send_from_directory
-from itsdangerous import TimedJSONWebSignatureSerializer
+from jwt.exceptions import (
+    ExpiredSignatureError,
+    ImmatureSignatureError,
+    InvalidAudienceError,
+    InvalidIssuedAtError,
+    InvalidSignatureError,
+)
 from setproctitle import setproctitle
 
 from airflow.configuration import conf
+from airflow.utils.jwt_signer import JWTSigner
+
+logger = logging.getLogger(__name__)
 
 
 def create_app():
     flask_app = Flask(__name__, static_folder=None)
-    max_request_age = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
+    expiration_time_in_seconds = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
     log_directory = os.path.expanduser(conf.get('logging', 'BASE_LOG_FOLDER'))
 
-    signer = TimedJSONWebSignatureSerializer(
+    signer = JWTSigner(
         secret_key=conf.get('webserver', 'secret_key'),
-        algorithm_name='HS512',
-        expires_in=max_request_age,
-        # This isn't really a "salt", more of a signing context
-        salt='task-instance-logs',
+        expiration_time_in_seconds=expiration_time_in_seconds,
+        audience="task-instance-logs",
     )
 
     # Prevent direct access to the logs port
     @flask_app.before_request
     def validate_pre_signed_url():
         try:
-            auth = request.headers['Authorization']
-
-            # We don't actually care about the payload, just that the signature
-            # was valid and the `exp` claim is correct
-            filename, headers = signer.loads(auth, return_header=True)
-
-            issued_at = int(headers['iat'])
-            expires_at = int(headers['exp'])
-        except Exception:
+            auth = request.headers.get('Authorization')
+            if auth is None:
+                logger.warning(f"The Authorization header is missing: {request.headers}")
+                abort(403)
+            payload = signer.verify_token(auth)
+            token_filename = payload.get("filename")
+            request_filename = request.view_args['filename']
+            if token_filename is None:
+                logger.warning(f"The payload does not contain 'filename' key: {payload}")

Review Comment:
   Please don't use f-strings in loggers. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24519:
URL: https://github.com/apache/airflow/pull/24519#discussion_r900780424


##########
airflow/utils/serve_logs.py:
##########
@@ -16,55 +16,86 @@
 # under the License.
 
 """Serve logs process"""
+import logging
 import os
-import time
 
 import gunicorn.app.base
 from flask import Flask, abort, request, send_from_directory
-from itsdangerous import TimedJSONWebSignatureSerializer
+from jwt.exceptions import (
+    ExpiredSignatureError,
+    ImmatureSignatureError,
+    InvalidAudienceError,
+    InvalidIssuedAtError,
+    InvalidSignatureError,
+)
 from setproctitle import setproctitle
 
 from airflow.configuration import conf
+from airflow.utils.jwt_signer import JWTSigner
+
+logger = logging.getLogger(__name__)
 
 
 def create_app():
     flask_app = Flask(__name__, static_folder=None)
-    max_request_age = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
+    expiration_time_in_seconds = conf.getint('webserver', 'log_request_clock_grace', fallback=30)
     log_directory = os.path.expanduser(conf.get('logging', 'BASE_LOG_FOLDER'))
 
-    signer = TimedJSONWebSignatureSerializer(
+    signer = JWTSigner(
         secret_key=conf.get('webserver', 'secret_key'),
-        algorithm_name='HS512',
-        expires_in=max_request_age,
-        # This isn't really a "salt", more of a signing context
-        salt='task-instance-logs',
+        expiration_time_in_seconds=expiration_time_in_seconds,
+        audience="task-instance-logs",
     )
 
     # Prevent direct access to the logs port
     @flask_app.before_request
     def validate_pre_signed_url():
         try:
-            auth = request.headers['Authorization']
-
-            # We don't actually care about the payload, just that the signature
-            # was valid and the `exp` claim is correct
-            filename, headers = signer.loads(auth, return_header=True)
-
-            issued_at = int(headers['iat'])
-            expires_at = int(headers['exp'])
-        except Exception:
+            auth = request.headers.get('Authorization')
+            if auth is None:
+                logger.warning(f"The Authorization header is missing: {request.headers}")
+                abort(403)
+            payload = signer.verify_token(auth)
+            token_filename = payload.get("filename")
+            request_filename = request.view_args['filename']
+            if token_filename is None:
+                logger.warning(f"The payload does not contain 'filename' key: {payload}")

Review Comment:
   Right



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24519:
URL: https://github.com/apache/airflow/pull/24519#discussion_r900780330


##########
airflow/utils/jwt_signer.py:
##########
@@ -0,0 +1,82 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import logging
+from datetime import datetime, timedelta
+from typing import Any, Dict
+
+import jwt
+
+logger = logging.getLogger(__name__)

Review Comment:
   Good point. I got rid of logging from the signer so we can remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #24519: Get rid of TimedJSONWebSignatureSerializer

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24519:
URL: https://github.com/apache/airflow/pull/24519#issuecomment-1159563508

   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org