You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/19 17:02:56 UTC

[GitHub] [airflow] potiuk opened a new pull request #21685: Add extra information about time synchronization needed

potiuk opened a new pull request #21685:
URL: https://github.com/apache/airflow/pull/21685


   When you have several machines running Airflow and their time
   is not synchronized, you might get very weird behaviour - some of
   the log retrieval actions might randomly return "forbidden" error
   because of expiring token. This is extremely difficult to
   diagnose and figure out (some of our users spent days on
   investigating that). Explicitly stating the requirement in the
   forbidden error and whenever secret_key parameter is
   mentioned, should help our users to diagnose it more easily
   (and save maintainers from unnecessary questions and discussions
   in Slack :))
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21685: Add extra information about time synchronization needed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21685:
URL: https://github.com/apache/airflow/pull/21685#issuecomment-1046066935


   Actually - thanks @eladkal ! I was thinking for a while about a new idea for talk / blog post and your comment gave me fantastic idea. I am going to do it :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #21685: Add extra information about time synchronization needed

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #21685:
URL: https://github.com/apache/airflow/pull/21685


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21685: Add extra information about time synchronization needed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21685:
URL: https://github.com/apache/airflow/pull/21685#issuecomment-1046064875


   And when you see mistakes like that, it's also super easy to do search-replace if the content is the same :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #21685: Add extra information about time synchronization needed

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #21685:
URL: https://github.com/apache/airflow/pull/21685#issuecomment-1046063591






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21685: Add extra information about time synchronization needed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21685:
URL: https://github.com/apache/airflow/pull/21685#issuecomment-1046066143


   Just one comment - tt's actually very similar reasoning why Tests shoudl be DAMP not DRY: https://testing.googleblog.com/2019/12/testing-on-toilet-tests-too-dry-make.html 
   
   I agree that also in docs  DRY is important but when faced with the dillema DRY or IMMEDIATELY HELPFUL WITHOUT FOLLOWING MORE LINKS (when you see warnings and watchouts) I favour the other. 
   
   BTW.  That gave me an idea - I need to come up with some catchy phrase like DAMP. I think this might be the next content of my blog post and presentation for the summit even :D.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #21685: Add extra information about time synchronization needed

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #21685:
URL: https://github.com/apache/airflow/pull/21685#issuecomment-1046106835


   For me documentation is not different that managing DWH. You need one source of truth otherwise more often people will consume wrong/non update information but I also agree with what you said. Duplication can sometimes be OK (for example in intro/quick guide <-> full extensive guide). Ideally we should have the ability to embed text snippets so the content is written only once and the reader doesn't have to click a link nor is even aware that the content is taken from somewhere else. Something to inspire to...
   
   > Actually - thanks @eladkal ! I was thinking for a while about a new idea for talk / blog post and your comment gave me fantastic idea. I am going to do it :)
   
   Waiting to read it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #21685: Add extra information about time synchronization needed

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #21685:
URL: https://github.com/apache/airflow/pull/21685#issuecomment-1046066143


   Just one comment - tt's actually very similar reasoning why Tests shoudl be DAMP rather than DRY (DAMP should be favoured): https://testing.googleblog.com/2019/12/testing-on-toilet-tests-too-dry-make.html 
   
   I agree that also in docs  DRY is important but when faced with the dillema DRY or IMMEDIATELY HELPFUL WITHOUT FOLLOWING MORE LINKS (when you see warnings and watchouts) I favour the other. 
   
   BTW.  That gave me an idea - I need to come up with some catchy phrase like DAMP. I think this might be the next content of my blog post and presentation for the summit even :D.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #21685: Add extra information about time synchronization needed

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #21685:
URL: https://github.com/apache/airflow/pull/21685#discussion_r810515314



##########
File path: airflow/config_templates/config.yml
##########
@@ -1085,6 +1085,10 @@
         Secret key used to run your flask app. It should be as random as possible. However, when running
         more than 1 instances of webserver, make sure all of them use the same ``secret_key`` otherwise
         one of them will error with "CSRF session token is missing".
+        The webserver key is also used to authorize requests to Celery workers when loga are retrieved.

Review comment:
       ```suggestion
           The webserver key is also used to authorize requests to Celery workers when logs are retrieved.
   ```

##########
File path: docs/helm-chart/production-guide.rst
##########
@@ -111,6 +111,11 @@ Example to create a Kubernetes Secret from ``kubectl``:
 
     kubectl create secret generic my-webserver-secret --from-literal="webserver-secret-key=$(python3 -c 'import secrets; print(secrets.token_hex(16))')"
 
+The webserver key is also used to authorize requests to Celery workers when loga are retrieved. The token

Review comment:
       ```suggestion
   The webserver key is also used to authorize requests to Celery workers when logs are retrieved. The token
   ```

##########
File path: docs/apache-airflow/howto/set-config.rst
##########
@@ -121,3 +121,8 @@ the example below.
     does not require all, some configurations need to be same otherwise they would not
     work as expected. A good example for that is :ref:`secret_key<config:webserver__secret_key>` which
     should be same on the Webserver and Worker to allow Webserver to fetch logs from Worker.
+
+    The webserver key is also used to authorize requests to Celery workers when loga are retrieved. The token

Review comment:
       ```suggestion
       The webserver key is also used to authorize requests to Celery workers when logs are retrieved. The token
   ```

##########
File path: docs/apache-airflow/upgrading-from-1-10/index.rst
##########
@@ -339,6 +339,11 @@ the only supported UI.
     this via any configuration mechanism. The 1.10.15 bridge-release modifies this feature
     to use randomly generated secret keys instead of an insecure default and may break existing
     deployments that rely on the default.
+    The webserver key is also used to authorize requests to Celery workers when loga are retrieved. The token

Review comment:
       ```suggestion
       The webserver key is also used to authorize requests to Celery workers when logs are retrieved. The token
   ```

##########
File path: airflow/config_templates/default_airflow.cfg
##########
@@ -559,6 +559,10 @@ reload_on_plugin_change = False
 # Secret key used to run your flask app. It should be as random as possible. However, when running
 # more than 1 instances of webserver, make sure all of them use the same ``secret_key`` otherwise
 # one of them will error with "CSRF session token is missing".
+# The webserver key is also used to authorize requests to Celery workers when loga are retrieved.

Review comment:
       ```suggestion
   # The webserver key is also used to authorize requests to Celery workers when logs are retrieved.
   ```

##########
File path: docs/apache-airflow/configurations-ref.rst
##########
@@ -27,6 +27,11 @@ does not require all, some configurations need to be same otherwise they would n
 work as expected. A good example for that is :ref:`secret_key<config:webserver__secret_key>` which
 should be same on the Webserver and Worker to allow Webserver to fetch logs from Worker.
 
+The webserver key is also used to authorize requests to Celery workers when loga are retrieved. The token

Review comment:
       ```suggestion
   The webserver key is also used to authorize requests to Celery workers when logs are retrieved. The token
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21685: Add extra information about time synchronization needed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21685:
URL: https://github.com/apache/airflow/pull/21685#issuecomment-1046064722


   I deliberetely put repeated content in this case. The problem with content that is "linked" from many places is that most users will not follow "see also" or "look for details". I believe some level of redundancy in "watch outs" and warnings like this is not only good but necessary - especially that (unlike code) those warnings will not get refactored or updated after they are merged. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org