You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/11 08:21:59 UTC

[GitHub] [airflow] itayB opened a new pull request #20801: changing execution_date to run_id

itayB opened a new pull request #20801:
URL: https://github.com/apache/airflow/pull/20801


   related: [#19593](https://github.com/apache/airflow/pull/19593/files)
   
   Align test of the configuration files to work with `run_id` (rather than `execution_date`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #20801: changing execution_date to run_id

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #20801:
URL: https://github.com/apache/airflow/pull/20801#issuecomment-1010655404


   I read this early because we are planning to change the log filename config in 2.3 (to accomodate AIP-42). Probably want to follow #19625 for this as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itayB edited a comment on pull request #20801: changing execution_date to run_id

Posted by GitBox <gi...@apache.org>.
itayB edited a comment on pull request #20801:
URL: https://github.com/apache/airflow/pull/20801#issuecomment-1010995914


   I'm just not sure if it's a bug or misconfiguration.
   I'm working with remote logging and Kubernetes executor.
   Airflow webserver's log page is still sending log [requests](https://github.com/apache/airflow/blob/main/airflow/www/views.py#L1292) with `execution_date` rather than `run_id`. Inspection via browser show:
   ```
   https://airflow-audience-webserver.use1.dynamicyield.com/get_logs_with_metadata?dag_id=sparkjobs-8764248&task_id=daily&execution_date=2022-01-11T10:50:00+00:00&try_number=1&metadata={"end_of_log":false,"last_log_timestamp":"2022-01-12T11:48:42.713060+00:00","offset":"0"}
   ```
   and in the web-server side:
   ```
   10.1.113.221 - - [12/Jan/2022:12:05:06 +0000] "GET /get_logs_with_metadata?dag_id=sparkjobs-8764248&task_id=daily&execution_date=2022-01-11T10%3A50%3A00%2B00%3A00&try_number=1&metadata=%7B%22end_of_log%22%3Afalse%2C%22last_log_timestamp%22%3A%222022-01-12T12%3A05%3A03.678921%2B00%3A00%22%2C%22offset%22%3A%220%22%7D HTTP/1.1" 200 116 "https://airflow-audience-webserver.dev-use1.dynamicyield.com/log?dag_id=sparkjobs-8764248&task_id=daily&execution_date=2022-01-11T10%3A50%3A00%2B00%3A00" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
   ```
   
   On the other hand, `execution_date` info was dropped from (Kubernetes) executors - it was marked as a label in the pod before v2.2.x. `run_id` label was added instead.
   
   - Is there a bug in the web client that doesn't contain/send `run_id` in the log requests?
   - Is there a configuration change to make it send the new `run_id`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr edited a comment on pull request #20801: changing execution_date to run_id

Posted by GitBox <gi...@apache.org>.
uranusjr edited a comment on pull request #20801:
URL: https://github.com/apache/airflow/pull/20801#issuecomment-1010655404


   I read this early because we are planning to change the log filename config in 2.3 (to accomodate AIP-42). Probably want to follow #19625 for this as well.
   
   Regarding the change, the update note added in #19593 targets people accessing the Airflow database directly (as in using SQL or the SQLAlchemy ORM), not using the `execution_date` value in general. That value is provided by an additional backward compatibility abstraction and doesn’t need to change. We will probably still change it at some point because using `execution_date` to identify DAG runs is generally not a good idea, but it’s not necessary (yet) so we’re taking it slow to be safe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itayB commented on pull request #20801: changing execution_date to run_id

Posted by GitBox <gi...@apache.org>.
itayB commented on pull request #20801:
URL: https://github.com/apache/airflow/pull/20801#issuecomment-1017867594


   I fixed my issue by moving to use the logs in JSON format (which was [challenging](https://stackoverflow.com/questions/70787712/airflow-wrong-log-id-format) as well), instead of trying to build the `log_id` in myself (as some of the information is missing now) and thus - I'm closing this PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #20801: changing execution_date to run_id

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #20801:
URL: https://github.com/apache/airflow/pull/20801#issuecomment-1010164042


   On second though, maybe you can? The configs are written out to a plain file on _installation time_, so the change only affects new installations; old installations would continue to use whatever value they have in their configs and work as expected.
   
   There's another problem I just thought of about using `run_id` though. A run ID can contain (literally) any characters, and not all characters are safe to be used as a filename (e.g. things would probably not work well if `/` is used). So we at least need some additional escaping logic to make the string filename-safe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itayB commented on pull request #20801: changing execution_date to run_id

Posted by GitBox <gi...@apache.org>.
itayB commented on pull request #20801:
URL: https://github.com/apache/airflow/pull/20801#issuecomment-1010995914


   I'm just not sure if it's a bug or misconfiguration.
   I'm working with remote logging and Kubernetes executor.
   Airflow webserver's log page is still sending log requests with `execution_date` rather than `run_id`. Inspection via browser show:
   ```
   https://airflow-audience-webserver.use1.dynamicyield.com/get_logs_with_metadata?dag_id=sparkjobs-8764248&task_id=daily&execution_date=2022-01-11T10:50:00+00:00&try_number=1&metadata={"end_of_log":false,"last_log_timestamp":"2022-01-12T11:48:42.713060+00:00","offset":"0"}
   ```
   and in the web-server side:
   ```
   10.1.113.221 - - [12/Jan/2022:12:05:06 +0000] "GET /get_logs_with_metadata?dag_id=sparkjobs-8764248&task_id=daily&execution_date=2022-01-11T10%3A50%3A00%2B00%3A00&try_number=1&metadata=%7B%22end_of_log%22%3Afalse%2C%22last_log_timestamp%22%3A%222022-01-12T12%3A05%3A03.678921%2B00%3A00%22%2C%22offset%22%3A%220%22%7D HTTP/1.1" 200 116 "https://airflow-audience-webserver.dev-use1.dynamicyield.com/log?dag_id=sparkjobs-8764248&task_id=daily&execution_date=2022-01-11T10%3A50%3A00%2B00%3A00" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
   ```
   
   On the other hand, `execution_date` info was dropped from (Kubernetes) executors - it was marked as a label in the pod before v2.2.x. `run_id` label was added instead.
   
   - Is there a bug in the web client that doesn't contain/send `run_id` in the log requests?
   - Is there a configuration change to make it send the new `run_id`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itayB closed pull request #20801: changing execution_date to run_id

Posted by GitBox <gi...@apache.org>.
itayB closed pull request #20801:
URL: https://github.com/apache/airflow/pull/20801


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itayB edited a comment on pull request #20801: changing execution_date to run_id

Posted by GitBox <gi...@apache.org>.
itayB edited a comment on pull request #20801:
URL: https://github.com/apache/airflow/pull/20801#issuecomment-1010995914


   I'm just not sure if it's a bug or misconfiguration.
   I'm working with remote logging and Kubernetes executor.
   Airflow webserver's log page is still sending log [requests](https://github.com/apache/airflow/blob/main/airflow/www/views.py#L1292) with `execution_date` rather than `run_id`. Inspection via browser show:
   ```
   https://airflow-audience-webserver.use1.dynamicyield.com/get_logs_with_metadata?dag_id=sparkjobs-8764248&task_id=daily&execution_date=2022-01-11T10:50:00+00:00&try_number=1&metadata={"end_of_log":false,"last_log_timestamp":"2022-01-12T11:48:42.713060+00:00","offset":"0"}
   ```
   and in the web-server side:
   ```
   10.1.113.221 - - [12/Jan/2022:12:05:06 +0000] "GET /get_logs_with_metadata?dag_id=sparkjobs-8764248&task_id=daily&execution_date=2022-01-11T10%3A50%3A00%2B00%3A00&try_number=1&metadata=%7B%22end_of_log%22%3Afalse%2C%22last_log_timestamp%22%3A%222022-01-12T12%3A05%3A03.678921%2B00%3A00%22%2C%22offset%22%3A%220%22%7D HTTP/1.1" 200 116 "https://airflow-audience-webserver.dev-use1.dynamicyield.com/log?dag_id=sparkjobs-8764248&task_id=daily&execution_date=2022-01-11T10%3A50%3A00%2B00%3A00" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
   ```
   
   On the other hand, `execution_date` info was dropped from (Kubernetes) executors - it was marked as a label in the pod before v2.2.x. `run_id` label was added instead.
   Bottom line, there's no way to retrieve remote logs to Airflow's UI
   
   - Is there a bug in the web client that doesn't contain/send `run_id` in the log requests?
   - Is there a configuration change to make it send the new `run_id`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #20801: changing execution_date to run_id

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #20801:
URL: https://github.com/apache/airflow/pull/20801#issuecomment-1010150283


   Unfortunately you can't just do this because it'd basically break all log file lookup in all existing Airflow installations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itayB commented on pull request #20801: changing execution_date to run_id

Posted by GitBox <gi...@apache.org>.
itayB commented on pull request #20801:
URL: https://github.com/apache/airflow/pull/20801#issuecomment-1010253719


   thanks @uranusjr!
   I opened this PR as draft, hoping no one would review it yet, but thanks for taking a look :)
   I've noticed that the `execution_date` label was removed from the Kubernetes executor pod in v2.2.x and as a result I lost my remote logs (via Airflow UI). More info is available in my [SO](https://stackoverflow.com/questions/70658599/remote-logging-doesnt-work-after-upgrading-airflow-version) question.
   I saw that the [documentation](https://airflow.apache.org/docs/apache-airflow-providers-elasticsearch/stable/logging/index.html) changed 4 months ago, specifically [this line](https://github.com/apache/airflow/blame/main/docs/apache-airflow-providers-elasticsearch/logging/index.rst#L41), so I thought that the configuration files should be changed accordingly as well (but I'm still testing it locally).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org