You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/26 13:19:00 UTC
[GitHub] [airflow] EricGao888 opened a new issue #21127: Fail to download task log if there are Chinese characters in dag_id
EricGao888 opened a new issue #21127:
URL: https://github.com/apache/airflow/issues/21127
### Apache Airflow version
main (development)
### What happened
If there are Chinese characters in dag_id of a dag, downloading logs of tasks which belong to the dag leads to 'Internal Server Error Page'
![image](https://user-images.githubusercontent.com/34905992/151167538-59898b5c-8978-4b76-b732-bdfeff2afba8.png)
![image](https://user-images.githubusercontent.com/34905992/151167566-bb3627db-20fc-4614-a4fb-02b8ba8607c4.png)
### What you expected to happen
Here's the webserver log related to the bug which standalone mode produced:
webserver | [2022-01-26 18:29:15 +0800] [48511] [ERROR] Error handling request /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1
webserver | Traceback (most recent call last):
webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 136, in handle
webserver | self.handle_request(listener, req, client, addr)
webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 185, in handle_request
webserver | resp.write(item)
webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 327, in write
webserver | self.send_headers()
webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 322, in send_headers
webserver | util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/util.py", line 565, in to_bytestring
webserver | return value.encode(encoding)
webserver | UnicodeEncodeError: 'latin-1' codec can't encode characters in position 161-162: ordinal not in range(256)
webserver | 127.0.0.1 - - [26/Jan/2022:18:29:15 +0800] "GET /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 HTTP/1.1" 500 0 "-" "-"
webserver | [2022-01-26 18:29:21 +0800] [48508] [ERROR] Error handling request /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1
webserver | Traceback (most recent call last):
webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 136, in handle
webserver | self.handle_request(listener, req, client, addr)
webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 185, in handle_request
webserver | resp.write(item)
webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 327, in write
webserver | self.send_headers()
webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 322, in send_headers
webserver | util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/util.py", line 565, in to_bytestring
webserver | return value.encode(encoding)
webserver | UnicodeEncodeError: 'latin-1' codec can't encode characters in position 161-162: ordinal not in range(256)
webserver | 127.0.0.1 - - [26/Jan/2022:18:29:21 +0800] "GET /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 HTTP/1.1" 500 0 "-" "-"
triggerer | [2022-01-26 18:29:43,927] {triggerer_job.py:250} INFO - 0 triggers currently running
### How to reproduce
* I've tested in airflow v2.2.0 with celery executor, airflow dev version with standalone mode and airflow v1.10.12 with celery executor. The bug existed in all three version I've tested.
* To reproduce, simply create a dag with some Chinese characters like '测试' as dag_id. After triggering the dag, try to download a log file of any task of the dag through tree view page or graph view page and you will get redirected to some 'Internal Server Error Page'.
### Operating System
macOS Catalina, CentOS 7
### Versions of Apache Airflow Providers
_No response_
### Deployment
Other
### Deployment details
_No response_
### Anything else
* Following the error log produced by websever, I checked `/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py` line 322 and saw `util.write(self.sock, util.to_bytestring(header_str, "latin-1"))`
* After changing `latin-1` to `utf-8`, the bug got fixed. The whole function is shown as following, the commented line is added by me.
* ```
def send_headers(self):
if self.headers_sent:
return
tosend = self.default_headers()
tosend.extend(["%s: %s\r\n" % (k, v) for k, v in self.headers])
header_str = "%s\r\n" % "".join(tosend)
util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
# util.write(self.sock, util.to_bytestring(header_str, "utf-8"))
self.headers_sent = True```
* However, `gunicorn/http/wsgi.py` is not part of airflow code, I haven't figured out how to fix this without changing this script. May I ask if there is a better way to fix it?
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045983659
I guess this might change in the future there is a good discussion in https://github.com/apache/airflow/issues/18010#issuecomment-912820115
Probably the idea of separating the id from the display name in the UI will happen in future releases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #21127:
URL: https://github.com/apache/airflow/issues/21127
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] aa3pankaj edited a comment on issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
aa3pankaj edited a comment on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045329791
As per below comment code, dag_id should be ASCII: https://github.com/apache/airflow/blob/78490f86bfab195cc98a7b700c26c24a1aa2eb17/airflow/models/dag.py#L200
@potiuk @ashb do we even support dag_id to be non-ascii? is this issue valid?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045750643
Yep. Actually you are right.
```
KEY_REGEX = re.compile(r'^[\w.-]+$')
GROUP_KEY_REGEX = re.compile(r'^[\w-]+$')
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1022189612
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1030913682
Feel free @aa3pankaj
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] aa3pankaj edited a comment on issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
aa3pankaj edited a comment on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045329791
As per https://github.com/apache/airflow/blob/78490f86bfab195cc98a7b700c26c24a1aa2eb17/airflow/models/dag.py#L200
dag_id should be ASCII
@potiuk @ashb do we even support dag_id to be non-ascii? is this issue valid?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] aa3pankaj edited a comment on issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
aa3pankaj edited a comment on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045329791
As per below comment, dag_id should be ASCII: https://github.com/apache/airflow/blob/78490f86bfab195cc98a7b700c26c24a1aa2eb17/airflow/models/dag.py#L200
@potiuk @ashb do we even support dag_id to be non-ascii? is this issue valid?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] aa3pankaj commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
aa3pankaj commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
@EricGao888
We can fix this from our code itself,
As we add attachment filename in the headers, gunicorn code tries to encode it with "latin-1" and fails (when it contains chinese chars),
So, before sending the response itself, we can do something like:
```
# attachment_filename = task_log_reader.render_log_filename(ti, try_number, session=session)
attachment_filename = urllib.parse.quote(task_log_reader.render_log_filename(ti, try_number, session=session))
```
**I can raise PR with this fix, you can assign this issue to me.**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] aa3pankaj commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
aa3pankaj commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045329791
As per https://github.com/apache/airflow/blob/78490f86bfab195cc98a7b700c26c24a1aa2eb17/airflow/models/dag.py#L200, dag_id should be ASCII
@potiuk @ashb do we even support dag_id to be non-ascii? is this issue valid?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] aa3pankaj edited a comment on issue #21127: Fail to download task log if there are Chinese characters in dag_id
Posted by GitBox <gi...@apache.org>.
aa3pankaj edited a comment on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1030673862
@EricGao888
We can fix this from our code itself,
As we add attachment filename in the headers, gunicorn code tries to encode it with "latin-1" and fails (when it contains chinese chars),
So, before sending the response itself, we can do something like:
```
# attachment_filename = task_log_reader.render_log_filename(ti, try_number, session=session)
attachment_filename = urllib.parse.quote(task_log_reader.render_log_filename(ti, try_number, session=session))
```
**I can raise PR with this fix, you can assign this issue to me.**
cc: @potiuk
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org