You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/26 13:19:00 UTC

[GitHub] [airflow] EricGao888 opened a new issue #21127: Fail to download task log if there are Chinese characters in dag_id

EricGao888 opened a new issue #21127:
URL: https://github.com/apache/airflow/issues/21127


   ### Apache Airflow version
   
   main (development)
   
   ### What happened
   
   If there are Chinese characters in dag_id of a dag, downloading logs of tasks which belong to the dag leads to 'Internal Server Error Page'
   ![image](https://user-images.githubusercontent.com/34905992/151167538-59898b5c-8978-4b76-b732-bdfeff2afba8.png)
   ![image](https://user-images.githubusercontent.com/34905992/151167566-bb3627db-20fc-4614-a4fb-02b8ba8607c4.png)
   
   
   
   ### What you expected to happen
   
   Here's the webserver log related to the bug which standalone mode produced:
   
   webserver | [2022-01-26 18:29:15 +0800] [48511] [ERROR] Error handling request /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1
    webserver | Traceback (most recent call last):
    webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 136, in handle
    webserver | self.handle_request(listener, req, client, addr)
    webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 185, in handle_request
    webserver | resp.write(item)
    webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 327, in write
    webserver | self.send_headers()
    webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 322, in send_headers
    webserver | util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
    webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/util.py", line 565, in to_bytestring
    webserver | return value.encode(encoding)
    webserver | UnicodeEncodeError: 'latin-1' codec can't encode characters in position 161-162: ordinal not in range(256)
    webserver | 127.0.0.1 - - [26/Jan/2022:18:29:15 +0800] "GET /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 HTTP/1.1" 500 0 "-" "-"
    webserver | [2022-01-26 18:29:21 +0800] [48508] [ERROR] Error handling request /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1
    webserver | Traceback (most recent call last):
    webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 136, in handle
    webserver | self.handle_request(listener, req, client, addr)
    webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 185, in handle_request
    webserver | resp.write(item)
    webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 327, in write
    webserver | self.send_headers()
    webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 322, in send_headers
    webserver | util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
    webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/util.py", line 565, in to_bytestring
    webserver | return value.encode(encoding)
    webserver | UnicodeEncodeError: 'latin-1' codec can't encode characters in position 161-162: ordinal not in range(256)
    webserver | 127.0.0.1 - - [26/Jan/2022:18:29:21 +0800] "GET /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 HTTP/1.1" 500 0 "-" "-"
    triggerer | [2022-01-26 18:29:43,927] {triggerer_job.py:250} INFO - 0 triggers currently running
   
   
   ### How to reproduce
   
   * I've tested in airflow v2.2.0 with celery executor, airflow dev version with standalone mode and airflow v1.10.12 with celery executor. The bug existed in all three version I've tested. 
   * To reproduce, simply create a dag with some Chinese characters like '测试' as dag_id. After triggering the dag, try to download a log file of any task of the dag through tree view page or graph view page and you will get redirected to some 'Internal Server Error Page'.
   
   ### Operating System
   
   macOS Catalina, CentOS 7
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   * Following the error log produced by websever, I checked `/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py` line 322 and saw `util.write(self.sock, util.to_bytestring(header_str, "latin-1"))`
   * After changing `latin-1` to `utf-8`, the bug got fixed. The whole function is shown as following, the commented line is added by me.
   * ```      
           def send_headers(self):
           if self.headers_sent:
               return
           tosend = self.default_headers()
           tosend.extend(["%s: %s\r\n" % (k, v) for k, v in self.headers])
   
           header_str = "%s\r\n" % "".join(tosend)
           util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
           # util.write(self.sock, util.to_bytestring(header_str, "utf-8"))
           self.headers_sent = True```
   * However, `gunicorn/http/wsgi.py` is not part of airflow code, I haven't figured out how to fix this without changing this script. May I ask if there is a better way to fix it?
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045983659


   I guess this might change in the future there is a good discussion in https://github.com/apache/airflow/issues/18010#issuecomment-912820115
   Probably the idea of separating the id from the display name in the UI will happen in future releases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #21127:
URL: https://github.com/apache/airflow/issues/21127


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aa3pankaj edited a comment on issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
aa3pankaj edited a comment on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045329791


   As per below comment code, dag_id should be ASCII: https://github.com/apache/airflow/blob/78490f86bfab195cc98a7b700c26c24a1aa2eb17/airflow/models/dag.py#L200
   
   @potiuk @ashb do we even support dag_id to be non-ascii? is this issue valid?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045750643


   Yep. Actually you are right. 
   
   ```
   KEY_REGEX = re.compile(r'^[\w.-]+$')
   GROUP_KEY_REGEX = re.compile(r'^[\w-]+$')
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1022189612


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1030913682


   Feel free @aa3pankaj 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aa3pankaj edited a comment on issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
aa3pankaj edited a comment on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045329791


   As per https://github.com/apache/airflow/blob/78490f86bfab195cc98a7b700c26c24a1aa2eb17/airflow/models/dag.py#L200
   dag_id should be ASCII
   
   @potiuk @ashb do we even support dag_id to be non-ascii? is this issue valid?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aa3pankaj edited a comment on issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
aa3pankaj edited a comment on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045329791


   As per below comment, dag_id should be ASCII: https://github.com/apache/airflow/blob/78490f86bfab195cc98a7b700c26c24a1aa2eb17/airflow/models/dag.py#L200
   
   @potiuk @ashb do we even support dag_id to be non-ascii? is this issue valid?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aa3pankaj commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
aa3pankaj commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1030673862


   @EricGao888 
   We can fix this from our code itself,
   As we add attachment filename in the headers, gunicorn code tries to encode it with "latin-1" and fails (when it contains chinese chars),
   So, before sending the response itself, we can do something like:
   ```
   # attachment_filename = task_log_reader.render_log_filename(ti, try_number, session=session)
   attachment_filename = urllib.parse.quote(task_log_reader.render_log_filename(ti, try_number, session=session))
   ```
   
   **I can raise PR with this fix, you can assign this issue to me.**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aa3pankaj commented on issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
aa3pankaj commented on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1045329791


   As per https://github.com/apache/airflow/blob/78490f86bfab195cc98a7b700c26c24a1aa2eb17/airflow/models/dag.py#L200, dag_id should be ASCII
   
   @potiuk @ashb do we even support dag_id to be non-ascii? is this issue valid?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aa3pankaj edited a comment on issue #21127: Fail to download task log if there are Chinese characters in dag_id

Posted by GitBox <gi...@apache.org>.
aa3pankaj edited a comment on issue #21127:
URL: https://github.com/apache/airflow/issues/21127#issuecomment-1030673862


   @EricGao888 
   We can fix this from our code itself,
   As we add attachment filename in the headers, gunicorn code tries to encode it with "latin-1" and fails (when it contains chinese chars),
   So, before sending the response itself, we can do something like:
   ```
   # attachment_filename = task_log_reader.render_log_filename(ti, try_number, session=session)
   attachment_filename = urllib.parse.quote(task_log_reader.render_log_filename(ti, try_number, session=session))
   ```
   
   **I can raise PR with this fix, you can assign this issue to me.**
   
   cc: @potiuk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org