You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/30 15:32:52 UTC

[GitHub] [airflow] potiuk commented on pull request #13984: Fixed reading from zip package to default to text.

potiuk commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770229702


   I think the solution is not complete, as it does not properly include Python encoding. And it is wrong not only for "zipped" case but also for the "non-zipped" case. Maybe there is a chance to fix it for both cases. It would require to change the interface slightly of the open_maybe_zippped function. 
   
   In Python 3 default encoding is utf-8, and I guess it covers vast majority of cases, but there might be different encodings specified as defined by PEP 263: https://www.python.org/dev/peps/pep-0263/ . They are rarely used in Python 3 but still, there are cases when it can be useful. Moreover, different python files can be encoded with different encoding and we seem to use always the same encoding (default) as defined by `locale.getpreferredencoding(False)` (see https://docs.python.org/3/library/io.html#io.TextIOWrapper). 
   
   However, this function is only used to read python sources I believe, and there is a way in Python 3 to detect the encoding for Python source files. It is there in the standard library: 
   
   There are those two functions that can be used (added in Python 3.2): 
   
   * https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding
   * https://docs.python.org/3/library/tokenize.html#tokenize.open 
   
   They both read BOM of a file (if present) or follow PEP362 to detect the file encoding. I think it would not be too complex to use those to reliably detect encoding of python files.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org