You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/30 13:50:41 UTC

[GitHub] [airflow] levahim opened a new pull request #13984: Fixed reading from zip package to default to text.

levahim opened a new pull request #13984:
URL: https://github.com/apache/airflow/pull/13984


   This is a resubmission of PR #13962. The original PR did not include corresponding unit test adjustments and therefore broke the build. This PR fixes the problem.
   
   Original PR #13962 description for reference:
   
   The `open_maybe_zipped` function returns different file-like objects depending on whether it's called for a plain file or for a file in a zip archive. The problem is that by default `io.open` (used for plain files) returns file in text mode and subsequent `read` on it returns strings. `ZipFile` on the other hand by default returns a binary file and subsequent `read` on it returns bytes.
   The returned value for `open_maybe_zipped` should be the same regardless whether it's a zip or a plain file--it should be in text mode. Returning binaries for zip packages causes problems. For example, when saving DAG code is turned on, the `DagCode` model tries to save DAG source code in the metadata database. SQLAlchemy throws an error for DAGs that come from a zip package, because tries to save binary value in a string column.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770249089


   Unfortunately not (GA has no such possibility). It is indeed traansient error, But it is not blocking for merge. I would love to hear what others think (@kaxil, @ashb whether detection of the encoding of python file is worth a separate issue/PR) before merging.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#discussion_r567517515



##########
File path: tests/utils/test_dag_processing.py
##########
@@ -586,13 +586,16 @@ def test_open_maybe_zipped_normal_file_with_zip_in_name(self):
 
     @mock.patch("zipfile.is_zipfile")
     @mock.patch("zipfile.ZipFile")
-    def test_open_maybe_zipped_archive(self, mocked_zip_file, mocked_is_zipfile):
+    @mock.patch("io.TextIOWrapper")
+    def test_open_maybe_zipped_archive(self, mocked_text_io_wrapper, mocked_zip_file, mocked_is_zipfile):

Review comment:
       Yes. Please move it also. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770229702


   I think the solution is not complete, as it does not properly include Python encoding. And it is currently (potentially) wrong not only for "zipped" case but also for the "non-zipped" case. Maybe there is a chance to fix it for both cases. It would likely require to change the interface slightly of the open_maybe_zippped function. 
   
   In Python 3 default encoding is utf-8, and I guess it covers vast majority of cases, but there might be different encodings specified as defined by PEP 263: https://www.python.org/dev/peps/pep-0263/ . They are rarely used in Python 3 but still, there are cases when it can be useful. Moreover, different python files can be encoded with different encoding and we seem to use always the same encoding (default) as defined by `locale.getpreferredencoding(False)` (see https://docs.python.org/3/library/io.html#io.TextIOWrapper). 
   
   However, this function is only used to read python sources I believe, and there is a way in Python 3 to detect the encoding for Python source files. It is there in the standard library: 
   
   There are those two functions that can be used (added in Python 3.2): 
   
   * https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding
   * https://docs.python.org/3/library/tokenize.html#tokenize.open 
   
   They both read BOM of a file (if present) or follow PEP 263 (if not) to detect the file encoding. I think it would not be too complex to use those to reliably detect encoding of python files.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770253554


   Speaking of the tests: we are testing the implementation which is fragile.
   
   If possible we should instead test that when reading from it we get strings, not bytes (and mock as little as possible - we already have actual zip files in the tests for for this purpose)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770229702


   I think the solution is not complete, as it does not properly include Python encoding. And it is currently (potentially) wrong not only for "zipped" case but also for the "non-zipped" case. Maybe there is a chance to fix it for both cases. It would likely require to change the interface slightly of the open_maybe_zippped function. 
   
   In Python 3 default encoding is utf-8, and I guess it covers vast majority of cases, but there might be different encodings specified as defined by PEP 263: https://www.python.org/dev/peps/pep-0263/ . They are rarely used in Python 3 but still, there are cases when it can be useful. Moreover, different python files can be encoded with different encoding and we seem to use always the same encoding (default) as defined by `locale.getpreferredencoding(False)` (see https://docs.python.org/3/library/io.html#io.TextIOWrapper). 
   
   However, this function is only used to read python sources I believe, and there is a way in Python 3 to detect the encoding for Python source files. It is there in the standard library: 
   
   There are those two functions that can be used (added in Python 3.2): 
   
   * https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding
   * https://docs.python.org/3/library/tokenize.html#tokenize.open 
   
   They both read BOM of a file (if present) or follow PEP362 (if not) to detect the file encoding. I think it would not be too complex to use those to reliably detect encoding of python files.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770229702


   I think the solution is not complete, as it does not properly include Python encoding. And it is wrong not only for "zipped" case but also for the "non-zipped" case. Maybe there is a chance to fix it for both cases. It would require to change the interface slightly of the open_maybe_zippped function. 
   
   In Python 3 default encoding is utf-8, and I guess it covers vast majority of cases, but there might be different encodings specified as defined by PEP 263: https://www.python.org/dev/peps/pep-0263/ . They are rarely used in Python 3 but still, there are cases when it can be useful. Moreover, different python files can be encoded with different encoding and we seem to use always the same encoding (default) as defined by `locale.getpreferredencoding(False)` (see https://docs.python.org/3/library/io.html#io.TextIOWrapper). 
   
   However, this function is only used to read python sources I believe, and there is a way in Python 3 to detect the encoding for Python source files. It is there in the standard library: 
   
   There are those two functions that can be used (added in Python 3.2): 
   
   * https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding
   * https://docs.python.org/3/library/tokenize.html#tokenize.open 
   
   They both read BOM of a file (if present) or follow PEP362 to detect the file encoding. I think it would not be too complex to use those to reliably detect encoding of python files.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-773229399


   
   > _That_ I agree with. In this PR I tried to match the way things are done in the existing code and fix the specific issue at focus with minimal changes. If the elders agree, I'll certainly be happy to change the test implementation. @ashb , I'm new to the codebase, can you recommend exact files (a zip and a non-zip Python source code-like content) to use that you say already exist in the tests?
   
   @levahim `tests/dags/test_zip.zip` was the zip file I was thinking of, and anything else in that dir for a .py folder.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#discussion_r567301520



##########
File path: tests/utils/test_dag_processing.py
##########
@@ -586,13 +586,16 @@ def test_open_maybe_zipped_normal_file_with_zip_in_name(self):
 
     @mock.patch("zipfile.is_zipfile")
     @mock.patch("zipfile.ZipFile")
-    def test_open_maybe_zipped_archive(self, mocked_zip_file, mocked_is_zipfile):
+    @mock.patch("io.TextIOWrapper")
+    def test_open_maybe_zipped_archive(self, mocked_text_io_wrapper, mocked_zip_file, mocked_is_zipfile):

Review comment:
       Can you move these test cases to `test/utils/test_file.py`?  During the review, I checked that the file was missing and hence I merged the change that didn't work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#discussion_r567517515



##########
File path: tests/utils/test_dag_processing.py
##########
@@ -586,13 +586,16 @@ def test_open_maybe_zipped_normal_file_with_zip_in_name(self):
 
     @mock.patch("zipfile.is_zipfile")
     @mock.patch("zipfile.ZipFile")
-    def test_open_maybe_zipped_archive(self, mocked_zip_file, mocked_is_zipfile):
+    @mock.patch("io.TextIOWrapper")
+    def test_open_maybe_zipped_archive(self, mocked_text_io_wrapper, mocked_zip_file, mocked_is_zipfile):

Review comment:
       Yes. Please move it also. YOu can moce these classes in this PR, but you'd better do it as a separate PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770253975


   As for encoding: so long as we document it, I'm perfectly happy only supporting utf8 source files - it's unlikely any modern system will use anything else, so isn't worth us supporting it.
   
   If someone wants it they can open a PR in future.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] levahim commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
levahim commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770388456


   > Speaking of the tests: we are testing the implementation which is fragile.
   > 
   > If possible we should instead test that when reading from it we get strings, not bytes (and mock as little as possible - we already have actual zip files in the tests for for this purpose)
   
   _That_ I agree with. In this PR I tried to match the way things are done in the existing code and fix the specific issue at focus with minimal changes. If the elders agree, I'll certainly be happy to change the test implementation. @ashb , I'm new to the codebase, can you recommend exact files (a zip and a non-zip Python source code-like content) to use that you say already exist in the tests?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] levahim commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
levahim commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770248248


   It appears that the CI Build failed for a reason unrelated to the PR. Seems to be a temporary issue with the failed test itself. Is it possible to re-run the failed check?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770244768


   > @potiuk, I'd say it's a separate issue. The scope of this fix is just to make sure that when the DAG source code is read from a zipped package, it is returned in exactly the same way as for a non-zipped file. In both cases it uses the Python's default encoding and with this fix the outcome of calling `open_maybe_zipped` is identical for both cases.
   > I'd suggest a separate issue is created and tracked for what you are describing. In the meantime, this fix unblocks all the folks who want to use zipped packages and have `store_dag_code` option enabled.
   
   Happy to hear what others think, but since testing is about the same piece of code and potentially resulting in errors in the same plce. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #13984:
URL: https://github.com/apache/airflow/pull/13984


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770498592


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770244768






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770253975


   As for encoding: so long as we document it, I'm perfectly happy only supporting utf8 source files - it's unlikely any modern system will use anything else, so isn't worth us supporting it.
   
   If someone wants it they can open a PR in future.
   
   I'm not against it, but also agree that it should be a separate PR to this one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770229702


   I think the solution is not complete, as it does not properly include Python encoding. And it is currently (potentially) wrong not only for "zipped" case but also for the "non-zipped" case. Maybe there is a chance to fix it for both cases. It would likely require to change the interface slightly of the open_maybe_zippped function. 
   
   In Python 3 default encoding is utf-8, and I guess it covers vast majority of cases, but there might be different encodings specified as defined by PEP 263: https://www.python.org/dev/peps/pep-0263/ . They are rarely used in Python 3 but still, there are cases when it can be useful. Moreover, different python files can be encoded with different encoding and we seem to use always the same encoding (default) as defined by `locale.getpreferredencoding(False)` (see https://docs.python.org/3/library/io.html#io.TextIOWrapper). 
   
   However, this function is only used to read python sources I believe, and there is a way in Python 3 to detect the encoding for Python source files. It is there in the standard library: 
   
   There are those two functions that can be used (added in Python 3.2): 
   
   * https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding
   * https://docs.python.org/3/library/tokenize.html#tokenize.open 
   
   They both read BOM of a file (if present) or follow PEP362 to detect the file encoding. I think it would not be too complex to use those to reliably detect encoding of python files.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-773229399


   
   > _That_ I agree with. In this PR I tried to match the way things are done in the existing code and fix the specific issue at focus with minimal changes. If the elders agree, I'll certainly be happy to change the test implementation. @ashb , I'm new to the codebase, can you recommend exact files (a zip and a non-zip Python source code-like content) to use that you say already exist in the tests?
   
   @levahim `tests/dags/test_zip.zip` was the zip file I was thinking of, and anything else in that dir for a .py folder.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] levahim commented on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
levahim commented on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770243950


   @potiuk, I'd say it's a separate issue. The scope of this fix is just to make sure that when the DAG source code is read from a zipped package, it is returned in exactly the same way as for a non-zipped file. In both cases it uses the Python's default encoding and with this fix the outcome of calling `open_maybe_zipped` is identical for both cases.
   I'd suggest a separate issue is created and tracked for what you are describing. In the meantime, this fix unblocks all the folks who want to use zipped packages and have `store_dag_code` option enabled.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770244768


   > @potiuk, I'd say it's a separate issue. The scope of this fix is just to make sure that when the DAG source code is read from a zipped package, it is returned in exactly the same way as for a non-zipped file. In both cases it uses the Python's default encoding and with this fix the outcome of calling `open_maybe_zipped` is identical for both cases.
   > I'd suggest a separate issue is created and tracked for what you are describing. In the meantime, this fix unblocks all the folks who want to use zipped packages and have `store_dag_code` option enabled.
   
   Happy to hear what others think, but since testing is about the same piece of code and potentially resulting in errors in the same place. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#discussion_r567301520



##########
File path: tests/utils/test_dag_processing.py
##########
@@ -586,13 +586,16 @@ def test_open_maybe_zipped_normal_file_with_zip_in_name(self):
 
     @mock.patch("zipfile.is_zipfile")
     @mock.patch("zipfile.ZipFile")
-    def test_open_maybe_zipped_archive(self, mocked_zip_file, mocked_is_zipfile):
+    @mock.patch("io.TextIOWrapper")
+    def test_open_maybe_zipped_archive(self, mocked_text_io_wrapper, mocked_zip_file, mocked_is_zipfile):

Review comment:
       Can you move this test case to `test/utils/test_file.py`?  During the review, I checked that the file was missing and hence I merged the change that didn't work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#issuecomment-770229702


   I think the solution is not complete, as it does not properly include Python encoding. And it is currently (potentially) wrong not only for "zipped" case but also for the "non-zipped" case. Maybe there is a chance to fix it for both cases. It would require to change the interface slightly of the open_maybe_zippped function. 
   
   In Python 3 default encoding is utf-8, and I guess it covers vast majority of cases, but there might be different encodings specified as defined by PEP 263: https://www.python.org/dev/peps/pep-0263/ . They are rarely used in Python 3 but still, there are cases when it can be useful. Moreover, different python files can be encoded with different encoding and we seem to use always the same encoding (default) as defined by `locale.getpreferredencoding(False)` (see https://docs.python.org/3/library/io.html#io.TextIOWrapper). 
   
   However, this function is only used to read python sources I believe, and there is a way in Python 3 to detect the encoding for Python source files. It is there in the standard library: 
   
   There are those two functions that can be used (added in Python 3.2): 
   
   * https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding
   * https://docs.python.org/3/library/tokenize.html#tokenize.open 
   
   They both read BOM of a file (if present) or follow PEP362 to detect the file encoding. I think it would not be too complex to use those to reliably detect encoding of python files.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] levahim commented on a change in pull request #13984: Fixed reading from zip package to default to text.

Posted by GitBox <gi...@apache.org>.
levahim commented on a change in pull request #13984:
URL: https://github.com/apache/airflow/pull/13984#discussion_r567432534



##########
File path: tests/utils/test_dag_processing.py
##########
@@ -586,13 +586,16 @@ def test_open_maybe_zipped_normal_file_with_zip_in_name(self):
 
     @mock.patch("zipfile.is_zipfile")
     @mock.patch("zipfile.ZipFile")
-    def test_open_maybe_zipped_archive(self, mocked_zip_file, mocked_is_zipfile):
+    @mock.patch("io.TextIOWrapper")
+    def test_open_maybe_zipped_archive(self, mocked_text_io_wrapper, mocked_zip_file, mocked_is_zipfile):

Review comment:
       Would you like to move `TestCorrectMaybeZipped` to the new `tests/utils/test_file.py` module as well?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org