You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/10 09:19:51 UTC

[GitHub] [airflow] lvjg opened a new issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

lvjg opened a new issue #18133:
URL: https://github.com/apache/airflow/issues/18133


   ### Apache Airflow version
   
   main (development)
   
   ### Operating System
   
   centos7
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   ![image](https://user-images.githubusercontent.com/2542162/132831151-30222d59-6604-4df6-9ff6-7c7cd8cebaec.png)
   
   
   ### What you expected to happen
   
   dagcode._get_code_from_file open fileloc use encoding=utf-8
   
   ### How to reproduce
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733


   In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
   `    
   @classmethod
       def code(cls, fileloc) -> str:
           if STORE_DAG_CODE:
               return cls._get_code_from_db(fileloc)
           else:
               return cls._get_code_from_file(fileloc)
   `
   
   However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
   `    
   @classmethod
       def code(cls, fileloc) -> str:
           return cls._get_code_from_db(fileloc)
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-916787541


   Just started to answer when @uranusjr  did  :) ,. so to repeat - as a solution for now you need to set the encoding properly via LANG* variables in your environment (for all components):
   https://stackoverflow.com/questions/2276200/changing-default-encoding-of-python
   
   Airflow will use default encoding from the system. We are working on hard-conding utf8 for other parts #17965 - this might be a nice one to add also for parsing the files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733


   In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
   `
   
       def code(cls, fileloc) -> str:
           if STORE_DAG_CODE:
               return cls._get_code_from_db(fileloc)
           else:
               return cls._get_code_from_file(fileloc)
   
   `
   
   However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
   `    
       def code(cls, fileloc) -> str:
           return cls._get_code_from_db(fileloc)
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733


   In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
   `    
   @classmethod
       def code(cls, fileloc) -> str:
           """Returns source code for this DagCode object.
   
           :return: source code as string
           """
           if STORE_DAG_CODE:
               return cls._get_code_from_db(fileloc)
           else:
               return cls._get_code_from_file(fileloc)
   `
   
   However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
   `    
   @classmethod
       def code(cls, fileloc) -> str:
           """Returns source code for this DagCode object.
   
           :return: source code as string
           """
           return cls._get_code_from_db(fileloc)
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733


   In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
   `    @classmethod
       def code(cls, fileloc) -> str:
           """Returns source code for this DagCode object.
   
           :return: source code as string
           """
           if STORE_DAG_CODE:
               return cls._get_code_from_db(fileloc)
           else:
               return cls._get_code_from_file(fileloc)`
   
   However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
   `    @classmethod
       def code(cls, fileloc) -> str:
           """Returns source code for this DagCode object.
   
           :return: source code as string
           """
           return cls._get_code_from_db(fileloc)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917588618


   I looked into the source code where it's loading dags from files. It's using importlib at airflow.models.dagbag.DagBag:_load_modules_from_file/_load_modules_from_zip, called from airflow.models.dagbag.DagBag:process_file
   
   I'm not aware of the encoding importlib is using internally. But, in the case of this specific issue, user's dag was loaded properly problem faced was to only display the code in the UI.(so assuming it's using utf-8/something-compatible internally. ref: [PEP-3120](https://www.python.org/dev/peps/pep-3120/) Please let me know if there's any other place to be looked into. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917391026


   I believe Airflow also uses platform encoding when initially reading the source code out to store into the database. This would likely still an issue if that’s the case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733


   In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
   
   
       def code(cls, fileloc) -> str:
           if STORE_DAG_CODE:
               return cls._get_code_from_db(fileloc)
           else:
               return cls._get_code_from_file(fileloc)
   
   
   
   However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
   
   
       def code(cls, fileloc) -> str:
           return cls._get_code_from_db(fileloc)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917588618


   I looked into the source code where it's loading dags from files. It's using importlib at airflow.models.dagbag.DagBag:_load_modules_from_file/_load_modules_from_zip, called from airflow.models.dagbag.DagBag:process_file
   
   I'm not aware of the encoding importlib is using internally. But, in the case of this specific issue, user's dag was loaded properly problem faced was to only display the code in the UI.(so assuming it's using utf-8/something-compatible internally referring: [PEP-3120](https://www.python.org/dev/peps/pep-3120/) 
   Please let me know if there's any other location in codebase to be looked into. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733


   In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
   
   
       def code(cls, fileloc) -> str:
           if STORE_DAG_CODE:
               return cls._get_code_from_db(fileloc)
           else:
               return cls._get_code_from_file(fileloc)
   
   
   
   However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more, in future releases.
   
   
       def code(cls, fileloc) -> str:
           return cls._get_code_from_db(fileloc)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-916761145


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733


   In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (below snippet from 2.1.3):
   
   
       def code(cls, fileloc) -> str:
           if STORE_DAG_CODE:
               return cls._get_code_from_db(fileloc)
           else:
               return cls._get_code_from_file(fileloc)
   
   
   
   However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more, in future releases.
   
   
       def code(cls, fileloc) -> str:
           return cls._get_code_from_db(fileloc)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917589798


   Looks lie importlib indeed always uses utf8 unless otherwise specified: https://docs.python.org/3/library/importlib.html#importlib.resources.open_text


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733


   In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
   `
   
       def code(cls, fileloc) -> str:
           if STORE_DAG_CODE:
               return cls._get_code_from_db(fileloc)
           else:
               return cls._get_code_from_file(fileloc)
   
   `
   
   However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
   `
   
       def code(cls, fileloc) -> str:
           return cls._get_code_from_db(fileloc)
   
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lvjg commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
lvjg commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-916845288


   thanks!!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-916786534


   You probably need this: https://stackoverflow.com/a/27931669/1376863
   
   Although explicitly using UTF-8 is probably a correct thing to do anyway. IIRC the source code read here is only used for UI display? Always using UTF-8 would be OK if that’s the case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lvjg commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code

Posted by GitBox <gi...@apache.org>.
lvjg commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-916845161


   thanks~ 
   
   > You probably need this: https://stackoverflow.com/a/27931669/1376863
   > 
   > Although explicitly using UTF-8 is probably a correct thing to do anyway. IIRC the source code read here is only used for UI display? Always using UTF-8 would be OK if that’s the case.
   
   solved...
   It is more recommended that we can set the encoding...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org