You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/10 09:19:51 UTC
[GitHub] [airflow] lvjg opened a new issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
lvjg opened a new issue #18133:
URL: https://github.com/apache/airflow/issues/18133
### Apache Airflow version
main (development)
### Operating System
centos7
### Versions of Apache Airflow Providers
_No response_
### Deployment
Other
### Deployment details
_No response_
### What happened
![image](https://user-images.githubusercontent.com/2542162/132831151-30222d59-6604-4df6-9ff6-7c7cd8cebaec.png)
### What you expected to happen
dagcode._get_code_from_file open fileloc use encoding=utf-8
### How to reproduce
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733
In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
`
@classmethod
def code(cls, fileloc) -> str:
if STORE_DAG_CODE:
return cls._get_code_from_db(fileloc)
else:
return cls._get_code_from_file(fileloc)
`
However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
`
@classmethod
def code(cls, fileloc) -> str:
return cls._get_code_from_db(fileloc)
`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-916787541
Just started to answer when @uranusjr did :) ,. so to repeat - as a solution for now you need to set the encoding properly via LANG* variables in your environment (for all components):
https://stackoverflow.com/questions/2276200/changing-default-encoding-of-python
Airflow will use default encoding from the system. We are working on hard-conding utf8 for other parts #17965 - this might be a nice one to add also for parsing the files.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733
In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
`
def code(cls, fileloc) -> str:
if STORE_DAG_CODE:
return cls._get_code_from_db(fileloc)
else:
return cls._get_code_from_file(fileloc)
`
However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
`
def code(cls, fileloc) -> str:
return cls._get_code_from_db(fileloc)
`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733
In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
`
@classmethod
def code(cls, fileloc) -> str:
"""Returns source code for this DagCode object.
:return: source code as string
"""
if STORE_DAG_CODE:
return cls._get_code_from_db(fileloc)
else:
return cls._get_code_from_file(fileloc)
`
However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
`
@classmethod
def code(cls, fileloc) -> str:
"""Returns source code for this DagCode object.
:return: source code as string
"""
return cls._get_code_from_db(fileloc)
`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Narendra-Neerukonda commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733
In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
` @classmethod
def code(cls, fileloc) -> str:
"""Returns source code for this DagCode object.
:return: source code as string
"""
if STORE_DAG_CODE:
return cls._get_code_from_db(fileloc)
else:
return cls._get_code_from_file(fileloc)`
However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
` @classmethod
def code(cls, fileloc) -> str:
"""Returns source code for this DagCode object.
:return: source code as string
"""
return cls._get_code_from_db(fileloc)`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Narendra-Neerukonda commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917588618
I looked into the source code where it's loading dags from files. It's using importlib at airflow.models.dagbag.DagBag:_load_modules_from_file/_load_modules_from_zip, called from airflow.models.dagbag.DagBag:process_file
I'm not aware of the encoding importlib is using internally. But, in the case of this specific issue, user's dag was loaded properly problem faced was to only display the code in the UI.(so assuming it's using utf-8/something-compatible internally. ref: [PEP-3120](https://www.python.org/dev/peps/pep-3120/) Please let me know if there's any other place to be looked into.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917391026
I believe Airflow also uses platform encoding when initially reading the source code out to store into the database. This would likely still an issue if that’s the case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733
In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
def code(cls, fileloc) -> str:
if STORE_DAG_CODE:
return cls._get_code_from_db(fileloc)
else:
return cls._get_code_from_file(fileloc)
However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
def code(cls, fileloc) -> str:
return cls._get_code_from_db(fileloc)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917588618
I looked into the source code where it's loading dags from files. It's using importlib at airflow.models.dagbag.DagBag:_load_modules_from_file/_load_modules_from_zip, called from airflow.models.dagbag.DagBag:process_file
I'm not aware of the encoding importlib is using internally. But, in the case of this specific issue, user's dag was loaded properly problem faced was to only display the code in the UI.(so assuming it's using utf-8/something-compatible internally referring: [PEP-3120](https://www.python.org/dev/peps/pep-3120/)
Please let me know if there's any other location in codebase to be looked into.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733
In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
def code(cls, fileloc) -> str:
if STORE_DAG_CODE:
return cls._get_code_from_db(fileloc)
else:
return cls._get_code_from_file(fileloc)
However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more, in future releases.
def code(cls, fileloc) -> str:
return cls._get_code_from_db(fileloc)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-916761145
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733
In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (below snippet from 2.1.3):
def code(cls, fileloc) -> str:
if STORE_DAG_CODE:
return cls._get_code_from_db(fileloc)
else:
return cls._get_code_from_file(fileloc)
However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more, in future releases.
def code(cls, fileloc) -> str:
return cls._get_code_from_db(fileloc)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917589798
Looks lie importlib indeed always uses utf8 unless otherwise specified: https://docs.python.org/3/library/importlib.html#importlib.resources.open_text
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-917351733
In Airflow 2.1.3, there is an option in core for store_dag_code, which if set to True, stores the dag code in DB and UI retrieves it from there (nelow snippet from 2.1.3):
`
def code(cls, fileloc) -> str:
if STORE_DAG_CODE:
return cls._get_code_from_db(fileloc)
else:
return cls._get_code_from_file(fileloc)
`
However, in the current main branch, the store_dag_code option seems to have been removed and the code is loaded directly from DB (below snippet from main). So, not sure if the issue will come any more in future releases.
`
def code(cls, fileloc) -> str:
return cls._get_code_from_db(fileloc)
`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] lvjg commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
lvjg commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-916845288
thanks!!!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-916786534
You probably need this: https://stackoverflow.com/a/27931669/1376863
Although explicitly using UTF-8 is probably a correct thing to do anyway. IIRC the source code read here is only used for UI display? Always using UTF-8 would be OK if that’s the case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] lvjg commented on issue #18133: bulk_sync_to_db got UnicodeDecodeError when Chinese characters in dag code
Posted by GitBox <gi...@apache.org>.
lvjg commented on issue #18133:
URL: https://github.com/apache/airflow/issues/18133#issuecomment-916845161
thanks~
> You probably need this: https://stackoverflow.com/a/27931669/1376863
>
> Although explicitly using UTF-8 is probably a correct thing to do anyway. IIRC the source code read here is only used for UI display? Always using UTF-8 would be OK if that’s the case.
solved...
It is more recommended that we can set the encoding...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org