You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/09 07:34:44 UTC
[GitHub] [airflow] Avphy opened a new issue, #28798: Processor unable to parse DAG with non english characters
Avphy opened a new issue, #28798:
URL: https://github.com/apache/airflow/issues/28798
### Apache Airflow version
Other Airflow 2 version (please specify below)
### What happened
Airflow version: 2.3.2
OS: Linux
When processor tries to load DAG files with non english characters (for example Arabic) it throws error "UnicodeEncodeError: 'charmap' codec can't encode characters in position 4999-5001: character maps to <undefined>"
Similar issue was reported in https://github.com/apache/airflow/issues/10954 but it was closed due to lack of response from owner.
```
Process DagFileProcessor9-Process:
Traceback (most recent call last):
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/dag_processing/processor.py", line 155, in _run_file_processor
result: Tuple[int, int] = dag_file_processor.process_file(
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/utils/session.py", line 71, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/dag_processing/processor.py", line 660, in process_file
dagbag.sync_to_db()
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/utils/session.py", line 71, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/models/dagbag.py", line 615, in sync_to_db
for attempt in run_with_db_retries(logger=self.log):
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 382, in __iter__
do = self.iter(retry_state=retry_state)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 349, in iter
return fut.result()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/models/dagbag.py", line 629, in sync_to_db
DAG.bulk_write_to_db(self.dags.values(), session=session)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/utils/session.py", line 68, in wrapper
return func(*args, **kwargs)
File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/models/dag.py", line 2474, in bulk_write_to_db
session.flush()
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 3255, in flush
self._flush(objects)
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 3395, in _flush
transaction.rollback(_capture_exception=True)
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
compat.raise_(
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 3355, in _flush
flush_context.execute()
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 453, in execute
rec.execute(self)
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 627, in execute
util.preloaded.orm_persistence.save_obj(
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 242, in save_obj
_emit_insert_statements(
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1094, in _emit_insert_statements
c = connection._execute_20(
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1520, in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/sql/elements.py", line 313, in _execute_on_connection
return connection._execute_clauseelement(
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1389, in _execute_clauseelement
ret = self._execute_context(
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1748, in _execute_context
self._handle_dbapi_exception(
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1933, in _handle_dbapi_exception
util.raise_(exc_info[1], with_traceback=exc_info[2])
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_context
self.dialect.do_execute(
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/default.py", line 716, in do_execute
cursor.execute(statement, parameters)
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/MySQLdb/cursors.py", line 199, in execute
args = tuple(map(db.literal, args))
File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/MySQLdb/connections.py", line 275, in literal
s = self.string_literal(o.encode(self.encoding))
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/encodings/cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 4999-5001: character maps to <undefined>
```
### What you think should happen instead
_No response_
### How to reproduce
Add this Operator in DAG file:
```
OPERATOR_عمل1672935219115_937734575_ = PythonOperator(
task_id = 'OPERATOR_عمل1672935219115_937734575_',
params = {
"op_id" : "LOOP_NODE_عمل_PIPELINE_1672935219115"
},
dag = dag )
```
### Operating System
Linux
### Versions of Apache Airflow Providers
2.3.2
### Deployment
Docker-Compose
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #28798: Processor unable to parse DAG with non english characters
Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #28798:
URL: https://github.com/apache/airflow/issues/28798#issuecomment-1375214889
This looks like a database configuration issue. Did you configure the database’s character set and collation correctly? https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-mysql-database
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] notatallshaw commented on issue #28798: Processor unable to parse DAG with non english characters
Posted by GitBox <gi...@apache.org>.
notatallshaw commented on issue #28798:
URL: https://github.com/apache/airflow/issues/28798#issuecomment-1375914137
Have you set a charset in your metadata connection string?
Your traceback suggests that MySQLdb is trying to read from your database using cp1252 rather than the expected utf8.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #28798: Processor unable to parse DAG with non english characters
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #28798: Processor unable to parse DAG with non english characters
URL: https://github.com/apache/airflow/issues/28798
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Avphy commented on issue #28798: Processor unable to parse DAG with non english characters
Posted by GitBox <gi...@apache.org>.
Avphy commented on issue #28798:
URL: https://github.com/apache/airflow/issues/28798#issuecomment-1375744238
I followed the steps given in this page but still getting same error:
```
mysql> SELECT SCHEMA_NAME 'database', default_character_set_name 'charset', DEFAULT_COLLATION_NAME 'collation' FROM information_schema.SCHEMATA;
+--------------------+---------+--------------------+
| database | charset | collation |
+--------------------+---------+--------------------+
| mysql | utf8mb4 | utf8mb4_0900_ai_ci |
| information_schema | utf8mb3 | utf8mb3_general_ci |
| performance_schema | utf8mb4 | utf8mb4_0900_ai_ci |
| sys | utf8mb4 | utf8mb4_0900_ai_ci |
```
And set below value in airflow.cfg
```
sql_engine_collation_for_ids=utf8mb3_bin
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #28798: Processor unable to parse DAG with non english characters
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28798:
URL: https://github.com/apache/airflow/issues/28798#issuecomment-1376355211
MySQL has **MULTIPLE** ways of setting charset and you need to set properly all of them - client, server,table, column. It caused multiple problems in the past - to the level that we even considered dropping it MySQL support for that very reason.
You need to understand all charset settings for mysql and manage your SQL DB (and this is - unfortunately your job as deployment manager to make sure it is all good).
I think starting from https://dev.mysql.com/doc/refman/8.0/en/charset.html
and reading all the chapters is a good starting points.
Converting it into discussion, because that is certainly not an Airflow issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org