You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/09 07:34:44 UTC

[GitHub] [airflow] Avphy opened a new issue, #28798: Processor unable to parse DAG with non english characters

Avphy opened a new issue, #28798:
URL: https://github.com/apache/airflow/issues/28798

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   Airflow version: 2.3.2
   OS: Linux
   
   When processor tries to load DAG files with non english characters (for example Arabic) it throws error "UnicodeEncodeError: 'charmap' codec can't encode characters in position 4999-5001: character maps to <undefined>"
   
   Similar issue was reported in https://github.com/apache/airflow/issues/10954 but it was closed due to lack of response from owner.
   
   ```
    Process DagFileProcessor9-Process:
    Traceback (most recent call last):
      File "/opt/rh/rh-python38/root/usr/lib64/python3.8/multiprocessing/process.py", line 315, in _bootstrap
        self.run()
      File "/opt/rh/rh-python38/root/usr/lib64/python3.8/multiprocessing/process.py", line 108, in run
       self._target(*self._args, **self._kwargs)
      File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/dag_processing/processor.py", line 155, in _run_file_processor
        result: Tuple[int, int] = dag_file_processor.process_file(
      File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/utils/session.py", line 71, in wrapper
        return func(*args, session=session, **kwargs)
      File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/dag_processing/processor.py", line 660, in process_file
        dagbag.sync_to_db()
      File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/utils/session.py", line 71, in wrapper
        return func(*args, session=session, **kwargs)
      File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/models/dagbag.py", line 615, in sync_to_db
        for attempt in run_with_db_retries(logger=self.log):
      File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 382, in __iter__
        do = self.iter(retry_state=retry_state)
      File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 349, in iter
        return fut.result()
      File "/opt/rh/rh-python38/root/usr/lib64/python3.8/concurrent/futures/_base.py", line 437, in result
        return self.__get_result()
      File "/opt/rh/rh-python38/root/usr/lib64/python3.8/concurrent/futures/_base.py", line 389, in __get_result
        raise self._exception
      File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/models/dagbag.py", line 629, in sync_to_db
        DAG.bulk_write_to_db(self.dags.values(), session=session)
      File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/utils/session.py", line 68, in wrapper
        return func(*args, **kwargs)
      File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/airflow/models/dag.py", line 2474, in bulk_write_to_db
        session.flush()
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 3255, in flush
        self._flush(objects)
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 3395, in _flush
        transaction.rollback(_capture_exception=True)
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
        compat.raise_(
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
        raise exception
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 3355, in _flush
        flush_context.execute()
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 453, in execute
        rec.execute(self)
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 627, in execute
        util.preloaded.orm_persistence.save_obj(
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 242, in save_obj
        _emit_insert_statements(
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1094, in _emit_insert_statements
       c = connection._execute_20(
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1520, in _execute_20
        return meth(self, args_10style, kwargs_10style, execution_options)
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/sql/elements.py", line 313, in _execute_on_connection
        return connection._execute_clauseelement(
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1389, in _execute_clauseelement
        ret = self._execute_context(
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1748, in _execute_context
        self._handle_dbapi_exception(
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1933, in _handle_dbapi_exception
        util.raise_(exc_info[1], with_traceback=exc_info[2])
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
        raise exception
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_context
        self.dialect.do_execute(
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/sqlalchemy/engine/default.py", line 716, in do_execute
        cursor.execute(statement, parameters)
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/MySQLdb/cursors.py", line 199, in execute
        args = tuple(map(db.literal, args))
      File "/opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/MySQLdb/connections.py", line 275, in literal
        s = self.string_literal(o.encode(self.encoding))
      File "/opt/rh/rh-python38/root/usr/lib64/python3.8/encodings/cp1252.py", line 12, in encode
        return codecs.charmap_encode(input,errors,encoding_table)
     UnicodeEncodeError: 'charmap' codec can't encode characters in position 4999-5001: character maps to <undefined>
   ```
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   Add this Operator in DAG file:
   
   ```
   OPERATOR_عمل1672935219115_937734575_ = PythonOperator(
       task_id = 'OPERATOR_عمل1672935219115_937734575_',
       params = {
     "op_id" : "LOOP_NODE_عمل_PIPELINE_1672935219115"
   },
       dag = dag )
   ```
   
   
   
   ### Operating System
   
   Linux
   
   ### Versions of Apache Airflow Providers
   
   2.3.2
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #28798: Processor unable to parse DAG with non english characters

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #28798:
URL: https://github.com/apache/airflow/issues/28798#issuecomment-1375214889

   This looks like a database configuration issue. Did you configure the database’s character set and collation correctly? https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-mysql-database


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] notatallshaw commented on issue #28798: Processor unable to parse DAG with non english characters

Posted by GitBox <gi...@apache.org>.
notatallshaw commented on issue #28798:
URL: https://github.com/apache/airflow/issues/28798#issuecomment-1375914137

   Have you set a charset in your metadata connection string?
   
   Your traceback suggests that MySQLdb is trying to read from your database using cp1252 rather than the expected utf8.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #28798: Processor unable to parse DAG with non english characters

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #28798: Processor unable to parse DAG with non english characters
URL: https://github.com/apache/airflow/issues/28798


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Avphy commented on issue #28798: Processor unable to parse DAG with non english characters

Posted by GitBox <gi...@apache.org>.
Avphy commented on issue #28798:
URL: https://github.com/apache/airflow/issues/28798#issuecomment-1375744238

   I followed the steps given in this page but still getting same error:
   
   ```
   mysql> SELECT SCHEMA_NAME 'database', default_character_set_name 'charset', DEFAULT_COLLATION_NAME 'collation' FROM information_schema.SCHEMATA;
   +--------------------+---------+--------------------+
   | database           | charset | collation          |
   +--------------------+---------+--------------------+
   | mysql              | utf8mb4 | utf8mb4_0900_ai_ci |
   | information_schema | utf8mb3 | utf8mb3_general_ci |
   | performance_schema | utf8mb4 | utf8mb4_0900_ai_ci |
   | sys                | utf8mb4 | utf8mb4_0900_ai_ci |
   ```
   
   And set below value in airflow.cfg
   ```
   sql_engine_collation_for_ids=utf8mb3_bin
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28798: Processor unable to parse DAG with non english characters

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28798:
URL: https://github.com/apache/airflow/issues/28798#issuecomment-1376355211

   MySQL has **MULTIPLE** ways of setting charset and you need to set properly all of them - client, server,table, column. It caused multiple problems in the past - to the level that we even considered dropping it MySQL support for that very reason. 
   You need to understand all charset settings for mysql and manage your SQL DB (and this is - unfortunately your job as deployment manager to make sure it is all good).
   
   I think starting from https://dev.mysql.com/doc/refman/8.0/en/charset.html 
   
   and reading all the chapters is a good starting points.
   
   Converting it into discussion, because that is certainly not an Airflow issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org