You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/20 17:30:53 UTC

[GitHub] [airflow] ahuynh3 opened a new issue #12515: DAG serialization JSONDecodeError

ahuynh3 opened a new issue #12515:
URL: https://github.com/apache/airflow/issues/12515


   
   **Apache Airflow version**: 1.10.12
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: AWS
   - **OS** (e.g. from /etc/os-release):
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**: MySQL (RDS) metadata backend
   
   **What happened**:
   
   We recently turned on DAG serialization and noticed that when we tried to click on large DAGs in the UI, we get an error:
   
   <details>
     <summary>Traceback</summary>
   
   ```
   Traceback (most recent call last):
     File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 2447, in wsgi_app
       response = self.full_dispatch_request()
     File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1952, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1821, in handle_user_exception
       reraise(exc_type, exc_value, tb)
     File "/usr/local/lib/python3.7/dist-packages/flask/_compat.py", line 39, in reraise
       raise value
     File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1950, in full_dispatch_request
       rv = self.dispatch_request()
     File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1936, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/decorators.py", line 121, in wrapper
       return f(self, *args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/flask_appbuilder/security/decorators.py", line 109, in wraps
       return f(self, *args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/decorators.py", line 92, in view_func
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/decorators.py", line 56, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/airflow/utils/db.py", line 74, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/views.py", line 1407, in tree
       dag = dagbag.get_dag(dag_id)
     File "/usr/local/lib/python3.7/dist-packages/airflow/models/dagbag.py", line 136, in get_dag
       self._add_dag_from_db(dag_id=dag_id)
     File "/usr/local/lib/python3.7/dist-packages/airflow/models/dagbag.py", line 191, in _add_dag_from_db
       row = SerializedDagModel.get(dag_id)
     File "/usr/local/lib/python3.7/dist-packages/airflow/utils/db.py", line 74, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/airflow/models/serialized_dag.py", line 217, in get
       row = session.query(cls).filter(cls.dag_id == dag_id).one_or_none()
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/query.py", line 3459, in one_or_none
       ret = list(self)
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 100, in instances
       cursor.close()
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
       with_traceback=exc_tb,
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 80, in instances
       rows = [proc(row) for row in fetch]
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 80, in <listcomp>
       rows = [proc(row) for row in fetch]
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 588, in _instance
       populators,
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 725, in _populate_full
       dict_[key] = getter(row)
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/sql/type_api.py", line 1278, in process
       return process_value(impl_processor(value), dialect)
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/sql/sqltypes.py", line 2454, in process
       return json_deserializer(value)
     File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
       return _default_decoder.decode(s)
     File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
       obj, end = self.raw_decode(s, idx=_w(s, 0).end())
     File "/usr/lib/python3.7/json/decoder.py", line 353, in raw_decode
       obj, end = self.scan_once(s, idx)
   json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 16275 (char 16274)
   ```
   </details>
   
   We’ve determined the issue is that in the `serialized_dag` table with MySQL, the `data` column type is `TEXT`, which has a max of 64KB, but some of our DAG code is larger than that. We were able to get around this by running the following manually on the `serialized_dag` table then waiting for the table to get re-updated:
   
   ```
   CREATE TABLE serialized_dag_backup AS SELECT * FROM serialized_dag;
   
   ALTER TABLE serialized_dag MODIFY data MEDIUMTEXT;
   
   SELECT * FROM serialized_dag
   WHERE LENGTH(data) = 65535;
   
   DELETE FROM serialized_dag
   WHERE LENGTH(data) = 65535; 
   ```
   
   **What you expected to happen**:
   
   Should be able to click on the DAG in the UI without error
   
   **How to reproduce it**:
   With a MySQL metadata backend, create a DAG with code that is larger than 64KB and enable DAG serialization. Then attempt to click on that DAG in the UI.
   
   
   **Anything else we need to know**: N/A


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #12515: DAG serialization JSONDecodeError

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #12515:
URL: https://github.com/apache/airflow/issues/12515


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ahuynh3 commented on issue #12515: DAG serialization JSONDecodeError

Posted by GitBox <gi...@apache.org>.
ahuynh3 commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731345903


   5.6.43


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #12515: DAG serialization JSONDecodeError

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731302065


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ahuynh3 commented on issue #12515: DAG serialization JSONDecodeError

Posted by GitBox <gi...@apache.org>.
ahuynh3 commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731354721


   Ah nice catch -- seems like we can resolve this issue fairly easily by upgrading our MySQL version. Feel free to close this issue out if you don't think it's worth fixing for 5.6.x. Thanks again!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12515: DAG serialization JSONDecodeError

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731361173


   Cool, yeah with Airflow 2.0 around the corner, I think it would be worth for you to at least upgrade to 5.7 and even better 8.0 if you want to run multiple Schedulers in 2.0 ;)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12515: DAG serialization JSONDecodeError

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731344966


   @ahuynh3 What version of MySQL do you use?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12515: DAG serialization JSONDecodeError

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731342045


   > @kaxil Here's the issue described [in this Slack thread](https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1605839981357300)
   
   Thanks @ahuynh3 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12515: DAG serialization JSONDecodeError

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731347431


   That is why, Mysql 5.7+/MariaDB 10.2.3 has JSON support
   
   https://github.com/apache/airflow/blob/20843ff89ddbdac45f7ecf9913c4e38685089eb4/airflow/migrations/versions/d38e04c12aa2_add_serialized_dag_table.py#L42-L44


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ahuynh3 commented on issue #12515: DAG serialization JSONDecodeError

Posted by GitBox <gi...@apache.org>.
ahuynh3 commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731302447


   @kaxil Here's the issue described [in this Slack thread](https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1605839981357300)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ahuynh3 edited a comment on issue #12515: DAG serialization JSONDecodeError

Posted by GitBox <gi...@apache.org>.
ahuynh3 edited a comment on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731345903


   @kaxil 5.6.43


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org