You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/20 17:30:53 UTC
[GitHub] [airflow] ahuynh3 opened a new issue #12515: DAG serialization JSONDecodeError
ahuynh3 opened a new issue #12515:
URL: https://github.com/apache/airflow/issues/12515
**Apache Airflow version**: 1.10.12
**Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
**Environment**:
- **Cloud provider or hardware configuration**: AWS
- **OS** (e.g. from /etc/os-release):
- **Kernel** (e.g. `uname -a`):
- **Install tools**:
- **Others**: MySQL (RDS) metadata backend
**What happened**:
We recently turned on DAG serialization and noticed that when we tried to click on large DAGs in the UI, we get an error:
<details>
<summary>Traceback</summary>
```
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.7/dist-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/decorators.py", line 121, in wrapper
return f(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/flask_appbuilder/security/decorators.py", line 109, in wraps
return f(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/decorators.py", line 92, in view_func
return f(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/decorators.py", line 56, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/airflow/www_rbac/views.py", line 1407, in tree
dag = dagbag.get_dag(dag_id)
File "/usr/local/lib/python3.7/dist-packages/airflow/models/dagbag.py", line 136, in get_dag
self._add_dag_from_db(dag_id=dag_id)
File "/usr/local/lib/python3.7/dist-packages/airflow/models/dagbag.py", line 191, in _add_dag_from_db
row = SerializedDagModel.get(dag_id)
File "/usr/local/lib/python3.7/dist-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/airflow/models/serialized_dag.py", line 217, in get
row = session.query(cls).filter(cls.dag_id == dag_id).one_or_none()
File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/query.py", line 3459, in one_or_none
ret = list(self)
File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 100, in instances
cursor.close()
File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
with_traceback=exc_tb,
File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 80, in instances
rows = [proc(row) for row in fetch]
File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 80, in <listcomp>
rows = [proc(row) for row in fetch]
File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 588, in _instance
populators,
File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/loading.py", line 725, in _populate_full
dict_[key] = getter(row)
File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/sql/type_api.py", line 1278, in process
return process_value(impl_processor(value), dialect)
File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/sql/sqltypes.py", line 2454, in process
return json_deserializer(value)
File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.7/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 16275 (char 16274)
```
</details>
We’ve determined the issue is that in the `serialized_dag` table with MySQL, the `data` column type is `TEXT`, which has a max of 64KB, but some of our DAG code is larger than that. We were able to get around this by running the following manually on the `serialized_dag` table then waiting for the table to get re-updated:
```
CREATE TABLE serialized_dag_backup AS SELECT * FROM serialized_dag;
ALTER TABLE serialized_dag MODIFY data MEDIUMTEXT;
SELECT * FROM serialized_dag
WHERE LENGTH(data) = 65535;
DELETE FROM serialized_dag
WHERE LENGTH(data) = 65535;
```
**What you expected to happen**:
Should be able to click on the DAG in the UI without error
**How to reproduce it**:
With a MySQL metadata backend, create a DAG with code that is larger than 64KB and enable DAG serialization. Then attempt to click on that DAG in the UI.
**Anything else we need to know**: N/A
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil closed issue #12515: DAG serialization JSONDecodeError
Posted by GitBox <gi...@apache.org>.
kaxil closed issue #12515:
URL: https://github.com/apache/airflow/issues/12515
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ahuynh3 commented on issue #12515: DAG serialization JSONDecodeError
Posted by GitBox <gi...@apache.org>.
ahuynh3 commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731345903
5.6.43
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #12515: DAG serialization JSONDecodeError
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731302065
Thanks for opening your first issue here! Be sure to follow the issue template!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ahuynh3 commented on issue #12515: DAG serialization JSONDecodeError
Posted by GitBox <gi...@apache.org>.
ahuynh3 commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731354721
Ah nice catch -- seems like we can resolve this issue fairly easily by upgrading our MySQL version. Feel free to close this issue out if you don't think it's worth fixing for 5.6.x. Thanks again!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #12515: DAG serialization JSONDecodeError
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731361173
Cool, yeah with Airflow 2.0 around the corner, I think it would be worth for you to at least upgrade to 5.7 and even better 8.0 if you want to run multiple Schedulers in 2.0 ;)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #12515: DAG serialization JSONDecodeError
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731344966
@ahuynh3 What version of MySQL do you use?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #12515: DAG serialization JSONDecodeError
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731342045
> @kaxil Here's the issue described [in this Slack thread](https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1605839981357300)
Thanks @ahuynh3
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #12515: DAG serialization JSONDecodeError
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731347431
That is why, Mysql 5.7+/MariaDB 10.2.3 has JSON support
https://github.com/apache/airflow/blob/20843ff89ddbdac45f7ecf9913c4e38685089eb4/airflow/migrations/versions/d38e04c12aa2_add_serialized_dag_table.py#L42-L44
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ahuynh3 commented on issue #12515: DAG serialization JSONDecodeError
Posted by GitBox <gi...@apache.org>.
ahuynh3 commented on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731302447
@kaxil Here's the issue described [in this Slack thread](https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1605839981357300)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ahuynh3 edited a comment on issue #12515: DAG serialization JSONDecodeError
Posted by GitBox <gi...@apache.org>.
ahuynh3 edited a comment on issue #12515:
URL: https://github.com/apache/airflow/issues/12515#issuecomment-731345903
@kaxil 5.6.43
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org