You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/30 20:09:20 UTC

[GitHub] [airflow] yuzeh opened a new issue #13393: Airflow 1.10 -> 2.0 DB upgrade broke

yuzeh opened a new issue #13393:
URL: https://github.com/apache/airflow/issues/13393


   **Apache Airflow version**: 2.0.0
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): N/A
   
   **Environment**: docker-compose
   
   - **Cloud provider or hardware configuration**: local / gcp
   - **OS** (e.g. from /etc/os-release): Ubuntu 20 on WSL 2 (local) / Ubuntu 18 (gcp)
   - **Kernel** (e.g. `uname -a`): 4.19.128-microsoft-standard (local) / 5.4.0-1032-gcp (gcp)
   - **Install tools**: pip (for airflow), apt (for dependencies)
   - **Others**:
   
   **What happened**:
   We cannot upgrade to Airflow 2.0 while retaining our existing database (which we have used since Airflow 1.10.11).
   
   <details>
   <summary>
   
   After running `airflow db upgrade` and then running `airflow scheduler`, the scheduler reports the following error and crashes.
   
   This error doesn't show up when we set up Airflow 2.0 from a fresh db, so I'm inclined to believe something in our Airflow DB is corrupted.
   
   </summary>
   
   ```
   webserver_1        | Traceback (most recent call last):
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/pendulum/tz/zoneinfo/reader.py", line 50, in read_for
   webserver_1        |     file_path = pytzdata.tz_path(timezone)
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/pytzdata/__init__.py", line 74, in tz_path
   webserver_1        |     raise TimezoneNotFound('Timezone {} not found at {}'.format(name, filepath))
   webserver_1        | pytzdata.exceptions.TimezoneNotFound: Timezone tzlocal() not found at /usr/local/lib/python3.7/site-packages/pytzdata/zoneinfo/tzlocal()
   webserver_1        |
   webserver_1        | During handling of the above exception, another exception occurred:
   webserver_1        |
   webserver_1        | Traceback (most recent call last):
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
   webserver_1        |     self._run_scheduler_loop()
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
   webserver_1        |     num_queued_tis = self._do_scheduling(session)
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1474, in _do_scheduling
   webserver_1        |     self._create_dag_runs(query.all(), session)
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1557, in _create_dag_runs
   webserver_1        |     dag = self.dagbag.get_dag(dag_model.dag_id, session=session)
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/airflow/utils/session.py", line 62, in wrapper
   webserver_1        |     return func(*args, **kwargs)
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 171, in get_dag
   webserver_1        |     self._add_dag_from_db(dag_id=dag_id, session=session)
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 229, in _add_dag_from_db
   webserver_1        |     dag = row.dag
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/airflow/models/serialized_dag.py", line 167, in dag
   webserver_1        |     dag = SerializedDAG.from_dict(self.data)  # type: Any
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/airflow/serialization/serialized_objects.py", line 719, in from_dict
   webserver_1        |     return cls.deserialize_dag(serialized_obj['dag'])
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/airflow/serialization/serialized_objects.py", line 655, in deserialize_dag
   webserver_1        |     v = cls._deserialize_timezone(v)
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/pendulum/tz/__init__.py", line 37, in timezone
   webserver_1        |     tz = _Timezone(name, extended=extended)
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/pendulum/tz/timezone.py", line 40, in __init__
   webserver_1        |     tz = read(name, extend=extended)
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/pendulum/tz/zoneinfo/__init__.py", line 9, in read
   webserver_1        |     return Reader(extend=extend).read_for(name)
   webserver_1        |   File "/usr/local/lib/python3.7/site-packages/pendulum/tz/zoneinfo/reader.py", line 52, in read_for
   webserver_1        |     raise InvalidTimezone(timezone)
   webserver_1        | pendulum.tz.zoneinfo.exceptions.InvalidTimezone: Invalid timezone "tzlocal()"
   webserver_1        | [2020-12-30 08:55:39,257] {{settings.py:52}} INFO - Configured default timezone Timezone('UTC')
   ```
   </details>
   
   **How to reproduce it**:
   
   I've been able to reproduce this on two systems by using the same DB backup, but cannot share the DB backup as it contains confidential information.
   
   Any thoughts on how to develop a minimal reproducible test case would be appreciated!
   
   **Anything else we need to know**:
   
   We use some of the maintenance DAGs in this repo (https://github.com/teamclairvoyant/airflow-maintenance-dags), which directly edits the Airflow Metadata DB. I suspect that something that one of these DAGs did may have corrupted our Airflow DB.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #13393: Airflow 1.10 -> 2.0 DB upgrade broke

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13393:
URL: https://github.com/apache/airflow/issues/13393#issuecomment-752746108


   Do you use subdags? This is likely some problem with the data in the serialized_dag table. The way to reproduce this would be to one-by-one try deleting the rows from that table until the problem goes away.
   
   That would at least help narrow down the problem.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #13393: Airflow 1.10 -> 2.0 DB upgrade broke

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #13393:
URL: https://github.com/apache/airflow/issues/13393#issuecomment-818324060


   This issue has been closed because it has not received response from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed issue #13393: Airflow 1.10 -> 2.0 DB upgrade broke

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #13393:
URL: https://github.com/apache/airflow/issues/13393


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuzeh commented on issue #13393: Airflow 1.10 -> 2.0 DB upgrade broke

Posted by GitBox <gi...@apache.org>.
yuzeh commented on issue #13393:
URL: https://github.com/apache/airflow/issues/13393#issuecomment-752784943


   Hi @ashb!
   
   1. Scheduler / Webserver - Our `docker-compose` service is called "webserver", but it launches a scheduler process in the background before starting the webserver process.
   2. Database - Postgres 9.6
   3. Local time - `/etc/adjtime` doesn't exist. I'm not sure how to configure local time but it seems like we don't have one configured.
   4. We don't use subdags at all. (Would removing rows one-by-one work still? I can certainly do that)
   
   Thank you so much!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #13393: Airflow 1.10 -> 2.0 DB upgrade broke

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13393:
URL: https://github.com/apache/airflow/issues/13393#issuecomment-752743383


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #13393: Airflow 1.10 -> 2.0 DB upgrade broke

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #13393:
URL: https://github.com/apache/airflow/issues/13393#issuecomment-813121791


   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #13393: Airflow 1.10 -> 2.0 DB upgrade broke

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13393:
URL: https://github.com/apache/airflow/issues/13393#issuecomment-754568155


   @yuzeh You'll need to work out which of the dags in your DB (i.e. in a copy of your DB go and manually delete some rows from the dag and serialized_dag tables) to track down which DAG is causing this problem.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #13393: Airflow 1.10 -> 2.0 DB upgrade broke

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #13393:
URL: https://github.com/apache/airflow/issues/13393#issuecomment-752745781


   You say the scheduler crashed, but are showing logs from the webserver.
   
   What database are you using?
   
   Do you have a local time configured? What is in `/etc/adjtime`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org