You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/12 18:04:49 UTC

[GitHub] [airflow] Gollum999 opened a new issue, #25698: Backfill mode with mapped tasks: "Failed to populate all mapping metadata"

Gollum999 opened a new issue, #25698:
URL: https://github.com/apache/airflow/issues/25698

   ### Apache Airflow version
   
   2.3.3
   
   ### What happened
   
   I was backfilling some DAGs that use dynamic tasks when I got an exception like the following:
   
   ```
   Traceback (most recent call last):
     File "/opt/conda/envs/production/bin/airflow", line 11, in <module>
       sys.exit(main())
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/__main__.py", line 38, in main
       args.func(args)
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/cli/cli_parser.py", line 51, in command
       return func(*args, **kwargs)
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/utils/cli.py", line 99, in wrapper
       return f(*args, **kwargs)
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/cli/commands/dag_command.py", line 107, in dag_backfill
       dag.run(
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/models/dag.py", line 2288, in run
       job.run()
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/jobs/base_job.py", line 244, in run
       self._execute()
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/utils/session.py", line 71, in wrapper
       return func(*args, session=session, **kwargs)
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/jobs/backfill_job.py", line 847, in _execute
       self._execute_dagruns(
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/utils/session.py", line 68, in wrapper
       return func(*args, **kwargs)
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/jobs/backfill_job.py", line 737, in _execute_dagruns
       processed_dag_run_dates = self._process_backfill_task_instances(
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/utils/session.py", line 68, in wrapper
       return func(*args, **kwargs)
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/jobs/backfill_job.py", line 612, in _process_backfill_task_instances
       for node, run_id, new_mapped_tis, max_map_index in self._manage_executor_state(
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/jobs/backfill_job.py", line 270, in _manage_executor_state
       new_tis, num_mapped_tis = node.expand_mapped_task(ti.run_id, session=session)
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/models/mappedoperator.py", line 614, in expand_mapped_task
       operator.mul, self._resolve_map_lengths(run_id, session=session).values()
     File "/opt/conda/envs/production/lib/python3.9/site-packages/airflow/models/mappedoperator.py", line 600, in _resolve_map_lengths
       raise RuntimeError(f"Failed to populate all mapping metadata; missing: {keys}")
   RuntimeError: Failed to populate all mapping metadata; missing: 'x'
   ```
   
   Digging further, it appears this always happens if the task used as input to an `.expand` raises an Exception.  Airflow doesn't handle this exception gracefully like it does with exceptions in "normal" tasks, which can lead to other errors from deeper within Airflow.  This also means that since this is not a "typical" failure case, things like `--rerun-failed-tasks` do not work as expected.
   
   ### What you think should happen instead
   
   Airflow should fail gracefully if exceptions are raised in dynamic task generators.
   
   ### How to reproduce
   
   ```
   #!/usr/bin/env python3
   
   import datetime
   import logging
   
   from airflow.decorators import dag, task
   
   
   logger = logging.getLogger(__name__)
   
   
   @dag(
       schedule_interval='@daily',
       start_date=datetime.datetime(2022, 8, 12),
       default_args={
           'retries': 5,
           'retry_delay': 5.0,
       },
   )
   def test_backfill():
       @task
       def get_tasks(ti=None):
           logger.info(f'{ti.try_number=}')
           if ti.try_number < 3:
               raise RuntimeError('')
           return ['a', 'b', 'c']
   
       @task
       def do_stuff(x=None, ti=None):
           logger.info(f'do_stuff: {x=}, {ti.try_number=}')
           if ti.try_number < 3:
               raise RuntimeError('')
   
       do_stuff.expand(x=do_stuff.expand(x=get_tasks()))
       do_stuff() >> do_stuff()
   
   
   dag = test_backfill()
   
   
   if __name__ == '__main__':
       dag.cli()
   ```
   ```
   airflow dags backfill test_backfill -s 2022-08-05 -e 2022-08-07 --rerun-failed-tasks
   ```
   
   ### Operating System
   
   CentOS Stream 8
   
   ### Versions of Apache Airflow Providers
   
   None
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   Standalone
   
   ### Anything else
   
   I was able to reproduce this both with SQLite + `SequentialExecutor` as well as with Postgres + `LocalExecutor`.
   
   I haven't yet been able to reproduce this outside of `backfill` mode.
   
   Possibly related since they mention the same exception text:
   * #23533
   * #23642
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #25698: Backfill mode with mapped tasks: "Failed to populate all mapping metadata"

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #25698:
URL: https://github.com/apache/airflow/issues/25698#issuecomment-1213381464

   After #25661 this may cause a different error. I will need to look into this further (not now).
   
   To clarify, the downstream (mapped task) will never run correctly in any scenarios, since if the upstream raises an exception, there’s nothing the task can be expanded into. But the scheduler should handle this more gracefully.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr closed issue #25698: Backfill mode with mapped tasks: "Failed to populate all mapping metadata"

Posted by GitBox <gi...@apache.org>.
uranusjr closed issue #25698: Backfill mode with mapped tasks: "Failed to populate all mapping metadata"
URL: https://github.com/apache/airflow/issues/25698


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org