You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/19 15:01:05 UTC

[GitHub] [airflow] Bruschkov opened a new issue #20956: Scheduler crashes due to "MySQLdb IntegrityError: Duplicate entry" (Airflow 2.1.1)

Bruschkov opened a new issue #20956:
URL: https://github.com/apache/airflow/issues/20956


   ### Apache Airflow version
   
   2.1.1
   
   ### What happened
   
   Scheduler regularly crashes with error messages like this:
   
   ```
   MySQLdb._exceptions.IntegrityError: (1062, "Duplicate entry 'some-ETL-2022-01-19 14:00:00.000000' for key 'dag_id'")                                                                                                                                
                                                                                                                                                                                                                                                                
   The above exception was the direct cause of the following exception:                                                                                                                                                                                         
                                                                                                                                                                                                                                                                
   Traceback (most recent call last):                                                                                                                                                                                                                           
     File "/home/airflow/.local/bin/airflow", line 8, in <module>                                                                                                                                                                                               
       sys.exit(main())                                                                                                                                                                                                                                         
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/__main__.py", line 40, in main                                                                                                                                                              
       args.func(args)                                                                                                                                                                                                                                          
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command                                                                                                                                                     
       return func(*args, **kwargs)                                                                                                                                                                                                                             
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line 91, in wrapper                                                                                                                                                          
       return f(*args, **kwargs)                                                                                                                                                                                                                                
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/scheduler_command.py", line 64, in scheduler                                                                                                                                   
       job.run()                                                                                                                                                                                                                                                
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/base_job.py", line 237, in run                                                                                                                                                         
       self._execute()                                                                                                                                                                                                                                          
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1303, in _execute                                                                                                                                              
       self._run_scheduler_loop()                                                                                                                                                                                                                               
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1396, in _run_scheduler_loop                                                                                                                                   
       num_queued_tis = self._do_scheduling(session)                                                                                                                                                                                                            
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1492, in _do_scheduling                                                                                                                                        
       self._create_dagruns_for_dags(guard, session)                                                                                                                                                                                                            
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/retries.py", line 76, in wrapped_function                                                                                                                                             
       for attempt in run_with_db_retries(max_retries=retries, logger=logger, **retry_kwargs):                                                                                                                                                                  
     File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 390, in __iter__                                                                                                                                                        
       do = self.iter(retry_state=retry_state)                                                                                                                                                                                                                  
     File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 356, in iter                                                                                                                                                            
       return fut.result()                                                                                                                                                                                                                                      
     File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result                                                                                                                                                                           
       return self.__get_result()                                                                                                                                                                                                                               
     File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result                                                                                                                                                                     
       raise self._exception                                                                                                                                                                                                                                    
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/retries.py", line 85, in wrapped_function                                                                                                                                             
       return func(*args, **kwargs)                               
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1583, in _create_dagruns_for_dags                                                                                                                              
       self._create_dag_runs(query.all(), session)                                                                               
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1625, in _create_dag_runs                                                                                                                                      
       run = dag.create_dagrun(                                   
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py", line 67, in wrapper                                                                                                                                                      
       return func(*args, **kwargs)                               
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dag.py", line 1796, in create_dagrun                                                                                                                                                 
       session.flush()                                            
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2523, in flush                                                                                                                                                     
       self._flush(objects)                                       
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2664, in _flush                                                                                                                                                    
       transaction.rollback(_capture_exception=True)                                                                             
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__                                                                                                                                               
       compat.raise_(                                             
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_                                                                                                                                                     
       raise exception                                            
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2624, in _flush                                                                                                                                                    
       flush_context.execute()                                    
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute        
   ...  
   ```
   
   ### What you expected to happen
   
   We would expect these errors not to occur. According to https://github.com/apache/airflow/issues/9148 and https://github.com/apache/airflow/issues/13925 this issue should have been fixed a couple of versions ago.
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   kubernetes
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-amazon==2.0.0
   apache-airflow-providers-celery==2.0.0
   apache-airflow-providers-cncf-kubernetes==2.0.0
   apache-airflow-providers-docker==2.0.0
   apache-airflow-providers-elasticsearch==2.0.1
   apache-airflow-providers-ftp==2.0.0
   apache-airflow-providers-google==4.0.0
   apache-airflow-providers-grpc==2.0.0
   apache-airflow-providers-hashicorp==2.0.0
   apache-airflow-providers-http==2.0.0
   apache-airflow-providers-imap==2.0.0
   apache-airflow-providers-microsoft-azure==3.0.0
   apache-airflow-providers-mysql==2.0.0
   apache-airflow-providers-odbc==2.0.0
   apache-airflow-providers-postgres==2.0.0
   apache-airflow-providers-redis==2.0.0
   apache-airflow-providers-sendgrid==2.0.0
   apache-airflow-providers-sftp==2.0.0
   apache-airflow-providers-slack==4.0.0
   apache-airflow-providers-sqlite==2.0.0
   apache-airflow-providers-ssh==2.0.0
   
   ```
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   deployed via https://artifacthub.io/packages/helm/airflow-helm/airflow/8.5.0 to kubernetes cluster (kubernetes 1.18).
   
   Backend is a mariaDB (10.3.31)
   
   Docker image used as base image: apache/airflow:2.1.1-python3.8
   
   Additional python dependencies installed: 
   ```
   airflow-exporter==1.5.2
   boto3==1.18.58
   s3fs==0.4.*
   pandas==1.3.3
   sqlalchemy==1.3.18
   sqlalchemy-redshift==0.8.2
   smart_open[aws]==2.1.*
   # Use PyMySQL as dialect to fix SSL connection error
   PyMySQL==1.0.2
   ```
   
   Relevant parts of the airflow configuration:
   ```
   airflow:
     config:
       # [core]
       AIRFLOW__CORE__PARALLELISM: "24"
       AIRFLOW__CORE__DAG_CONCURRENCY: "20"
       AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG: "1"
       AIRFLOW__CORE__LOAD_EXAMPLES: "False"
       AIRFLOW__CORE__STORE_SERIALIZED_DAGS: "False"
   ```
   
   ### Anything else
   
   Between 1 and 10 scheduler restarts per hour on average with the above error message.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20956: Scheduler crashes due to "MySQLdb IntegrityError: Duplicate entry" (Airflow 2.1.1)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20956:
URL: https://github.com/apache/airflow/issues/20956#issuecomment-1024976715


   It's impossible to get any reprosucibility here. You likely have some duplication in your dag_ids - same dag_ids coming from multiple files. But it's hard to say what the problem is . Converting it into discussion. If you provide more information on your dag structure,  logical dags you have and dynamic dags generation schemes you have maybe we can somehow help


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #20956: Scheduler crashes due to "MySQLdb IntegrityError: Duplicate entry" (Airflow 2.1.1)

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #20956:
URL: https://github.com/apache/airflow/issues/20956#issuecomment-1016551528


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org