You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/25 09:43:22 UTC

[GitHub] [airflow] parisni opened a new issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

parisni opened a new issue #19822:
URL: https://github.com/apache/airflow/issues/19822


   ### Official Helm Chart version
   
   1.3.0 (latest released)
   
   ### Apache Airflow version
   
   2.2.1
   
   ### Kubernetes Version
   
   1.2.1
   
   ### Helm Chart configuration
   
   ```yaml
   airflow:
   
     env:
       - name: AIRFLOW__SCHEDULER__USE_ROW_LEVEL_LOCKING
         value: "True"
   
   webserver:
     livenessProbe:
       initialDelaySeconds: 100
       timeoutSeconds: 100
       failureThreshold: 20
       periodSeconds: 25
   
     readinessProbe:
       initialDelaySeconds: 100
       timeoutSeconds: 100
       failureThreshold: 20
       periodSeconds: 25
   
   
   scheduler:
     replicas: 2
   
   executor: CeleryExecutor
   ```
   
   ### Docker Image customisations
   
   _No response_
   
   ### What happened
   
   ```
   duplicate key value violates unique constraint "serialized_dag_pkey"
   DETAIL:  Key (dag_id)=(test_dag_454) already exists.    
    scheduler [SQL: INSERT INTO serialized_dag (dag_id, fileloc, fileloc_hash, data, last_updated, dag_hash) VALUES (%(dag_id)s, %(fileloc)s, %(fileloc_hash)s, scheduler [parameters: {'dag_id': 'test_dag_454', 'fileloc': '/opt/airflow/dags/repo/test_dag_454.py', 'fileloc_hash': 14390114031793844, 'data': '{"__vers  scheduler (Background on this error at: http://sqlalche.me/e/13/gkpj)
   ```
   
   ### What you expected to happen
   
   _No response_
   
   ### How to reproduce
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19822:
URL: https://github.com/apache/airflow/issues/19822#issuecomment-979323697


   > Can you post the logs from your scheduler / processor logs. I _don't think_ you should see that error in scheduler/parser logs -- errors in postgres logs is expected
   
   Ah ... My bad. I thought the format of the logs is pretty strange (but well who am I to judge formats that our users choose) but if those are Postgres logs then yeah - they are entirely expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] parisni commented on issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #19822:
URL: https://github.com/apache/airflow/issues/19822#issuecomment-979415050


   @potiuk @kaxil 
   see the log in the issue (not my last comment) they are indeed comming from the scheduler.
   
   ```
    duplicate key value violates unique constraint "serialized_dag_pkey"
   DETAIL:  Key (dag_id)=(test_dag_454) already exists.    
    scheduler [SQL: INSERT INTO serialized_dag (dag_id, fileloc, fileloc_hash, data, last_updated, dag_hash) VALUES (%(dag_id)s, %(fileloc)s, %(fileloc_hash)s, scheduler [parameters: {'dag_id': 'test_dag_454', 'fileloc': '/opt/airflow/dags/repo/test_dag_454.py', 'fileloc_hash': 14390114031793844, 'data': '{"__vers  scheduler (Background on this error at: http://sqlalche.me/e/13/gkpj)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #19822:
URL: https://github.com/apache/airflow/issues/19822#issuecomment-980213377


   > @potiuk @kaxil see the log in the issue (not my last comment) they are indeed comming from the scheduler.
   > 
   > ```
   >  duplicate key value violates unique constraint "serialized_dag_pkey"
   > DETAIL:  Key (dag_id)=(test_dag_454) already exists.    
   >  scheduler [SQL: INSERT INTO serialized_dag (dag_id, fileloc, fileloc_hash, data, last_updated, dag_hash) VALUES (%(dag_id)s, %(fileloc)s, %(fileloc_hash)s, scheduler [parameters: {'dag_id': 'test_dag_454', 'fileloc': '/opt/airflow/dags/repo/test_dag_454.py', 'fileloc_hash': 14390114031793844, 'data': '{"__vers  scheduler (Background on this error at: http://sqlalche.me/e/13/gkpj)
   > ```
   
   Can you post entire stacktrace please .. at least 5-10 lines before that message


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #19822:
URL: https://github.com/apache/airflow/issues/19822


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] parisni commented on issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #19822:
URL: https://github.com/apache/airflow/issues/19822#issuecomment-979077641


   still apply with 2.2.2. This does not lead to scheduler crash, still they likely to try to insert sames dags same time resulting in pg PK errors
   
   ```
   │ 2021-11-25 10:34:42.496 GMT [152] ERROR:  duplicate key value violates unique constraint "dag_pkey"                                                         │
   
   │ 2021-11-25 10:34:42.496 GMT [152] DETAIL:  Key (dag_id)=(test_dag_386) already exists.                                                                      │
   │ 2021-11-25 10:34:42.496 GMT [152] STATEMENT:  INSERT INTO dag (dag_id, root_dag_id, is_paused, is_subdag, is_active, last_parsed_time, last_pickled, last_e │
   │ 2021-11-25 10:34:47.649 GMT [152] ERROR:  duplicate key value violates unique constraint "serialized_dag_pkey"                                              │
   │ 2021-11-25 10:34:47.649 GMT [152] DETAIL:  Key (dag_id)=(test_dag_356) already exists.                                                                      │
   │ 2021-11-25 10:34:47.649 GMT [152] STATEMENT:  INSERT INTO serialized_dag (dag_id, fileloc, fileloc_hash, data, last_updated, dag_hash) VALUES ('test_dag_35 │
   │ 2021-11-25 10:34:48.731 GMT [444] ERROR:  duplicate key value violates unique constraint "serialized_dag_pkey"                                              │
   │ 2021-11-25 10:34:48.731 GMT [444] DETAIL:  Key (dag_id)=(test_dag_379) already exists.                                                                      │
   │ 2021-11-25 10:34:48.731 GMT [444] STATEMENT:  INSERT INTO serialized_dag (dag_id, fileloc, fileloc_hash, data, last_updated, dag_hash) VALUES ('test_dag_37 │
   │ 2021-11-25 10:34:50.349 GMT [347] ERROR:  duplicate key value violates unique constraint "serialized_dag_pkey"                                              │
   │ 2021-11-25 10:34:50.349 GMT [347] DETAIL:  Key (dag_id)=(test_dag_379) already exists.                                                                      │
   │ 2021-11-25 10:34:50.349 GMT [347] STATEMENT:  INSERT INTO serialized_dag (dag_id, fileloc, fileloc_hash, data, last_updated, dag_hash) VALUES ('test_dag_37 │
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19822:
URL: https://github.com/apache/airflow/issues/19822#issuecomment-979323697


   > Can you post the logs from your scheduler / processor logs. I _don't think_ you should see that error in scheduler/parser logs -- errors in postgres logs is expected
   
   Ah ... My bad. I thought the format of the logs is pretty strange (but well who am I too judge formats that our users choose) but if those are Postgres logs then yeah - they are entirely expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19822:
URL: https://github.com/apache/airflow/issues/19822#issuecomment-979324896


   I think we can close it. @parisni. If you see the errors in airflow logs as well please comment here - we will re-open it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19822:
URL: https://github.com/apache/airflow/issues/19822#issuecomment-979199199


   I believe this is sometimes expected. From what I understand, the HA scheduler (file Processor especially) does not have tthe guarantee to avoid multiple files being re-processed, but it does have the guarantee that they are not duplicated in the db - and this sounds to be the case.
   
   Maybe we should just handle it in nicer way and give some more meanigful message at INFO level rather than ERROR  @ashb  @ephraimbuddy  ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #19822:
URL: https://github.com/apache/airflow/issues/19822#issuecomment-979311959


   Can you post the logs from your scheduler / processor logs. I _don't think_ you should see that error in scheduler/parser logs -- errors in postgres logs is expected


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #19822: Helm schduler HA: duplicate key value violates unique constraint "serialized_dag_pkey"

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #19822:
URL: https://github.com/apache/airflow/issues/19822#issuecomment-979043236


   We need more information here especially how to reproduce. Can you try in 2.2.2?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org