You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/11/24 21:45:54 UTC

[GitHub] [airflow] dstandish commented on issue #6210: [AIRFLOW-5567] BaseReschedulePokeOperator

dstandish commented on issue #6210: [AIRFLOW-5567] BaseReschedulePokeOperator
URL: https://github.com/apache/airflow/pull/6210#issuecomment-557931240
 
 
   So there were concerns with @Fokko's xcom change re idempotency.
   
   I think it makes sense to create second table, very similar to xcom, but designed specifically to support stateful tasks.  The table could perhaps be called TaskState.  
   This task state should not be pegged to a specific execution date, because execution date only really makes sense for non-stateful tasks.  And execution date can be out of sequence with actual run time.
   I think it might make sense to make it so we don't do updates: when state changes, we insert a new record with the current state.  Primary key would be dag id / task id / timestamp.  To get current state, we get the last record for the dag / task.  It's possible we could allow state to be namespaced under task id with a column `key` like is done with XCom but I don't think it's necessary. 
   
   I previously shared the concern, why create another table that is almost identical to xcom.  But the reality is XCom is problematic for stateful tasks in a number of ways.  Obviously there is the clearing / idempotency issue. But additionally if you use trigger dag, with XCom your next scheduled run won't get current state because it sorts by execution_date.
   
   WDYT?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services