You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Bence Nagy (JIRA)" <ji...@apache.org> on 2016/05/02 17:58:13 UTC

[jira] [Comment Edited] (AIRFLOW-30) Make preoperators part of the same transaction as the actual operation

    [ https://issues.apache.org/jira/browse/AIRFLOW-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266870#comment-15266870 ] 

Bence Nagy edited comment on AIRFLOW-30 at 5/2/16 3:57 PM:
-----------------------------------------------------------

[~criccomini]
{quote}
what happens if the DAG fails half way through and the transaction is reverted? In such a case, Airflow will show prior operators as having successfully run
{quote}

With my suggestion the transaction would not span multiple operators — {{GenericTransfer}} has a {{preoperator}} argument (see [the docs|http://pythonhosted.org/airflow/code.html#airflow.operators.GenericTransfer]) which just holds a query that's executed before the insertions. I was suggesting that we allow binding these together, with all of them being part of the same task instance.

{quote}
In your use cases, are the source and destination DBs part of the same DB cluster?
{quote}

Nope, different clusters.


was (Author: underyx):
[~criccomini]
{quote}
what happens if the DAG fails half way through and the transaction is reverted? In such a case, Airflow will show prior operators as having successfully run
{quote}

With my suggestion the transaction would not span multiple operators — {{GenericTransfer}} has a {{preoperator}} argument (see [the docs|http://pythonhosted.org/airflow/code.html#airflow.operators.GenericTransfer]) which just holds a query that's executed before the insertions. I was suggesting that we allow binding these together, with all being part of the same task instance.

{quote}
In your use cases, are the source and destination DBs part of the same DB cluster?
{quote}

Nope, different clusters.

> Make preoperators part of the same transaction as the actual operation
> ----------------------------------------------------------------------
>
>                 Key: AIRFLOW-30
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-30
>             Project: Apache Airflow
>          Issue Type: Improvement
>            Reporter: Bence Nagy
>
> All my use cases would work better if each operator would execute everything in one transaction. Two examples:
> - I want to {{GenericTransfer}} a set of rows from one DB to another, and I have to create the table first in the destination DB. I feel like it'd be a lot more clean if I didn't have empty tables lying around if the insertion fails for some reason later on.
> - I want to {{GenericTransfer}} all rows from an entire table periodically to sync it from one DB to another. To do this correctly I want to clear the destination table first to make sure I end up with no duplicate rows, so I'd have a {{DELETE * FROM dst_table}} preoperator. If the insertions fail afterwards, I'd end up with no data (it would be better in most cases to fall back to the old data), and even if everything is working correctly, I'll have an empty table while the insertions as still executing.
> To fix this, the relevant {{DbApiHook}} methods could support a new kwarg to set whether it should commit at the end.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)