You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/18 15:15:04 UTC

[GitHub] [airflow] turbaszek opened a new issue #10382: Add on_kill method to BigQueryInsertJobOperator

turbaszek opened a new issue #10382:
URL: https://github.com/apache/airflow/issues/10382


   **Description**
   
   Add possibility to cancel running job if the operator is killed. This option probably should be cofigurable due to idempotency logic of the operator.
   
   **Use case / motivation**
   
   Remove dangling jobs.
   
   **Related Issues**
   
   https://github.com/apache/airflow/pull/9590
   https://github.com/apache/airflow/pull/8858
   https://github.com/apache/airflow/pull/6470
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #10382:
URL: https://github.com/apache/airflow/issues/10382#issuecomment-683631001


   Thanks @jaketf for response! Do you think the edge case is something crucial to handle or should we see in the future if this causes problems?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #10382:
URL: https://github.com/apache/airflow/issues/10382#issuecomment-675541455


   @edejong @jaketf @potiuk I would love to hear your opinion on this one 👍 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jaketf commented on issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
jaketf commented on issue #10382:
URL: https://github.com/apache/airflow/issues/10382#issuecomment-675661153


   I think this is straightforward for import / query / copy jobs as they are all internal to bigquery and committed atomically.
   
   There may be a corner case with extract (to GCS) jobs. I do not believe export jobs > 1GB are atomic because the BigQuery export will write sharded GCS files. I imagine if killed at just the right time there would be just some portion of those sharded files committed to gcs.
   Would our expected `on_kill` behavior be to clean up those files?
   If we were to rerun the same export (with the same destination URIs in the config) those files would likely just be overwritten.
   UNLESS the table has become much smaller or larger between the original (killed) try and the second try (causing the number of shards to change).
   
   For example:
   Original extract commits these files to GCS 
   shard-00-of-5
   shard-01-of-5
   [original extract job killed]
   [we delete a few partitions from the source table]
   [submit a new extract w/ same config]
   shard-00-of-3
   shard-01-of-3
   shard-02-of-3
   
   This will leave the GCS prefix looking like this:
   shard-00-of-3
   shard-00-of-5
   shard-01-of-3
   shard-00-of-5
   shard-02-of-3
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #10382:
URL: https://github.com/apache/airflow/issues/10382#issuecomment-691256654






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #10382:
URL: https://github.com/apache/airflow/issues/10382#issuecomment-691256654


   Closed in #10866


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek closed issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
turbaszek closed issue #10382:
URL: https://github.com/apache/airflow/issues/10382






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #10382:
URL: https://github.com/apache/airflow/issues/10382#issuecomment-691256654


   Closed in #10866


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #10382:
URL: https://github.com/apache/airflow/issues/10382#issuecomment-690523736


   @tszerszen awesome! I will take a look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek closed issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
turbaszek closed issue #10382:
URL: https://github.com/apache/airflow/issues/10382


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tszerszen commented on issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
tszerszen commented on issue #10382:
URL: https://github.com/apache/airflow/issues/10382#issuecomment-690449025


   @turbaszek 
   Since it's very similar to [#10381](https://github.com/apache/airflow/issues/10381), I created [PR](https://github.com/apache/airflow/pull/10866) with this issue also, is that ok?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek closed issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
turbaszek closed issue #10382:
URL: https://github.com/apache/airflow/issues/10382


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek closed issue #10382: Add on_kill method to BigQueryInsertJobOperator

Posted by GitBox <gi...@apache.org>.
turbaszek closed issue #10382:
URL: https://github.com/apache/airflow/issues/10382


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org