You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/12/18 04:23:01 UTC

[jira] [Commented] (AIRFLOW-2280) Extra argument for comparison with another table in IntervalCheckOperator

    [ https://issues.apache.org/jira/browse/AIRFLOW-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723671#comment-16723671 ] 

ASF GitHub Bot commented on AIRFLOW-2280:
-----------------------------------------

stale[bot] closed pull request #3186: [AIRFLOW-2280]Add feature in CheckIntervalOperator
URL: https://github.com/apache/incubator-airflow/pull/3186
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/check_operator.py b/airflow/operators/check_operator.py
index 9994671a70..682a15749c 100644
--- a/airflow/operators/check_operator.py
+++ b/airflow/operators/check_operator.py
@@ -181,6 +181,9 @@ class IntervalCheckOperator(BaseOperator):
 
     :param table: the table name
     :type table: str
+    :param check_with_table: the table name to check against, default None
+    indicates comparing within the same table
+    :type table: str
     :param days_back: number of days between ds and the ds we want to check
         against. Defaults to 7 days
     :type days_back: int
@@ -197,7 +200,7 @@ class IntervalCheckOperator(BaseOperator):
 
     @apply_defaults
     def __init__(
-            self, table, metrics_thresholds,
+            self, table, metrics_thresholds, check_with_table=None,
             date_filter_column='ds', days_back=-7,
             conn_id=None,
             *args, **kwargs):
@@ -208,11 +211,15 @@ def __init__(
         self.date_filter_column = date_filter_column
         self.days_back = -abs(days_back)
         self.conn_id = conn_id
+        if not check_with_table:
+            check_with_table = table
         sqlexp = ', '.join(self.metrics_sorted)
-        sqlt = ("SELECT {sqlexp} FROM {table}"
-                " WHERE {date_filter_column}=").format(**locals())
-        self.sql1 = sqlt + "'{{ ds }}'"
-        self.sql2 = sqlt + "'{{ macros.ds_add(ds, "+str(self.days_back)+") }}'"
+        sqlt1 = ("SELECT {sqlexp} FROM {table}"
+                 " WHERE {date_filter_column}=").format(**locals())
+        self.sql1 = sqlt1 + "'{{ ds }}'"
+        sqlt2 = ("SELECT {sqlexp} FROM {check_with_table}"
+                 " WHERE {date_filter_column}=").format(**locals())
+        self.sql2 = sqlt2 + "'{{ macros.ds_add(ds, " + str(self.days_back) + ") }}'"
 
     def execute(self, context=None):
         hook = self.get_db_hook()


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Extra argument for comparison with another table in IntervalCheckOperator
> -------------------------------------------------------------------------
>
>                 Key: AIRFLOW-2280
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2280
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: core
>            Reporter: Yuyin Yang
>            Assignee: Yuyin Yang
>            Priority: Minor
>
> Current IntervalCheckOperator can only check the values of metrics given as SQL expressions are within a certain tolerance of the ones from days_back before for the same table. For example, if I set metrics as COUNT(*), threshold ratio=1.5,  and days_back=-7, then I can compare the count of this table at current, and the count of same table 7 days back.
> However, in practice, we would like to first load tables to a tmp dataset, which has an expiration date. And after validation, we start to load it to production dataset. In this case, it makes more sense to compare the current tmp one, with production dataset days_back, because days_back temporary table may not exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)