You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/21 04:58:25 UTC

[GitHub] [airflow] uranusjr opened a new pull request #17122: AIP-39: Add schedule_date to OpenAPI DAGRun schema

uranusjr opened a new pull request #17122:
URL: https://github.com/apache/airflow/pull/17122


   I tried renaming `DagRun.execution_date` to `schedule_date` and it produced a plethora of errors everywhere, so I’m taking a less drastic attempt on this.
   
   This PR adds a `schedule_date` field to the OpenAPI schema, and allow POST-ing a new DagRun with that field (instead of `execution_date`), but not both of them. Both `schedule_date` and `execution_date` fields are present when GET-ing a DagRun.
   
   Unfortunately this is not currently working; I can’t successfully express the above and the POST tests are currently failing to validate. I have very little experience with OpenAPI specification, so I’m creating this as a draft hoping some can provide some advice.
   
   cc @ephraimbuddy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #17122: AIP-39: Add schedule_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#discussion_r673885918



##########
File path: airflow/api_connexion/endpoints/dag_run_endpoint.py
##########
@@ -242,24 +242,34 @@ def post_dag_run(dag_id, session):
     except ValidationError as err:
         raise BadRequest(detail=str(err))
 
+    try:
+        schedule_date = post_body.pop("schedule_date")
+    except KeyError:
+        schedule_date = post_body.pop("execution_date")
+

Review comment:
       ```suggestion
      schedule_date = post_body.get("schedule_date", post_body["execution_date"])
   ```
   This way we preserve the content of post_body. Or what do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy merged pull request #17122: AIP-39: Add logical_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
ephraimbuddy merged pull request #17122:
URL: https://github.com/apache/airflow/pull/17122


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #17122: AIP-39: Add logical_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#issuecomment-901921836


   Tests are failing due to #17728.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17122: AIP-39: Add logical_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#discussion_r691921974



##########
File path: airflow/models/dag.py
##########
@@ -2097,19 +2097,25 @@ def create_dagrun(
         :param data_interval: Data interval of the DagRun
         :type data_interval: tuple[datetime, datetime] | None
         """
-        if run_id and not run_type:
+        if run_id:  # Infer run_type from run_id if needed.
             if not isinstance(run_id, str):
                 raise ValueError(f"`run_id` expected to be a str is {type(run_id)}")
-            run_type: DagRunType = DagRunType.from_run_id(run_id)
-        elif run_type and execution_date:
+            if not run_type:
+                run_type = DagRunType.from_run_id(run_id)
+        elif run_type and execution_date:  # Generate run_id from run_type and execution_date.
             if not isinstance(run_type, DagRunType):
                 raise ValueError(f"`run_type` expected to be a DagRunType is {type(run_type)}")
             run_id = DagRun.generate_run_id(run_type, execution_date)
-        elif not run_id:
+        else:
             raise AirflowException(
                 "Creating DagRun needs either `run_id` or both `run_type` and `execution_date`"
             )

Review comment:
       This is actually fixing an existing but. Previously, when `run_id`, `run_type`, and `execution_date` are *all* provided, the explicit `run_id` value would be ignored and overwritten by an auto-generated value. This fixes the logic and respect the explicit `run_id` value if it’s provided.

##########
File path: airflow/models/dag.py
##########
@@ -2097,19 +2097,25 @@ def create_dagrun(
         :param data_interval: Data interval of the DagRun
         :type data_interval: tuple[datetime, datetime] | None
         """
-        if run_id and not run_type:
+        if run_id:  # Infer run_type from run_id if needed.
             if not isinstance(run_id, str):
                 raise ValueError(f"`run_id` expected to be a str is {type(run_id)}")
-            run_type: DagRunType = DagRunType.from_run_id(run_id)
-        elif run_type and execution_date:
+            if not run_type:
+                run_type = DagRunType.from_run_id(run_id)
+        elif run_type and execution_date:  # Generate run_id from run_type and execution_date.
             if not isinstance(run_type, DagRunType):
                 raise ValueError(f"`run_type` expected to be a DagRunType is {type(run_type)}")
             run_id = DagRun.generate_run_id(run_type, execution_date)
-        elif not run_id:
+        else:
             raise AirflowException(
                 "Creating DagRun needs either `run_id` or both `run_type` and `execution_date`"
             )

Review comment:
       This is actually fixing an existing bug. Previously, when `run_id`, `run_type`, and `execution_date` are *all* provided, the explicit `run_id` value would be ignored and overwritten by an auto-generated value. This fixes the logic and respect the explicit `run_id` value if it’s provided.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17122: AIP-39: Add schedule_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#discussion_r678088508



##########
File path: airflow/api_connexion/openapi/v1.yaml
##########
@@ -1875,17 +1875,26 @@ components:
         dag_id:
           type: string
           readOnly: true
-        execution_date:
-          description: >
-            The execution date. This is the time when the DAG run should be started according
-            to the DAG definition.
+        schedule_date:
+          type: string
+          nullable: true
+          description: |
+            The logical date (schedule date, previously called execution date). This is the time or interval
+            covered by this DAG run, according to the DAG definition.
 
             The value of this field can be set only when creating the object. If you try to modify the
             field of an existing object, the request fails with an BAD_REQUEST error.
 
             This together with DAG_ID are a unique key.
+          format: date-time
+        execution_date:
           type: string
+          nullable: true
+          description: |
+            The execution date. This is the same as schedule_date, kept for backwards compatibility.
+            A request with both schedule_date and this field will fail with an BAD_REQUEST error.

Review comment:
       You got me here 😆 I was undecided between the two and ended up doing different things in the two places. I’ll change the API spec to match implementation for now (i.e. only fail if values differ).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #17122: AIP-39: Add schedule_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#discussion_r674561315



##########
File path: airflow/api_connexion/schemas/dag_run_schema.py
##########
@@ -57,21 +60,43 @@ class Meta:
     run_id = auto_field(data_key='dag_run_id')
     dag_id = auto_field(dump_only=True)
     execution_date = auto_field(validate=validate_istimezone)
+    schedule_date = fields.Method(serialize="get_schedule_date", deserialize="load_schedule_date")
     start_date = auto_field(dump_only=True)
     end_date = auto_field(dump_only=True)
     state = DagStateField(dump_only=True)
     external_trigger = auto_field(default=True, dump_only=True)
     conf = ConfObject()
 
+    @staticmethod
+    def get_schedule_date(obj: DagRun):
+        return obj.execution_date
+
+    def load_schedule_date(self, value: str):
+        # Use the execution_date logic to load schedule_date.
+        return self.execution_date.deserialize(value, "schedule_date", {"schedule_date": value})
+
     @pre_load
     def autogenerate(self, data, **kwargs):
-        """Auto generate run_id and execution_date if they are not loaded"""
-        if "execution_date" not in data.keys():
-            data["execution_date"] = str(timezone.utcnow())
-        if "dag_run_id" not in data.keys():
+        """Auto generate run_id, schedule_date, and execution_date if they are not loaded"""
+        # Try to fill schedule_date and execution_date if the other is provided.
+        schedule_date = data.get("schedule_date", _MISSING)
+        execution_date = data.get("execution_date", _MISSING)
+        if schedule_date is execution_date is _MISSING:  # Both missing.
+            data["schedule_date"] = data["execution_date"] = str(timezone.utcnow())
+        elif schedule_date is _MISSING:  # Only schedule_date missing.
+            data["schedule_date"] = data["execution_date"]
+        elif execution_date is _MISSING:  # Only execution_date missing.
+            data["execution_date"] = data["schedule_date"]
+        elif schedule_date != execution_date:  # Both provided but don't match.
+            raise BadRequest(
+                "schedule_date conflicts with executrion_date",

Review comment:
       ```suggestion
                   "schedule_date conflicts with execution_date",
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #17122: AIP-39: Add schedule_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#discussion_r674564800



##########
File path: airflow/api_connexion/openapi/v1.yaml
##########
@@ -1875,17 +1875,26 @@ components:
         dag_id:
           type: string
           readOnly: true
-        execution_date:
-          description: >
-            The execution date. This is the time when the DAG run should be started according
-            to the DAG definition.
+        schedule_date:
+          type: string
+          nullable: true
+          description: |
+            The logical date (schedule date, previously called execution date). This is the time or interval
+            covered by this DAG run, according to the DAG definition.
 
             The value of this field can be set only when creating the object. If you try to modify the
             field of an existing object, the request fails with an BAD_REQUEST error.
 
             This together with DAG_ID are a unique key.
+          format: date-time
+        execution_date:
           type: string
+          nullable: true
+          description: |
+            The execution date. This is the same as schedule_date, kept for backwards compatibility.
+            A request with both schedule_date and this field will fail with an BAD_REQUEST error.

Review comment:
       Will it fail when both are supplied or when they have different values?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #17122: AIP-39: Add schedule_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#issuecomment-890940586


   We should also add a test to ensure a DagRun created from connexion populated data interval correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17122: AIP-39: Add schedule_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#discussion_r673915763



##########
File path: airflow/api_connexion/endpoints/dag_run_endpoint.py
##########
@@ -242,24 +242,34 @@ def post_dag_run(dag_id, session):
     except ValidationError as err:
         raise BadRequest(detail=str(err))
 
+    try:
+        schedule_date = post_body.pop("schedule_date")
+    except KeyError:
+        schedule_date = post_body.pop("execution_date")
+

Review comment:
       `DagRun` does not accept `schedule_date` as a keyword argument (yet), so either we need to modify `post_body` or we can’t use `DagRun(..., **post_body)`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17122: AIP-39: Add logical_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#discussion_r691921974



##########
File path: airflow/models/dag.py
##########
@@ -2097,19 +2097,25 @@ def create_dagrun(
         :param data_interval: Data interval of the DagRun
         :type data_interval: tuple[datetime, datetime] | None
         """
-        if run_id and not run_type:
+        if run_id:  # Infer run_type from run_id if needed.
             if not isinstance(run_id, str):
                 raise ValueError(f"`run_id` expected to be a str is {type(run_id)}")
-            run_type: DagRunType = DagRunType.from_run_id(run_id)
-        elif run_type and execution_date:
+            if not run_type:
+                run_type = DagRunType.from_run_id(run_id)
+        elif run_type and execution_date:  # Generate run_id from run_type and execution_date.
             if not isinstance(run_type, DagRunType):
                 raise ValueError(f"`run_type` expected to be a DagRunType is {type(run_type)}")
             run_id = DagRun.generate_run_id(run_type, execution_date)
-        elif not run_id:
+        else:
             raise AirflowException(
                 "Creating DagRun needs either `run_id` or both `run_type` and `execution_date`"
             )

Review comment:
       This is actually fixing an existing bug. Previously, when `run_id`, `run_type`, and `execution_date` are *all* provided, the explicit `run_id` value would be ignored and overwritten by an auto-generated value. This fixes the logic and respect the explicit `run_id` value if it’s provided. The bug was found when I was adding tests for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17122: AIP-39: Add logical_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#discussion_r691921974



##########
File path: airflow/models/dag.py
##########
@@ -2097,19 +2097,25 @@ def create_dagrun(
         :param data_interval: Data interval of the DagRun
         :type data_interval: tuple[datetime, datetime] | None
         """
-        if run_id and not run_type:
+        if run_id:  # Infer run_type from run_id if needed.
             if not isinstance(run_id, str):
                 raise ValueError(f"`run_id` expected to be a str is {type(run_id)}")
-            run_type: DagRunType = DagRunType.from_run_id(run_id)
-        elif run_type and execution_date:
+            if not run_type:
+                run_type = DagRunType.from_run_id(run_id)
+        elif run_type and execution_date:  # Generate run_id from run_type and execution_date.
             if not isinstance(run_type, DagRunType):
                 raise ValueError(f"`run_type` expected to be a DagRunType is {type(run_type)}")
             run_id = DagRun.generate_run_id(run_type, execution_date)
-        elif not run_id:
+        else:
             raise AirflowException(
                 "Creating DagRun needs either `run_id` or both `run_type` and `execution_date`"
             )

Review comment:
       This is actually fixing an existing bug. Previously, when `run_id`, `run_type`, and `execution_date` are *all* provided, the explicit `run_id` value would be ignored and overwritten by an auto-generated value. This fixes the logic and respect the explicit `run_id` value if it’s provided. The bug was found when I was adding tests for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #17122: AIP-39: Add logical_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#issuecomment-902315599


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #17122: AIP-39: Add schedule_date to OpenAPI DAGRun schema

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #17122:
URL: https://github.com/apache/airflow/pull/17122#issuecomment-888887012


   This now conflicts with #16352, let’s finish that one first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org