You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/23 07:33:09 UTC

[GitHub] [airflow] houqp opened a new pull request #13278: fix dag run type enum query for mysqldb driver

houqp opened a new pull request #13278:
URL: https://github.com/apache/airflow/pull/13278


   By default sqlalchemy pass query params as is to db dialect drivers for
   query execution. This causes inconsistent behavior of query param
   evaluation between different db drivers. For example, MySQLdb will
   convert `DagRunType.SCHEDULED` to string `'DagRunType.SCHEDULED'`
   instead of string `'scheduled'`.
   
   To keep the behavior consistent across DB dialects, this patch
   introduces a new enum aware sqlalchemy type based of
   sqlalchemy.types.String to do the query param conversion within
   sqlalchemy's ORM layer before passing the value down to individual DB
   driver.
   
   see https://github.com/apache/airflow/pull/11621 for relevant
   discussions.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r550450264



##########
File path: tests/utils/test_types.py
##########
@@ -0,0 +1,61 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow import settings
+from airflow.models import DAG
+from airflow.models.dagrun import DagRun
+from airflow.utils.state import State
+from airflow.utils.types import DagRunType
+from tests.models import DEFAULT_DATE
+
+
+def test_runtype_enum_escape():
+    """
+    Make sure DagRunType.SCHEDULE is converted to string 'scheduled' when
+    referenced in DB query
+    """
+    session = settings.Session()

Review comment:
       A nit, but should we use `create_session` decorator here? I think this is approach we recommend




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548845243



##########
File path: airflow/models/dagrun.py
##########
@@ -83,7 +84,7 @@ class DagRun(Base, LoggingMixin):
     run_id = Column(String(ID_LEN))
     creating_job_id = Column(Integer)
     external_trigger = Column(Boolean, default=True)
-    run_type = Column(String(50), nullable=False)
+    run_type = Column(Enum(DagRunType, native_enum=False, create_constraint=False), nullable=False)

Review comment:
       It doesn't require migration because of `native_enum=False, create_constraint=False`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755300173


   > I still feel migration is the best place.
   > 
   > > For the db fix up, I am thinking that alembic might not be the right tool since it's supposed to be ran before deploying the application code. If we fix up the value through alembic, the existing code could still write invalid values into the db until it has been replaced by new code.
   > 
   > Once a new version is released (i.e 2.0.1) -- first step is to run migration at which point the application code uses the new code -- so I don't think it will have invalid values.. unless I am missing something.
   
   I think this depends on how users do the migration. I think nowhere in Airflow document we have this:
   
   1) shut down all your workers (CeleryExecutor) and wait for all your tasks to complete (KubernetesExecutor)
   2) shut down your webserver
   3) shut down scheduler
   4) perform migration
   5) start your webserver
   6) start your workers
   7) start your scheduler
   
   While this seems a reasonable approach (that's how I would do it), I do not think we provide any tooling nor instructions for it (like a tool to tell "OK all tasks completed, you can now safely migrate Airflow").  
   
   The users might have different expectations (i.e. live migration, or at least not waiting for Kubernetes task Pods to complete in order to do the migration). And I can very easily imagine that if someone uses a KubernetesExecutor, some of the task from the old airflow might still be running, especially in KubernetesExecutor case.
   
   I believe in 2.0 we do even more in the running tasks when they complete (the 'dependent tasks small run after task completes). and I think this part of the code essentially modifies state as well. So I can very easily imagine situation when 'main' airflow is already migrated but some task still running from previous version will modify the state.
   
   Similarly with the webserver - I guess when you trigger tasks, you can also change the state (am I right?) so what happens if you have a new webserver and old db or old webserver and already migrated db? There are different deployment mechanisms and it is hard to make an "atomic" change to your deployment where all components are upgraded precisely at the same time - sometimes it is even impossible.
   
   I think we should either:
   
   a) document and even somehow maybe enforce that migration cannot be run while old tasks are running (and possibly webserver)
   or
   b) handle the case when those tasks are running during migration
   
   Otherwise we risk to be flooded by "inconsistent" DB issues from our users and we will have to figure out how to fix it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #13278: Fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #13278:
URL: https://github.com/apache/airflow/pull/13278


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r551615312



##########
File path: airflow/utils/types.py
##########
@@ -24,6 +24,9 @@ class DagRunType(str, enum.Enum):
     SCHEDULED = "scheduled"
     MANUAL = "manual"
 
+    def __str__(self) -> str:  # pylint: disable=invalid-str-returned

Review comment:
       we need this, inheriting from str doesn't override `__str__`, but only `__format__`. If you look into the mysql client code i linked in the discussion, `__str__` is what it calls to serialize the binding params:
   
   For example:
   
   ```
   >>> from  airflow.utils.types import DagRunType
   >>> str(DagRunType.SCHEDULED)
   'DagRunType.SCHEDULED'
   >>> f'{DagRunType.SCHEDULED}'
   'scheduled'
   >>> format(DagRunType.SCHEDULED)
   'scheduled'
   ```
   
   You can also reproduce the problem with the unit test provided in this PR by removing `__str__` override.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755274008


   > Ohhh, this is only a problem for _one_ of the mysql drivers -`mysqldb`. The "default" that you get via `apache-airflow[mysql]` is `mysqlclient` which doesn't have this problem.
   > 
   > That's why I never noticed this problem, nor did the tests fail.
   
   So maybe we should fix it + add the startup code to fix it in scheduler + workers waiting (I guess they also can change the state) to run a query fixing it before they start guarded by some flag in the DB set by the migration ('needs state fixing' ) and reset after scheduler fix is run ?  This is a one-time event only after migration.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r551386398



##########
File path: airflow/utils/types.py
##########
@@ -24,6 +24,9 @@ class DagRunType(str, enum.Enum):
     SCHEDULED = "scheduled"
     MANUAL = "manual"
 
+    def __str__(self) -> str:  # pylint: disable=invalid-str-returned

Review comment:
       Do we need this if it also inherits from `str`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-754541946


   @houqp Looks good now -- last thing to add is I think a migration to correct any old values.
   
   Do you have time to add that?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-767105495


   Just rebased, let's merge this one once it passes CI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548275287



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       How did you perform the mysql test? You should be able to reproduce the problem with my newly added test case in breeze. 
   
   I am open to better way to handle this as well, but I think whatever solution we end up with, we should aim for consistent behavior between db drivers.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-750036067


   [The Workflow run](https://github.com/apache/airflow/actions/runs/439950289) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-750028596


   Why didn't mysql unit tests fail before?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-750483833


   @ashb because when running the tests, run_type are being written as string `'DagRunType.SCHEDULED'` into the db, so when it's queried using the same value, tests passed. We ran into this issue during upgrade because all of our old db rows has the column populated as string `'scheduled'`, which lead to crashes due to queries returning empty result.
   
   All existing mysql users using mysqlclient driver will run into this issue after performing the upgrade. New 2.0 mysql users might already have db populated with `'DagRunType.SCHEDULED'`, which needs manual fix up.
   
   I will prepare a test case to guard against this regression going forward.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548275287



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       How did you perform the mysql test? You should be able to reproduce the problem with my newly added test case in breeze. 
   
   I am open to better way to handle this as well, but I think whatever solution we end up with, we should aim for consistent behavior between db drivers.
   
   IMHO, i think it's not a bad idea to use `DagRunType.SCHEDULED.value` through out the code base. This leads to less magic/abstraction, which will make the code easier to follow and maintain. It will also reduce chances of us running into subtle bugs like this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r551406309



##########
File path: airflow/utils/types.py
##########
@@ -24,6 +24,9 @@ class DagRunType(str, enum.Enum):
     SCHEDULED = "scheduled"
     MANUAL = "manual"
 
+    def __str__(self) -> str:  # pylint: disable=invalid-str-returned

Review comment:
       On current master
   
   ```
   In [19]: from  airflow.utils.types import DagRunType                                                                                  
   
   In [20]: f'{DagRunType.SCHEDULED}__a'                                                                                                 
   Out[20]: 'scheduled__a'
   ```
   
   Since it inherits from str I don't think we need this method.
   
   (This PR has gone through a number of iterations, firstly removing the `str` subclassing, then adding it back etc, so I'm not 100% sure on where it's at.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-750855205


   > All existing 1.x mysql users using mysqlclient driver will run into this issue after performing the 2.0 upgrade. New 2.0 mysql users might already have db populated with `'DagRunType.SCHEDULED'`, which needs manual fix up.
   
   Not good :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-750483833


   @ashb because when running the tests, run_type are being written as string `'DagRunType.SCHEDULED'` into the db, so when it's queried using the same value, tests passed. We ran into this issue during upgrade because all of our old db rows have the column populated as string `'scheduled'`, which lead to crashes due to queries returning empty result.
   
   All existing 1.x mysql users using mysqlclient driver will run into this issue after performing the 2.0 upgrade. New 2.0 mysql users might already have db populated with `'DagRunType.SCHEDULED'`, which needs manual fix up.
   
   I will prepare a test case to guard against this regression going forward.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-750028596


   Why don't mysql unit tests fail?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-754512746


   > Do we need to add a migration to correct any old values that might exist?
   
   Probably yes. I admit that I'm super surprised by this issue 👀 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548452559



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       Another way to address this problem is to fix it at db driver layer. i.e. changing mysqlclient to support enum types. If we can get this change merged upstream, then we can at least safely use str subclassed enum type for both postgres and mysql everywhere, but since the conversion is not done within sqlalchemy anymore, we need to double check other db drivers.
   
   However, adding enum support to mysqlclient will take a bit more time. If we don't need to push this fix out sooner, then we can probably wait for that.
   
   Or we can go with custom `EnumString` as a short term fix, then work on mysqlclient path as a follow up.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755300173


   > I still feel migration is the best place.
   > 
   > > For the db fix up, I am thinking that alembic might not be the right tool since it's supposed to be ran before deploying the application code. If we fix up the value through alembic, the existing code could still write invalid values into the db until it has been replaced by new code.
   > 
   > Once a new version is released (i.e 2.0.1) -- first step is to run migration at which point the application code uses the new code -- so I don't think it will have invalid values.. unless I am missing something.
   
   I think this depends on how users do the migration. I think nowhere in Airflow document we have this:
   
   1) shut down all your workers (CeleryExecutor) and wait for all your tasks to complete (KubernetesExecutor)
   2) shut down your webserver
   3) shut down scheduler
   4) perform migration
   5) start your webserver
   6) start your workers
   7) start your scheduler
   
   While this seems a reasonable approach (that's how I would do it), I do not think we provide any tooling nor instructions for it (like a tool to tell "OK all tasks completed, you can now safely migrate Airflow").  
   
   The users might have different expectations (i.e. live migration, or at least not waiting for Kubernetes task Pods to complete in order to do the migration). And I can very easily imagine that if someone uses a KubernetesExecutor, some of the task from the old airflow might still be running, especially in KubernetesExecutor case.
   
   I believe in 2.0 we do even more in the running tasks when they complete (the 'dependent tasks small run after task completes). and I think this part of the code essentially modifies state as well. So I can very easily imagine situation when 'main' airflow is already migrated but some task still running from previous version will modify the state.
   
   Similarly with the webserver - I guess when you trigger tasks, you can also change the state (am I right?) so what happens if you have a new webserver and old db or old webserver and already migrated db? There are different deployment mechanisms and it is hard to make an "atomic" change to your deployment where all components are upgraded precisely at the same time - sometimes it is even impossible.
   
   I think we should either:
   
   a) document and even somehow maybe enforce that migration cannot be run while old tasks are running (and possibly webserver)
   or
   b) handle the case when those tasks are running
   
   Otherwise we risk to be flooded by "inconsistent" DB issues from our users and we will have to figure out how to fix it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r547757953



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       @ashb i removed the str here to avoid this type being misused in the rest of the code base. For example, if `DagRunType.SCHEDULED` is used for column types that are not declared as `EnumString`, developer will get the incorrect behavior regardless of which db driver is being used, so we can surface the issue sooner. That way, it's less likely to get half working PR merged into master.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548609907



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       i had the same question initially, i am curious about the reason for picking string for column type here. as far as i know, the biggest pain with using enum in postgres was schema evolution, i.e. adding new value to enum. I have heard postgres 9.x has made some improvements around that area.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-751427697


   [The Workflow run](https://github.com/apache/airflow/actions/runs/446530744) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548845243



##########
File path: airflow/models/dagrun.py
##########
@@ -83,7 +84,7 @@ class DagRun(Base, LoggingMixin):
     run_id = Column(String(ID_LEN))
     creating_job_id = Column(Integer)
     external_trigger = Column(Boolean, default=True)
-    run_type = Column(String(50), nullable=False)
+    run_type = Column(Enum(DagRunType, native_enum=False, create_constraint=False), nullable=False)

Review comment:
       It shouldn't require migration because of `native_enum=False, create_constraint=False`. but i can double check tomorrow.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-754282882






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548275287



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       How did you perform the mysql test? You should be able to reproduce the problem with my newly added test case in breeze. 
   
   I am open to better way to handle this as well, but I think whatever solution we end up with, we should aim for consistent behavior between db drivers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r547840374



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       Not a fan of this - these enums are used elsewhere than just db.
   
   Can you explain when this is a problem? Cos it worked for me in mysql




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548275287



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       How did you perform the mysql test? This bug won't cause any error for airflow installations setup from scratch, including existing unit tests. You should be able to reproduce the bug with my newly added test case in breeze. 
   
   I am open to better way to handle this as well, but I think whatever solution we end up with, we should aim for consistent behavior between db drivers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-754896047


   > Do we need to add a migration to correct any old values that might exist?
   
   Sorry, my reply did not get sent out correctly yesterday.
   
   I suggest leaving out the value fix up as a separate quick follow up task so we can get this fix out to new installations sooner.
   
   For the db fix up, I am thinking that alembic might not be the right tool since it's supposed to be ran before deploying the application code. If we fix up the value through alembic, the existing code could still write invalid values into the db until it has been replaced by new code.
   
   Is there a tool in airflow that are supposed to be ran by users after code deployment has been done?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755218560


   Ohhh, this is only a problem for _one_ of the mysql drivers -`mysqldb`. The "default" that you get via `apache-airflow[mysql]` is `mysqlclient` which doesn't have this problem.
   
   That's why I never noticed this problem, nor did the tests fail.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-750793626


   > @ashb because when running the tests, run_type are being written as string `'DagRunType.SCHEDULED'` into the db, so when it's queried using the same value, tests passed. We ran into this issue during upgrade because all of our old db rows have the column populated as string `'scheduled'`, which lead to crashes due to queries returning empty result.
   > 
   > All existing 1.x mysql users using mysqlclient driver will run into this issue after performing the 2.0 upgrade. New 2.0 mysql users might already have db populated with `'DagRunType.SCHEDULED'`, which needs manual fix up.
   > 
   > I will prepare a test case to guard against this regression going forward.
   
   Oooh. Fuuu.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755602084


   > Ohhh, this is only a problem for one of the mysql drivers -mysqldb. The "default" that you get via apache-airflow[mysql] is mysqlclient which doesn't have this problem.
   
   This is happening to mysqlclient. But I haven't tested it with pymysql, which might or might not have this problem.
   
   > While this seems a reasonable approach (that's how I would do it), I do not think we provide any tooling nor instructions for it (like a tool to tell "OK all tasks completed, you can now safely migrate Airflow").
   
   That's right, we perform all schema upgrade online without shutting down airflow components. The only time we did stop the world migration was for 2.0 upgrade.
   
   I am a little bit hesitant to add this fix up into scheduler startup code if we can get away without it.
   
   Based on the discussion so far, would it be accessible to add a note in the release note to ask users to run a fix up command after new code has been deployed? If so we can add a fix up command in airflow cli do that.
   
   Alternatively, we can go all in on native enum column type in postgres and mysql, which means we will be able to manage the fix up with alembic.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548275287



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       How did you perform the mysql test? You should be able to reproduce the problem with my newly added test case in breeze. 
   
   I am open to better way to handle this as well, but I think whatever solution we end with, we should aim for consistent behavior between db drivers.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-750483833


   @ashb because when running the tests, run_type are being written as string `'DagRunType.SCHEDULED'` into the db, so when it's queried using the same value, tests passed. We ran into this issue during upgrade because all of our old db rows have the column populated as string `'scheduled'`, which lead to crashes due to queries returning empty result.
   
   All existing 1.x mysql users using mysqlclient driver will run into this issue after performing the upgrade. New 2.0 mysql users might already have db populated with `'DagRunType.SCHEDULED'`, which needs manual fix up.
   
   I will prepare a test case to guard against this regression going forward.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-750483833


   @ashb because when running the tests, run_type are being written as string `'DagRunType.SCHEDULED'` into the db, so when it's queried using the same value, tests passed. We ran into this issue during upgrade because all of our old db rows has the column populated as string `scheduled`, which lead to crashes due to queries returning empty result.
   
   All existing mysql users using mysqlclient driver will run into this issue after performing the upgrade. New 2.0 mysql users might already have db populated with `'DagRunType.SCHEDULED'`, which needs manual fix up.
   
   I will prepare a test case to guard against this regression going forward.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-751336432


   [The Workflow run](https://github.com/apache/airflow/actions/runs/445287198) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755280441


   I still feel migration is the best place.
   
   >For the db fix up, I am thinking that alembic might not be the right tool since it's supposed to be ran before deploying the application code. If we fix up the value through alembic, the existing code could still write invalid values into the db until it has been replaced by new code.
   
   
   Once a new version is released (i.e 2.0.1) -- first step is to run migration at which point the application code uses the new code -- so I don't think it will have invalid values.. unless I am missing something.
   
   It can happen in a separate PR -- sure but not sure if we should do it outside of alembic


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-754950070


   > Is there a tool in airflow that are supposed to be ran by users after code deployment has been done?
   
   Nope, the best hope we have here is startup code in `airflow scheduler` or similar.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548956485



##########
File path: airflow/models/dagrun.py
##########
@@ -83,7 +84,7 @@ class DagRun(Base, LoggingMixin):
     run_id = Column(String(ID_LEN))
     creating_job_id = Column(Integer)
     external_trigger = Column(Boolean, default=True)
-    run_type = Column(String(50), nullable=False)
+    run_type = Column(Enum(DagRunType, native_enum=False, create_constraint=False), nullable=False)

Review comment:
       ok, turns out sqlalchmy's enum type uses field name instead of field value as db value, so we ended up getting `SCHEDULED` instead of `scheduled` in queries. I have changed it back to `String` column type for now and overwrite `__str__` method to make it work for mysqlclient.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548450671



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       The downside of subclassing `str` is it will lead to inconsistent behavior between db drivers for columns that are not declared as `EnumString` type. My main concern is developer adding new db columns that are supposed to interact with python enum types, but declared it as `String` instead of `EnumString`. In this case, postgres users won't experience any problem, but mysql users will end up with crashes or corrupted data in database.
   
   What's your main concern around lack of str subclassing? Are you worried that developer will get confused between use of `DagRunType.SCHEDULED.value` and `DagRunType.SCHEDULED`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r551835900



##########
File path: airflow/utils/types.py
##########
@@ -24,6 +24,9 @@ class DagRunType(str, enum.Enum):
     SCHEDULED = "scheduled"
     MANUAL = "manual"
 
+    def __str__(self) -> str:  # pylint: disable=invalid-str-returned

Review comment:
       Ahhhhh. Gotcha. Nice find.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-767105495


   Just rebased, let's merge this one once it passes CI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548435535



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       Yeah, I never tested upgrading an install using mysql, nor looked in the (mysql) db so missed it.
   
   With your custom EnumString type can't we leave the enum subclass inheriting from `str` too?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755274008


   > Ohhh, this is only a problem for _one_ of the mysql drivers -`mysqldb`. The "default" that you get via `apache-airflow[mysql]` is `mysqlclient` which doesn't have this problem.
   > 
   > That's why I never noticed this problem, nor did the tests fail.
   
   So maybe we should fix it for this driver only + add the startup code to fix it in scheduler + workers waiting (I guess they also can change the state) to run a query fixing it before they start guarded by some flag in the DB set by the migration ('needs state fixing' ) and reset after scheduler fix is run ?  This is a one-time event only after migration.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-753794206


   @ashb would you mind taking a look?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548845243



##########
File path: airflow/models/dagrun.py
##########
@@ -83,7 +84,7 @@ class DagRun(Base, LoggingMixin):
     run_id = Column(String(ID_LEN))
     creating_job_id = Column(Integer)
     external_trigger = Column(Boolean, default=True)
-    run_type = Column(String(50), nullable=False)
+    run_type = Column(Enum(DagRunType, native_enum=False, create_constraint=False), nullable=False)

Review comment:
       It shouldn't require migration because of `native_enum=False, create_constraint=False`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-750483833


   @ashb because when running the tests, run_type are being written as string `'DagRunType.SCHEDULED'` into the db, so when it's queried using the same value, tests passed. We ran into this issue during upgrade because all of our old db rows have the column populated as string `'scheduled'`, which lead to crashes due to queries returning empty result.
   
   All existing mysql users using mysqlclient driver will run into this issue after performing the upgrade. New 2.0 mysql users might already have db populated with `'DagRunType.SCHEDULED'`, which needs manual fix up.
   
   I will prepare a test case to guard against this regression going forward.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548504008



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       Is there any reason why we do not want to use SqlAlchemy Enum Type? https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.Enum 
   
   I believe it will map everything correctly in all dialects




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548844450



##########
File path: airflow/models/dagrun.py
##########
@@ -83,7 +84,7 @@ class DagRun(Base, LoggingMixin):
     run_id = Column(String(ID_LEN))
     creating_job_id = Column(Integer)
     external_trigger = Column(Boolean, default=True)
-    run_type = Column(String(50), nullable=False)
+    run_type = Column(Enum(DagRunType, native_enum=False, create_constraint=False), nullable=False)

Review comment:
       Does this need a db migration?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755274008


   > Ohhh, this is only a problem for _one_ of the mysql drivers -`mysqldb`. The "default" that you get via `apache-airflow[mysql]` is `mysqlclient` which doesn't have this problem.
   > 
   > That's why I never noticed this problem, nor did the tests fail.
   
   So maybe we should fix it for this driver only + add the startup code to fix it in scheduler + workers waiting (I guess they also can change the state) before they start guarded by some flag in the DB set by the migration ('needs state fixing' ) and reset after scheduler fix is run ?  This is a one-time event only after migration.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755300173


   > I still feel migration is the best place.
   > 
   > > For the db fix up, I am thinking that alembic might not be the right tool since it's supposed to be ran before deploying the application code. If we fix up the value through alembic, the existing code could still write invalid values into the db until it has been replaced by new code.
   > 
   > Once a new version is released (i.e 2.0.1) -- first step is to run migration at which point the application code uses the new code -- so I don't think it will have invalid values.. unless I am missing something.
   
   I think this depends on how users do the migration. I think nowhere in Airflow document we have this:
   
   1) shut down all your workers (CeleryExecutor) and wait for all your tasks to complete (KubernetesExecutor)
   2) shut down your webserver
   3) shut down scheduler
   4) perform migration
   5) start your webserver
   6) start your workers
   7) start your scheduler
   
   While this seems a reasonable approach (that's how I would do it), I do not think we provide any tooling nor instructions for it (like a tool to tell "OK all tasks completed, you can now safely migrate Airflow").  
   
   The users might have different expectations (i.e. live migration, or at least not waiting for Kubernetes task Pods to complete in order to do the migration). And I can very easily imagine that if someone uses a KubernetesExecutor, some of the task from the old airflow might still be running, especially in KubernetesExecutor case.
   
   I believe in 2.0 we do even more in the running tasks when they complete (the 'dependent tasks small run after task completes). and I think this part of the code essentially modifies state as well. So I can very easily imagine situation when 'main' airflow is already migrated but some task still running from previous version will modify the state.
   
   Similarly with the webserver - I guess when you trigger tasks, you can also change the state (am I right?) so what happens if you have a new webserver and old db or old webserver and already migrated db? There are different deployment mechanisms and it is hard to make an "atomic" change to your deployment where all components are upgraded precisely at the same time - sometimes it is even impossible.
   
   I think we should either:
   
   a) document and even somehow maybe enforce that migration cannot be run while old tasks are running (and possibly webserver)
   or
   b) handle the case when those tasks are running during migration
   
   Otherwise we risk to be flooded by "inconsistent DB" issues from our users and we will have to figure out how to fix it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r551386256



##########
File path: airflow/models/dagrun.py
##########
@@ -298,7 +298,7 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        return f"{run_type.value}__{execution_date.isoformat()}"

Review comment:
       We don't need this now you've got a `__str__` method on the enum do we?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755300173


   > I still feel migration is the best place.
   > 
   > > For the db fix up, I am thinking that alembic might not be the right tool since it's supposed to be ran before deploying the application code. If we fix up the value through alembic, the existing code could still write invalid values into the db until it has been replaced by new code.
   > 
   > Once a new version is released (i.e 2.0.1) -- first step is to run migration at which point the application code uses the new code -- so I don't think it will have invalid values.. unless I am missing something.
   
   I think this depends on how users do the migration. I think nowhere in Airflow document we have this:
   
   1) shut down all your workers (CeleryExecutor) and wait for all your tasks to complete (KubernetesExecutor)
   2) shut down your webserver
   3) shut down scheduler
   4) perform migration
   5) start your webserver
   6) start your workers
   7) start your scheduler
   
   While this seems a reasonable approach (that's how I would do it), I do not think we provide any tooling nor instructions for it (like a tool to tell "OK all tasks completed, you can now safely migrate Airflow").  
   
   The users might have different expectations (i.e. live migration, or at least not waiting for Kubernetes task Pods to complete in order to do the migration). And I can very easily imagine that if someone uses a KubernetesExecutor, some of the task from the old airflow might still be running, especially in KubernetesExecutor case.
   
   I believe in 2.0 we do even more in the running tasks when they complete (the 'dependent tasks small run after task completes). and I think this part of the code essentially modifies state as well. So I can very easily imagine situation when 'main' airflow is already migrated but some task still running from previous version will modify the state.
   
   Similarly with the webserver - I guess when you trigger tasks, you can also change the state (am I right?) so what happens if you have a new webserver and old db or old webserver and already migrated db? There are different deployment mechanisms and it is hard to make an "atomic" change to your deployment where all components are upgraded precisely at the same time - sometimes it is even impossible.
   
   I think we should either:
   
   a) document and even somehow maybe enforce that migration cannot be run while old tasks are running (and possibly webserver)
   or
   b) handle the case when old tasks and webserver are running during migration
   
   Otherwise we risk to be flooded by "inconsistent DB" issues from our users and we will have to figure out how to fix it anyway.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755300173


   > I still feel migration is the best place.
   > 
   > > For the db fix up, I am thinking that alembic might not be the right tool since it's supposed to be ran before deploying the application code. If we fix up the value through alembic, the existing code could still write invalid values into the db until it has been replaced by new code.
   > 
   > Once a new version is released (i.e 2.0.1) -- first step is to run migration at which point the application code uses the new code -- so I don't think it will have invalid values.. unless I am missing something.
   
   I think this depends on how users do the migration. I think nowhere in Airflow document we have this:
   
   1) shut down all your workers (CeleryExecutor) and wait for all your tasks to complete (KubernetesExecutor)
   2) shut down your webserver
   3) shut down scheduler
   4) perform migration
   5) start your webserver
   6) start your workers
   7) start your scheduler
   
   While this seems a reasonable approach (that's how I would do it), I do not think we provide any tooling nor instructions for it (like a tool to tell "OK all tasks completed, you can now safely migrate Airflow).  
   
   The users might have different expectations (i.e. live migration, or at least not waiting for Kubernetes task Pods to complete in order to do the migration). And I can very easily imagine that if someone uses a KubernetesExecutor, some of the task from the old airflow might still be running, especially in KubernetesExecutor case.
   
   I believe in 2.0 we do even more in the running tasks when they complete (the 'dependent tasks small run after task completes). and I think this part of the code essentially modifies state as well. So I can very easily imagine situation when 'main' airflow is already migrated but some task still running from previous version will modify the state.
   
   Similarly with the webserver - I guess when you trigger tasks, you can also change the state (am I right?) so what happens if you have a new webserver and old db or old webserver and already migrated db? There are different deployment mechanisms and it is hard to make an "atomic" change to your deployment where all components are upgraded precisely at the same time - sometimes it is even impossible.
   
   I think we should either:
   
   a) document and even somehow maybe enforce that migration cannot be run while old tasks are running (and possibly webserver)
   or
   b) handle the case when those tasks are running
   
   Otherwise we risk to be flooded by "inconsistent" DB issues from our users and we will have to figure out how to fix it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755274008


   > Ohhh, this is only a problem for _one_ of the mysql drivers -`mysqldb`. The "default" that you get via `apache-airflow[mysql]` is `mysqlclient` which doesn't have this problem.
   > 
   > That's why I never noticed this problem, nor did the tests fail.
   
   So maybe we should fix it for this driver only + add the startup code to fix it in scheduler + workers waiting (I guess they also can change the state) before they start guarded by some flag in the DB set by the migration ('needs state fixing' ) and reset after scheduler fix is run ?  This is a one-time event only after migration.
   
   This even could be a "feature" of airflow for any kind of future problems like this  in case we find that we need to perform any cleanup before we allow schedulers/workers/webservers to run.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r548841779



##########
File path: airflow/utils/types.py
##########
@@ -16,8 +16,31 @@
 # under the License.
 import enum
 
+from sqlalchemy.types import TypeDecorator, String
 
-class DagRunType(str, enum.Enum):
+
+class EnumString(TypeDecorator):
+    """
+    Declare db column with this type to make the column compatible with string
+    and string based enum values when building the sqlalchemy ORM query. It can
+    be used just like sqlalchemy.types.String, for example:
+
+    ```
+    class Table(Base):
+        __tablename__ = "t"
+        run_type = Column(EnumString(50), nullable=False)
+    ```
+    """
+    impl = String
+
+    def process_bind_param(self, value, dialect):
+        if isinstance(value, enum.Enum):
+            return value.value
+        else:
+            return value
+
+
+class DagRunType(enum.Enum):

Review comment:
       I pushed a new commit to switch to sqlalchemy enum type with native enum and check constraint disabled, which should address all our concerns without requiring a schema migration. I think this strikes a good balance.
   
   @ashb @potiuk @kaxil thoughts? Or do you all prefer to go all in and migrate to native db enum type instead?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755300173


   > I still feel migration is the best place.
   > 
   > > For the db fix up, I am thinking that alembic might not be the right tool since it's supposed to be ran before deploying the application code. If we fix up the value through alembic, the existing code could still write invalid values into the db until it has been replaced by new code.
   > 
   > Once a new version is released (i.e 2.0.1) -- first step is to run migration at which point the application code uses the new code -- so I don't think it will have invalid values.. unless I am missing something.
   
   I think this depends on how we do the migration. I think nowhere in Airflow document we have this:
   
   1) shut down all your workers (CeleryExecutor) and wait for all your tasks to complete (KubernetesExecutor)
   2) shut down your webserver
   3) shut down scheduler
   4) perform migration
   5) start your webserver
   6) start your workers
   7) start your scheduler
   
   While this seems a reasonable approach (that's how I would do it), I do not think we provide any tooling nor instructions for it (like a tool to tell "OK all tasks completed, you can now safely migrate Airflow).  
   
   The users might have different expectations (i.e. live migration, or at least not waiting for Kubernetes task Pods to complete in order to do the migration). And I can very easily imagine that if someone uses a KubernetesExecutor, some of the task from the old airflow might still be running, especially in KubernetesExecutor case.
   
   I believe in 2.0 we do even more in the running tasks when they complete (the 'dependent tasks small run after task completes). and I think this part of the code essentially modifies state as well. So I can very easily imagine situation when 'main' airflow is already migrated but some task still running from previous version will modify the state.
   
   Similarly with the webserver - I guess when you trigger tasks, you can also change the state (am I right?) so what happens if you have a new webserver and old db or old webserver and already migrated db? There are different deployment mechanisms and it is hard to make an "atomic" change to your deployment where all components are upgraded precisely at the same time - sometimes it is even impossible.
   
   I think we should either:
   
   a) document and even somehow maybe enforce that migration cannot be run while old tasks are running (and possibly webserver)
   or
   b) handle the case when those tasks are running
   
   Otherwise we risk to be flooded by "inconsistent" DB issues from our users and we will have to figure out how to fix it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#issuecomment-755300173


   > I still feel migration is the best place.
   > 
   > > For the db fix up, I am thinking that alembic might not be the right tool since it's supposed to be ran before deploying the application code. If we fix up the value through alembic, the existing code could still write invalid values into the db until it has been replaced by new code.
   > 
   > Once a new version is released (i.e 2.0.1) -- first step is to run migration at which point the application code uses the new code -- so I don't think it will have invalid values.. unless I am missing something.
   
   I think this depends on how users do the migration. I think nowhere in Airflow document we have this:
   
   1) shut down all your workers (CeleryExecutor) and wait for all your tasks to complete (KubernetesExecutor)
   2) shut down your webserver
   3) shut down scheduler
   4) perform migration
   5) start your webserver
   6) start your workers
   7) start your scheduler
   
   While this seems a reasonable approach (that's how I would do it), I do not think we provide any tooling nor instructions for it (like a tool to tell "OK all tasks completed, you can now safely migrate Airflow").  
   
   The users might have different expectations (i.e. live migration, or at least not waiting for Kubernetes task Pods to complete in order to do the migration). And I can very easily imagine that if someone uses a KubernetesExecutor, some of the task from the old airflow might still be running, especially in KubernetesExecutor case.
   
   I believe in 2.0 we do even more in the running tasks when they complete (the 'dependent tasks small run after task completes). and I think this part of the code essentially modifies state as well. So I can very easily imagine situation when 'main' airflow is already migrated but some task still running from previous version will modify the state.
   
   Similarly with the webserver - I guess when you trigger tasks, you can also change the state (am I right?) so what happens if you have a new webserver and old db or old webserver and already migrated db? There are different deployment mechanisms and it is hard to make an "atomic" change to your deployment where all components are upgraded precisely at the same time - sometimes it is even impossible.
   
   I think we should either:
   
   a) document and even somehow maybe enforce that migration cannot be run while old tasks are running (and possibly webserver)
   or
   b) handle the case when those tasks are running during migration
   
   Otherwise we risk to be flooded by "inconsistent DB" issues from our users and we will have to figure out how to fix it anyway.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r551386700



##########
File path: tests/operators/test_python.py
##########
@@ -436,7 +436,7 @@ def add_number(num: int):
         with self.dag:
             ret = add_number(2)
         self.dag.create_dagrun(
-            run_id=DagRunType.MANUAL,
+            run_id=DagRunType.MANUAL.value,

Review comment:
       Can revert the changes in this file can't we?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13278: fix dag run type enum query for mysqldb driver

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13278:
URL: https://github.com/apache/airflow/pull/13278#discussion_r551392128



##########
File path: airflow/utils/types.py
##########
@@ -24,6 +24,9 @@ class DagRunType(str, enum.Enum):
     SCHEDULED = "scheduled"
     MANUAL = "manual"
 
+    def __str__(self) -> str:  # pylint: disable=invalid-str-returned

Review comment:
       I think this is what the PR actually fixes or intends to do since we were inheriting from str since https://github.com/apache/airflow/pull/11621




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org