You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/02 16:21:41 UTC

[GitHub] [airflow] potiuk opened a new pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

potiuk opened a new pull request #21272:
URL: https://github.com/apache/airflow/pull/21272


   SQL Alchemy 1.4.10 introduces a bug where for PyODBC driver UTCDateTime
   fields get wrongly converted as string and fail to be converted back to
   datetime. It was supposed to be fixed in
   https://github.com/sqlalchemy/sqlalchemy/issues/6366 (released in
   1.4.10) but apparently our case is different.
   
   Opened https://github.com/sqlalchemy/sqlalchemy/issues/7660 to track it
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1028837750


   If we replace the `UTCDateTime` class definition with
   
   ```
   UTCDateTime = DateTime
   ```
   
   does the error still show up?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#discussion_r797803156



##########
File path: setup.cfg
##########
@@ -128,7 +128,11 @@ install_requires =
     python-slugify~=5.0
     rich>=9.2.0
     setproctitle>=1.1.8, <2
-    sqlalchemy>=1.3.18
+    # SQL Alchemy 1.4.10 introduces a bug where for PyODBC driver UTCDateTime fields get wrongly converted
+    # as string and fail to be converted back to datetime. It was supposed to be fixed in
+    # https://github.com/sqlalchemy/sqlalchemy/issues/6366 (released in 1.4.12) but apparently our case
+    # is different. Opened https://github.com/sqlalchemy/sqlalchemy/issues/7660  to track it

Review comment:
       ```suggestion
       # is different. Opened https://github.com/sqlalchemy/sqlalchemy/issues/7660 to track it
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1028854738


   > If we replace the `UTCDateTime` class definition with
   > 
   > ```
   > UTCDateTime = DateTime
   > ```
   > 
   > does the error still show up?
   
   That's the first thing I checked. Yes. That's why I thought initially and that's why I thought this is not "our" problem. But I could not reproduce it with neither DateTime nor or UTCDateTime in the "MVE" (minimum viable error) script from SQLAlchemy. 
   My hypothesis is that maybe somewhere we hard-code or force different date format than the one that the driver uses, but I have not found it yet. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #21272:
URL: https://github.com/apache/airflow/pull/21272


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1028957246


   > @potiuk can we merge this please?
   
   It does not seem like it solves all problems (if you look at the errors), but yeah, I think there might partially help at least for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1028950264


   @potiuk can we merge this please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1043438207


   2.2.3 will never be updated. MSSQL is not yet supported in 2.2 at all. 2.3.0 is the first time it MAY be supported. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1028317786


   The story unfolds.
   
   I tried to reproduce starting from a "minimum" script that `sqlalchemy` maintainers proposed. I modified it to embed our cusomizations (UTCDateTime field and utcnow function) but so far I could not reproduce it.
   
   ```
   import pytz
   import datetime as dt
   import pendulum
   import time
   
   from sqlalchemy import Column
   from sqlalchemy import create_engine
   from sqlalchemy import DateTime
   from sqlalchemy import Integer
   from sqlalchemy import String
   from sqlalchemy.ext.declarative import declarative_base
   from sqlalchemy.orm import Session
   from sqlalchemy.types import TypeDecorator
   
   Base = declarative_base()
   
   # UTC time zone as a tzinfo instance.
   utc = pendulum.tz.timezone('UTC')
   
   class UtcDateTime(TypeDecorator):
       """
       Almost equivalent to :class:`~sqlalchemy.types.DateTime` with
       ``timezone=True`` option, but it differs from that by:
   
       - Never silently take naive :class:`~datetime.datetime`, instead it
         always raise :exc:`ValueError` unless time zone aware value.
       - :class:`~datetime.datetime` value's :attr:`~datetime.datetime.tzinfo`
         is always converted to UTC.
       - Unlike SQLAlchemy's built-in :class:`~sqlalchemy.types.DateTime`,
         it never return naive :class:`~datetime.datetime`, but time zone
         aware value, even with SQLite or MySQL.
       - Always returns DateTime in UTC
   
       """
   
       impl = DateTime(timezone=True)
   
       def process_bind_param(self, value, dialect):
           if value is not None:
               if not isinstance(value, dt.datetime):
                   raise TypeError('expected datetime.datetime, not ' + repr(value))
               elif value.tzinfo is None:
                   raise ValueError('naive datetime is disallowed')
               return value.astimezone(utc)
           return None
   
       def process_result_value(self, value, dialect):
           """
           Processes DateTimes from the DB making sure it is always
           returning UTC. Not using timezone.convert_to_utc as that
           converts to configured TIMEZONE while the DB might be
           running with some other setting. We assume UTC datetimes
           in the database.
           """
           if value is not None:
               if value.tzinfo is None:
                   value = value.replace(tzinfo=utc)
               else:
                   value = value.astimezone(utc)
   
           return value
   
   
   class Project(Base):
       """
       Simple table describing projects.
       """
   
       __tablename__ = "project"
   
       id = Column("project_id", Integer, primary_key=True)
       name = Column(String, nullable=False)
       description = Column(String, nullable=False)
       created = Column(UtcDateTime)
       modified = Column(UtcDateTime)
   
   def utcnow() -> dt.datetime:
       """
       Get the current date and time in UTC
   
       :return:
       """
       result = dt.datetime.utcnow()
       result = result.replace(tzinfo=utc)
   
       return result
   
   e = create_engine(
       "mssql+pyodbc://sa:Airflow123@mssql:1433/airflow?driver=ODBC+Driver+17+for+SQL+Server",
       echo="debug",
   )
   Base.metadata.drop_all(e)
   Base.metadata.create_all(e)
   session = Session(e)
   p1 = Project(
           name="name",
           description="desc",
           created=utcnow() - dt.timedelta(seconds=60),
           modified=utcnow()
       )
   session.add(p1)
   session.commit()
   
   time.sleep(1)
   p1.created= utcnow() - dt.timedelta(seconds=60)
   session.flush()
   ```
   
   Any comments are welcome :)
   
   The script above can be executed in Breeze with `--backend mssql`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1043438207


   2.2.3 will never be updated with that change.
   
   MSSQL is not yet supported in 2.2 at all. 2.3.0 is the first time it MAY be supported. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1028964873


   I also updated constraints manualy to include SQLAlchemy == 1.4.9. Hopefully it will contain the "regular" PR errors now. But I think 1.4.9 also has other MSSQL issues that might show-up instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aagateuip commented on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
aagateuip commented on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1043193335


   > I gave myself a little break on searching since yesterday to get a new perspective and I will come back today to look for more clues :). But an ideas/hypotheses/recalls would be most welcome ..
   > 
   > BTW. It actually also crossed my mind "Should we drop MSSQL Idea :) ?". We keep on getting various kinds of troubles with it, t's still experimental, we have not released any version that would be MSSQL compatible, so the harm there is minimal. But I think it would be a little harsh. I spoke with a user on Slack yesterday who was mislead that MsSQL is supported and our website - even if mentioned that this is experimental, does not really say "it's not supported officially uet" (which was a bit surprising https://airflow.apache.org/docs/apache-airflow/stable/installation/prerequisites.html ) - only the https://github.com/apache/airflow/blob/main/README.md#requirements is precise with tht.
   
   @potiuk Ran into this issue and with this workaround it worked. Thank you! When do apache/airflow:2.2.3-python3.7 docker images get updated? 
   Also we are planning to use MSSQL server just in case you need a vote to keep MSSQL supported. Thank you!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1028317786


   The story unfolds.
   
   I tried to reproduce starting from a "minimum" script that `sqlalchemy` maintainers proposed. I modified it to embed our cusomizations (UTCDateTime field and utcnow function) but so far I could not reproduce it.
   
   ```
   import pytz
   import datetime as dt
   import pendulum
   import time
   
   from sqlalchemy import Column
   from sqlalchemy import create_engine
   from sqlalchemy import DateTime
   from sqlalchemy import Integer
   from sqlalchemy import String
   from sqlalchemy.ext.declarative import declarative_base
   from sqlalchemy.orm import Session
   from sqlalchemy.types import TypeDecorator
   
   Base = declarative_base()
   
   # UTC time zone as a tzinfo instance.
   utc = pendulum.tz.timezone('UTC')
   
   class UtcDateTime(TypeDecorator):
       """
       Almost equivalent to :class:`~sqlalchemy.types.DateTime` with
       ``timezone=True`` option, but it differs from that by:
   
       - Never silently take naive :class:`~datetime.datetime`, instead it
         always raise :exc:`ValueError` unless time zone aware value.
       - :class:`~datetime.datetime` value's :attr:`~datetime.datetime.tzinfo`
         is always converted to UTC.
       - Unlike SQLAlchemy's built-in :class:`~sqlalchemy.types.DateTime`,
         it never return naive :class:`~datetime.datetime`, but time zone
         aware value, even with SQLite or MySQL.
       - Always returns DateTime in UTC
   
       """
   
       impl = DateTime(timezone=True)
   
       def process_bind_param(self, value, dialect):
           if value is not None:
               if not isinstance(value, dt.datetime):
                   raise TypeError('expected datetime.datetime, not ' + repr(value))
               elif value.tzinfo is None:
                   raise ValueError('naive datetime is disallowed')
               return value.astimezone(utc)
           return None
   
       def process_result_value(self, value, dialect):
           """
           Processes DateTimes from the DB making sure it is always
           returning UTC. Not using timezone.convert_to_utc as that
           converts to configured TIMEZONE while the DB might be
           running with some other setting. We assume UTC datetimes
           in the database.
           """
           if value is not None:
               if value.tzinfo is None:
                   value = value.replace(tzinfo=utc)
               else:
                   value = value.astimezone(utc)
   
           return value
   
   
   class Project(Base):
       """
       Simple table describing projects.
       """
   
       __tablename__ = "project"
   
       id = Column("project_id", Integer, primary_key=True)
       name = Column(String, nullable=False)
       description = Column(String, nullable=False)
       created = Column(UtcDateTime)
       modified = Column(UtcDateTime)
   
   def utcnow() -> dt.datetime:
       """
       Get the current date and time in UTC
   
       :return:
       """
       result = dt.datetime.utcnow()
       result = result.replace(tzinfo=utc)
   
       return result
   
   e = create_engine(
       "mssql+pyodbc://sa:Airflow123@mssql:1433/airflow?driver=ODBC+Driver+17+for+SQL+Server",
       echo="debug",
   )
   Base.metadata.drop_all(e)
   Base.metadata.create_all(e)
   session = Session(e)
   p1 = Project(
           name="name",
           description="desc",
           created=utcnow() - dt.timedelta(seconds=60),
           modified=utcnow()
       )
   session.add(p1)
   session.commit()
   
   time.sleep(1)
   p1.created= utcnow() - dt.timedelta(seconds=60)
   session.flush()
   ```
   
   Any comments are welcome :)
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1028874271


   I'm not sure if it's relevant (probably not) but I've just noticed this warning:
   
   >   /opt/airflow/airflow/models/serialized_dag.py:120 SAWarning: TypeDecorator UtcDateTime(timezone=True) will not produce a cache key because the ``cache_ok`` attribute is not set to True.  This can have significant performance implications including some performance degradations in comparison to prior SQLAlchemy versions.  Set this attribute to True if this type object's state is safe to use in a cache key, or False to disable this warning. (Background on this error at: https://sqlalche.me/e/14/cprf)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21272:
URL: https://github.com/apache/airflow/pull/21272#issuecomment-1028859712


   I gave myself a little break on searching since yesterday to get a new perspective and I will come back today to look for more clues :). But an ideas/hypotheses/recalls would be most welcome ..
   
   BTW. It actually also crossed my mind "Should we drop MSSQL Idea  :) ?". We keep on getting various kinds of troubles with it, t's still experimental, we have not released any version that would be MSSQL compatible, so the harm there is minimal. But I think it would be a little harsh. I spoke with a user on Slack yesterday who was mislead that MsSQL is supported and our website - even if mentioned that this is experimental, does not really say "it's not supported officially uet" (which was a bit surprising https://airflow.apache.org/docs/apache-airflow/stable/installation/prerequisites.html ) - only the https://github.com/apache/airflow/blob/main/README.md#requirements is precise with tht. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #21272: Limit SQLAlchemy until MSSQL datetime bug is fixed

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #21272:
URL: https://github.com/apache/airflow/pull/21272


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org