You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jscheffl (via GitHub)" <gi...@apache.org> on 2023/11/26 17:32:53 UTC

[PR] Remove MSSQL support from Airflow core [airflow]

jscheffl opened a new pull request, #35868:
URL: https://github.com/apache/airflow/pull/35868

   DRAFT PR to remove MSSQL support from Airflow
   
   Open items:
   - Green build
   - Check that breeze works as expected
   - Discuss whether the matix action in Github needs a change for coverage?
   
   Preconditions before merge:
   - Discuss where the MSSQL migration script from PR #35861 will be available and update docs accordingly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "ephraimbuddy (via GitHub)" <gi...@apache.org>.
ephraimbuddy commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1844942742

   I'm not confortable having this in 2.8.0. Let's leave it for 2.9.0 because the change is big and close to the release time. 
   Also, I prefer we don't remove the mssql codes but instead let users know by warning that this is no longer supported if they are using mssql. I understand @potiuk concerns about removing it this way but I feel this approach is harsh. We can start by warning users and then total removal. My opinion though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "jscheffl (via GitHub)" <gi...@apache.org>.
jscheffl commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1829147488

   Thanks @potiuk for the analysis and the hist to history. My main "mis-understanding" was that I did not know that there are multiple installs for the driver. Was wondering if this is a leftover (=garbage). But understood it is a "feature".
   
   I understand (now) that mssql is needed in the docker for the provider package, which is fail. reverted the last two (functional) commits, leaving the driver and the install options for the docker build scripts like before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "jscheffl (via GitHub)" <gi...@apache.org>.
jscheffl commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1872228582

   @potiuk Yes, before re-basing another 10 times, I'd like to get this off the plate and merged. So your advise and prep help might help:
   
   - I assume all feedback consistently points that the migration tool shall not be in the Airflow mono repo. If there would be any kind of other GIT repo that you can create or point to to bring #35861 to then this would be a good pre-requisite. This PR needs to point to migration tool in docs. I do not have a strong opinion where to store the migration tool but woul dbe better also not pointing to my personal account.
   - This PR is quite large, the debate around migration script clean is not settled. I would be OK to revert. Maybe separating into at least two PRs might make discussion easier. (PR1: Removal of support form core + breeze / PR2: Clean up of leftovers in migration tooling)
   - Development advise in regards: Would it be OK to merge this to main "soon" or for better patch maintenance of 2.8-line do we need to keep this removal as a PR open for longer? Otherwise nobody really made a review, and I don't want all the efforts being dropped/wasted as actually I had positive intention as a follow-up of the vote to clean-up MSSQL but missing support for review here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1406955218


##########
Dockerfile:
##########
@@ -342,87 +342,6 @@ if [[ ${INSTALL_MYSQL_CLIENT:="true"} == "true" ]]; then
 fi
 EOF
 
-# The content below is automatically copied from scripts/docker/install_mssql.sh

Review Comment:
   As explained on Slack - we likely do not want to remove this script. This script is to install MSSQL client libraries and pymssql client. Those are not necessarily the same as used by anyone who uses mysql as Airflow Metadata DB.
   
   The pymssql driver installed in this step (together with binary libraries) is used for sure by mssql provider https://github.com/apache/airflow/blob/main/airflow/providers/microsoft/mssql/hooks/mssql.py 
   
   The driver that we recommend to use MSSQL as Metadata DB is a different one (it's `pyodbc` one)
   
   SQL Alchemy might use differnt drivers to communicate with MSSQL https://docs.sqlalchemy.org/en/20/dialects/mssql.html  this is one ofht 
   
   We are testing in CI using `mssql+pyodbc` driver not `mssql+pymssql`. I can't remember the details, but I believe there were issues with using pymssql driver. We might want to take a look at the history of adding MSSQL as core db. See:  https://github.com/apache/airflow/blob/main/scripts/ci/docker-compose/backend-mssql.yml
   
   
   So the `pymssql` driver is part of the reference image of ours mostly because we want to be able to support `apache-airflow-microsoft-mssql` provider out-of-the-box as we recognise MSSQL as a database our users might often interact with. The goal of this PR is to remove any `MSSQL as Core Airflow Metadatadb backend` - not to remove pymssql library from the image.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1407674731


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   I am not really sure if we want to remove mssql in all the migrations from the past. 
   
   Maybe yes, Maybe no. I have arguments for both. 
   
   ## The argumets for "removing MSSQL From migrations":
   
   I am leaning towards removing them (so basically what this PR does), but I am a bit torn on that one. On one hand I would love to constraint and limit the "mssql" users to only use Airflow 2.7.3 and never, ever use airflow >= 2.8.0 for anything MSSQL. On the  other hand it feels a little weird. I'd love other's opinion - especially those who worked on the migrations in the past: @ephraimbuddy @dstandish @ashb @kaxil  @eladkal 
   
   ## The argument for "leaving MSSQL in migrations": 
   
   It does not really looks like it is necessary to remove them and does not cost us much. It's a bit sunken cost already and we are not going to maintain those migrations anyway. <aybe at some point in time we might want to merge them somehow and get a new "starting point" migration in the future, but I am not sure what is the value in removing mssql from the  old migrations. 
   
   It feels a bit weird to change the past migrations that are already marked as "I run this migration already" by alembic. I know it SHOULD NOT matter when you have Postgres/MySQL, but it feels a bit wrong to modify the migration scripts, especially that it does not hurt us to leave them there (we generally do not touch/maintain migrations too often).
   
   I see two potential prolems with removing the migrations for mssql from those migrations:
   
   * there is a risk we remove something here accidentally by missing it in review - there are a lot of files to be reviewed and lots of conditions, there is a slight risk we will miss it. But with careful review we can mitigate it.
   
   * there is a side-effect on --from-ref/--to-ref, --from-version/--to-version.  Since we are still supporting MSSQL for airlfow < 2.8.0, it would be nice if Airflow 2.8+ still supports migrations for older versions of airflow even for MSSQL.
   
   I think (but I might be wrong) - just leaving the mssql in migration code will just allow it to be run and it SHOULD work in general. So users will still be able - for example - to run `airflow db migrate --to-version 2.7.3` using Airflow 2.8, 2.9 etc. There will be no need to run those migration with `Airflow 2.7.3` strictly.
   
   ##  My current thinking
   
   But I am bit torn on that one. It also relates to the discussion we had on https://github.com/apache/airflow/pull/35861 and what  we want to tell our users who run MSSQL now.
   
   So far my points there at least were:
   
   * let's constraint the set of tool and environment as much as we can to reduce variability of environment
   * let's provide a tool/script that they can use - but also modify if needed, so that it's not part of "airflow" itself
   
   So if we follow and agree that line of thought - then yes, removing MSSQL from Airflow 2.8.0  and not allowing the MSSQL users to run `--to-version 2.7.3` feels about right. On the othere hand - if we allow for variability, and letting the users to do the migrations in various ways, it's ok to leave it in. 
   
   I wonder what you think?



##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   I am not really sure if we want to remove mssql in all the migrations from the past. 
   
   Maybe yes, Maybe no. I have arguments for both. 
   
   ## The arguments for "removing MSSQL From migrations":
   
   I am leaning towards removing them (so basically what this PR does), but I am a bit torn on that one. On one hand I would love to constraint and limit the "mssql" users to only use Airflow 2.7.3 and never, ever use airflow >= 2.8.0 for anything MSSQL. On the  other hand it feels a little weird. I'd love other's opinion - especially those who worked on the migrations in the past: @ephraimbuddy @dstandish @ashb @kaxil  @eladkal 
   
   ## The argument for "leaving MSSQL in migrations": 
   
   It does not really looks like it is necessary to remove them and does not cost us much. It's a bit sunken cost already and we are not going to maintain those migrations anyway. <aybe at some point in time we might want to merge them somehow and get a new "starting point" migration in the future, but I am not sure what is the value in removing mssql from the  old migrations. 
   
   It feels a bit weird to change the past migrations that are already marked as "I run this migration already" by alembic. I know it SHOULD NOT matter when you have Postgres/MySQL, but it feels a bit wrong to modify the migration scripts, especially that it does not hurt us to leave them there (we generally do not touch/maintain migrations too often).
   
   I see two potential prolems with removing the migrations for mssql from those migrations:
   
   * there is a risk we remove something here accidentally by missing it in review - there are a lot of files to be reviewed and lots of conditions, there is a slight risk we will miss it. But with careful review we can mitigate it.
   
   * there is a side-effect on --from-ref/--to-ref, --from-version/--to-version.  Since we are still supporting MSSQL for airlfow < 2.8.0, it would be nice if Airflow 2.8+ still supports migrations for older versions of airflow even for MSSQL.
   
   I think (but I might be wrong) - just leaving the mssql in migration code will just allow it to be run and it SHOULD work in general. So users will still be able - for example - to run `airflow db migrate --to-version 2.7.3` using Airflow 2.8, 2.9 etc. There will be no need to run those migration with `Airflow 2.7.3` strictly.
   
   ##  My current thinking
   
   But I am bit torn on that one. It also relates to the discussion we had on https://github.com/apache/airflow/pull/35861 and what  we want to tell our users who run MSSQL now.
   
   So far my points there at least were:
   
   * let's constraint the set of tool and environment as much as we can to reduce variability of environment
   * let's provide a tool/script that they can use - but also modify if needed, so that it's not part of "airflow" itself
   
   So if we follow and agree that line of thought - then yes, removing MSSQL from Airflow 2.8.0  and not allowing the MSSQL users to run `--to-version 2.7.3` feels about right. On the othere hand - if we allow for variability, and letting the users to do the migrations in various ways, it's ok to leave it in. 
   
   I wonder what you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1829690314

   > Thanks @potiuk for the analysis and the hist to history. My main "mis-understanding" was that I did not know that there are multiple installs for the driver. Was wondering if this is a leftover (=garbage). But understood it is a "feature".
   > 
   > I understand (now) that mssql is needed in the docker for the provider package, which is fail. reverted the last two (functional) commits, leaving the driver and the install options for the docker build scripts like before.
   
   Yeah. It is not clear indeed :). As @Taragolis mentioned in slack, likely we should split the mssql_sh installation in two. 
   
   Just to add a bit of context here.
   
   The Dockerfiles of ours (especially the production one) are built with "users can build their own custom, optimized builds files using them". There is the whole set of documentation and set of arguments users can use if they want to build their own, even more optimized image using our Dockerfile if they want. Currently we pre-built two images:
   
   * the regular one (AKA "reference" image - where we add libraries needed, Postgres, MySQL, MSSQL clients and most popular (we think) providers
    
   * the "slim" image  - which contains all database clients and airflow - without providers (except the preinstalled ones)
   
   But the users can simply get our Dockerfile and build their own image using build-args https://airflow.apache.org/docs/docker-stack/build.html#customizing-the-image
   
   For now both MSSQL clients we install (pymssql and msodbc) are controlled by a single arg when you want to build custom image with or without it:
   
   https://airflow.apache.org/docs/docker-stack/build-arg-ref.html#image-optimization-options
   
   > INSTALL_MSSQL_CLIENT | true |  Whether MsSQL client should be installed
   
   But likely we should split that into two variables. 
   
   Another small comment/context. For development purpose you can very easily build your own, custom PROD image using these. Without even looking at the variable / build image docs.
   
   The `breeze prod-image build` command have all of those parameters nicely explained and documented and they are even auto-completeable - see "advanced customisation optins". You can even run `--dry-run` with selected parameters and it will spit-out the right `docker build` command with the right build args used converted from the nicely documented breeze `prod-image build` auto-completeable and documented flags.
   
   ![image](https://github.com/apache/airflow/assets/595491/20f98db1-2ba3-4d26-9106-241aa512d212)
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "jscheffl (via GitHub)" <gi...@apache.org>.
jscheffl commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1873001635

   Closing this PR as the attempt is now super-seeded with PRs:
   - #36509
   - #36514
   - #36515
   
   Leaving the cleanup of migration scripts "open for now"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1406971850


##########
Dockerfile:
##########
@@ -342,87 +342,6 @@ if [[ ${INSTALL_MYSQL_CLIENT:="true"} == "true" ]]; then
 fi
 EOF
 
-# The content below is automatically copied from scripts/docker/install_mssql.sh
-COPY <<"EOF" /install_mssql.sh
-#!/usr/bin/env bash
-set -euo pipefail
-
-. "$( dirname "${BASH_SOURCE[0]}" )/common.sh"
-
-: "${AIRFLOW_PIP_VERSION:?Should be set}"
-
-: "${INSTALL_MSSQL_CLIENT:?Should be true or false}"
-
-COLOR_BLUE=$'\e[34m'
-readonly COLOR_BLUE
-COLOR_RESET=$'\e[0m'
-readonly COLOR_RESET
-
-function install_mssql_client() {
-    # Install MsSQL client from Microsoft repositories
-    if [[ ${INSTALL_MSSQL_CLIENT:="true"} != "true" ]]; then
-        echo
-        echo "${COLOR_BLUE}Skip installing mssql client${COLOR_RESET}"
-        echo
-        return
-    fi
-    echo
-    echo "${COLOR_BLUE}Installing mssql client${COLOR_RESET}"
-    echo
-    local distro
-    local version
-    distro=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
-    version=$(lsb_release -rs)
-    local driver=msodbcsql18
-    curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg
-    curl --silent https://packages.microsoft.com/keys/microsoft.asc | apt-key add - >/dev/null 2>&1
-    curl --silent "https://packages.microsoft.com/config/${distro}/${version}/prod.list" > \
-        /etc/apt/sources.list.d/mssql-release.list
-    apt-get update -yqq
-    apt-get upgrade -yqq
-    ACCEPT_EULA=Y apt-get -yqq install -y --no-install-recommends "${driver}"
-    rm -rf /var/lib/apt/lists/*
-    apt-get autoremove -yqq --purge
-    apt-get clean && rm -rf /var/lib/apt/lists/*

Review Comment:
   I believe this part of the script COULD potentially be removed. This part is there to install msodbcsql library from microsoft which is needed for the mssql+pyodbc driver. However we would need to check if for example ODBC provider can make use of it (likely it can use it, so removing the odbc libraries might be potentially breaking for anyone who relies on it in the "reference" image. We make no promises there, we can remove stuff from reference image (we do not follow semver - image is just a convenience package that people might extend and customise at will. So users are free to add what they are missing in their own images if we remove something, but we try to avoid it regardless.
   
   All the three libraries add some size to the image (139 MB vs. 1.4 GB - uncompressed) so ~10% but mssql is only a part of it (and likely small - maybe 20 MB max - probably much smaller) - so there is little reason why we would like to remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1407674731


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   I am not really sure if we want to remove mssql in all the migrations from the past. 
   
   Maybe yes, Maybe no. I have arguments for both. 
   
   ## The arguments for "removing MSSQL From migrations":
   
   I am leaning towards removing them (so basically what this PR does), but I am a bit torn on that one. On one hand I would love to constraint and limit the "mssql" users to only use Airflow 2.7.3 and never, ever use airflow >= 2.8.0 for anything MSSQL. On the  other hand it feels a little weird. I'd love other's opinion - especially those who worked on the migrations in the past: @ephraimbuddy @dstandish @ashb @kaxil  @eladkal 
   
   ## The argument for "leaving MSSQL in migrations": 
   
   It does not really looks like it is necessary to remove them and does not cost us much. It's a bit sunken cost already and we are not going to maintain those migrations anyway. <aybe at some point in time we might want to merge them somehow and get a new "starting point" migration in the future, but I am not sure what is the value in removing mssql from the  old migrations. 
   
   It feels a bit weird to change the past migrations that are already marked as "I run this migration already" by alembic. I know it SHOULD NOT matter when you have Postgres/MySQL, but it feels a bit wrong to modify the migration scripts, especially that it does not hurt us to leave them there (we generally do not touch/maintain migrations too often).
   
   I see two potential prolems with removing the migrations for mssql from those migrations:
   
   * there is a risk we remove something here accidentally by missing it in review - there are a lot of files to be reviewed and lots of conditions, there is a slight risk we will miss it. But with careful review we can mitigate it.
   
   * there is a side-effect on --from-ref/--to-ref, --from-version/--to-version.  Since we are still supporting MSSQL for airlfow < 2.8.0, it would be nice if Airflow 2.8+ still supports migrations for older versions of airflow even for MSSQL.
   
   I think (but I might be wrong) - just leaving the mssql in migration code will just allow it to be run and it SHOULD work in general. So users will still be able - for example - to run `airflow db migrate --to-version 2.7.3` using Airflow 2.8, 2.9 etc. There will be no need to run those migration with `Airflow 2.7.3` strictly.
   
   ##  My current thinking
   
   But I am bit torn on that one. It also relates to the discussion we had on https://github.com/apache/airflow/pull/35861 and what  we want to tell our users who run MSSQL now.
   
   So far my points there at least were:
   
   * let's constraint the set of tool and environment as much as we can to reduce variability of environment
   * let's provide a tool/script that they can use - but also modify if needed, so that it's not part of "airflow" itself
   
   So if we follow and agree that line of thought - then yes, removing MSSQL from Airflow 2.8.0  and not allowing the MSSQL users to run `--to-version 2.7.3` feels about right. On the other hand - if we allow for variability, and letting the users to do the migrations in various ways, it's ok to leave it in. 
   
   I wonder what you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1827066914

   Should work after rebase 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1408310262


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   waiting to what others say. I think we do not have to rush it. Spoke to @ephraimbuddy today and RC is planned 11th of December. I think we have quite some time to discuss, decide, iterate and merge.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1406955218


##########
Dockerfile:
##########
@@ -342,87 +342,6 @@ if [[ ${INSTALL_MYSQL_CLIENT:="true"} == "true" ]]; then
 fi
 EOF
 
-# The content below is automatically copied from scripts/docker/install_mssql.sh

Review Comment:
   As explained on Slack - we likely do not want to remove this script. This script is to install MSSQL client libraries and pymssql client. Those are not necessarily the same as used by anyone who uses mysql as Airflow Metadata DB.
   
   The pymssql driver installed in this step (together with binary libraries) is used for sure by mssql provider https://github.com/apache/airflow/blob/main/airflow/providers/microsoft/mssql/hooks/mssql.py 
   
   The driver that we recommend to use MSSQL as Metadata DB is a different one (it's `pyodbc` one)
   
   SQL Alchemy might use differnt drivers to communicate with MSSQL https://docs.sqlalchemy.org/en/20/dialects/mssql.html 
   
   We are testing in CI using `mssql+pyodbc` driver not `mssql+pymssql`. I can't remember the details, but I believe there were issues with using pymssql driver. We might want to take a look at the history of adding MSSQL as core db. See:  https://github.com/apache/airflow/blob/main/scripts/ci/docker-compose/backend-mssql.yml
   
   
   So the `pymssql` driver is part of the reference image of ours mostly because we want to be able to support `apache-airflow-microsoft-mssql` provider out-of-the-box as we recognise MSSQL as a database our users might often interact with. The goal of this PR is to remove any `MSSQL as Core Airflow Metadatadb backend` - not to remove pymssql library from the image.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1407674731


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   I am not really sure if we want to remove mssql in all the migrations from the past. 
   
   Maybe yes, Maybe no. I have arguments for both. 
   
   ## The argiumets for "removing MSSQL From migrations":
   
   I am leaning towards removing them (so basically what this PR does), but I am a bit torn on that one. On one hand I would love to constraint and limit the "mssql" users to only use Airflow 2.7.3 and never, ever use airflow >= 2.8.0 for anything MSSQL. On the  other hand it feels a little weird. I'd love other's opinion - especially those who worked on the migrations in the past: @ephraimbuddy @dstandish @ashb @kaxil  @eladkal 
   
   ## The argument for "leaving MSSQL in migrations": 
   
   It does not really looks like it is necessary to remove them and does not cost us much. It's a bit sunken cost already and we are not going to maintain those migrations anyway. <aybe at some point in time we might want to merge them somehow and get a new "starting point" migration in the future, but I am not sure what is the value in removing mssql from the  old migrations. 
   
   It feels a bit weird to change the past migrations that are already marked as "I run this migration already" by alembic. I know it SHOULD NOT matter when you have Postgres/MySQL, but it feels a bit wrong to modify the migration scripts, especially that it does not hurt us to leave them there (we generally do not touch/maintain migrations too often).
   
   I see two potential prolems with removing the migrations for mssql from those migrations:
   
   * there is a risk we remove something here accidentally by missing it in review - there are a lot of files to be reviewed and lots of conditions, there is a slight risk we will miss it. But with careful review we can mitigate it.
   
   * there is a side-effect on --from-ref/--to-ref, --from-version/--to-version.  Since we are still supporting MSSQL for airlfow < 2.8.0, it would be nice if Airflow 2.8+ still supports migrations for older versions of airflow even for MSSQL.
   
   I think (but I might be wrong) - just leaving the mssql in migration code will just allow it to be run and it SHOULD work in general. So users will still be able - for example - to run `airflow db migrate --to-version 2.7.3` using Airflow 2.8, 2.9 etc. There will be no need to run those migration with `Airflow 2.7.3` strictly.
   
   ##  My current thinking
   
   But I am bit torn on that one. It also relates to the discussion we had on https://github.com/apache/airflow/pull/35861 and what  we want to tell our users who run MSSQL now.
   
   So far my points there at least were:
   
   * let's constraint the set of tool and environment as much as we can to reduce variability of environment
   * let's provide a tool/script that they can use - but also modify if needed, so that it's not part of "airflow" itself
   
   So if we follow and agree that line of thought - then yes, removing MSSQL from Airflow 2.8.0  and not allowing the MSSQL users to run `--to-version 2.7.3` feels about right. On the othere hand - if we allow for variability, and letting the users to do the migrations in various ways, it's ok to leave it in. 
   
   I wonder what you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1871997004

   Hey @jscheffl -> do you need my help / actions for that one? I am happy to create the separate repo, if needed and we could already arrange the PR in the way to be "ready to marge" when we start thinking about 2.9.0 - we want to get it all ready and prepared quite some time before 2.9.0 so that we do not miss another window of opportunity for MSSQL removal :).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "jscheffl (via GitHub)" <gi...@apache.org>.
jscheffl commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1408255345


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   100% on your argument line - and I really had the same thoughts, just did not write it like this.
   
   Mainly I started removing migrations when I cleaned (future) dead code in airflow/migrations/utils.py and then saw migration scripts failing because common functions missing. Migration code is not 100% separated.
   
   As we are shortly before the release and we need more days in discussing/debating about migrations I'd also be OK to split the PR in two. (with a minimum risk that I need to leave some migration dependency dead code behind).
   And still if somebody want to play with MSSQL there is the 2.7.3 release available, from there still you could up-and downgrade. Leaving traces behind if 2.8.0 does not support MSSQL anyway is just some leftover cleanup. And I would not like to keep all MSSQL dependencies in the future (2.9++?) just because in ancient history we supported MSSQL - rather clean it soon.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1846225990

   > Anyway... so let put this PR on pause and merge it for 2.9.0...
   
   Well. I think we have all the building-blocks in place. Yes it might meen few people doing some out-of-hours stuff - but for a good rason. I think if we miss the window of opportunity with announcements and "potentially breaking" changes, then we won't be abe to do much (due to cherry-pickability of some fixes). We have the script to migrate, we can put it separately (few minutes tomorrow mid-day)  - and we are done. IMHO - the risk is small.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1872248350

   > @potiuk Yes, before re-basing another 10 times, I'd like to get this off the plate and merged. So your advise and prep help might help:
   > 
   > * I assume all feedback consistently points that the migration tool shall not be in the Airflow mono repo. If there would be any kind of other GIT repo that you can create or point to to bring [Add a migration script tooling support to migrate off mssql #35861](https://github.com/apache/airflow/pull/35861) to then this would be a good pre-requisite. This PR needs to point to migration tool in docs. I do not have a strong opinion where to store the migration tool but woul dbe better also not pointing to my personal account.
   
   I created https://github.com/apache/airflow-mssql-migration . In about 5 minutes permissions should be set there and you should be able to initiate it an push the migration tooling there (even directly without PR/Approval). We can setup branch protection later when it settles down, for now it's ok to **just** push the changes there (but if you prefer to open PRs it's also fine - happy to review them).
   
   > * This PR is quite large, the debate around migration script clean is not settled. I would be OK to revert. Maybe separating into at least two PRs might make discussion easier. (PR1: Removal of support form core + breeze / PR2: Clean up of leftovers in migration tooling)
   
   Yeah. I would rather do it in reverse:
   
   - Create migration tooling outside (it does not have to be final version - just enough to be "working" - we can iterate and improve it later
   
   - Add link to it for people who want to migrate in PR 1 (we will keep updating it when we know exactly which version we should migrate to and maybe get some clarification about migration process
   
   - Only after we have this set up as base, we should have PR 2 with removing the support. There is a bit of problem potentially for cherry-picking, but I think the probability of difficult change that we would like to cherry-pick that will break MSSQL is very low and we will still have mssql tests running in v2-8-test so as long as @ephraimbuddy is ok with it and we commit to quickly fixing those kind of issues (or not-cherry-picking them in the first place) we can merge it even now.
   
   > * Development advise in regards: Would it be OK to merge this to main "soon" or for better patch maintenance of 2.8-line do we need to keep this removal as a PR open for longer? Otherwise nobody really made a review, and I don't want all the efforts being dropped/wasted as actually I had positive intention as a follow-up of the vote to clean-up MSSQL but missing support for review here.
   
   Yeah. See above, I think with just dropping MSSQL the chance of breaking cherry-picking is rather low (and in that case we would likely not cherry-pick it anyway - as I see it only happening for new features). But I'd love to hear what others say as well (especially @ephraimbuddy ).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "jscheffl (via GitHub)" <gi...@apache.org>.
jscheffl closed pull request #35868: Remove MSSQL support from Airflow core
URL: https://github.com/apache/airflow/pull/35868


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1408517163


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   Yeah. The problem with size here is that it's "all" or "nothing" in case of migrations (and those are the big ones) . All the rest is "2 minutes" review :). 
   
   So yeah, agree we should - I think - quickly - get to conclusions on how we should do it and start reviewing/merging what we agreed about.
   
   CC: everyone else who migh have an opinion :) . Maybe also worth to send it to devlist  to raise "importance" of it - might be in the response of the vote/consensus we had about MSSQL removal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "jscheffl (via GitHub)" <gi...@apache.org>.
jscheffl commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1408255345


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   100% on your argument line - and I really had the same thoughts, just did not write it like this.
   
   Mainly I started removing migrations when I cleaned (future) dead code in airflow/migrations/utils.py and then saw migration scripts failing because common functions missing. Migration code is not 100% separated.
   
   As we are shortly before the release and we need more days in discussing/debating about migrations I'd also be OK to split the PR in two. (with a minimum risk that I need to leave some migration dependency dead code behind).
   And still if somebody want to play qith MSSQL there is the 2.7.3 release available, from there still you could up-and downgrade. Leaving traces behind if 2.8.0 does not support MSSQL anyway is just some leftover cleanup. And I would not like to keep all MSSQL dependencies in the future (2.9++?) just because in ancient history we supported MSSQL - rather clean it soon.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "jscheffl (via GitHub)" <gi...@apache.org>.
jscheffl commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1408345648


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   Main risk I see is this PR is quire large, if we merge last-minute, there is no buffer to see if we had a unforeseen side-effect. But I am okay if you see it relaxed :-D



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1406978989


##########
Dockerfile:
##########
@@ -342,87 +342,6 @@ if [[ ${INSTALL_MYSQL_CLIENT:="true"} == "true" ]]; then
 fi
 EOF
 
-# The content below is automatically copied from scripts/docker/install_mssql.sh
-COPY <<"EOF" /install_mssql.sh
-#!/usr/bin/env bash
-set -euo pipefail
-
-. "$( dirname "${BASH_SOURCE[0]}" )/common.sh"
-
-: "${AIRFLOW_PIP_VERSION:?Should be set}"
-
-: "${INSTALL_MSSQL_CLIENT:?Should be true or false}"
-
-COLOR_BLUE=$'\e[34m'
-readonly COLOR_BLUE
-COLOR_RESET=$'\e[0m'
-readonly COLOR_RESET
-
-function install_mssql_client() {
-    # Install MsSQL client from Microsoft repositories
-    if [[ ${INSTALL_MSSQL_CLIENT:="true"} != "true" ]]; then
-        echo
-        echo "${COLOR_BLUE}Skip installing mssql client${COLOR_RESET}"
-        echo
-        return
-    fi
-    echo
-    echo "${COLOR_BLUE}Installing mssql client${COLOR_RESET}"
-    echo
-    local distro
-    local version
-    distro=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
-    version=$(lsb_release -rs)
-    local driver=msodbcsql18
-    curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg
-    curl --silent https://packages.microsoft.com/keys/microsoft.asc | apt-key add - >/dev/null 2>&1
-    curl --silent "https://packages.microsoft.com/config/${distro}/${version}/prod.list" > \
-        /etc/apt/sources.list.d/mssql-release.list
-    apt-get update -yqq
-    apt-get upgrade -yqq
-    ACCEPT_EULA=Y apt-get -yqq install -y --no-install-recommends "${driver}"
-    rm -rf /var/lib/apt/lists/*
-    apt-get autoremove -yqq --purge
-    apt-get clean && rm -rf /var/lib/apt/lists/*

Review Comment:
   Update: removing both odbs and pymssql makes the layer 88MB so ~ 50 MB less. Not sure how much of that is pymssql and how much mssql odbc. Assuming 50/50 we can save 25 MB by removing msodc which is `< 2%` of the image size. IMHO not worth to remove anything here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1406955218


##########
Dockerfile:
##########
@@ -342,87 +342,6 @@ if [[ ${INSTALL_MYSQL_CLIENT:="true"} == "true" ]]; then
 fi
 EOF
 
-# The content below is automatically copied from scripts/docker/install_mssql.sh

Review Comment:
   As explained on Slack - we likely do not want to remove this script. This script is to install MSSQL client libraries and pymssql client. Those are not necessarily the same as drivers/libraries used by anyone who uses mysql as Airflow Metadata DB.
   
   The pymssql driver installed in this step (together with binary libraries) is used by the `apache-airflow-providers-microsoft-mssql` privider, NOT by Airflow's SQLAlchemy to access Airflow's metadata DB: https://github.com/apache/airflow/blob/main/airflow/providers/microsoft/mssql/hooks/mssql.py 
   
   The driver that we recommend to use MSSQL as Metadata DB is a different one (it's `pyodbc` one)
   
   SQL Alchemy might use differnt drivers to communicate with MSSQL https://docs.sqlalchemy.org/en/20/dialects/mssql.html 
   
   We are testing in CI using `mssql+pyodbc` driver not `mssql+pymssql`. I can't remember the details, but I believe there were issues with using pymssql driver. We might want to take a look at the history of adding MSSQL as core db. See:  https://github.com/apache/airflow/blob/main/scripts/ci/docker-compose/backend-mssql.yml
   
   
   So the `pymssql` driver is part of the reference image of ours mostly because we want to be able to support `apache-airflow-microsoft-mssql` provider out-of-the-box as we recognise MSSQL as a database our users might often interact with. The goal of this PR is to remove any `MSSQL as Core Airflow Metadatadb backend` - not to remove pymssql library from the image.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1406958011


##########
Dockerfile:
##########
@@ -1259,29 +1176,19 @@ RUN bash /scripts/docker/install_os_dependencies.sh dev
 
 ARG INSTALL_MYSQL_CLIENT="true"
 ARG INSTALL_MYSQL_CLIENT_TYPE="mysql"
-ARG INSTALL_MSSQL_CLIENT="true"
 ARG INSTALL_POSTGRES_CLIENT="true"
 ARG AIRFLOW_PIP_VERSION
 
 ENV INSTALL_MYSQL_CLIENT=${INSTALL_MYSQL_CLIENT} \
     INSTALL_MYSQL_CLIENT_TYPE=${INSTALL_MYSQL_CLIENT_TYPE} \
-    INSTALL_MSSQL_CLIENT=${INSTALL_MSSQL_CLIENT} \
     INSTALL_POSTGRES_CLIENT=${INSTALL_POSTGRES_CLIENT}
 
-# Only copy mysql/mssql installation scripts for now - so that changing the other
+# Only copy mysql installation scripts for now - so that changing the other
 # scripts which are needed much later will not invalidate the docker layer here
-COPY --from=scripts install_mysql.sh install_mssql.sh install_postgres.sh /scripts/docker/
-
-# THE 3 LINES ARE ONLY NEEDED IN ORDER TO MAKE PYMSSQL BUILD WORK WITH LATEST CYTHON

Review Comment:
   As explained above - I think we do not want to remove it in this PR. We might want to see if we still need to use the Cython workaround to bulld pymssql package on ARM images.
   
   
   Cython compatibility seems to be fixed in https://github.com/pymssql/pymssql/blob/master/ChangeLog.rst#version-228---2023-07-30----mikhail-terekhov
   
   > Compatibility with Cython. Thanks to matusvalo (Matus Valo) (fix #826).
   
   So likely yes - we can remove the workaround: but IMHO  it shoudl be a separate PR reverting changes imple
   mented in https://github.com/apache/airflow/pull/32748 - and we shoudl not remove `pymssql`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1412851278


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   > On the other hand it feels a little weird
   
   Does it? even before we started experimental support we accepted changes to migration scripts to better support MsSQL. It doesn't cost us anything other than few min of review.
   i think we should keep it and also accept PRs improving/fixing issues related to MsSQL.
   
   The removal of MsSQL support from our side just means that we do not test Airflow deployment against MsSQL backend. It doesn't mean that it won't work it just mean that we don't guarantee it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1829892711

   > I understand (now) that mssql is needed in the docker for the provider package, which is fail. reverted the last two (functional) commits, leaving the driver and the install options for the docker build scripts like before.
   
   Actually after https://github.com/apache/airflow/pull/35924 gets merged, we can reconsider removal of this part. After mergint #35924 - the "install MSSQL" will only install the odbc drivers -we do not need any more special case where we had to build pymssql separately at "system" level. So we **could** remove it if we want.
   
   This will cause potential incompatibilities, because the image will not have mssql odbc libraries which were there before, so it **might** break somoene's workflow, however I'd say it's a bit non-core-airflow issue any more and anyone could build their image with those drivers added. 
   
   We would have to add "Changelog" entry explaining that in https://airflow.apache.org/docs/docker-stack/changelog.html but we've been doing similar changes in the past. In this case it's even better because this is the first "bookworm" image  - see the current changelog for 2.8.0 https://github.com/apache/airflow/blob/main/docs/docker-stack/changelog.rst 
   
   And we are going to move "latest" to use Python 3.11 as well as discussed in devlist thread. 
   
   So there are more changes that are  potentially backwards-incompatible coming, removing mssql odbc library, seems like not a big issue.
   
   @Taragolis @jscheffl, others WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1413040413


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   Yeah. It was a "little" weird. But I could go either way.
   
   > The removal of MsSQL support from our side just means that we do not test Airflow deployment against MsSQL backend. It doesn't mean that it won't work it just mean that we don't guarantee it.
   
   I think we need a a bit more than that. 
   
   There are two potential scenarios for our MSSQL current users:
   
   1) Scenario 1 - we do not hard/stop mssql users for migration and it will maybe work and maybe not
   
   If we leave it in "maybe it will work" then our users will continue using airflow until it breaks and willl come to us when it will stop (inevitably) - the whole point is that the "mssql-specific" code in Airflow slows us down, so getting rid of it is a good idea and also it's likely in a few months time, it will catastrophically break MSSQL installation - say somoene already moved to 2.9.0 - still using MsSQL and then it will break in 2.10.0. Now. they have no way to go back (becuase migration scripts will also break for example) and we have a user who has broken installation that they can do nothing about. 
   
   And they will complaint to us, will take our time, will cry for help and workarounds regardless what is the status of MSSQL - and most of all they will loose the trust in Airflow as a good solution for them because it won't work for them for possibly days when they will try to recover somehow.  And this all will be happening 2 years from now when nobody here will even remember what kind of changes they were there implmented and likely many of people who are now in Airflow won't even be around.
   
   And this is actually one of the reasons we want to remove MSSQL - we do not WANT to get issues about MSSQL for 2.8+ 
   
   Is it their fault? Technically yes - they have not read the release notes (I have no stats but there is quite a number of people who don't).
   
   2) Scenario 2 - we forbid MSSQL and remove MSSQL code straight away
   
   We can explicitly forbid migration to 2.8.0 for MSSQL users. that will be much cleaner solution. We will provide them the migration script on 2.7.3 and how to migrate to other database. The users will see it immediatley when they will attempt to migrate - and they will have two choices: a) stay with Airflow 2.7.3 longer b) migrate to other DB.
   
   I think this is far more plausible scenario - both for our users and maintainers (including future ones) of Airflow. The cut is clean, and we know already that (some) people do not read release notes so basically - if we allow MSSQL users to migrate to 2.8.0, then it's all but guaranteed that some of our users will actually get into troubles sooner or later. We are basically accepting the fact that some percentage of our users, rather than stopped from migration - will migrate and eventually get in the troubles.
   
   
   So I'd personally be for even saying:
   
   ```
   if mssql => raise Error ("Migrate to Postgres or MySQL").
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #35868:
URL: https://github.com/apache/airflow/pull/35868#discussion_r1413040413


##########
airflow/migrations/versions/0027_1_10_0_add_time_zone_awareness.py:
##########
@@ -139,11 +139,11 @@ def upgrade():
             type_=mysql.TIMESTAMP(fsp=6),
         )
     else:
-        # sqlite and mssql datetime are fine as is.  Therefore, not converting

Review Comment:
   Yeah. It was a "little" weird. But I could go either way.
   
   > The removal of MsSQL support from our side just means that we do not test Airflow deployment against MsSQL backend. It doesn't mean that it won't work it just mean that we don't guarantee it.
   
   I think we need a a bit more than that. 
   
   There are two potential scenarios for our MSSQL current users:
   
   1) Scenario 1 - we do not hard/stop mssql users for migration and it will maybe work and maybe not
   
   If we leave it in "maybe it will work" then our users will continue using airflow until it breaks and willl come to us when it will stop (inevitably) - the whole point is that the "mssql-specific" code in Airflow slows us down, so getting rid of it is a good idea and also it's likely in a few months time, it will catastrophically break MSSQL installation - say somoene already moved to 2.9.0 - still using MsSQL and then it will break in 2.10.0. Now. they have no way to go back (becuase migration scripts will also break for example) and we have a user who has broken installation that they can do nothing about. 
   
   And they will complain to us, will take our time, will cry for help and workarounds regardless what is the status of MSSQL - and most of all they will loose the trust in Airflow as a good solution for them because it won't work for them for possibly days when they will try to recover somehow.  And this all will be happening 2 years from now when nobody here will even remember what kind of changes they were there implmented and likely many of people who are now in Airflow won't even be around.
   
   And this is actually one of the reasons we want to remove MSSQL - we do not WANT to get issues about MSSQL for 2.8+ 
   
   Is it their fault? Technically yes - they have not read the release notes (I have no stats but there is quite a number of people who don't).
   
   2) Scenario 2 - we forbid MSSQL and remove MSSQL code straight away
   
   We can explicitly forbid migration to 2.8.0 for MSSQL users. that will be much cleaner solution. We will provide them the migration script on 2.7.3 and how to migrate to other database. The users will see it immediatley when they will attempt to migrate - and they will have two choices: a) stay with Airflow 2.7.3 longer b) migrate to other DB.
   
   I think this is far more plausible scenario - both for our users and maintainers (including future ones) of Airflow. The cut is clean, and we know already that (some) people do not read release notes so basically - if we allow MSSQL users to migrate to 2.8.0, then it's all but guaranteed that some of our users will actually get into troubles sooner or later. We are basically accepting the fact that some percentage of our users, rather than stopped from migration - will migrate and eventually get in the troubles.
   
   
   So I'd personally be for even saying:
   
   ```
   if mssql => raise Error ("Migrate to Postgres or MySQL").
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Remove MSSQL support from Airflow core [airflow]

Posted by "jscheffl (via GitHub)" <gi...@apache.org>.
jscheffl commented on PR #35868:
URL: https://github.com/apache/airflow/pull/35868#issuecomment-1846192314

   > I'm not confortable having this in 2.8.0. Let's leave it for 2.9.0 because the change is big and close to the release time. Also, I prefer we don't remove the mssql codes but instead let users know by warning that this is no longer supported if they are using mssql. I understand @potiuk concerns about removing it this way but I feel this approach is harsh. We can start by warning users and then total removal. My opinion though.
   
   I understand the change is large, this is a reason. Raised this a bit late, unfortunately review was stale for a number of days :-(
   
   Anyway, admins should have been warned already, there was a post in the docs already, see: https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-mssql-database
   
   Anyway... so let put this PR on pause and merge it for 2.9.0...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org