You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <ja...@potiuk.com> on 2023/08/05 20:04:03 UTC

Re: [LASY CONSENSUS] Updating constraints for pymssql after CPython 3 "mayhem" for Airflow 2.5 -2.6

Since there were no objections, I moved the constraints - replacing
pymssql==2.2.7 with 2.2.8 for Airflow 2.5.0 - 2.6.3.

I've done that withe the new "breeze release-management
update-constraints" command: PR for it
https://github.com/apache/airflow/pull/33144 - including instructions
on how we can make such updates in the future (in those rare cases we
need).

J.

On Mon, Jul 31, 2023 at 2:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > Generally, I like the idea of having such a tool, but we need to make it
> pretty general so that when such a tight dependency issue comes up,
> the tool should just be able to pull it off by building the rest of the
> ecosystem while keeping "constraints".
>
> Absolutely. I checked that PyMSSQL has no dependencies on it's own so
> bumping it does not affect the other part of the "environment". it
> would not be possible in more general cases, it will always be a very,
> very rare event and we only will do it when there is a problem with we
> have reproducible build problem. We NEVER update constraints when
> there is just a new dependency version with a fix. This is made very
> clear here: https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html#fixing-constraints-at-release-time:
>
> > The released “versioned” constraints are mostly fixed when we release Airflow version and we only update them in exceptional circumstances. For example when we find out that the released constraints might prevent Airflow from being installed consistently from the scratch.
> > In normal circumstances, the constraint files are not going to change if new version of Airflow dependencies are released - not even when those versions contain critical security fixes. The process of Airflow releases is designed around upgrading dependencies automatically where applicable but only when we release a new version of Airflow, not for already released versions.
>
> This is the "exceptional circumstance" mentioned above.
>
> > Btw, I think this issue is also for intel machines and not just ARM
> installations. Ref:
> > https://github.com/cython/cython/issues/5541#issuecomment-1641654183
>
> Sort of. To be precise - it was triggered when the binary wheel
> released by PyMSSQL maintainers for the given architecture/os/cpp
> library version etc. is not used. When you see
> https://pypi.org/project/pymssql/2.2.7/#files  you will see they
> released MANY of those for x86 (but none for ARM). So generally linux
> + x86 were covered. UNLESS someone had an architecture that did not
> fall into one of the 50 or so binary variants (so yeah it was a bit of
> simplification to say "only ARM", but generally speaking it would be a
> very rare case that one of those binaries did not match the actual x86
> architecture). Not sure why this one did not match the binaries - they
> seem to be there - maybe that someone posting it had a setting that
> forces compilation (via --no-binary :all: for example) when PIP
> installs packages. BTW. This is also one of the reasons I want to wait
> until the maintainers will build and publish all the binary packages
> that are missing for 2.2.8 now
> https://pypi.org/project/pymssql/2.2.8/#files. Currently there are
> just a few of those (only mac binaries are present there - see the
> screenshots and compare to 2.2.7 ).
>
> To say ARM users are impacted is largely because they are 100%
> impacted compared to likely less than few % for Intel. The pymssql
> project does not publish ARM binaries at all, so pymssql is always
> compiled locally when installed on ARM.
>
> > Also, not sure if this can be fixed by constraining Cython, see thread:
> > https://discuss.python.org/t/no-way-to-pin-build-dependencies/29833/21
>
> I am not going to constraint Cython - this is not going to work for
> the reasons described there (and in my explanation - build isolation
> does not take constraints into account, they only take them into
> account when  PIP_CONSTRAINT is used but not when --pip-constraint is
> used - this is why I explain that the workaround was mostly
> "accidental" - PIP_CONSTRAINT working when build isolation is on,
> could be considered as bug actually because it behaves differently
> when the command line flag is used). As described above - what I am
> going to do (see my description) I am going to bump pymssql to 2.2.8
> instead. I had to wait for 2.2.8 release done by pymssql maintainers
> in order to do that. The 2.2.8 release is compatible with Cython 3, so
> when someone will install airflow 2.5.1 with 2.5.1 constraints for
> example and this will trigger pymssql compilation, it will pull 2.2.8
> version
>
> J.
>
>
>
> On Mon, Jul 31, 2023 at 12:36 PM Amogh Desai <am...@gmail.com> wrote:
> >
> > Generally, I like the idea of having such a tool, but we need to make it
> > pretty general so that when such a tight dependency issue comes up,
> > the tool should just be able to pull it off by building the rest of the
> > ecosystem while keeping "constraints".
> >
> > Btw, I think this issue is also for intel machines and not just ARM
> > installations. Ref:
> > https://github.com/cython/cython/issues/5541#issuecomment-1641654183
> >
> > Also, not sure if this can be fixed by constraining Cython, see thread:
> > https://discuss.python.org/t/no-way-to-pin-build-dependencies/29833/21
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> > On Mon, Jul 31, 2023 at 3:57 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > > BTW. I will run the consensus question until Thursday 3rd of August
> > > 2023 1pm noon.
> > >
> > > On Mon, Jul 31, 2023 at 12:20 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> > > >
> > > > Hello everyone,
> > > >
> > > > TL;DR; I have a small proposal - I would like to fix constraints for
> > > > all Airflow 2.5.*  and 2.6*  by updating pymssql to (released today)
> > > > 2.2.8 version from 2.2.7.
> > > >
> > > > I still have to wait for the "complete" release. They have not yet
> > > > released linux binary variants of the packages for 2.2.8 and people
> > > > watching it flagged it to the maintainers. but I wanted to get
> > > > consensus on it before I start doing it.
> > > >
> > > > Currently users installing MSSQL provider for their ARM-based airflow
> > > > are experiencing "build failed" when pymssql is installed. They have
> > > > to use a workaround described here -
> > > > https://github.com/apache/airflow/issues/32672#issuecomment-1647007726
> > > > and the proposal aims to fix it so that the workaround will not be
> > > > needed when using constraints. There are already few issues about it
> > > > in our repo:
> > > >
> > > > This is one of the extremely rare cases (happened already 2 times over
> > > > last 2 years) where our "reproducible installation" stopped working
> > > > for Python versions - because of the `pip` tooling update that we have
> > > > no control over, but thanks to ability of updating constraints, we can
> > > > fix it by updating constraints.
> > > >
> > > > If we get consensus I will use that opportunity to add some tooling to
> > > > make it easier to do such updates in the future - it requires creating
> > > > new branch for every versio and moving constraint tags - but this is
> > > > easy to automate. And I will have an excuse to develop a small tool do
> > > > help with that that - which we will be able to use in the future in
> > > > simillar cases (I've done it manually before).
> > > >
> > > > Some more context:
> > > >
> > > > Two weeks ago, on 17th of July, Cython released a long-in-the-making
> > > > 3.0.0 version with some backwards-incompatible changes, and while a
> > > > lot of the packages have been made compatible, pymssql was one of the
> > > > packages that was not.  The issue did not affect x86 users, because
> > > > pymssql binaries were pre-compiled in PyPI
> > > > https://pypi.org/project/pymssql/2.2.7 but ARM users have problems
> > > > installing it, because it needs to be compiled on-the-flight for them.
> > > >
> > > > It caused quite a bit of mayhem in Python ecosystem - especially for
> > > > projects that are not as up-to-date as Airflow is with all our
> > > > dependencies - most of our dependecies are automatically updated in
> > > > the constraints as soon as new versions are released, and many of them
> > > > have binary packages already. So given how big of the problem it was
> > > > for some other projects, having just pymssql being problematic is
> > > > quite cool and shows that our approach works :).
> > > >
> > > > Unfortunately we have no control over which version of Cython is used
> > > > when compiling PyMSSQL (this is something described by pymssql package
> > > > - and new versions of pip uses "build-isolation" enabled by default,
> > > > so it's only up to the package itself to decide on the version of
> > > > build tools that are used. There is a "mostly accidental" - I think -
> > > > workaround with PIP_CONSTRAINTS environment variable but it is rather
> > > > complexi-sh to pull, especialy in custom docker images based on the
> > > > slim images.
> > > >
> > > > I've implemented the workaround for our ARM images last week to make
> > > > them work - so you can see it's quite a bit complex-ish:
> > > > https://github.com/apache/airflow/pull/32748
> > > >
> > > > The 2.2.8 version of pymssql has only one change:
> > > >
> > > > > Version 2.2.8 - 2023-07-30 - Mikhail Terekhov
> > > > > Compatibility with Cython. Thanks to matusvalo (Matus Valo) (fix #826).
> > > >
> > > > Why 2.5+ ?
> > > >
> > > > a) because  ARM suppport for MsSQL has been introduced in 2.5.1
> > > > b) because 2.4 used 2.2.5 version of PyMSSQL and there were few more
> > > > changes in 2.2.6 so there is a (low) risk it will break something
> > > > else.
> > > >
> > > > Note, that we do NOT have to rebuild our images, when the pymssql
> > > > 2.2.7 has been build before Cython 3.0.0, it is good to go. The fact
> > > > that 2.2.8 only change is to make it works with CPython to build - we
> > > > do not need to rebuild and re-release our images.
> > > >
> > > > Can we get consensus on it? Anyone has anything against it ?
> > > >
> > > > J.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > For additional commands, e-mail: dev-help@airflow.apache.org
> > >
> > >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
For additional commands, e-mail: dev-help@airflow.apache.org