You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <ja...@potiuk.com> on 2022/02/06 12:32:29 UTC

Re: [DISCUSSION] Upper bounds on our dependencies

I do not think we reached consensus yet on those but I thought maybe just
attempting it and showing how much more consistent and better it would
look,
I have attempted to describe the policies I had in mind and update our
dependencies in

https://github.com/apache/airflow/pull/21356

Ash - I think you had doubts initially, maybe just reviewing and showing
how it could look might be a good idea to clear up any doubts.
Alternatively maybe we can iterate on the PR (and discussion below) if
there is any counter-proposal (I think the current situation where the
policy is basically "random" is not really a policy).

As part of the PR I reviewed ALL our upper-bound limits (there were not too
many as I suspected) and tried to either reverse-engineer where the
limits are from (and document them) or left comment and TODO to attempt to
remove the limit (those should be separate PRs and I am happy to attempt at
least some of those as a follow-up). I also removed all upper limits where
there are no ideas on what and when and whether at all the upper limits
need to be applied because we are already at the latest version and there
are no plans from the libraries to release releases with breaking changes
(at least it's not obvious there are).

Let me know what you think also - others who have not spoken yet - maybe
looking at the PR it will be clearer what I had in mind.

J.


On Tue, Jan 25, 2022 at 10:54 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Actually - if we make a clear policy about what we agree clear policies
> when we do upper-bound and when we don't -  I am super happy to write a
> short documentation for Poetry and PIP tools users on exactly how they can
> install airflow reliably (I can even add a small tool that will take our
> constraints an generate the right installation configuration for
> poetry/piptools.
>
> But without a clear policy on what our approach is, that might not be
> straightforward.
>
> J,
>
>
> On Tue, Jan 25, 2022 at 10:48 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> > I am wondering if this will have a negative impact on installations
>> using tools that do not support constraints files like poetry or
>> pip-tools. These tools are not officially supported, but many users
>> may use them nonetheless, so I think it is worth considering before
>> making any changes that affect them. Unfortunately, I have too little
>> experience with them to say at the same time how your proposal will
>> affect it.
>>
>> I think this is not a change really, just clarification of our policies.
>> I have not counted it but by rough look we already have some ~70% of our
>> dependencies NOT upper bound already.
>>
>> We have no documentation, statement, nothing to explain - why. Why some
>> dependencies are upper bound and some are not. Looks like it is pretty
>> random now.
>>
>> So this is not a "change" but more
>>
>> * "statement on what our policy is" and clear guidelines on how to add
>> future dependencies
>> * possibly "review" the current set and make sure that our current
>> approach follows the policy we define.
>>
>> Re: poetry, piptools - I think we cannot satisfy them if we want to keep
>> our "both application and library" approach.
>> Those tools made an opinionated (and in cases of many projects very
>> reasonable BTW) decision that you are either "library" or "application" but
>> not both - and they propose different approaches for "application" and
>> "library". We are both. None of the solutions proposed by poetry and pip
>> tools work for us. There is no way we can make them work well for us.
>>
>> I really like poetry for one, and I heartily recommend most of "standard"
>> projects to use it as they have some really cool tooling and make it super
>> easy to manage dependencies. We are just different because we are both
>> application and library and poetry does not support that.
>> I think at least - if we clarify which of our dependencies are
>> "application" (and upper-bound them) it will give higher chances for the
>> users that choose to use poetry/pip-tools to install "base" version of
>> airflow - but also will give them a chance to know that they have to
>> manually pin some of the non-application dependencies if they fail for
>> them. Poetry and pip tools users can simply take our constraints and pin
>> them manually when they are installing airflow - this is a solution that I
>> mentioned many times to the users who raised their questions about it.
>>
>> J.
>>
>>
>> On Tue, Jan 25, 2022 at 3:26 AM Kamil Breguła <dz...@gmail.com> wrote:
>>
>>> I am wondering if this will have a negative impact on installations
>>> using tools that do not support constraints files like poetry or
>>> pip-tools. These tools are not officially supported, but many users
>>> may use them nonetheless, so I think it is worth considering before
>>> making any changes that affect them. Unfortunately, I have too little
>>> experience with them to say at the same time how your proposal will
>>> affect it.
>>>
>>> pon., 24 sty 2022 o 23:08 Jarek Potiuk <ja...@potiuk.com> napisał(a):
>>> >
>>> > Hey Ash (and everyone).
>>> >
>>> > I think we are really close to some consensus, but we are coming from
>>> two directions. I do agree there will be some of the deps that are more
>>> important to keep "in-check" than the rest.
>>> > I think it makes perfect sense to distinguish the dependencies we
>>> really "astrongly feel" are important for airflow vs. those that "are
>>> important to our users".
>>> > I am perfectly ok with upper-bounding deps that we really think are
>>> important.
>>> >
>>> > But with >500 deps total for aifrflow (transitive) I think we should
>>> have. shortlist of those we care only - and the rest should be upper-open
>>> by default.
>>> > And "short" is a good name, because those are the dependencies that we
>>> take responsibility of manually upgrading when needed.
>>> >
>>> > I would be happy if we can come up wit the list that is really
>>> important and "core" to airflow - and "really likely" to break things with
>>> new major release. And those are the things we care more as an
>>> "application" that our users care as "libraries". This is my subjective
>>> list (those are all things taht we care :
>>> >
>>> > * sqlalchemy
>>> > * alembic
>>> > * flask (and all flask libs)
>>> > * flask-app-builder
>>> > * werkzeug  (yeah. not surprisingly - including last meetup I spoke
>>> about Airflow, my previous presented mentioned that Werkzeug comes up as
>>> "problematic case" and I was only able to confirm that :).
>>> >
>>> > + all those that we know break things (for example we know a bunch of
>>> google deps <2.0.0 that we know are breaking things already - but we can
>>> easily describe those)
>>> >
>>> > I think if we upper-bound those (and make appropriate comment in our
>>> setup.py/cfg)  this would be pretty "good" setup. And having a
>>> shortlist of those that we want to keep upper-bound makes sense and is
>>> manageable.
>>> >
>>> > WDYT?
>>> >
>>> > J
>>> >
>>> >
>>> > On Mon, Jan 24, 2022 at 10:30 PM Collin McNulty
>>> <co...@astronomer.io.invalid> wrote:
>>> >>
>>> >> Ash,
>>> >>
>>> >> I think your code snippet will result in getting the latest version
>>> of pandas even if there is an upper constraint on pandas in apache-airflow.
>>> I simulated with these commands on the latest pip available.
>>> >>
>>> >> ```
>>> >> pip install apache-airflow "pandas<1.4"
>>> >> pip freeze | grep pandas
>>> >> pip install -U pandas
>>> >> pip freeze | grep pandas
>>> >> ```
>>> >>
>>> >> Collin McNulty
>>> >>
>>> >> On Mon, Jan 24, 2022 at 3:11 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>> >>>
>>> >>> To your inital thoughts:
>>> >>> - lower bounds: yeah, introduce them when we need to, so a new
>>> package could be added with no lower bound, and then a bound placed only
>>> when we discover we need a minium version (include bug fix etc. PR author
>>> can choose wether or not to have a lower bound initially.)
>>> >>> - not have upper bound by default: see below
>>> >>> - only introduce upper bound if breaking change: see below
>>> >>> - specify reason for upper bound. yes, 100% to this
>>> >>>
>>> >>>
>>> >>> I think you have created a strawman argument here to say that the
>>> '<2.0'constraint is useless', as Panda's documentation[1] say they follow
>>> SemVer so 2.0 is going to break _something_. It didn't help us with this
>>> specific problem.
>>> >>>
>>> >>> As you mentioned (possibly only in slack) constraints protect
>>> against initial install only, but not future in-place upgrades. Without an
>>> upper constraint I could do
>>> >>>
>>> >>> ```
>>> >>> pip install --constraint $FILE apache-airflow
>>> >>> pip install -U pandas
>>> >>> ```
>>> >>>
>>> >>> And then airflow could be broken. And being able to upgrade without
>>> constraints files is the only way to apply a security upgrade to a module
>>> (otherwise we'd have to update the constraint files for all supported
>>> versions when ever any of our transitive dependencies has a security
>>> update, not something we are in a position to do).
>>> >>>
>>> >>> I think it also depends how we use the dependency in Airflow -- some
>>> of them are so core that we don't want anything to break, but others (such
>>> as Pandas in this case) where our use of them in Airflow is actually fairly
>>> superficial.
>>> >>>
>>> >>> So I think I would just tone down your middle suggestions slightly
>>> -- Upper limits dependencies can be optional. I think the only place we
>>> really differ is what is "foreseen".
>>> >>>
>>> >>> -ash
>>> >>>
>>> >>> [1]: https://pandas.pydata.org/docs/development/policies.html
>>> >>>
>>> >>>
>>> >>> On Sun, Jan 23 2022 at 17:35:04 +0100, Jarek Potiuk <
>>> jarek@potiuk.com> wrote:
>>> >>>
>>> >>> Just to illustrate it better. Here is an extremely good example from
>>> today:
>>> >>>
>>> >>> Our main build started to fail today (
>>> https://github.com/apache/airflow/runs/4913148486?check_suite_focus=true)
>>> because Pandas released a new 1.4.0 version last  night:
>>> https://pypi.org/project/pandas/1.4.0/
>>> >>> The root cause of the failure was that Pandas 1.4.0 requires
>>> SqlAlchemy 1.4.0 or above. But surprisingly -  there is no hard limit in
>>> the released Pandas. What's more - Pandas does not even mention that it
>>> **might** need sqlalchemy at all (and in version >=1.4.0 !).
>>> >>>
>>> >>> What happens is that in runtime, during running our tests we get
>>> this error:
>>> >>> `Pandas requires version '1.4.0' or newer of 'sqlalchemy' (version
>>> '1.3.24' currently installed).`
>>> >>>
>>> >>> We already had a limitation (not documented why) on Pandas
>>> >>>
>>> >>> pandas_requirement = 'pandas>=0.17.1, <2.0'
>>> >>>
>>> >>> However this `<2.0` was completely useless in this case, because
>>> it's 1.4 version that broke parts of Airflow.
>>> >>> We assumed that the <2.0 limitation will protect us (apparently) but
>>> it did not. However, the failure in tests protected our constraints from
>>> being upgraded and in our "main" pandas is still 1.3.4. Our users who
>>> follow "use constraints" and those who use Airflow images, are fully
>>> protected against those kinds of problems.
>>> >>>
>>> >>> This particular problem will likely be solved in a few days when
>>> Flask Application Builder 3.4.4rc1 (
>>> https://pypi.org/project/Flask-AppBuilder/3.4.4rc1/|) will be released
>>> (It moves the upper bound limit for sqlalchemy to <1.5 and that was the
>>> only thing that blocked us from getting sqlalchemy 1.4).
>>> >>>
>>> >>> My point is that simply we do not know if any future version of any
>>> dependency will break Airflow. And any reasonable assumptions about that
>>> guessing from "Major" version is IMHO just wild guessing. And also by
>>> adding such limits - we are limiting our users to update to higher version
>>> of dependencies when it is harmless (very similarly as Flask Application
>>> Builder blocked us in this case from migration to sqlalchemy 1.4).
>>> >>>
>>> >>> The temporary fix (until FAB 3.4.4 is released) is here:
>>> >>>
>>> >>> https://github.com/apache/airflow/pull/21045
>>> >>>
>>> >>> I believe after FAB 3.4.4 is released we remove the fix we should
>>> just get rid of the pandas limitation. Constraints of ours protect our
>>> users when they install Airflow "according to our instructions with
>>> constraints" - which is the only official way of installing Airflow. We are
>>> protecting our users from those kind of events, but at the same time we
>>> should relax pretty much all the upper bounds to not block our users in the
>>> future in case they need to move to higher versions of dependencies
>>> released in the future that we have no idea if they will break, or not
>>> Airflow..
>>> >>>
>>> >>> Let me know your thoughts - I think this Pandas case is great to
>>> illustrate my point.
>>> >>>
>>> >>> J.
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sun, Jan 23, 2022 at 1:40 AM Jarek Potiuk <ja...@potiuk.com>
>>> wrote:
>>> >>>>
>>> >>>> Hello everyone,
>>> >>>>
>>> >>>> TL;DR; I think there is one thing about dependencies that we should
>>> have some agreement on to make some common approach. Namely about using
>>> upper bounds on our dependencies.
>>> >>>> Should we set some rules on when we set upper-bounds on the deps ?
>>> What rules should they be?
>>> >>>>
>>> >>>> Currently we use constraints to make sure Airflow in specific
>>> version can be installed using "golden" set of dependencies. Those are part
>>> of our CI, automatically updated and tests which makes it really nice as
>>> they are "fixed" for installation of particular version but they
>>> continuously upgrade (even across major versions) when they pass tests in
>>> CI.
>>> >>>>
>>> >>>> This all happens pretty much automatically by our CI, except the
>>> cases where our dependencies are upper-limited. We occasionally do some
>>> "setup.py", "setup.cfg" changes manually to bump some upper limits, but
>>> this is more the result of some external request or "occasional" cleanup to
>>> do it - rather than a regular process.
>>> >>>>
>>> >>>> Now, we have some dependencies that are upper-bound for good
>>> reasons and they are documented. We also have a number of dependencies that
>>> are not upper-limited. But I think this pretty inconsistent. Small excerpt
>>> from setup.cfg:
>>> >>>>
>>> >>>>     # Logging is broken with itsdangerous > 2
>>> >>>>     itsdangerous>=1.1.0, <2.0
>>> >>>>     # Jinja2 3.1 will remove the 'autoescape' and 'with'
>>> extensions, which would
>>> >>>>     # break Flask 1.x, so we limit this for future compatibility.
>>> Remove this
>>> >>>>     # when bumping Flask to >=2.
>>> >>>>     jinja2>=2.10.1,<3.1
>>> >>>>     ...
>>> >>>>     # python daemon crashes with 'socket operation on non-socket'
>>> for python 3.8+ in version < 2.2.4
>>> >>>>     # https://pagure.io/python-daemon/issue/34
>>> >>>>     python-daemon>=2.2.4
>>> >>>>     python-dateutil>=2.3, <3
>>> >>>>     python-nvd3~=0.15.0
>>> >>>>
>>> >>>> * itsdangerous is upper limited and the reason is specified in the
>>> comment (though we do not know when we could remove the limit)
>>> >>>>
>>> >>>> * Jinja is upper-limited and we know not only why but also when it
>>> can be removed
>>> >>>> * Python-daemon is not upper-limited but it has a comment why it is
>>> "lower-limited" (which is pretty useless IMHO)
>>> >>>> * python-dateutil is upper-limited but we do not know why
>>> >>>> * python-nvd3 is also upper limited (~0.15.0 - limits it to any
>>> 0.15.x version but 0.16 could not be installed)
>>> >>>>
>>> >>>> I think there are many inconsistencies and the way we treat
>>> dependency limits is pretty inconsistent - we have no rules agreed.
>>> >>>>
>>> >>>> I would love to discuss how we can standardize it or at least set
>>> some rules we can follow.
>>> >>>>
>>> >>>> My initial thoughts are:
>>> >>>>
>>> >>>> * we can (but do not have to) have lower bounds if we need to
>>> protect and we know about some limitations, but we do not need to document
>>> those. Just set the limit. The nice thing about lower bounds that they do
>>> not "decay" over time. By default latest "eligible" version is installed by
>>> PIP (but of course if other deps have conflicting upper bounds, keeping
>>> lower limits when unnecessary is a problem.
>>> >>>> * by default we should not have upper-bounds. We know it limits our
>>> users and constraints + CI tests are nicely handling the scenario when
>>> things are breaking.
>>> >>>> * we should only introduce upper bounds if we know that there is
>>> breaking change (or that it is very likely and "foreseen" -
>>> betas/rc2/discussions about upcoming breaking changes in the dependencies
>>> >>>> * when we introduce upper-bounds we should always specify why and
>>> what is the condition to remove them
>>> >>>>
>>> >>>> WDYT?
>>> >>>>
>>> >>>> J.
>>> >>>>
>>> >>>>
>>> >>>>
>>>
>>

Re: [DISCUSSION] Upper bounds on our dependencies

Posted by Jarek Potiuk <ja...@potiuk.com>.
Any more comments here or in the PR?

On Sun, Feb 6, 2022 at 1:32 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> I do not think we reached consensus yet on those but I thought maybe just
> attempting it and showing how much more consistent and better it would
> look,
> I have attempted to describe the policies I had in mind and update our
> dependencies in
>
> https://github.com/apache/airflow/pull/21356
>
> Ash - I think you had doubts initially, maybe just reviewing and showing
> how it could look might be a good idea to clear up any doubts.
> Alternatively maybe we can iterate on the PR (and discussion below) if
> there is any counter-proposal (I think the current situation where the
> policy is basically "random" is not really a policy).
>
> As part of the PR I reviewed ALL our upper-bound limits (there were not
> too many as I suspected) and tried to either reverse-engineer where the
> limits are from (and document them) or left comment and TODO to attempt to
> remove the limit (those should be separate PRs and I am happy to attempt at
> least some of those as a follow-up). I also removed all upper limits where
> there are no ideas on what and when and whether at all the upper limits
> need to be applied because we are already at the latest version and there
> are no plans from the libraries to release releases with breaking changes
> (at least it's not obvious there are).
>
> Let me know what you think also - others who have not spoken yet - maybe
> looking at the PR it will be clearer what I had in mind.
>
> J.
>
>
> On Tue, Jan 25, 2022 at 10:54 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Actually - if we make a clear policy about what we agree clear policies
>> when we do upper-bound and when we don't -  I am super happy to write a
>> short documentation for Poetry and PIP tools users on exactly how they can
>> install airflow reliably (I can even add a small tool that will take our
>> constraints an generate the right installation configuration for
>> poetry/piptools.
>>
>> But without a clear policy on what our approach is, that might not be
>> straightforward.
>>
>> J,
>>
>>
>> On Tue, Jan 25, 2022 at 10:48 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> > I am wondering if this will have a negative impact on installations
>>> using tools that do not support constraints files like poetry or
>>> pip-tools. These tools are not officially supported, but many users
>>> may use them nonetheless, so I think it is worth considering before
>>> making any changes that affect them. Unfortunately, I have too little
>>> experience with them to say at the same time how your proposal will
>>> affect it.
>>>
>>> I think this is not a change really, just clarification of our policies.
>>> I have not counted it but by rough look we already have some ~70% of our
>>> dependencies NOT upper bound already.
>>>
>>> We have no documentation, statement, nothing to explain - why. Why some
>>> dependencies are upper bound and some are not. Looks like it is pretty
>>> random now.
>>>
>>> So this is not a "change" but more
>>>
>>> * "statement on what our policy is" and clear guidelines on how to add
>>> future dependencies
>>> * possibly "review" the current set and make sure that our current
>>> approach follows the policy we define.
>>>
>>> Re: poetry, piptools - I think we cannot satisfy them if we want to keep
>>> our "both application and library" approach.
>>> Those tools made an opinionated (and in cases of many projects very
>>> reasonable BTW) decision that you are either "library" or "application" but
>>> not both - and they propose different approaches for "application" and
>>> "library". We are both. None of the solutions proposed by poetry and pip
>>> tools work for us. There is no way we can make them work well for us.
>>>
>>> I really like poetry for one, and I heartily recommend most of
>>> "standard" projects to use it as they have some really cool tooling and
>>> make it super easy to manage dependencies. We are just different because we
>>> are both application and library and poetry does not support that.
>>> I think at least - if we clarify which of our dependencies are
>>> "application" (and upper-bound them) it will give higher chances for the
>>> users that choose to use poetry/pip-tools to install "base" version of
>>> airflow - but also will give them a chance to know that they have to
>>> manually pin some of the non-application dependencies if they fail for
>>> them. Poetry and pip tools users can simply take our constraints and pin
>>> them manually when they are installing airflow - this is a solution that I
>>> mentioned many times to the users who raised their questions about it.
>>>
>>> J.
>>>
>>>
>>> On Tue, Jan 25, 2022 at 3:26 AM Kamil Breguła <dz...@gmail.com>
>>> wrote:
>>>
>>>> I am wondering if this will have a negative impact on installations
>>>> using tools that do not support constraints files like poetry or
>>>> pip-tools. These tools are not officially supported, but many users
>>>> may use them nonetheless, so I think it is worth considering before
>>>> making any changes that affect them. Unfortunately, I have too little
>>>> experience with them to say at the same time how your proposal will
>>>> affect it.
>>>>
>>>> pon., 24 sty 2022 o 23:08 Jarek Potiuk <ja...@potiuk.com> napisał(a):
>>>> >
>>>> > Hey Ash (and everyone).
>>>> >
>>>> > I think we are really close to some consensus, but we are coming from
>>>> two directions. I do agree there will be some of the deps that are more
>>>> important to keep "in-check" than the rest.
>>>> > I think it makes perfect sense to distinguish the dependencies we
>>>> really "astrongly feel" are important for airflow vs. those that "are
>>>> important to our users".
>>>> > I am perfectly ok with upper-bounding deps that we really think are
>>>> important.
>>>> >
>>>> > But with >500 deps total for aifrflow (transitive) I think we should
>>>> have. shortlist of those we care only - and the rest should be upper-open
>>>> by default.
>>>> > And "short" is a good name, because those are the dependencies that
>>>> we take responsibility of manually upgrading when needed.
>>>> >
>>>> > I would be happy if we can come up wit the list that is really
>>>> important and "core" to airflow - and "really likely" to break things with
>>>> new major release. And those are the things we care more as an
>>>> "application" that our users care as "libraries". This is my subjective
>>>> list (those are all things taht we care :
>>>> >
>>>> > * sqlalchemy
>>>> > * alembic
>>>> > * flask (and all flask libs)
>>>> > * flask-app-builder
>>>> > * werkzeug  (yeah. not surprisingly - including last meetup I spoke
>>>> about Airflow, my previous presented mentioned that Werkzeug comes up as
>>>> "problematic case" and I was only able to confirm that :).
>>>> >
>>>> > + all those that we know break things (for example we know a bunch of
>>>> google deps <2.0.0 that we know are breaking things already - but we can
>>>> easily describe those)
>>>> >
>>>> > I think if we upper-bound those (and make appropriate comment in our
>>>> setup.py/cfg)  this would be pretty "good" setup. And having a
>>>> shortlist of those that we want to keep upper-bound makes sense and is
>>>> manageable.
>>>> >
>>>> > WDYT?
>>>> >
>>>> > J
>>>> >
>>>> >
>>>> > On Mon, Jan 24, 2022 at 10:30 PM Collin McNulty
>>>> <co...@astronomer.io.invalid> wrote:
>>>> >>
>>>> >> Ash,
>>>> >>
>>>> >> I think your code snippet will result in getting the latest version
>>>> of pandas even if there is an upper constraint on pandas in apache-airflow.
>>>> I simulated with these commands on the latest pip available.
>>>> >>
>>>> >> ```
>>>> >> pip install apache-airflow "pandas<1.4"
>>>> >> pip freeze | grep pandas
>>>> >> pip install -U pandas
>>>> >> pip freeze | grep pandas
>>>> >> ```
>>>> >>
>>>> >> Collin McNulty
>>>> >>
>>>> >> On Mon, Jan 24, 2022 at 3:11 PM Ash Berlin-Taylor <as...@apache.org>
>>>> wrote:
>>>> >>>
>>>> >>> To your inital thoughts:
>>>> >>> - lower bounds: yeah, introduce them when we need to, so a new
>>>> package could be added with no lower bound, and then a bound placed only
>>>> when we discover we need a minium version (include bug fix etc. PR author
>>>> can choose wether or not to have a lower bound initially.)
>>>> >>> - not have upper bound by default: see below
>>>> >>> - only introduce upper bound if breaking change: see below
>>>> >>> - specify reason for upper bound. yes, 100% to this
>>>> >>>
>>>> >>>
>>>> >>> I think you have created a strawman argument here to say that the
>>>> '<2.0'constraint is useless', as Panda's documentation[1] say they follow
>>>> SemVer so 2.0 is going to break _something_. It didn't help us with this
>>>> specific problem.
>>>> >>>
>>>> >>> As you mentioned (possibly only in slack) constraints protect
>>>> against initial install only, but not future in-place upgrades. Without an
>>>> upper constraint I could do
>>>> >>>
>>>> >>> ```
>>>> >>> pip install --constraint $FILE apache-airflow
>>>> >>> pip install -U pandas
>>>> >>> ```
>>>> >>>
>>>> >>> And then airflow could be broken. And being able to upgrade without
>>>> constraints files is the only way to apply a security upgrade to a module
>>>> (otherwise we'd have to update the constraint files for all supported
>>>> versions when ever any of our transitive dependencies has a security
>>>> update, not something we are in a position to do).
>>>> >>>
>>>> >>> I think it also depends how we use the dependency in Airflow --
>>>> some of them are so core that we don't want anything to break, but others
>>>> (such as Pandas in this case) where our use of them in Airflow is actually
>>>> fairly superficial.
>>>> >>>
>>>> >>> So I think I would just tone down your middle suggestions slightly
>>>> -- Upper limits dependencies can be optional. I think the only place we
>>>> really differ is what is "foreseen".
>>>> >>>
>>>> >>> -ash
>>>> >>>
>>>> >>> [1]: https://pandas.pydata.org/docs/development/policies.html
>>>> >>>
>>>> >>>
>>>> >>> On Sun, Jan 23 2022 at 17:35:04 +0100, Jarek Potiuk <
>>>> jarek@potiuk.com> wrote:
>>>> >>>
>>>> >>> Just to illustrate it better. Here is an extremely good example
>>>> from today:
>>>> >>>
>>>> >>> Our main build started to fail today (
>>>> https://github.com/apache/airflow/runs/4913148486?check_suite_focus=true)
>>>> because Pandas released a new 1.4.0 version last  night:
>>>> https://pypi.org/project/pandas/1.4.0/
>>>> >>> The root cause of the failure was that Pandas 1.4.0 requires
>>>> SqlAlchemy 1.4.0 or above. But surprisingly -  there is no hard limit in
>>>> the released Pandas. What's more - Pandas does not even mention that it
>>>> **might** need sqlalchemy at all (and in version >=1.4.0 !).
>>>> >>>
>>>> >>> What happens is that in runtime, during running our tests we get
>>>> this error:
>>>> >>> `Pandas requires version '1.4.0' or newer of 'sqlalchemy' (version
>>>> '1.3.24' currently installed).`
>>>> >>>
>>>> >>> We already had a limitation (not documented why) on Pandas
>>>> >>>
>>>> >>> pandas_requirement = 'pandas>=0.17.1, <2.0'
>>>> >>>
>>>> >>> However this `<2.0` was completely useless in this case, because
>>>> it's 1.4 version that broke parts of Airflow.
>>>> >>> We assumed that the <2.0 limitation will protect us (apparently)
>>>> but it did not. However, the failure in tests protected our constraints
>>>> from being upgraded and in our "main" pandas is still 1.3.4. Our users who
>>>> follow "use constraints" and those who use Airflow images, are fully
>>>> protected against those kinds of problems.
>>>> >>>
>>>> >>> This particular problem will likely be solved in a few days when
>>>> Flask Application Builder 3.4.4rc1 (
>>>> https://pypi.org/project/Flask-AppBuilder/3.4.4rc1/|) will be released
>>>> (It moves the upper bound limit for sqlalchemy to <1.5 and that was the
>>>> only thing that blocked us from getting sqlalchemy 1.4).
>>>> >>>
>>>> >>> My point is that simply we do not know if any future version of any
>>>> dependency will break Airflow. And any reasonable assumptions about that
>>>> guessing from "Major" version is IMHO just wild guessing. And also by
>>>> adding such limits - we are limiting our users to update to higher version
>>>> of dependencies when it is harmless (very similarly as Flask Application
>>>> Builder blocked us in this case from migration to sqlalchemy 1.4).
>>>> >>>
>>>> >>> The temporary fix (until FAB 3.4.4 is released) is here:
>>>> >>>
>>>> >>> https://github.com/apache/airflow/pull/21045
>>>> >>>
>>>> >>> I believe after FAB 3.4.4 is released we remove the fix we should
>>>> just get rid of the pandas limitation. Constraints of ours protect our
>>>> users when they install Airflow "according to our instructions with
>>>> constraints" - which is the only official way of installing Airflow. We are
>>>> protecting our users from those kind of events, but at the same time we
>>>> should relax pretty much all the upper bounds to not block our users in the
>>>> future in case they need to move to higher versions of dependencies
>>>> released in the future that we have no idea if they will break, or not
>>>> Airflow..
>>>> >>>
>>>> >>> Let me know your thoughts - I think this Pandas case is great to
>>>> illustrate my point.
>>>> >>>
>>>> >>> J.
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Sun, Jan 23, 2022 at 1:40 AM Jarek Potiuk <ja...@potiuk.com>
>>>> wrote:
>>>> >>>>
>>>> >>>> Hello everyone,
>>>> >>>>
>>>> >>>> TL;DR; I think there is one thing about dependencies that we
>>>> should have some agreement on to make some common approach. Namely about
>>>> using upper bounds on our dependencies.
>>>> >>>> Should we set some rules on when we set upper-bounds on the deps ?
>>>> What rules should they be?
>>>> >>>>
>>>> >>>> Currently we use constraints to make sure Airflow in specific
>>>> version can be installed using "golden" set of dependencies. Those are part
>>>> of our CI, automatically updated and tests which makes it really nice as
>>>> they are "fixed" for installation of particular version but they
>>>> continuously upgrade (even across major versions) when they pass tests in
>>>> CI.
>>>> >>>>
>>>> >>>> This all happens pretty much automatically by our CI, except the
>>>> cases where our dependencies are upper-limited. We occasionally do some
>>>> "setup.py", "setup.cfg" changes manually to bump some upper limits, but
>>>> this is more the result of some external request or "occasional" cleanup to
>>>> do it - rather than a regular process.
>>>> >>>>
>>>> >>>> Now, we have some dependencies that are upper-bound for good
>>>> reasons and they are documented. We also have a number of dependencies that
>>>> are not upper-limited. But I think this pretty inconsistent. Small excerpt
>>>> from setup.cfg:
>>>> >>>>
>>>> >>>>     # Logging is broken with itsdangerous > 2
>>>> >>>>     itsdangerous>=1.1.0, <2.0
>>>> >>>>     # Jinja2 3.1 will remove the 'autoescape' and 'with'
>>>> extensions, which would
>>>> >>>>     # break Flask 1.x, so we limit this for future compatibility.
>>>> Remove this
>>>> >>>>     # when bumping Flask to >=2.
>>>> >>>>     jinja2>=2.10.1,<3.1
>>>> >>>>     ...
>>>> >>>>     # python daemon crashes with 'socket operation on non-socket'
>>>> for python 3.8+ in version < 2.2.4
>>>> >>>>     # https://pagure.io/python-daemon/issue/34
>>>> >>>>     python-daemon>=2.2.4
>>>> >>>>     python-dateutil>=2.3, <3
>>>> >>>>     python-nvd3~=0.15.0
>>>> >>>>
>>>> >>>> * itsdangerous is upper limited and the reason is specified in the
>>>> comment (though we do not know when we could remove the limit)
>>>> >>>>
>>>> >>>> * Jinja is upper-limited and we know not only why but also when it
>>>> can be removed
>>>> >>>> * Python-daemon is not upper-limited but it has a comment why it
>>>> is "lower-limited" (which is pretty useless IMHO)
>>>> >>>> * python-dateutil is upper-limited but we do not know why
>>>> >>>> * python-nvd3 is also upper limited (~0.15.0 - limits it to any
>>>> 0.15.x version but 0.16 could not be installed)
>>>> >>>>
>>>> >>>> I think there are many inconsistencies and the way we treat
>>>> dependency limits is pretty inconsistent - we have no rules agreed.
>>>> >>>>
>>>> >>>> I would love to discuss how we can standardize it or at least set
>>>> some rules we can follow.
>>>> >>>>
>>>> >>>> My initial thoughts are:
>>>> >>>>
>>>> >>>> * we can (but do not have to) have lower bounds if we need to
>>>> protect and we know about some limitations, but we do not need to document
>>>> those. Just set the limit. The nice thing about lower bounds that they do
>>>> not "decay" over time. By default latest "eligible" version is installed by
>>>> PIP (but of course if other deps have conflicting upper bounds, keeping
>>>> lower limits when unnecessary is a problem.
>>>> >>>> * by default we should not have upper-bounds. We know it limits
>>>> our users and constraints + CI tests are nicely handling the scenario when
>>>> things are breaking.
>>>> >>>> * we should only introduce upper bounds if we know that there is
>>>> breaking change (or that it is very likely and "foreseen" -
>>>> betas/rc2/discussions about upcoming breaking changes in the dependencies
>>>> >>>> * when we introduce upper-bounds we should always specify why and
>>>> what is the condition to remove them
>>>> >>>>
>>>> >>>> WDYT?
>>>> >>>>
>>>> >>>> J.
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>>
>>>