You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <Ja...@polidea.com> on 2018/10/04 15:25:08 UTC

Pinning dependencies for Apache Airflow

TL;DR; A change is coming in the way how dependencies/requirements are
specified for Apache Airflow - they will be fixed rather than flexible (==
rather than >=).

This is follow up after Slack discussion we had with Ash and Kaxil -
summarising what we propose we'll do.

*Problem:*
During last few weeks we experienced quite a few downtimes of TravisCI
builds (for all PRs/branches including master) as some of the transitive
dependencies were automatically upgraded. This because in a number of
dependencies we have  >= rather than == dependencies.

Whenever there is a new release of such dependency, it might cause chain
reaction with upgrade of transitive dependencies which might get into
conflict.

An example was Flask-AppBuilder vs flask-login transitive dependency with
click. They started to conflict once AppBuilder has released version
1.12.0.

*Diagnosis:*
Transitive dependencies with "flexible" versions (where >= is used instead
of ==) is a reason for "dependency hell". We will sooner or later hit other
cases where not fixed dependencies cause similar problems with other
transitive dependencies. We need to fix-pin them. This causes problems for
both - released versions (cause they stop to work!) and for development
(cause they break master builds in TravisCI and prevent people from
installing development environment from the scratch.

*Solution:*

   - Following the old-but-good post
   https://nvie.com/posts/pin-your-packages/ we are going to fix the pinned
   dependencies to specific versions (so basically all dependencies are
   "fixed").
   - We will introduce mechanism to be able to upgrade dependencies with
   pip-tools (https://github.com/jazzband/pip-tools). We might also take a
   look at pipenv: https://pipenv.readthedocs.io/en/latest/
   - People who would like to upgrade some dependencies for their PRs will
   still be able to do it - but such upgrades will be in their PR thus they
   will go through TravisCI tests and they will also have to be specified with
   pinned fixed versions (==). This should be part of review process to make
   sure new/changed requirements are pinned.
   - In release process there will be a point where an upgrade will be
   attempted for all requirements (using pip-tools) so that we are not stuck
   with older releases. This will be in controlled PR environment where there
   will be time to fix all dependencies without impacting others and likely
   enough time to "vet" such changes (this can be done for alpha/beta releases
   for example).
   - As a side effect dependencies specification will become far simpler
   and straightforward.

Happy to hear community comments to the proposal. I am happy to take a lead
on that, open JIRA issue and implement if this is something community is
happy with.

J.

-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Re: Pinning dependencies for Apache Airflow

Posted by Maxime Beauchemin <ma...@gmail.com>.
Oh good to know! Scrap what I wrote then.

On Fri, Oct 19, 2018 at 9:08 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> echo 'pandas==2.1.3' > constraints.txt
>
> pip install -c constraints.txt apache-airflow[pandas]
>
> That will ignore what ever we specify in setup.py and use 2.1.3.
> https://pip.pypa.io/en/latest/user_guide/#constraints-files
>
> (sorry for the brief message)
>
> > On 19 Oct 2018, at 17:02, Maxime Beauchemin <ma...@gmail.com>
> wrote:
> >
> >> releases in pip should have stable (pinned deps)
> > I think that's an issue. When setup.py (the only reqs that setuptools/pip
> > knows about) is restrictive, there's no way to change that in your
> > environment, install will just fail if you deviate (are there any
> > hacks/solutions around that that I don't know about???). For example if
> you
> > want a specific version of pandas in your env, and Airflow's setup.py has
> > another version of pandas pinned, you're out of luck. I think the only
> way
> > is to fork and make you own build at that point as you cannot alter
> > setup.py once it's installed. On the other hand, when a version range is
> > specified in setup.py, you're free to pin using your own reqs.txt within
> > the specified version range.
> >
> > I think pinning in setup.py is just not viable. setup.py should have
> > version ranges based semantic versioning expectations. (lib>=1.1.2,
> > <2.0.0). Personally I think we should always have 2 bounds based on
> either
> > 1-semantic versioning major release, or 2- a lower version than
> prescribed
> > by semver that we know breaks backwards compatibility features we
> require.
> >
> > I think we have consensus around something like pip-tools to generate a
> > "deterministic" `requirements.txt`. A caveat is we may need 2:
> > requirements.txt and requirements3.txt for Python 3 as some package
> > versions can be flagged as only py2 or only py3.
> >
> > Max
> >
> >
> > On Fri, Oct 19, 2018 at 1:47 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> >> I think i might have a proposal that could be acceptable by everyone in
> the
> >> discussion (hopefully :) ).  Let me summarise what I am leaning towards
> >> now:
> >>
> >> I think we can have a solution where it will be relatively easy to keep
> >> both "open" and "fixed" requirements (open in setup.py, fixed in
> >> requirements.txt). Possibly we can use pip-tools or poetry (including
> using
> >> of the poetry-setup <https://github.com/orsinium/poetry-setup> which
> seem
> >> to be able to generate setup.py/constraints.txt/requirements.txt from
> >> poetry setup). Poetry is still "new" so it might not work, then we can
> try
> >> to get similar approach with pip-tools or our own custom solution. Here
> are
> >> the basic assumptions:
> >>
> >>   - we can leave master with "open" requirements which makes it
> >>   potentially unstable with potential conflicting dependencies. We will
> >> also
> >>   document how to generate stable set of requirements (hopefully
> >>   automatically) and a way how to install from master using those. *This
> >>   addresses needs of people using master for active development with
> >> latest
> >>   libraries.*
> >>   - releases in pip should have stable (pinned deps). Upgrading pinned
> >>   releases to latest "working" stable set should be part of the release
> >>   process (possibly automated with poetry). We can try it out and decide
> >> if
> >>   we want to pin only direct dependencies or also the transitive ones (I
> >>   think including transitive dependencies is a bit more stable). *This
> way
> >>   we keep long-term "install-ability" of releases and make job of
> release
> >>   maintainer easier*.
> >>   - CI builds will use the stable dependencies from requirements.txt.
> >> *This
> >>   way we keep CI from dependency-triggered failures.*
> >>   - we add documentation on how to use pip --constraints mechanism by
> >>   anyone who would like to use airflow from PIP rather than sources, but
> >>   would like also to use other (up- or down- graded) versions of
> specific
> >>   dependencies. *This way we let active developers to work with airflow
> >>   and more recent/or older releases.*
> >>
> >> If we can have general consensus that we should try it, I might try to
> find
> >> some time next week to do some "real work". Rather than implement it and
> >> make a pull request immediately, I think of a Proof Of Concept branch
> >> showing how it would work (with some artificial going back to older
> >> versions of requirements). I thought about pre-flaskappbuilder upgrade
> in
> >> one commit and update to post-flaskappbuilder upgrade in second,
> explaining
> >> the steps I've done to get to it. That would be much better for the
> >> community to discuss if that's the right approach.
> >>
> >> Does it sound good ?
> >>
> >> J.
> >>
> >> On Wed, Oct 17, 2018 at 2:21 AM Daniel (Daniel Lamblin) [BDP - Seoul] <
> >> lamblin@coupang.com> wrote:
> >>
> >>> On 10/17/18, 12:24 AM, "William Pursell" <wi...@wepay.com.INVALID>
> >>> wrote:
> >>>
> >>>    I'm jumping in a bit late here, and perhaps have missed some of the
> >>>    discussion, but I haven't seen any mention of the fact that pinning
> >>>    versions in setup.py isn't going to solve the problem.  Perhaps it's
> >>>    my lack of experience with pip, but currently pip doesn't provide
> any
> >>>    guarantee that the version of a dependency specified in setup.py
> will
> >>>    be the version that winds up being installed.  Is this a known issue
> >>>    that is being intentionally ignored because it's hard (and out of
> >>>    scope) to solve?  I agree that versions should be pinned in setup.py
> >>>    for stable releases, but I think we need to be aware that this won't
> >>>    solve the problem.
> >>>
> >>> So the problem is going to be stubborn for the rare user not installing
> >>> into a clean venv, vm, or docker image, or who is not relying on pypi
> to
> >>> host the dependencies unmodified.
> >>> https://pip.pypa.io/en/stable/user_guide/#pinned-version-numbers
> >>> That doesn't mean it doesn't fix it for the vast majority of users who
> >> are
> >>> trying to install a particular supported stable release. Given that
> >> 1.10.0
> >>> is the absolute very latest release, it should be supported.
> >>>
> >>> Shouldn’t there be an expectation that installing on a clean system
> from
> >> a
> >>> supported stable branch will create a stable installation that can run
> >> the
> >>> release?
> >>>
> >>>
> >>>
> >>
> >> --
> >>
> >> *Jarek Potiuk, Principal Software Engineer*
> >> Mobile: +48 660 796 129
> >>
>
>

Re: Pinning dependencies for Apache Airflow

Posted by Ash Berlin-Taylor <as...@apache.org>.
echo 'pandas==2.1.3' > constraints.txt

pip install -c constraints.txt apache-airflow[pandas]

That will ignore what ever we specify in setup.py and use 2.1.3. https://pip.pypa.io/en/latest/user_guide/#constraints-files

(sorry for the brief message) 

> On 19 Oct 2018, at 17:02, Maxime Beauchemin <ma...@gmail.com> wrote:
> 
>> releases in pip should have stable (pinned deps)
> I think that's an issue. When setup.py (the only reqs that setuptools/pip
> knows about) is restrictive, there's no way to change that in your
> environment, install will just fail if you deviate (are there any
> hacks/solutions around that that I don't know about???). For example if you
> want a specific version of pandas in your env, and Airflow's setup.py has
> another version of pandas pinned, you're out of luck. I think the only way
> is to fork and make you own build at that point as you cannot alter
> setup.py once it's installed. On the other hand, when a version range is
> specified in setup.py, you're free to pin using your own reqs.txt within
> the specified version range.
> 
> I think pinning in setup.py is just not viable. setup.py should have
> version ranges based semantic versioning expectations. (lib>=1.1.2,
> <2.0.0). Personally I think we should always have 2 bounds based on either
> 1-semantic versioning major release, or 2- a lower version than prescribed
> by semver that we know breaks backwards compatibility features we require.
> 
> I think we have consensus around something like pip-tools to generate a
> "deterministic" `requirements.txt`. A caveat is we may need 2:
> requirements.txt and requirements3.txt for Python 3 as some package
> versions can be flagged as only py2 or only py3.
> 
> Max
> 
> 
> On Fri, Oct 19, 2018 at 1:47 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
> 
>> I think i might have a proposal that could be acceptable by everyone in the
>> discussion (hopefully :) ).  Let me summarise what I am leaning towards
>> now:
>> 
>> I think we can have a solution where it will be relatively easy to keep
>> both "open" and "fixed" requirements (open in setup.py, fixed in
>> requirements.txt). Possibly we can use pip-tools or poetry (including using
>> of the poetry-setup <https://github.com/orsinium/poetry-setup> which seem
>> to be able to generate setup.py/constraints.txt/requirements.txt from
>> poetry setup). Poetry is still "new" so it might not work, then we can try
>> to get similar approach with pip-tools or our own custom solution. Here are
>> the basic assumptions:
>> 
>>   - we can leave master with "open" requirements which makes it
>>   potentially unstable with potential conflicting dependencies. We will
>> also
>>   document how to generate stable set of requirements (hopefully
>>   automatically) and a way how to install from master using those. *This
>>   addresses needs of people using master for active development with
>> latest
>>   libraries.*
>>   - releases in pip should have stable (pinned deps). Upgrading pinned
>>   releases to latest "working" stable set should be part of the release
>>   process (possibly automated with poetry). We can try it out and decide
>> if
>>   we want to pin only direct dependencies or also the transitive ones (I
>>   think including transitive dependencies is a bit more stable). *This way
>>   we keep long-term "install-ability" of releases and make job of release
>>   maintainer easier*.
>>   - CI builds will use the stable dependencies from requirements.txt.
>> *This
>>   way we keep CI from dependency-triggered failures.*
>>   - we add documentation on how to use pip --constraints mechanism by
>>   anyone who would like to use airflow from PIP rather than sources, but
>>   would like also to use other (up- or down- graded) versions of specific
>>   dependencies. *This way we let active developers to work with airflow
>>   and more recent/or older releases.*
>> 
>> If we can have general consensus that we should try it, I might try to find
>> some time next week to do some "real work". Rather than implement it and
>> make a pull request immediately, I think of a Proof Of Concept branch
>> showing how it would work (with some artificial going back to older
>> versions of requirements). I thought about pre-flaskappbuilder upgrade in
>> one commit and update to post-flaskappbuilder upgrade in second, explaining
>> the steps I've done to get to it. That would be much better for the
>> community to discuss if that's the right approach.
>> 
>> Does it sound good ?
>> 
>> J.
>> 
>> On Wed, Oct 17, 2018 at 2:21 AM Daniel (Daniel Lamblin) [BDP - Seoul] <
>> lamblin@coupang.com> wrote:
>> 
>>> On 10/17/18, 12:24 AM, "William Pursell" <wi...@wepay.com.INVALID>
>>> wrote:
>>> 
>>>    I'm jumping in a bit late here, and perhaps have missed some of the
>>>    discussion, but I haven't seen any mention of the fact that pinning
>>>    versions in setup.py isn't going to solve the problem.  Perhaps it's
>>>    my lack of experience with pip, but currently pip doesn't provide any
>>>    guarantee that the version of a dependency specified in setup.py will
>>>    be the version that winds up being installed.  Is this a known issue
>>>    that is being intentionally ignored because it's hard (and out of
>>>    scope) to solve?  I agree that versions should be pinned in setup.py
>>>    for stable releases, but I think we need to be aware that this won't
>>>    solve the problem.
>>> 
>>> So the problem is going to be stubborn for the rare user not installing
>>> into a clean venv, vm, or docker image, or who is not relying on pypi to
>>> host the dependencies unmodified.
>>> https://pip.pypa.io/en/stable/user_guide/#pinned-version-numbers
>>> That doesn't mean it doesn't fix it for the vast majority of users who
>> are
>>> trying to install a particular supported stable release. Given that
>> 1.10.0
>>> is the absolute very latest release, it should be supported.
>>> 
>>> Shouldn’t there be an expectation that installing on a clean system from
>> a
>>> supported stable branch will create a stable installation that can run
>> the
>>> release?
>>> 
>>> 
>>> 
>> 
>> --
>> 
>> *Jarek Potiuk, Principal Software Engineer*
>> Mobile: +48 660 796 129
>> 


Re: Pinning dependencies for Apache Airflow

Posted by Maxime Beauchemin <ma...@gmail.com>.
> releases in pip should have stable (pinned deps)
I think that's an issue. When setup.py (the only reqs that setuptools/pip
knows about) is restrictive, there's no way to change that in your
environment, install will just fail if you deviate (are there any
hacks/solutions around that that I don't know about???). For example if you
want a specific version of pandas in your env, and Airflow's setup.py has
another version of pandas pinned, you're out of luck. I think the only way
is to fork and make you own build at that point as you cannot alter
setup.py once it's installed. On the other hand, when a version range is
specified in setup.py, you're free to pin using your own reqs.txt within
the specified version range.

I think pinning in setup.py is just not viable. setup.py should have
version ranges based semantic versioning expectations. (lib>=1.1.2,
<2.0.0). Personally I think we should always have 2 bounds based on either
1-semantic versioning major release, or 2- a lower version than prescribed
by semver that we know breaks backwards compatibility features we require.

I think we have consensus around something like pip-tools to generate a
"deterministic" `requirements.txt`. A caveat is we may need 2:
requirements.txt and requirements3.txt for Python 3 as some package
versions can be flagged as only py2 or only py3.

Max


On Fri, Oct 19, 2018 at 1:47 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> I think i might have a proposal that could be acceptable by everyone in the
> discussion (hopefully :) ).  Let me summarise what I am leaning towards
> now:
>
> I think we can have a solution where it will be relatively easy to keep
> both "open" and "fixed" requirements (open in setup.py, fixed in
> requirements.txt). Possibly we can use pip-tools or poetry (including using
> of the poetry-setup <https://github.com/orsinium/poetry-setup> which seem
> to be able to generate setup.py/constraints.txt/requirements.txt from
> poetry setup). Poetry is still "new" so it might not work, then we can try
> to get similar approach with pip-tools or our own custom solution. Here are
> the basic assumptions:
>
>    - we can leave master with "open" requirements which makes it
>    potentially unstable with potential conflicting dependencies. We will
> also
>    document how to generate stable set of requirements (hopefully
>    automatically) and a way how to install from master using those. *This
>    addresses needs of people using master for active development with
> latest
>    libraries.*
>    - releases in pip should have stable (pinned deps). Upgrading pinned
>    releases to latest "working" stable set should be part of the release
>    process (possibly automated with poetry). We can try it out and decide
> if
>    we want to pin only direct dependencies or also the transitive ones (I
>    think including transitive dependencies is a bit more stable). *This way
>    we keep long-term "install-ability" of releases and make job of release
>    maintainer easier*.
>    - CI builds will use the stable dependencies from requirements.txt.
> *This
>    way we keep CI from dependency-triggered failures.*
>    - we add documentation on how to use pip --constraints mechanism by
>    anyone who would like to use airflow from PIP rather than sources, but
>    would like also to use other (up- or down- graded) versions of specific
>    dependencies. *This way we let active developers to work with airflow
>    and more recent/or older releases.*
>
> If we can have general consensus that we should try it, I might try to find
> some time next week to do some "real work". Rather than implement it and
> make a pull request immediately, I think of a Proof Of Concept branch
> showing how it would work (with some artificial going back to older
> versions of requirements). I thought about pre-flaskappbuilder upgrade in
> one commit and update to post-flaskappbuilder upgrade in second, explaining
> the steps I've done to get to it. That would be much better for the
> community to discuss if that's the right approach.
>
> Does it sound good ?
>
> J.
>
> On Wed, Oct 17, 2018 at 2:21 AM Daniel (Daniel Lamblin) [BDP - Seoul] <
> lamblin@coupang.com> wrote:
>
> > On 10/17/18, 12:24 AM, "William Pursell" <wi...@wepay.com.INVALID>
> > wrote:
> >
> >     I'm jumping in a bit late here, and perhaps have missed some of the
> >     discussion, but I haven't seen any mention of the fact that pinning
> >     versions in setup.py isn't going to solve the problem.  Perhaps it's
> >     my lack of experience with pip, but currently pip doesn't provide any
> >     guarantee that the version of a dependency specified in setup.py will
> >     be the version that winds up being installed.  Is this a known issue
> >     that is being intentionally ignored because it's hard (and out of
> >     scope) to solve?  I agree that versions should be pinned in setup.py
> >     for stable releases, but I think we need to be aware that this won't
> >     solve the problem.
> >
> > So the problem is going to be stubborn for the rare user not installing
> > into a clean venv, vm, or docker image, or who is not relying on pypi to
> > host the dependencies unmodified.
> > https://pip.pypa.io/en/stable/user_guide/#pinned-version-numbers
> > That doesn't mean it doesn't fix it for the vast majority of users who
> are
> > trying to install a particular supported stable release. Given that
> 1.10.0
> > is the absolute very latest release, it should be supported.
> >
> > Shouldn’t there be an expectation that installing on a clean system from
> a
> > supported stable branch will create a stable installation that can run
> the
> > release?
> >
> >
> >
>
> --
>
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129
>

Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
I think i might have a proposal that could be acceptable by everyone in the
discussion (hopefully :) ).  Let me summarise what I am leaning towards now:

I think we can have a solution where it will be relatively easy to keep
both "open" and "fixed" requirements (open in setup.py, fixed in
requirements.txt). Possibly we can use pip-tools or poetry (including using
of the poetry-setup <https://github.com/orsinium/poetry-setup> which seem
to be able to generate setup.py/constraints.txt/requirements.txt from
poetry setup). Poetry is still "new" so it might not work, then we can try
to get similar approach with pip-tools or our own custom solution. Here are
the basic assumptions:

   - we can leave master with "open" requirements which makes it
   potentially unstable with potential conflicting dependencies. We will also
   document how to generate stable set of requirements (hopefully
   automatically) and a way how to install from master using those. *This
   addresses needs of people using master for active development with latest
   libraries.*
   - releases in pip should have stable (pinned deps). Upgrading pinned
   releases to latest "working" stable set should be part of the release
   process (possibly automated with poetry). We can try it out and decide if
   we want to pin only direct dependencies or also the transitive ones (I
   think including transitive dependencies is a bit more stable). *This way
   we keep long-term "install-ability" of releases and make job of release
   maintainer easier*.
   - CI builds will use the stable dependencies from requirements.txt. *This
   way we keep CI from dependency-triggered failures.*
   - we add documentation on how to use pip --constraints mechanism by
   anyone who would like to use airflow from PIP rather than sources, but
   would like also to use other (up- or down- graded) versions of specific
   dependencies. *This way we let active developers to work with airflow
   and more recent/or older releases.*

If we can have general consensus that we should try it, I might try to find
some time next week to do some "real work". Rather than implement it and
make a pull request immediately, I think of a Proof Of Concept branch
showing how it would work (with some artificial going back to older
versions of requirements). I thought about pre-flaskappbuilder upgrade in
one commit and update to post-flaskappbuilder upgrade in second, explaining
the steps I've done to get to it. That would be much better for the
community to discuss if that's the right approach.

Does it sound good ?

J.

On Wed, Oct 17, 2018 at 2:21 AM Daniel (Daniel Lamblin) [BDP - Seoul] <
lamblin@coupang.com> wrote:

> On 10/17/18, 12:24 AM, "William Pursell" <wi...@wepay.com.INVALID>
> wrote:
>
>     I'm jumping in a bit late here, and perhaps have missed some of the
>     discussion, but I haven't seen any mention of the fact that pinning
>     versions in setup.py isn't going to solve the problem.  Perhaps it's
>     my lack of experience with pip, but currently pip doesn't provide any
>     guarantee that the version of a dependency specified in setup.py will
>     be the version that winds up being installed.  Is this a known issue
>     that is being intentionally ignored because it's hard (and out of
>     scope) to solve?  I agree that versions should be pinned in setup.py
>     for stable releases, but I think we need to be aware that this won't
>     solve the problem.
>
> So the problem is going to be stubborn for the rare user not installing
> into a clean venv, vm, or docker image, or who is not relying on pypi to
> host the dependencies unmodified.
> https://pip.pypa.io/en/stable/user_guide/#pinned-version-numbers
> That doesn't mean it doesn't fix it for the vast majority of users who are
> trying to install a particular supported stable release. Given that 1.10.0
> is the absolute very latest release, it should be supported.
>
> Shouldn’t there be an expectation that installing on a clean system from a
> supported stable branch will create a stable installation that can run the
> release?
>
>
>

-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Re: Pinning dependencies for Apache Airflow

Posted by "Daniel (Daniel Lamblin) [BDP - Seoul]" <la...@coupang.com>.
On 10/17/18, 12:24 AM, "William Pursell" <wi...@wepay.com.INVALID> wrote:

    I'm jumping in a bit late here, and perhaps have missed some of the
    discussion, but I haven't seen any mention of the fact that pinning
    versions in setup.py isn't going to solve the problem.  Perhaps it's
    my lack of experience with pip, but currently pip doesn't provide any
    guarantee that the version of a dependency specified in setup.py will
    be the version that winds up being installed.  Is this a known issue
    that is being intentionally ignored because it's hard (and out of
    scope) to solve?  I agree that versions should be pinned in setup.py
    for stable releases, but I think we need to be aware that this won't
    solve the problem.

So the problem is going to be stubborn for the rare user not installing into a clean venv, vm, or docker image, or who is not relying on pypi to host the dependencies unmodified.
https://pip.pypa.io/en/stable/user_guide/#pinned-version-numbers
That doesn't mean it doesn't fix it for the vast majority of users who are trying to install a particular supported stable release. Given that 1.10.0 is the absolute very latest release, it should be supported.

Shouldn’t there be an expectation that installing on a clean system from a supported stable branch will create a stable installation that can run the release?



Re: Pinning dependencies for Apache Airflow

Posted by William Pursell <wi...@wepay.com.INVALID>.
I'm jumping in a bit late here, and perhaps have missed some of the
discussion, but I haven't seen any mention of the fact that pinning
versions in setup.py isn't going to solve the problem.  Perhaps it's
my lack of experience with pip, but currently pip doesn't provide any
guarantee that the version of a dependency specified in setup.py will
be the version that winds up being installed.  Is this a known issue
that is being intentionally ignored because it's hard (and out of
scope) to solve?  I agree that versions should be pinned in setup.py
for stable releases, but I think we need to be aware that this won't
solve the problem.
On Tue, Oct 16, 2018 at 3:18 AM Daniel (Daniel Lamblin) [BDP - Seoul]
<la...@coupang.com> wrote:
>
> That Slack comment is mine, thanks.
>
> If it's a vote my vote is: please limit the package versions in setup.py for any branch meant to be stable.
>
> To be specific:
> * I don't have an expectation that installing from master is going to work every time. But when it doesn't I do expect to find the CI is broken there's a "red" indicator there-of, even if there was no commit or everyone was on vacation, it should be running a couple times a day to catch breakage due to dependencies. So, I don't really care if packages in setup.py from master are pinned, or just unlimited minimums, Though I'd think that any version can be limited to less than the next major number…
> * I do have an expectation that installing v1-10-stable, or v1-8-stable, vX-Y-stable etc. is 100% going to work every time, I do think that its package versions should be the same as those that were used to pass the release check-off process.
>
> It is probably easiest for maintainers (us?) if when prepping a stable branch, the setup.py is modified to specify exactly the == package versions of absolutely everything that passed the test, QA, release process. If what you guys are discussing with pipenv, pip-tools, .lock requirement.txt files etc is integrated with setup.py, then great; otherwise, not good enough (I'll explain).
>
> I hear there's a concern that say sshtunnel vX.Y turns out to be a security nightmare you will regret pinning v1-10-stable to sshtunnel-vX.Y once it's known, but I disagree this is an big-deal because it will be A) on the user to know that maybe maintainers didn't update this dependency check overnight, B) on the maintainers to go to each _maintained_ stable branch and bring it up to date with the patched version and redo the QA & release checks, and C) on maintainers to mark any stable branch that isn't or can't be updated as "has known security issues", and finally D) have a time line for marking releases as unmaintained, probably has security issues, we don't even check for.
> I doubt the ASF has a problem with maintaining a stable release with security updates or (as is the current need) fixing a stable release such that it builds. But I don't know exactly the release rules.
>
> I think I understand that what I'm proposing is dandy for the git branches but is an issue for PyPi, because there you do not update a released version, and deleting one breaks things even worse, so… if 1.10.0 is broken and 1.10.1 is the next release in test/development, the fixed 1.10.0 should be released as an update that isn't 1.10.1; something like 1.10.0.1; Note that HOW 1.10.0 BROKE would not have happened with more careful limits on the version, possibly requiring full pinning of exact versions. So that's again in favor of fixing this stable branches and their releases to exact versions that were known good. Now a security patch would still have to become, say, 1.10.0.1. I don't know if it's possible to go back to a released PyPi package and update its readme, description, or any part to mark it as known to contain a security issue or not.
>
> Trying not to go long, but here's the part where I explain why the setup.py has to be fixed to what passed ci, qa, release etc. for a stable branch or packaged release, by explaining what bit me:
> I made a docker image from a local fork of the v1-10-stable branch a couple months ago. A week ago, someone said to me, hey, I want to use the SSHOperator but I get this message about paramiko not being installed. So, look at that, my Dockerfile didn't add `ssh` to the options when running `pip install --no-cache-dir -e "${AIRFLOW_SRC_HOME}/[async,celery,crypto,cgroups,hdfs,hive,jdbc,ldap,mssql,mysql,postgres,s3,slack,statsd]"`, darn-it. This is easy though, let me add that now. You can see how this depends on what's in setup.py. And you might see how this didn't bring up the log message `pkg_resources.ContextualVersionConflict: (Click 7.0 (/usr/local/lib/python3.6/site-packages), Requirement.parse('click==6.7'), {'flask-appbuilder'})` until the replacement webserver tried to start up, glad I didn't touch the scheduler first.
> You might think, oh Daniel, you should pull the existing image you released and just pip install paramiko and pysftp after reading the setup.py file then release commit that and push it.
> Well, because Docker says it’s a best practice to build each release from the Dockerfile instead of interactively adding layers on top, there's a system that checks (kind of) and (usually) stops that idea from working.
> Also, that would require cautious work, doesn't support the simple fix, and doesn't support people who built the release "late".
> What I did end up doing was using pip freeze to figure out exactly what's on my prior working release (surprise boto3 1.8.6 is there, though master says boto3 <= 1.8.0) and using it as a basis for the Dockerfile prior to that `pip-install -e`. It's not quite 100% working yet, but the remaining issues have nothing to do with this discussion.
>
> In summary, I fully expected a stable branch of a package to be able to be installed at any time and operate the same way it did when it was cut. I'm not sure why there's any votes another way about that, but I suspect those votes are more about what goes on at master and on ci than in a release, and are thus, to my mind, besides the point.
>
> Thanks,
> -Daniel
>
> On 10/15/18, 9:05 PM, "Jarek Potiuk" <Ja...@polidea.com> wrote:
>
>     Speaking of which - just to show what kind of problems we are talking about
>     - here is a link to a relevant discussion in troubleshooting @ slack from
>     today,  where someone tries to install v1.10-stable and needs help.
>     This is exactly the kind of problems I think are important to solve,
>     whatever way we choose to solve it:
>     https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1539573567000100
>
>     I really don't think it's a good idea to put especially new Airflow users
>     in this situation where they need to search through devlist and upstream
>     commits or ask for help to just be able to install stable release of
>     Airflow.
>
>     J.
>
>     On Mon, Oct 15, 2018 at 9:29 AM Jarek Potiuk <Ja...@polidea.com>
>     wrote:
>
>     > Sorry for late reply - I was travelling, was at Cloud Next in London last
>     > week (BTW. there were talks about Composer/Airflow there).
>     >
>     > I see the point, it's indeed very difficult to solve when we want both:
>     > stability of releases and flexibility of using released version and write
>     > the code within it. I think some trade-offs need to be made as we won't
>     > solve it all with a one-size-fits-all approach. Answering your question
>     > George - the value of pinning for release purpose is addressing "stability"
>     > need.
>     >
>     >    - Due to my background I come from the "stability" side (which is more
>     >    user-focused) - i.e. the main problem that I want to solve is to make sure
>     >    that someone who wants to install airflow a fresh and start using it as a
>     >    beginner user, can always run 'pip install airflow' and it will get
>     >    installed. For me this is the point when many users my simply get put off
>     >    if it refuses to install out-of-the-box. Few months ago I actually
>     >    evaluated airflow to run ML pipeline for startup I was at that time. If
>     >    back then it refused to install out-of-the-box, my evaluation results would
>     >    be 'did not pass the basic criteria'. Luckily it did not happen, we did
>     >    more elaborated evaluation then - we did not use Airflow eventually but for
>     >    other reasons. For us the criteria "it just works!" was super important -
>     >    because we did not have time to deep dive into details, find out why things
>     >    do not work - we had a lot of "core/ML/robotics" things to worry about and
>     >    any hurdles with unstable tools would be a major distraction. We really
>     >    wanted to write several DAGs and get them executed in stable, repeatable
>     >    way, and that when we install it on production machine in two months - it
>     >    continues to work without any extra work.
>     >    - then there are a lot of concerns from the "flexibility" side (which
>     >    is more advanced users/developers) side. It becomes important when you want
>     >    to actively develop your Dags (you start using more than just built-in
>     >    operators and start developing lot more code in DAGs or use PythonOperator
>     >    more and more. Then of course it is important to get the "flexible"
>     >    approach. I argue that in this cases the "active" developers might be more
>     >    inclined to do any tweaking of their environment as they are more advanced
>     >    and might be more experience in the dependencies and would be able to
>     >    downgrade/upgrade dependencies as they will need in their virtualenvs.
>     >    Those people should be quite ok with spending a bit more time to get their
>     >    environment tweaked to their needs.
>     >
>     > I was thinking if there is a way to satisfy both ? And I have a wild idea:
>     >
>     >    - we have two set of requirements (easy-upgradeable "stable" ones in
>     >    requirements.txt/poetry and flexible with versions in setup.py (or similar)
>     >    - as proposed earlier in this thread
>     >    - we release two flavours of pip-installable airflow: 1.10.1 with
>     >    stable/pinned dependencies and 1.10.1-devel (we can pick other flavour
>     >    name) with flexible dependencies. It's quite common to have devel releases
>     >    in Linux world - they serve a bit different purpose (like include headers
>     >    for C/C++ programs) and it's usually extra package on top of the basic one,
>     >    but the basic idea is similar - if you are a user, you install 1.10.1, if
>     >    you are active developer, you install 1.10.1-devel
>     >
>     > What do you think?
>     >
>     > Off-topic a bit: a friend of mine pointed me to this excellent talk by Elm
>     > creator: "The Hard Parts of Open Source" by Evan Czaplicki
>     > <https://www.youtube.com/watch?v=o_4EX4dPppA> and it made me think
>     > differently about the discussion we have :D
>     >
>     > J.
>     >
>     > On Wed, Oct 10, 2018 at 7:51 PM George Leslie-Waksman <wa...@gmail.com>
>     > wrote:
>     >
>     >> It's not upgrading dependencies that I'm worried about, it's downgrading.
>     >> With upgrade conflicts, we can treat the dependency upgrades as a
>     >> necessary
>     >> aspect of the Airflow upgrade.
>     >>
>     >> Suppose Airflow pins LibraryA==1.2.3 and then a security issue is found in
>     >> LibraryA==1.2.3. This issue is fixed in LibraryA==1.2.4. Now, we are
>     >> placed
>     >> in the annoying situation of either: a) managing our deployments so that
>     >> we
>     >> install Airflow first, and then upgrade LibraryA and ignore pip's warning
>     >> about incompatible versions, b) keeping the insecure version of LibraryA,
>     >> c) waiting for another Airflow release and accepting all other changes, d)
>     >> maintaining our own fork of Airflow and diverging from mainline.
>     >>
>     >> If Airflow specifies a requirement of LibraryA>=1.2.3, there is no problem
>     >> whatsoever. If we're worried about API changes in the future, there's
>     >> always LibraryA>=1.2.3,1.3 or LibraryA>=1.2.3,<2.0
>     >>
>     >> As has been pointed out, that PythonOperator tasks run in the same venv as
>     >> Airflow, it is necessary that users be able to control dependencies for
>     >> their code.
>     >>
>     >> To be clear, it's not always a security risk but this is not a
>     >> hypothetical
>     >> issue. We ran into a code incompatibility with psutil that mattered to us
>     >> but had no impact on Airflow (see:
>     >> https://github.com/apache/incubator-airflow/pull/3585) and are currently
>     >> seeing SQLAlchemy held back without any clear need (
>     >> https://github.com/apache/incubator-airflow/blob/master/setup.py#L325).
>     >>
>     >> Pinning dependencies for releases will force us (and I expect others) to
>     >> either: ignore/workaround the pinning, or not use Airflow releases. Both
>     >> of
>     >> those options exactly defeat the point.
>     >>
>     >> If people are on board with pinning / locking all dependencies for CI
>     >> purposes, and we can constrain requirements to ranges for necessary
>     >> compatibility, what is the value of pinning all dependencies for release
>     >> purposes?
>     >>
>     >> --George
>     >>
>     >> On Tue, Oct 9, 2018 at 11:57 AM Jarek Potiuk <Ja...@polidea.com>
>     >> wrote:
>     >>
>     >> > I am still not convinced that pinning is bad. I re-read again the whole
>     >> > mail thread and the thread from 2016
>     >> > <
>     >> >
>     >> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
>     >> > >
>     >> > to
>     >> > read all the arguments, but I stand by pinning.
>     >> >
>     >> > I am - of course - not sure about graduation argument. I would just
>     >> imagine
>     >> > it might be the cas.. I however really think that situation we are in
>     >> now
>     >> > is quite volatile. The latest 1.10.0 cannot be clean-installed via pip
>     >> > without manually tweaking and forcing lower version of flask-appbuilder.
>     >> > Even if you use the constraints file it's pretty cumbersome because
>     >> you'd
>     >> > have to somehow know that you need to do exactly that (not at all
>     >> obvious
>     >> > from the error you get). Also it might at any time get worse as other
>     >> > packages get newer versions released. The thing here is that
>     >> maintainers of
>     >> > flask-appbuilder did nothing wrong, they simply released new version
>     >> with
>     >> > click dependency version increased (probably for a good reason) and it's
>     >> > airflow's cross-dependency graph which makes it incompatible.
>     >> >
>     >> > I am afraid that if we don't change it, it's all but guaranteed that
>     >> every
>     >> > single release at some point of time will "deteriorate" and refuse to
>     >> > clean-install. If we want to solve this problem (maybe we don't and we
>     >> > accept it as it is?), I think the only way to solve it is to hard-pin
>     >> all
>     >> > the requirements at the very least for releases.
>     >> >
>     >> > Of course we might choose pinning only for releases (and CI builds) and
>     >> > have the compromise that Matt mentioned. I have the worry however (also
>     >> > mentioned in the previous thread) that it will be hard to maintain.
>     >> > Effectively you will have to maintain both in parallel. And the case
>     >> with
>     >> > constraints is a nice workaround for someone who actually need specific
>     >> > (even newer) version of specific package in their environment.
>     >> >
>     >> > Maybe we should simply give it a try and do Proof-Of-Concept/experiment
>     >> as
>     >> > also Fokko mentioned?
>     >> >
>     >> > We could have a PR with pinning enabled, and maybe ask the people who
>     >> voice
>     >> > concerns about environment give it a try with those pinned versions and
>     >> see
>     >> > if that makes it difficult for them to either upgrade dependencies and
>     >> fork
>     >> > apache-airflow or use constraints file of pip?
>     >> >
>     >> > J.
>     >> >
>     >> >
>     >> > On Tue, Oct 9, 2018 at 5:56 PM Matt Davis <ji...@gmail.com> wrote:
>     >> >
>     >> > > Erik, the Airflow task execution code itself of course must run
>     >> somewhere
>     >> > > with Airflow installed, but if the task is making a database query or
>     >> a
>     >> > web
>     >> > > request or running something in Docker there's separation between the
>     >> > > environments and maybe you don't care about Python dependencies at all
>     >> > > except to get Airflow running. When running Python operators that's
>     >> not
>     >> > the
>     >> > > case (as you already deal with).
>     >> > >
>     >> > > - Matt
>     >> > >
>     >> > > On Tue, Oct 9, 2018 at 2:45 AM EKC (Erik Cederstrand)
>     >> > > <EK...@novozymes.com.invalid> wrote:
>     >> > >
>     >> > > > This is maybe a stupid question, but is it even possible to run
>     >> tasks
>     >> > in
>     >> > > > an environment where Airflow is not installed?
>     >> > > >
>     >> > > >
>     >> > > > Kind regards,
>     >> > > >
>     >> > > > Erik
>     >> > > >
>     >> > > > ________________________________
>     >> > > > From: Matt Davis <ji...@gmail.com>
>     >> > > > Sent: Monday, October 8, 2018 10:13:34 PM
>     >> > > > To: dev@airflow.incubator.apache.org
>     >> > > > Subject: Re: Pinning dependencies for Apache Airflow
>     >> > > >
>     >> > > > It sounds like we can get the best of both worlds with the original
>     >> > > > proposals to have minimal requirements in setup.py and "guaranteed
>     >> to
>     >> > > work"
>     >> > > > complete requirements in a separate file. That way we have
>     >> flexibility
>     >> > > for
>     >> > > > teams that run airflow and tasks in the same environment and
>     >> guidance
>     >> > on
>     >> > > a
>     >> > > > working set of requirements. (Disclaimer: I work on the same team as
>     >> > > > George.)
>     >> > > >
>     >> > > > Thanks,
>     >> > > > Matt
>     >> > > >
>     >> > > > On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor <as...@apache.org>
>     >> > wrote:
>     >> > > >
>     >> > > > > Although I think I come down on the side against pinning, my
>     >> reasons
>     >> > > are
>     >> > > > > different.
>     >> > > > >
>     >> > > > > For the two (or more) people who have expressed concern about it
>     >> > would
>     >> > > > > pip's "Constraint Files" help:
>     >> > > > >
>     >> > > > >
>     >> > > >
>     >> > >
>     >> >
>     >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpip.pypa.io%2Fen%2Fstable%2Fuser_guide%2F%23constraints-files&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=rUqtgC5eVKIQGlzniFMyJpU9IXFZ2Efs04ZCgO2I%2F9g%3D&amp;reserved=0
>     >> > > > >
>     >> > > > > For example, you could add "flask-appbuilder==1.11.1" in to this
>     >> > file,
>     >> > > > > specify it with `pip install -c constraints.txt apache-airflow`
>     >> and
>     >> > > then
>     >> > > > > whenever pip attempted to install _any version of FAB it would use
>     >> > the
>     >> > > > > exact version from the constraints file.
>     >> > > > >
>     >> > > > > I don't buy the argument about pinning being a requirement for
>     >> > > graduation
>     >> > > > > from Incubation fwiw - it's an unavoidable artefact of the
>     >> > open-source
>     >> > > > > world we develop in.
>     >> > > > >
>     >> > > > >
>     >> > > >
>     >> > >
>     >> >
>     >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibraries.io%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=QX5hO%2FVPJE9M9A38QgCjx%2BfT4C1tfvr1ySUW%2FpV86Jw%3D&amp;reserved=0
>     >> > > > offers a (free?) service that will monitor apps
>     >> > > > > dependencies for being out of date, might be better than writing
>     >> our
>     >> > > own
>     >> > > > > solution.
>     >> > > > >
>     >> > > > > Pip has for a while now supported a way of saying "this dep is for
>     >> > > py2.7
>     >> > > > > only":
>     >> > > > >
>     >> > > > > > Since version 6.0, pip also supports specifiers containing
>     >> > > environment
>     >> > > > > markers like so:
>     >> > > > > >
>     >> > > > > >    SomeProject ==5.4 ; python_version < '2.7'
>     >> > > > > >    SomeProject; sys_platform == 'win32'
>     >> > > > >
>     >> > > > >
>     >> > > > > Ash
>     >> > > > >
>     >> > > > >
>     >> > > > > > On 8 Oct 2018, at 07:58, George Leslie-Waksman <
>     >> waksman@gmail.com>
>     >> > > > > wrote:
>     >> > > > > >
>     >> > > > > > As a member of a team that will also have really big problems if
>     >> > > > > > Airflow pins all requirements (for reasons similar to those
>     >> already
>     >> > > > > > stated), I would like to add a very strong -1 to the idea of
>     >> > pinning
>     >> > > > > > them for all installations.
>     >> > > > > >
>     >> > > > > > In a number of situation on our end, to avoid similar problems
>     >> with
>     >> > > > > > CI, we use `pip-compile` from pip-tools (also mentioned):
>     >> > > > > >
>     >> > > >
>     >> > >
>     >> >
>     >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fpip-tools%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1d9m%2Bk4NSuXNtnXFRFtv6pGdAUDvVvkoFe95pTshiIQ%3D&amp;reserved=0
>     >> > > > > >
>     >> > > > > > I would like to suggest, a middle ground of:
>     >> > > > > >
>     >> > > > > > - Have the installation continue to use unpinned (`>=`) with
>     >> > minimum
>     >> > > > > > necessary requirements set
>     >> > > > > > - Include a pip-compiled requirements file
>     >> (`requirements-ci.txt`?)
>     >> > > > > > that is used by CI
>     >> > > > > > - - If we need, there can be one file for each incompatible
>     >> python
>     >> > > > > version
>     >> > > > > > - Append a watermark (hash of `setup.py` requirements?) to the
>     >> > > > > > compiled requirements file
>     >> > > > > > - Add a CI check that the watermark and original match to
>     >> ensure no
>     >> > > > > > drift since last compile
>     >> > > > > >
>     >> > > > > > I am happy to do much of the work for this, if it can help avoid
>     >> > > > > > pinning all of the depends at the installation level.
>     >> > > > > >
>     >> > > > > > --George Leslie-Waksman
>     >> > > > > >
>     >> > > > > > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
>     >> > > > > > <ma...@gmail.com> wrote:
>     >> > > > > >>
>     >> > > > > >> pip-tools can definitely help here to ship a reference [locked]
>     >> > > > > >> `requirements.txt` that can be used in [all or part of] the CI.
>     >> > It's
>     >> > > > > >> actually kind of important to get CI to fail when a new
>     >> [backward
>     >> > > > > >> incompatible] lib comes out and break things while allowing
>     >> > version
>     >> > > > > ranges.
>     >> > > > > >>
>     >> > > > > >> I think there may be challenges around pip-tools and projects
>     >> that
>     >> > > run
>     >> > > > > in
>     >> > > > > >> both python2.7 and python3.6. You sometimes need to have 2
>     >> > > > > requirements.txt
>     >> > > > > >> lock files.
>     >> > > > > >>
>     >> > > > > >> Max
>     >> > > > > >>
>     >> > > > > >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <
>     >> > > Jarek.Potiuk@polidea.com
>     >> > > > >
>     >> > > > > >> wrote:
>     >> > > > > >>
>     >> > > > > >>> It's a nice one :). However I think when/if we go to pinned
>     >> > > > > dependencies
>     >> > > > > >>> the way poetry/pip-tools do it, this will be suddenly lot-less
>     >> > > useful
>     >> > > > > It
>     >> > > > > >>> will be very easy to track dependency changes (they will be
>     >> > always
>     >> > > > > >>> committed as a change in the .lock file or requirements.txt)
>     >> and
>     >> > if
>     >> > > > > someone
>     >> > > > > >>> has a problem while upgrading a dependency (always
>     >> consciously,
>     >> > > never
>     >> > > > > >>> accidentally) it will simply fail during CI build and the
>     >> change
>     >> > > > won't
>     >> > > > > get
>     >> > > > > >>> merged/won't break the builds of others in the first place :).
>     >> > > > > >>>
>     >> > > > > >>> J.
>     >> > > > > >>>
>     >> > > > > >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <
>     >> > xd.deng.r@gmail.com>
>     >> > > > > wrote:
>     >> > > > > >>>
>     >> > > > > >>>> Hi folks,
>     >> > > > > >>>>
>     >> > > > > >>>> On top of this discussion, I was thinking we should have the
>     >> > > ability
>     >> > > > > to
>     >> > > > > >>>> quickly monitor dependency release as well. Previously, it
>     >> > > happened
>     >> > > > > for a
>     >> > > > > >>>> few times that CI kept failing for no reason and eventually
>     >> > turned
>     >> > > > > out it
>     >> > > > > >>>> was due to dependency release. But it took us some time,
>     >> > > sometimes a
>     >> > > > > few
>     >> > > > > >>>> days, to realise the failure was because of dependency
>     >> release.
>     >> > > > > >>>>
>     >> > > > > >>>> To partially address this, I tried to develop a mini tool to
>     >> > help
>     >> > > us
>     >> > > > > >>> check
>     >> > > > > >>>> the latest release of Python packages & the release
>     >> date-time on
>     >> > > > PyPi.
>     >> > > > > >>> So,
>     >> > > > > >>>> by comparing it with our CI failure history, we may be able
>     >> to
>     >> > > > > >>> troubleshoot
>     >> > > > > >>>> faster.
>     >> > > > > >>>>
>     >> > > > > >>>> Output Sample (ordered by upload time in desc order):
>     >> > > > > >>>>                               Latest Version          Upload
>     >> > Time
>     >> > > > > >>>> Package Name
>     >> > > > > >>>> awscli                    1.16.28
>     >> > > > > >>> 2018-10-05T23:12:45
>     >> > > > > >>>> botocore                1.12.18
>     >> > > > > 2018-10-05T23:12:39
>     >> > > > > >>>> promise                   2.2.1
>     >> > > > > >>> 2018-10-04T22:04:18
>     >> > > > > >>>> Keras                     2.2.4
>     >> > > > > >>> 2018-10-03T20:59:39
>     >> > > > > >>>> bleach                    3.0.0
>     >> > > > > >>> 2018-10-03T16:54:27
>     >> > > > > >>>> Flask-AppBuilder         1.12.0
>     >> > 2018-10-03T09:03:48
>     >> > > > > >>>> ... ...
>     >> > > > > >>>>
>     >> > > > > >>>> It's a minimal tool (not perfect yet but working). I have
>     >> hosted
>     >> > > > this
>     >> > > > > >>> tool
>     >> > > > > >>>> at
>     >> > > >
>     >> > >
>     >> >
>     >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXD-DENG%2Fpypi-release-query&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=xk9hyQA%2BnaJjqPF7bTQB%2BydqSfGIVzxkynfxjx%2FVoYo%3D&amp;reserved=0
>     >> > > > .
>     >> > > > > >>>>
>     >> > > > > >>>>
>     >> > > > > >>>> XD
>     >> > > > > >>>>
>     >> > > > > >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
>     >> > > > > Jarek.Potiuk@polidea.com>
>     >> > > > > >>>> wrote:
>     >> > > > > >>>>
>     >> > > > > >>>>> Hello Erik,
>     >> > > > > >>>>>
>     >> > > > > >>>>> I understand your concern. It's a hard one to solve in
>     >> general
>     >> > > > (i.e.
>     >> > > > > >>>>> dependency-hell). It looks like in this case you treat
>     >> Airflow
>     >> > as
>     >> > > > > >>>>> 'library', where for some other people it might be more like
>     >> > 'end
>     >> > > > > >>>> product'.
>     >> > > > > >>>>> If you look at the "pinning" philosophy - the "pin
>     >> everything"
>     >> > is
>     >> > > > > good
>     >> > > > > >>>> for
>     >> > > > > >>>>> end products, but not good for libraries. In the case you
>     >> have
>     >> > > > > Airflow
>     >> > > > > >>> is
>     >> > > > > >>>>> treated as a bit of both. And it's perfectly valid case at
>     >> that
>     >> > > > (with
>     >> > > > > >>>>> custom python DAGs being central concept for Airflow).
>     >> > > > > >>>>> However, I think it's not as bad as you think when it comes
>     >> to
>     >> > > > exact
>     >> > > > > >>>>> pinning.
>     >> > > > > >>>>>
>     >> > > > > >>>>> I believe - a bit counter-intuitively - that tools like
>     >> > > > > >>> pip-tools/poetry
>     >> > > > > >>>>> with exact pinning result in having your dependencies
>     >> upgraded
>     >> > > more
>     >> > > > > >>>> often,
>     >> > > > > >>>>> rather than less - especially in complex systems where
>     >> > > > > dependency-hell
>     >> > > > > >>>>> creeps-in. If you look at Airflow's setup.py now - It's a
>     >> bit
>     >> > > scary
>     >> > > > > to
>     >> > > > > >>>> make
>     >> > > > > >>>>> any change to it. There is a chance it will blow at your
>     >> face
>     >> > if
>     >> > > > you
>     >> > > > > >>>> change
>     >> > > > > >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you
>     >> > > change
>     >> > > > > it,
>     >> > > > > >>>>> whether it will cause chain reaction of conflicts that will
>     >> > ruin
>     >> > > > your
>     >> > > > > >>>> work
>     >> > > > > >>>>> day.
>     >> > > > > >>>>>
>     >> > > > > >>>>> On the contrary - if you change it to exact pinning in
>     >> > > > > >>>>> .lock/requirements.txt file (poetry/pip-tools) and have much
>     >> > > > simpler
>     >> > > > > >>> (and
>     >> > > > > >>>>> commented) exclusion/avoidance rules in your .in/.tml file,
>     >> the
>     >> > > > whole
>     >> > > > > >>>> setup
>     >> > > > > >>>>> might be much easier to maintain and upgrade. Every time you
>     >> > > > prepare
>     >> > > > > >>> for
>     >> > > > > >>>>> release (or even once in a while for master) one person
>     >> might
>     >> > > > > >>> consciously
>     >> > > > > >>>>> attempt to upgrade all dependencies to latest ones. It
>     >> should
>     >> > be
>     >> > > > > almost
>     >> > > > > >>>> as
>     >> > > > > >>>>> easy as letting poetry/pip-tools help with figuring out what
>     >> > are
>     >> > > > the
>     >> > > > > >>>> latest
>     >> > > > > >>>>> set of dependencies that will work without conflicts. It
>     >> should
>     >> > > be
>     >> > > > > >>> rather
>     >> > > > > >>>>> straightforward (I've done it in the past for fairly complex
>     >> > > > > systems).
>     >> > > > > >>>> What
>     >> > > > > >>>>> those tools enable is - doing single-shot upgrade of all
>     >> > > > > dependencies.
>     >> > > > > >>>>> After doing it you can make sure that all tests work fine
>     >> (and
>     >> > > fix
>     >> > > > > any
>     >> > > > > >>>>> problems that result from it). And then you test it
>     >> thoroughly
>     >> > > > before
>     >> > > > > >>> you
>     >> > > > > >>>>> make final release. You can do it in separate PR - with
>     >> > automated
>     >> > > > > >>> testing
>     >> > > > > >>>>> in Travis which means that you are not disturbing work of
>     >> > others
>     >> > > > > >>>>> (compilation/building + unit tests are guaranteed to work
>     >> > before
>     >> > > > you
>     >> > > > > >>>> merge
>     >> > > > > >>>>> it) while doing it. It's all conscious rather than
>     >> accidental.
>     >> > > Nice
>     >> > > > > >>> side
>     >> > > > > >>>>> effect of that is that with every release you can actually
>     >> > > > "catch-up"
>     >> > > > > >>>> with
>     >> > > > > >>>>> latest stable versions of many libraries in one go. It's
>     >> better
>     >> > > > than
>     >> > > > > >>>>> waiting until someone deliberately upgrades to newer version
>     >> > (and
>     >> > > > the
>     >> > > > > >>>> rest
>     >> > > > > >>>>> remain terribly out-dated as is the case for Airflow now).
>     >> > > > > >>>>>
>     >> > > > > >>>>> So a bit counterintuitively I think tools like
>     >> pip-tools/poetry
>     >> > > > help
>     >> > > > > >>> you
>     >> > > > > >>>> to
>     >> > > > > >>>>> catch up faster in many cases. That is at least my
>     >> experience
>     >> > so
>     >> > > > far.
>     >> > > > > >>>>>
>     >> > > > > >>>>> Additionally, Airflow is an open system - if you have very
>     >> > > specific
>     >> > > > > >>> needs
>     >> > > > > >>>>> for requirements, you might actually - in the very same way
>     >> > with
>     >> > > > > >>>>> pip-tools/poetry - upgrade all your dependencies in your
>     >> local
>     >> > > fork
>     >> > > > > of
>     >> > > > > >>>>> Airflow before someone else does it in master/release. Those
>     >> > > tools
>     >> > > > > kind
>     >> > > > > >>>> of
>     >> > > > > >>>>> democratise dependency management. It should be as easy as
>     >> > > > > `pip-compile
>     >> > > > > >>>>> --upgrade` or `poetry update` and you will get all the
>     >> > > > > >>> "non-conflicting"
>     >> > > > > >>>>> latest dependencies in your local fork (and poetry
>     >> especially
>     >> > > seems
>     >> > > > > to
>     >> > > > > >>> do
>     >> > > > > >>>>> all the heavy lifting of figuring out which versions will
>     >> > work).
>     >> > > > You
>     >> > > > > >>>> should
>     >> > > > > >>>>> be able to test and publish it locally as your private
>     >> package
>     >> > > for
>     >> > > > > >>> local
>     >> > > > > >>>>> installations. You can even mark the specific dependency you
>     >> > want
>     >> > > > to
>     >> > > > > >>> use
>     >> > > > > >>>>> specific version and let pip-tools/poetry figure out exact
>     >> > > versions
>     >> > > > > of
>     >> > > > > >>>>> other requirements. You can even make a PR with such upgrade
>     >> > > > > eventually
>     >> > > > > >>>> to
>     >> > > > > >>>>> get it faster in master. You can even downgrade in case
>     >> newer
>     >> > > > > >>> dependency
>     >> > > > > >>>>> causes problems for you in similar way. Guided by the tools,
>     >> > it's
>     >> > > > > much
>     >> > > > > >>>>> faster than figuring the versions out by yourself.
>     >> > > > > >>>>>
>     >> > > > > >>>>> As long as we have simple way of managing it and document
>     >> how
>     >> > to
>     >> > > > > >>>>> upgrade/downgrade dependencies in your own fork, and mention
>     >> > how
>     >> > > to
>     >> > > > > >>>> locally
>     >> > > > > >>>>> release Airflow as a package, I think your case could be
>     >> > covered
>     >> > > > even
>     >> > > > > >>>>> better than now. What do you think ?
>     >> > > > > >>>>>
>     >> > > > > >>>>> J.
>     >> > > > > >>>>>
>     >> > > > > >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
>     >> > > > > >>>>> <EK...@novozymes.com.invalid> wrote:
>     >> > > > > >>>>>
>     >> > > > > >>>>>> For us, exact pinning of versions would be problematic. We
>     >> > have
>     >> > > > DAG
>     >> > > > > >>>> code
>     >> > > > > >>>>>> that shares direct and indirect dependencies with Airflow,
>     >> > e.g.
>     >> > > > > lxml,
>     >> > > > > >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and
>     >> ldap3.
>     >> > > If
>     >> > > > > our
>     >> > > > > >>>> DAG
>     >> > > > > >>>>>> code for some reason needs a newer point release due to a
>     >> bug
>     >> > > > that's
>     >> > > > > >>>>> fixed,
>     >> > > > > >>>>>> then we can't cleanly build a virtual environment
>     >> containing
>     >> > the
>     >> > > > > >>> fixed
>     >> > > > > >>>>>> version. For us, it's already a problem that Airflow has
>     >> quite
>     >> > > > > strict
>     >> > > > > >>>>> (and
>     >> > > > > >>>>>> sometimes old) requirements in setup.py.
>     >> > > > > >>>>>>
>     >> > > > > >>>>>> Erik
>     >> > > > > >>>>>> ________________________________
>     >> > > > > >>>>>> From: Jarek Potiuk <Ja...@polidea.com>
>     >> > > > > >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
>     >> > > > > >>>>>> To: dev@airflow.incubator.apache.org
>     >> > > > > >>>>>> Subject: Re: Pinning dependencies for Apache Airflow
>     >> > > > > >>>>>>
>     >> > > > > >>>>>> I think one solution to release approach is to check as
>     >> part
>     >> > of
>     >> > > > > >>>> automated
>     >> > > > > >>>>>> Travis build if all requirements are pinned with == (even
>     >> the
>     >> > > deep
>     >> > > > > >>>> ones)
>     >> > > > > >>>>>> and fail the build in case they are not for ALL versions
>     >> > > > (including
>     >> > > > > >>>>>> dev). And of course we should document the approach of
>     >> > > > > >>>> releases/upgrades
>     >> > > > > >>>>>> etc. If we do it all the time for development versions
>     >> (which
>     >> > > > seems
>     >> > > > > >>>> quite
>     >> > > > > >>>>>> doable), then transitively all the releases will also have
>     >> > > pinned
>     >> > > > > >>>>> versions
>     >> > > > > >>>>>> and they will never try to upgrade any of the
>     >> dependencies. In
>     >> > > > > poetry
>     >> > > > > >>>>>> (similarly in pip-tools with .in file) it is done by
>     >> having a
>     >> > > > .lock
>     >> > > > > >>>> file
>     >> > > > > >>>>>> that specifies exact versions of each package so it can be
>     >> > > rather
>     >> > > > > >>> easy
>     >> > > > > >>>> to
>     >> > > > > >>>>>> manage (so it's worth trying it out I think  :D  - seems a
>     >> bit
>     >> > > > more
>     >> > > > > >>>>>> friendly than pip-tools).
>     >> > > > > >>>>>>
>     >> > > > > >>>>>> There is a drawback - of course - with manually updating
>     >> the
>     >> > > > module
>     >> > > > > >>>> that
>     >> > > > > >>>>>> you want, but I really see that as an advantage rather than
>     >> > > > drawback
>     >> > > > > >>>>>> especially for users. This way you maintain the property
>     >> that
>     >> > it
>     >> > > > > will
>     >> > > > > >>>>>> always install and work the same way no matter if you
>     >> > installed
>     >> > > it
>     >> > > > > >>>> today
>     >> > > > > >>>>> or
>     >> > > > > >>>>>> two months ago. I think the biggest drawback for
>     >> maintainers
>     >> > is
>     >> > > > that
>     >> > > > > >>>> you
>     >> > > > > >>>>>> need some kind of monitoring of security vulnerabilities
>     >> and
>     >> > > > cannot
>     >> > > > > >>>> rely
>     >> > > > > >>>>> on
>     >> > > > > >>>>>> automated security upgrades. With >= requirements those
>     >> > security
>     >> > > > > >>>> updates
>     >> > > > > >>>>>> might happen automatically without anyone noticing, but to
>     >> be
>     >> > > > honest
>     >> > > > > >>> I
>     >> > > > > >>>>>> don't think such upgrades are guaranteed even in current
>     >> setup
>     >> > > for
>     >> > > > > >>> all
>     >> > > > > >>>>>> security issues for all libraries anyway.
>     >> > > > > >>>>>>
>     >> > > > > >>>>>> Finding the need to upgrade because of security issues can
>     >> be
>     >> > > > quite
>     >> > > > > >>>>>> automated. Even now I noticed Github started to inform
>     >> owners
>     >> > > > about
>     >> > > > > >>>>>> potential security vulnerabilities in used libraries for
>     >> their
>     >> > > > > >>> project.
>     >> > > > > >>>>>> Those notifications can be sent to devlist and turned into
>     >> > JIRA
>     >> > > > > >>> issues
>     >> > > > > >>>>>> followed bvy  minor security-related releases (with only
>     >> few
>     >> > > > library
>     >> > > > > >>>>>> dependencies upgraded).
>     >> > > > > >>>>>>
>     >> > > > > >>>>>> I think it's even easier to automate it if you have pinned
>     >> > > > > >>>> dependencies -
>     >> > > > > >>>>>> because it's generally easy to find applicable
>     >> vulnerabilities
>     >> > > for
>     >> > > > > >>>>> specific
>     >> > > > > >>>>>> versions of libraries by static analysers - when you have
>     >> >=,
>     >> > > you
>     >> > > > > >>> never
>     >> > > > > >>>>>> know which version will be used until you actually perform
>     >> the
>     >> > > > > >>>>>> installation.
>     >> > > > > >>>>>>
>     >> > > > > >>>>>> There is one big advantage for maintainers for "pinned"
>     >> case.
>     >> > > Your
>     >> > > > > >>>> users
>     >> > > > > >>>>>> always have the same dependencies - so when issue is
>     >> raised,
>     >> > you
>     >> > > > can
>     >> > > > > >>>>>> reproduce it more easily. It's hard to know which version
>     >> user
>     >> > > has
>     >> > > > > >>> (as
>     >> > > > > >>>>> the
>     >> > > > > >>>>>> user could install it month ago or yesterday) and even if
>     >> you
>     >> > > find
>     >> > > > > >>> out
>     >> > > > > >>>> by
>     >> > > > > >>>>>> asking the user, you might not be able to reproduce the
>     >> set of
>     >> > > > > >>>>> requirements
>     >> > > > > >>>>>> easily (simply because there are already newer versions of
>     >> the
>     >> > > > > >>>> libraries
>     >> > > > > >>>>>> released and they are used automatically). You can ask the
>     >> > user
>     >> > > to
>     >> > > > > >>> run
>     >> > > > > >>>>> pip
>     >> > > > > >>>>>> --upgrade but that's dangerous and pretty lame ("check the
>     >> > > latest
>     >> > > > > >>>>> version -
>     >> > > > > >>>>>> maybe it fixes your problem ? ") and sometimes not possible
>     >> > > (e.g.
>     >> > > > > >>>> someone
>     >> > > > > >>>>>> has pre-built docker image with dependencies from few
>     >> months
>     >> > ago
>     >> > > > and
>     >> > > > > >>>>> cannot
>     >> > > > > >>>>>> rebuild the image easily).
>     >> > > > > >>>>>>
>     >> > > > > >>>>>> J.
>     >> > > > > >>>>>>
>     >> > > > > >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <
>     >> > > ash@apache.org
>     >> > > > >
>     >> > > > > >>>>> wrote:
>     >> > > > > >>>>>>
>     >> > > > > >>>>>>> One thing to point out here.
>     >> > > > > >>>>>>>
>     >> > > > > >>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a
>     >> > clean
>     >> > > > > >>>>>>> environment it will fail.
>     >> > > > > >>>>>>>
>     >> > > > > >>>>>>> This is because we pin flask-login to 0.2.1 but
>     >> > > flask-appbuilder
>     >> > > > is
>     >> > > > > >>>>> =
>     >> > > > > >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires
>     >> flask-login >=
>     >> > > > 0.3.
>     >> > > > > >>>>>>>
>     >> > > > > >>>>>>> So I do think there is maybe something to be said about
>     >> > pinning
>     >> > > > for
>     >> > > > > >>>>>>> releases. The down side to that is that if there are
>     >> updates
>     >> > > to a
>     >> > > > > >>>>> module
>     >> > > > > >>>>>>> that we want then we have to make a point release to let
>     >> > people
>     >> > > > get
>     >> > > > > >>>> it
>     >> > > > > >>>>>>>
>     >> > > > > >>>>>>> Both methods have draw-backs
>     >> > > > > >>>>>>>
>     >> > > > > >>>>>>> -ash
>     >> > > > > >>>>>>>
>     >> > > > > >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
>     >> > > > > >>> arthur.wiedmer@gmail.com>
>     >> > > > > >>>>>>> wrote:
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>> Hi Jarek,
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>> I will +1 the discussion Dan is referring to and George's
>     >> > > > advice.
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>> I just want to double check we are talking about pinning
>     >> in
>     >> > > > > >>>>>>>> requirements.txt only.
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>> This offers the ability to
>     >> > > > > >>>>>>>> pip install -r requirements.txt
>     >> > > > > >>>>>>>> pip install --no-deps airflow
>     >> > > > > >>>>>>>> For a guaranteed install which works.
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>> Several different requirement files can be provided for
>     >> > > specific
>     >> > > > > >>>> use
>     >> > > > > >>>>>>> cases,
>     >> > > > > >>>>>>>> like a stable dev one for instance for people wanting to
>     >> > work
>     >> > > on
>     >> > > > > >>>>>>> operators
>     >> > > > > >>>>>>>> and non-core functions.
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>> However, I think we should proactively test in CI against
>     >> > > > > >>> unpinned
>     >> > > > > >>>>>>>> dependencies (though it might be a separate case in the
>     >> > > matrix)
>     >> > > > ,
>     >> > > > > >>>> so
>     >> > > > > >>>>>> that
>     >> > > > > >>>>>>>> we get advance warning if possible that things will
>     >> break.
>     >> > > > > >>>>>>>> CI downtime is not a bad thing here, it actually caught a
>     >> > > > problem
>     >> > > > > >>>> :)
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>> We should unpin as possible in setup.py to only maintain
>     >> > > minimum
>     >> > > > > >>>>>> required
>     >> > > > > >>>>>>>> compatibility. The process of pinning in setup.py is
>     >> > extremely
>     >> > > > > >>>>>>> detrimental
>     >> > > > > >>>>>>>> when you have a large number of python libraries
>     >> installed
>     >> > > with
>     >> > > > > >>>>>> different
>     >> > > > > >>>>>>>> pinned versions.
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>> Best,
>     >> > > > > >>>>>>>> Arthur
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
>     >> > > > > >>>>>> <ddavydov@twitter.com.invalid
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>> wrote:
>     >> > > > > >>>>>>>>
>     >> > > > > >>>>>>>>> Relevant discussion about this:
>     >> > > > > >>>>>>>>>
>     >> > > > > >>>>>>>>>
>     >> > > > > >>>>>>>
>     >> > > > > >>>>>>
>     >> > > > > >>>>>
>     >> > > > > >>>>
>     >> > > > > >>>
>     >> > > > >
>     >> > > >
>     >> > >
>     >> >
>     >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=9wta3PcUeZjBg%2FmACBH06cNRzbYG4NcAW0XDJKan6cM%3D&amp;reserved=0
>     >> > > > > >>>>>>>>>
>     >> > > > > >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
>     >> > > > > >>>>>> Jarek.Potiuk@polidea.com>
>     >> > > > > >>>>>>>>> wrote:
>     >> > > > > >>>>>>>>>
>     >> > > > > >>>>>>>>>> TL;DR; A change is coming in the way how
>     >> > > > > >>>> dependencies/requirements
>     >> > > > > >>>>>> are
>     >> > > > > >>>>>>>>>> specified for Apache Airflow - they will be fixed
>     >> rather
>     >> > > than
>     >> > > > > >>>>>> flexible
>     >> > > > > >>>>>>>>> (==
>     >> > > > > >>>>>>>>>> rather than >=).
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>> This is follow up after Slack discussion we had with
>     >> Ash
>     >> > and
>     >> > > > > >>>> Kaxil
>     >> > > > > >>>>> -
>     >> > > > > >>>>>>>>>> summarising what we propose we'll do.
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>> *Problem:*
>     >> > > > > >>>>>>>>>> During last few weeks we experienced quite a few
>     >> downtimes
>     >> > > of
>     >> > > > > >>>>>> TravisCI
>     >> > > > > >>>>>>>>>> builds (for all PRs/branches including master) as some
>     >> of
>     >> > > the
>     >> > > > > >>>>>>> transitive
>     >> > > > > >>>>>>>>>> dependencies were automatically upgraded. This because
>     >> in
>     >> > a
>     >> > > > > >>>> number
>     >> > > > > >>>>> of
>     >> > > > > >>>>>>>>>> dependencies we have  >= rather than == dependencies.
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>> Whenever there is a new release of such dependency, it
>     >> > might
>     >> > > > > >>>> cause
>     >> > > > > >>>>>>> chain
>     >> > > > > >>>>>>>>>> reaction with upgrade of transitive dependencies which
>     >> > might
>     >> > > > > >>> get
>     >> > > > > >>>>> into
>     >> > > > > >>>>>>>>>> conflict.
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>> An example was Flask-AppBuilder vs flask-login
>     >> transitive
>     >> > > > > >>>>> dependency
>     >> > > > > >>>>>>> with
>     >> > > > > >>>>>>>>>> click. They started to conflict once AppBuilder has
>     >> > released
>     >> > > > > >>>>> version
>     >> > > > > >>>>>>>>>> 1.12.0.
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>> *Diagnosis:*
>     >> > > > > >>>>>>>>>> Transitive dependencies with "flexible" versions
>     >> (where >=
>     >> > > is
>     >> > > > > >>>> used
>     >> > > > > >>>>>>>>> instead
>     >> > > > > >>>>>>>>>> of ==) is a reason for "dependency hell". We will
>     >> sooner
>     >> > or
>     >> > > > > >>> later
>     >> > > > > >>>>> hit
>     >> > > > > >>>>>>>>> other
>     >> > > > > >>>>>>>>>> cases where not fixed dependencies cause similar
>     >> problems
>     >> > > with
>     >> > > > > >>>>> other
>     >> > > > > >>>>>>>>>> transitive dependencies. We need to fix-pin them. This
>     >> > > causes
>     >> > > > > >>>>>> problems
>     >> > > > > >>>>>>>>> for
>     >> > > > > >>>>>>>>>> both - released versions (cause they stop to work!) and
>     >> > for
>     >> > > > > >>>>>> development
>     >> > > > > >>>>>>>>>> (cause they break master builds in TravisCI and prevent
>     >> > > people
>     >> > > > > >>>> from
>     >> > > > > >>>>>>>>>> installing development environment from the scratch.
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>> *Solution:*
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>>  - Following the old-but-good post
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>
>     >> > > > > >>>>>
>     >> > > > > >>>>
>     >> > > > > >>>
>     >> > > > >
>     >> > > >
>     >> > >
>     >> >
>     >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=0jqlZcLU6%2BvO%2BJKSMlX7gyix6dKvD%2BZbrgHn9pRknLY%3D&amp;reserved=0
>     >> > > > > >>>>>> we are going to fix the
>     >> > > > > >>>>>>>>>> pinned
>     >> > > > > >>>>>>>>>>  dependencies to specific versions (so basically all
>     >> > > > > >>>> dependencies
>     >> > > > > >>>>>> are
>     >> > > > > >>>>>>>>>>  "fixed").
>     >> > > > > >>>>>>>>>>  - We will introduce mechanism to be able to upgrade
>     >> > > > > >>>> dependencies
>     >> > > > > >>>>>> with
>     >> > > > > >>>>>>>>>>  pip-tools (
>     >> > > > > >>>>>>
>     >> > > > > >>>>>
>     >> > > > > >>>>
>     >> > > > > >>>
>     >> > > > >
>     >> > > >
>     >> > >
>     >> >
>     >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=hu%2FivDsKxwocNlVtBTgYE0E%2BET97u2DWN1IdnCF1ckU%3D&amp;reserved=0
>     >> > > > > >>>>> ).
>     >> > > > > >>>>>> We might also
>     >> > > > > >>>>>>>>> take a
>     >> > > > > >>>>>>>>>>  look at pipenv:
>     >> > > > > >>>>>>
>     >> > > > > >>>>>
>     >> > > > > >>>>
>     >> > > > > >>>
>     >> > > > >
>     >> > > >
>     >> > >
>     >> >
>     >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=s0iqMPk3O8%2Bk1BCPBLYfIIMU2D4SdmPVEYELo%2FKS1%2FA%3D&amp;reserved=0
>     >> > > > > >>>>>>>>>>  - People who would like to upgrade some dependencies
>     >> for
>     >> > > > > >>> their
>     >> > > > > >>>>> PRs
>     >> > > > > >>>>>>>>> will
>     >> > > > > >>>>>>>>>>  still be able to do it - but such upgrades will be in
>     >> > their
>     >> > > > > >>> PR
>     >> > > > > >>>>> thus
>     >> > > > > >>>>>>>>> they
>     >> > > > > >>>>>>>>>>  will go through TravisCI tests and they will also
>     >> have to
>     >> > > be
>     >> > > > > >>>>>>> specified
>     >> > > > > >>>>>>>>>> with
>     >> > > > > >>>>>>>>>>  pinned fixed versions (==). This should be part of
>     >> review
>     >> > > > > >>>> process
>     >> > > > > >>>>>> to
>     >> > > > > >>>>>>>>>> make
>     >> > > > > >>>>>>>>>>  sure new/changed requirements are pinned.
>     >> > > > > >>>>>>>>>>  - In release process there will be a point where an
>     >> > upgrade
>     >> > > > > >>>> will
>     >> > > > > >>>>> be
>     >> > > > > >>>>>>>>>>  attempted for all requirements (using pip-tools) so
>     >> that
>     >> > we
>     >> > > > > >>> are
>     >> > > > > >>>>> not
>     >> > > > > >>>>>>>>>> stuck
>     >> > > > > >>>>>>>>>>  with older releases. This will be in controlled PR
>     >> > > > > >>> environment
>     >> > > > > >>>>>> where
>     >> > > > > >>>>>>>>>> there
>     >> > > > > >>>>>>>>>>  will be time to fix all dependencies without impacting
>     >> > > others
>     >> > > > > >>>> and
>     >> > > > > >>>>>>>>> likely
>     >> > > > > >>>>>>>>>>  enough time to "vet" such changes (this can be done
>     >> for
>     >> > > > > >>>>> alpha/beta
>     >> > > > > >>>>>>>>>> releases
>     >> > > > > >>>>>>>>>>  for example).
>     >> > > > > >>>>>>>>>>  - As a side effect dependencies specification will
>     >> become
>     >> > > far
>     >> > > > > >>>>>> simpler
>     >> > > > > >>>>>>>>>>  and straightforward.
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>> Happy to hear community comments to the proposal. I am
>     >> > happy
>     >> > > > to
>     >> > > > > >>>>> take
>     >> > > > > >>>>>> a
>     >> > > > > >>>>>>>>> lead
>     >> > > > > >>>>>>>>>> on that, open JIRA issue and implement if this is
>     >> > something
>     >> > > > > >>>>> community
>     >> > > > > >>>>>>> is
>     >> > > > > >>>>>>>>>> happy with.
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>> J.
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>> --
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
>     >> > > > > >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
>     >> > <+48%20660%20796%20129>
>     >> > > > <+48%20660%20796%20129>
>     >> > > > > >>>>>>>>>>
>     >> > > > > >>>>>>>>>
>     >> > > > > >>>>>>>
>     >> > > > > >>>>>>>
>     >> > > > > >>>>>>
>     >> > > > > >>>>>> --
>     >> > > > > >>>>>>
>     >> > > > > >>>>>> *Jarek Potiuk, Principal Software Engineer*
>     >> > > > > >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
>     >> > <+48%20660%20796%20129>
>     >> > > > <+48%20660%20796%20129>
>     >> > > > > >>>>>>
>     >> > > > > >>>>>
>     >> > > > > >>>>>
>     >> > > > > >>


Re: Pinning dependencies for Apache Airflow

Posted by "Daniel (Daniel Lamblin) [BDP - Seoul]" <la...@coupang.com>.
That Slack comment is mine, thanks.

If it's a vote my vote is: please limit the package versions in setup.py for any branch meant to be stable.

To be specific:
* I don't have an expectation that installing from master is going to work every time. But when it doesn't I do expect to find the CI is broken there's a "red" indicator there-of, even if there was no commit or everyone was on vacation, it should be running a couple times a day to catch breakage due to dependencies. So, I don't really care if packages in setup.py from master are pinned, or just unlimited minimums, Though I'd think that any version can be limited to less than the next major number…
* I do have an expectation that installing v1-10-stable, or v1-8-stable, vX-Y-stable etc. is 100% going to work every time, I do think that its package versions should be the same as those that were used to pass the release check-off process.

It is probably easiest for maintainers (us?) if when prepping a stable branch, the setup.py is modified to specify exactly the == package versions of absolutely everything that passed the test, QA, release process. If what you guys are discussing with pipenv, pip-tools, .lock requirement.txt files etc is integrated with setup.py, then great; otherwise, not good enough (I'll explain).

I hear there's a concern that say sshtunnel vX.Y turns out to be a security nightmare you will regret pinning v1-10-stable to sshtunnel-vX.Y once it's known, but I disagree this is an big-deal because it will be A) on the user to know that maybe maintainers didn't update this dependency check overnight, B) on the maintainers to go to each _maintained_ stable branch and bring it up to date with the patched version and redo the QA & release checks, and C) on maintainers to mark any stable branch that isn't or can't be updated as "has known security issues", and finally D) have a time line for marking releases as unmaintained, probably has security issues, we don't even check for.
I doubt the ASF has a problem with maintaining a stable release with security updates or (as is the current need) fixing a stable release such that it builds. But I don't know exactly the release rules.

I think I understand that what I'm proposing is dandy for the git branches but is an issue for PyPi, because there you do not update a released version, and deleting one breaks things even worse, so… if 1.10.0 is broken and 1.10.1 is the next release in test/development, the fixed 1.10.0 should be released as an update that isn't 1.10.1; something like 1.10.0.1; Note that HOW 1.10.0 BROKE would not have happened with more careful limits on the version, possibly requiring full pinning of exact versions. So that's again in favor of fixing this stable branches and their releases to exact versions that were known good. Now a security patch would still have to become, say, 1.10.0.1. I don't know if it's possible to go back to a released PyPi package and update its readme, description, or any part to mark it as known to contain a security issue or not.

Trying not to go long, but here's the part where I explain why the setup.py has to be fixed to what passed ci, qa, release etc. for a stable branch or packaged release, by explaining what bit me:
I made a docker image from a local fork of the v1-10-stable branch a couple months ago. A week ago, someone said to me, hey, I want to use the SSHOperator but I get this message about paramiko not being installed. So, look at that, my Dockerfile didn't add `ssh` to the options when running `pip install --no-cache-dir -e "${AIRFLOW_SRC_HOME}/[async,celery,crypto,cgroups,hdfs,hive,jdbc,ldap,mssql,mysql,postgres,s3,slack,statsd]"`, darn-it. This is easy though, let me add that now. You can see how this depends on what's in setup.py. And you might see how this didn't bring up the log message `pkg_resources.ContextualVersionConflict: (Click 7.0 (/usr/local/lib/python3.6/site-packages), Requirement.parse('click==6.7'), {'flask-appbuilder'})` until the replacement webserver tried to start up, glad I didn't touch the scheduler first.
You might think, oh Daniel, you should pull the existing image you released and just pip install paramiko and pysftp after reading the setup.py file then release commit that and push it.
Well, because Docker says it’s a best practice to build each release from the Dockerfile instead of interactively adding layers on top, there's a system that checks (kind of) and (usually) stops that idea from working.
Also, that would require cautious work, doesn't support the simple fix, and doesn't support people who built the release "late".
What I did end up doing was using pip freeze to figure out exactly what's on my prior working release (surprise boto3 1.8.6 is there, though master says boto3 <= 1.8.0) and using it as a basis for the Dockerfile prior to that `pip-install -e`. It's not quite 100% working yet, but the remaining issues have nothing to do with this discussion.

In summary, I fully expected a stable branch of a package to be able to be installed at any time and operate the same way it did when it was cut. I'm not sure why there's any votes another way about that, but I suspect those votes are more about what goes on at master and on ci than in a release, and are thus, to my mind, besides the point.

Thanks,
-Daniel

On 10/15/18, 9:05 PM, "Jarek Potiuk" <Ja...@polidea.com> wrote:

    Speaking of which - just to show what kind of problems we are talking about
    - here is a link to a relevant discussion in troubleshooting @ slack from
    today,  where someone tries to install v1.10-stable and needs help.
    This is exactly the kind of problems I think are important to solve,
    whatever way we choose to solve it:
    https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1539573567000100
    
    I really don't think it's a good idea to put especially new Airflow users
    in this situation where they need to search through devlist and upstream
    commits or ask for help to just be able to install stable release of
    Airflow.
    
    J.
    
    On Mon, Oct 15, 2018 at 9:29 AM Jarek Potiuk <Ja...@polidea.com>
    wrote:
    
    > Sorry for late reply - I was travelling, was at Cloud Next in London last
    > week (BTW. there were talks about Composer/Airflow there).
    >
    > I see the point, it's indeed very difficult to solve when we want both:
    > stability of releases and flexibility of using released version and write
    > the code within it. I think some trade-offs need to be made as we won't
    > solve it all with a one-size-fits-all approach. Answering your question
    > George - the value of pinning for release purpose is addressing "stability"
    > need.
    >
    >    - Due to my background I come from the "stability" side (which is more
    >    user-focused) - i.e. the main problem that I want to solve is to make sure
    >    that someone who wants to install airflow a fresh and start using it as a
    >    beginner user, can always run 'pip install airflow' and it will get
    >    installed. For me this is the point when many users my simply get put off
    >    if it refuses to install out-of-the-box. Few months ago I actually
    >    evaluated airflow to run ML pipeline for startup I was at that time. If
    >    back then it refused to install out-of-the-box, my evaluation results would
    >    be 'did not pass the basic criteria'. Luckily it did not happen, we did
    >    more elaborated evaluation then - we did not use Airflow eventually but for
    >    other reasons. For us the criteria "it just works!" was super important -
    >    because we did not have time to deep dive into details, find out why things
    >    do not work - we had a lot of "core/ML/robotics" things to worry about and
    >    any hurdles with unstable tools would be a major distraction. We really
    >    wanted to write several DAGs and get them executed in stable, repeatable
    >    way, and that when we install it on production machine in two months - it
    >    continues to work without any extra work.
    >    - then there are a lot of concerns from the "flexibility" side (which
    >    is more advanced users/developers) side. It becomes important when you want
    >    to actively develop your Dags (you start using more than just built-in
    >    operators and start developing lot more code in DAGs or use PythonOperator
    >    more and more. Then of course it is important to get the "flexible"
    >    approach. I argue that in this cases the "active" developers might be more
    >    inclined to do any tweaking of their environment as they are more advanced
    >    and might be more experience in the dependencies and would be able to
    >    downgrade/upgrade dependencies as they will need in their virtualenvs.
    >    Those people should be quite ok with spending a bit more time to get their
    >    environment tweaked to their needs.
    >
    > I was thinking if there is a way to satisfy both ? And I have a wild idea:
    >
    >    - we have two set of requirements (easy-upgradeable "stable" ones in
    >    requirements.txt/poetry and flexible with versions in setup.py (or similar)
    >    - as proposed earlier in this thread
    >    - we release two flavours of pip-installable airflow: 1.10.1 with
    >    stable/pinned dependencies and 1.10.1-devel (we can pick other flavour
    >    name) with flexible dependencies. It's quite common to have devel releases
    >    in Linux world - they serve a bit different purpose (like include headers
    >    for C/C++ programs) and it's usually extra package on top of the basic one,
    >    but the basic idea is similar - if you are a user, you install 1.10.1, if
    >    you are active developer, you install 1.10.1-devel
    >
    > What do you think?
    >
    > Off-topic a bit: a friend of mine pointed me to this excellent talk by Elm
    > creator: "The Hard Parts of Open Source" by Evan Czaplicki
    > <https://www.youtube.com/watch?v=o_4EX4dPppA> and it made me think
    > differently about the discussion we have :D
    >
    > J.
    >
    > On Wed, Oct 10, 2018 at 7:51 PM George Leslie-Waksman <wa...@gmail.com>
    > wrote:
    >
    >> It's not upgrading dependencies that I'm worried about, it's downgrading.
    >> With upgrade conflicts, we can treat the dependency upgrades as a
    >> necessary
    >> aspect of the Airflow upgrade.
    >>
    >> Suppose Airflow pins LibraryA==1.2.3 and then a security issue is found in
    >> LibraryA==1.2.3. This issue is fixed in LibraryA==1.2.4. Now, we are
    >> placed
    >> in the annoying situation of either: a) managing our deployments so that
    >> we
    >> install Airflow first, and then upgrade LibraryA and ignore pip's warning
    >> about incompatible versions, b) keeping the insecure version of LibraryA,
    >> c) waiting for another Airflow release and accepting all other changes, d)
    >> maintaining our own fork of Airflow and diverging from mainline.
    >>
    >> If Airflow specifies a requirement of LibraryA>=1.2.3, there is no problem
    >> whatsoever. If we're worried about API changes in the future, there's
    >> always LibraryA>=1.2.3,1.3 or LibraryA>=1.2.3,<2.0
    >>
    >> As has been pointed out, that PythonOperator tasks run in the same venv as
    >> Airflow, it is necessary that users be able to control dependencies for
    >> their code.
    >>
    >> To be clear, it's not always a security risk but this is not a
    >> hypothetical
    >> issue. We ran into a code incompatibility with psutil that mattered to us
    >> but had no impact on Airflow (see:
    >> https://github.com/apache/incubator-airflow/pull/3585) and are currently
    >> seeing SQLAlchemy held back without any clear need (
    >> https://github.com/apache/incubator-airflow/blob/master/setup.py#L325).
    >>
    >> Pinning dependencies for releases will force us (and I expect others) to
    >> either: ignore/workaround the pinning, or not use Airflow releases. Both
    >> of
    >> those options exactly defeat the point.
    >>
    >> If people are on board with pinning / locking all dependencies for CI
    >> purposes, and we can constrain requirements to ranges for necessary
    >> compatibility, what is the value of pinning all dependencies for release
    >> purposes?
    >>
    >> --George
    >>
    >> On Tue, Oct 9, 2018 at 11:57 AM Jarek Potiuk <Ja...@polidea.com>
    >> wrote:
    >>
    >> > I am still not convinced that pinning is bad. I re-read again the whole
    >> > mail thread and the thread from 2016
    >> > <
    >> >
    >> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
    >> > >
    >> > to
    >> > read all the arguments, but I stand by pinning.
    >> >
    >> > I am - of course - not sure about graduation argument. I would just
    >> imagine
    >> > it might be the cas.. I however really think that situation we are in
    >> now
    >> > is quite volatile. The latest 1.10.0 cannot be clean-installed via pip
    >> > without manually tweaking and forcing lower version of flask-appbuilder.
    >> > Even if you use the constraints file it's pretty cumbersome because
    >> you'd
    >> > have to somehow know that you need to do exactly that (not at all
    >> obvious
    >> > from the error you get). Also it might at any time get worse as other
    >> > packages get newer versions released. The thing here is that
    >> maintainers of
    >> > flask-appbuilder did nothing wrong, they simply released new version
    >> with
    >> > click dependency version increased (probably for a good reason) and it's
    >> > airflow's cross-dependency graph which makes it incompatible.
    >> >
    >> > I am afraid that if we don't change it, it's all but guaranteed that
    >> every
    >> > single release at some point of time will "deteriorate" and refuse to
    >> > clean-install. If we want to solve this problem (maybe we don't and we
    >> > accept it as it is?), I think the only way to solve it is to hard-pin
    >> all
    >> > the requirements at the very least for releases.
    >> >
    >> > Of course we might choose pinning only for releases (and CI builds) and
    >> > have the compromise that Matt mentioned. I have the worry however (also
    >> > mentioned in the previous thread) that it will be hard to maintain.
    >> > Effectively you will have to maintain both in parallel. And the case
    >> with
    >> > constraints is a nice workaround for someone who actually need specific
    >> > (even newer) version of specific package in their environment.
    >> >
    >> > Maybe we should simply give it a try and do Proof-Of-Concept/experiment
    >> as
    >> > also Fokko mentioned?
    >> >
    >> > We could have a PR with pinning enabled, and maybe ask the people who
    >> voice
    >> > concerns about environment give it a try with those pinned versions and
    >> see
    >> > if that makes it difficult for them to either upgrade dependencies and
    >> fork
    >> > apache-airflow or use constraints file of pip?
    >> >
    >> > J.
    >> >
    >> >
    >> > On Tue, Oct 9, 2018 at 5:56 PM Matt Davis <ji...@gmail.com> wrote:
    >> >
    >> > > Erik, the Airflow task execution code itself of course must run
    >> somewhere
    >> > > with Airflow installed, but if the task is making a database query or
    >> a
    >> > web
    >> > > request or running something in Docker there's separation between the
    >> > > environments and maybe you don't care about Python dependencies at all
    >> > > except to get Airflow running. When running Python operators that's
    >> not
    >> > the
    >> > > case (as you already deal with).
    >> > >
    >> > > - Matt
    >> > >
    >> > > On Tue, Oct 9, 2018 at 2:45 AM EKC (Erik Cederstrand)
    >> > > <EK...@novozymes.com.invalid> wrote:
    >> > >
    >> > > > This is maybe a stupid question, but is it even possible to run
    >> tasks
    >> > in
    >> > > > an environment where Airflow is not installed?
    >> > > >
    >> > > >
    >> > > > Kind regards,
    >> > > >
    >> > > > Erik
    >> > > >
    >> > > > ________________________________
    >> > > > From: Matt Davis <ji...@gmail.com>
    >> > > > Sent: Monday, October 8, 2018 10:13:34 PM
    >> > > > To: dev@airflow.incubator.apache.org
    >> > > > Subject: Re: Pinning dependencies for Apache Airflow
    >> > > >
    >> > > > It sounds like we can get the best of both worlds with the original
    >> > > > proposals to have minimal requirements in setup.py and "guaranteed
    >> to
    >> > > work"
    >> > > > complete requirements in a separate file. That way we have
    >> flexibility
    >> > > for
    >> > > > teams that run airflow and tasks in the same environment and
    >> guidance
    >> > on
    >> > > a
    >> > > > working set of requirements. (Disclaimer: I work on the same team as
    >> > > > George.)
    >> > > >
    >> > > > Thanks,
    >> > > > Matt
    >> > > >
    >> > > > On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor <as...@apache.org>
    >> > wrote:
    >> > > >
    >> > > > > Although I think I come down on the side against pinning, my
    >> reasons
    >> > > are
    >> > > > > different.
    >> > > > >
    >> > > > > For the two (or more) people who have expressed concern about it
    >> > would
    >> > > > > pip's "Constraint Files" help:
    >> > > > >
    >> > > > >
    >> > > >
    >> > >
    >> >
    >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpip.pypa.io%2Fen%2Fstable%2Fuser_guide%2F%23constraints-files&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=rUqtgC5eVKIQGlzniFMyJpU9IXFZ2Efs04ZCgO2I%2F9g%3D&amp;reserved=0
    >> > > > >
    >> > > > > For example, you could add "flask-appbuilder==1.11.1" in to this
    >> > file,
    >> > > > > specify it with `pip install -c constraints.txt apache-airflow`
    >> and
    >> > > then
    >> > > > > whenever pip attempted to install _any version of FAB it would use
    >> > the
    >> > > > > exact version from the constraints file.
    >> > > > >
    >> > > > > I don't buy the argument about pinning being a requirement for
    >> > > graduation
    >> > > > > from Incubation fwiw - it's an unavoidable artefact of the
    >> > open-source
    >> > > > > world we develop in.
    >> > > > >
    >> > > > >
    >> > > >
    >> > >
    >> >
    >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibraries.io%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=QX5hO%2FVPJE9M9A38QgCjx%2BfT4C1tfvr1ySUW%2FpV86Jw%3D&amp;reserved=0
    >> > > > offers a (free?) service that will monitor apps
    >> > > > > dependencies for being out of date, might be better than writing
    >> our
    >> > > own
    >> > > > > solution.
    >> > > > >
    >> > > > > Pip has for a while now supported a way of saying "this dep is for
    >> > > py2.7
    >> > > > > only":
    >> > > > >
    >> > > > > > Since version 6.0, pip also supports specifiers containing
    >> > > environment
    >> > > > > markers like so:
    >> > > > > >
    >> > > > > >    SomeProject ==5.4 ; python_version < '2.7'
    >> > > > > >    SomeProject; sys_platform == 'win32'
    >> > > > >
    >> > > > >
    >> > > > > Ash
    >> > > > >
    >> > > > >
    >> > > > > > On 8 Oct 2018, at 07:58, George Leslie-Waksman <
    >> waksman@gmail.com>
    >> > > > > wrote:
    >> > > > > >
    >> > > > > > As a member of a team that will also have really big problems if
    >> > > > > > Airflow pins all requirements (for reasons similar to those
    >> already
    >> > > > > > stated), I would like to add a very strong -1 to the idea of
    >> > pinning
    >> > > > > > them for all installations.
    >> > > > > >
    >> > > > > > In a number of situation on our end, to avoid similar problems
    >> with
    >> > > > > > CI, we use `pip-compile` from pip-tools (also mentioned):
    >> > > > > >
    >> > > >
    >> > >
    >> >
    >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fpip-tools%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1d9m%2Bk4NSuXNtnXFRFtv6pGdAUDvVvkoFe95pTshiIQ%3D&amp;reserved=0
    >> > > > > >
    >> > > > > > I would like to suggest, a middle ground of:
    >> > > > > >
    >> > > > > > - Have the installation continue to use unpinned (`>=`) with
    >> > minimum
    >> > > > > > necessary requirements set
    >> > > > > > - Include a pip-compiled requirements file
    >> (`requirements-ci.txt`?)
    >> > > > > > that is used by CI
    >> > > > > > - - If we need, there can be one file for each incompatible
    >> python
    >> > > > > version
    >> > > > > > - Append a watermark (hash of `setup.py` requirements?) to the
    >> > > > > > compiled requirements file
    >> > > > > > - Add a CI check that the watermark and original match to
    >> ensure no
    >> > > > > > drift since last compile
    >> > > > > >
    >> > > > > > I am happy to do much of the work for this, if it can help avoid
    >> > > > > > pinning all of the depends at the installation level.
    >> > > > > >
    >> > > > > > --George Leslie-Waksman
    >> > > > > >
    >> > > > > > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
    >> > > > > > <ma...@gmail.com> wrote:
    >> > > > > >>
    >> > > > > >> pip-tools can definitely help here to ship a reference [locked]
    >> > > > > >> `requirements.txt` that can be used in [all or part of] the CI.
    >> > It's
    >> > > > > >> actually kind of important to get CI to fail when a new
    >> [backward
    >> > > > > >> incompatible] lib comes out and break things while allowing
    >> > version
    >> > > > > ranges.
    >> > > > > >>
    >> > > > > >> I think there may be challenges around pip-tools and projects
    >> that
    >> > > run
    >> > > > > in
    >> > > > > >> both python2.7 and python3.6. You sometimes need to have 2
    >> > > > > requirements.txt
    >> > > > > >> lock files.
    >> > > > > >>
    >> > > > > >> Max
    >> > > > > >>
    >> > > > > >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <
    >> > > Jarek.Potiuk@polidea.com
    >> > > > >
    >> > > > > >> wrote:
    >> > > > > >>
    >> > > > > >>> It's a nice one :). However I think when/if we go to pinned
    >> > > > > dependencies
    >> > > > > >>> the way poetry/pip-tools do it, this will be suddenly lot-less
    >> > > useful
    >> > > > > It
    >> > > > > >>> will be very easy to track dependency changes (they will be
    >> > always
    >> > > > > >>> committed as a change in the .lock file or requirements.txt)
    >> and
    >> > if
    >> > > > > someone
    >> > > > > >>> has a problem while upgrading a dependency (always
    >> consciously,
    >> > > never
    >> > > > > >>> accidentally) it will simply fail during CI build and the
    >> change
    >> > > > won't
    >> > > > > get
    >> > > > > >>> merged/won't break the builds of others in the first place :).
    >> > > > > >>>
    >> > > > > >>> J.
    >> > > > > >>>
    >> > > > > >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <
    >> > xd.deng.r@gmail.com>
    >> > > > > wrote:
    >> > > > > >>>
    >> > > > > >>>> Hi folks,
    >> > > > > >>>>
    >> > > > > >>>> On top of this discussion, I was thinking we should have the
    >> > > ability
    >> > > > > to
    >> > > > > >>>> quickly monitor dependency release as well. Previously, it
    >> > > happened
    >> > > > > for a
    >> > > > > >>>> few times that CI kept failing for no reason and eventually
    >> > turned
    >> > > > > out it
    >> > > > > >>>> was due to dependency release. But it took us some time,
    >> > > sometimes a
    >> > > > > few
    >> > > > > >>>> days, to realise the failure was because of dependency
    >> release.
    >> > > > > >>>>
    >> > > > > >>>> To partially address this, I tried to develop a mini tool to
    >> > help
    >> > > us
    >> > > > > >>> check
    >> > > > > >>>> the latest release of Python packages & the release
    >> date-time on
    >> > > > PyPi.
    >> > > > > >>> So,
    >> > > > > >>>> by comparing it with our CI failure history, we may be able
    >> to
    >> > > > > >>> troubleshoot
    >> > > > > >>>> faster.
    >> > > > > >>>>
    >> > > > > >>>> Output Sample (ordered by upload time in desc order):
    >> > > > > >>>>                               Latest Version          Upload
    >> > Time
    >> > > > > >>>> Package Name
    >> > > > > >>>> awscli                    1.16.28
    >> > > > > >>> 2018-10-05T23:12:45
    >> > > > > >>>> botocore                1.12.18
    >> > > > > 2018-10-05T23:12:39
    >> > > > > >>>> promise                   2.2.1
    >> > > > > >>> 2018-10-04T22:04:18
    >> > > > > >>>> Keras                     2.2.4
    >> > > > > >>> 2018-10-03T20:59:39
    >> > > > > >>>> bleach                    3.0.0
    >> > > > > >>> 2018-10-03T16:54:27
    >> > > > > >>>> Flask-AppBuilder         1.12.0
    >> > 2018-10-03T09:03:48
    >> > > > > >>>> ... ...
    >> > > > > >>>>
    >> > > > > >>>> It's a minimal tool (not perfect yet but working). I have
    >> hosted
    >> > > > this
    >> > > > > >>> tool
    >> > > > > >>>> at
    >> > > >
    >> > >
    >> >
    >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXD-DENG%2Fpypi-release-query&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=xk9hyQA%2BnaJjqPF7bTQB%2BydqSfGIVzxkynfxjx%2FVoYo%3D&amp;reserved=0
    >> > > > .
    >> > > > > >>>>
    >> > > > > >>>>
    >> > > > > >>>> XD
    >> > > > > >>>>
    >> > > > > >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
    >> > > > > Jarek.Potiuk@polidea.com>
    >> > > > > >>>> wrote:
    >> > > > > >>>>
    >> > > > > >>>>> Hello Erik,
    >> > > > > >>>>>
    >> > > > > >>>>> I understand your concern. It's a hard one to solve in
    >> general
    >> > > > (i.e.
    >> > > > > >>>>> dependency-hell). It looks like in this case you treat
    >> Airflow
    >> > as
    >> > > > > >>>>> 'library', where for some other people it might be more like
    >> > 'end
    >> > > > > >>>> product'.
    >> > > > > >>>>> If you look at the "pinning" philosophy - the "pin
    >> everything"
    >> > is
    >> > > > > good
    >> > > > > >>>> for
    >> > > > > >>>>> end products, but not good for libraries. In the case you
    >> have
    >> > > > > Airflow
    >> > > > > >>> is
    >> > > > > >>>>> treated as a bit of both. And it's perfectly valid case at
    >> that
    >> > > > (with
    >> > > > > >>>>> custom python DAGs being central concept for Airflow).
    >> > > > > >>>>> However, I think it's not as bad as you think when it comes
    >> to
    >> > > > exact
    >> > > > > >>>>> pinning.
    >> > > > > >>>>>
    >> > > > > >>>>> I believe - a bit counter-intuitively - that tools like
    >> > > > > >>> pip-tools/poetry
    >> > > > > >>>>> with exact pinning result in having your dependencies
    >> upgraded
    >> > > more
    >> > > > > >>>> often,
    >> > > > > >>>>> rather than less - especially in complex systems where
    >> > > > > dependency-hell
    >> > > > > >>>>> creeps-in. If you look at Airflow's setup.py now - It's a
    >> bit
    >> > > scary
    >> > > > > to
    >> > > > > >>>> make
    >> > > > > >>>>> any change to it. There is a chance it will blow at your
    >> face
    >> > if
    >> > > > you
    >> > > > > >>>> change
    >> > > > > >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you
    >> > > change
    >> > > > > it,
    >> > > > > >>>>> whether it will cause chain reaction of conflicts that will
    >> > ruin
    >> > > > your
    >> > > > > >>>> work
    >> > > > > >>>>> day.
    >> > > > > >>>>>
    >> > > > > >>>>> On the contrary - if you change it to exact pinning in
    >> > > > > >>>>> .lock/requirements.txt file (poetry/pip-tools) and have much
    >> > > > simpler
    >> > > > > >>> (and
    >> > > > > >>>>> commented) exclusion/avoidance rules in your .in/.tml file,
    >> the
    >> > > > whole
    >> > > > > >>>> setup
    >> > > > > >>>>> might be much easier to maintain and upgrade. Every time you
    >> > > > prepare
    >> > > > > >>> for
    >> > > > > >>>>> release (or even once in a while for master) one person
    >> might
    >> > > > > >>> consciously
    >> > > > > >>>>> attempt to upgrade all dependencies to latest ones. It
    >> should
    >> > be
    >> > > > > almost
    >> > > > > >>>> as
    >> > > > > >>>>> easy as letting poetry/pip-tools help with figuring out what
    >> > are
    >> > > > the
    >> > > > > >>>> latest
    >> > > > > >>>>> set of dependencies that will work without conflicts. It
    >> should
    >> > > be
    >> > > > > >>> rather
    >> > > > > >>>>> straightforward (I've done it in the past for fairly complex
    >> > > > > systems).
    >> > > > > >>>> What
    >> > > > > >>>>> those tools enable is - doing single-shot upgrade of all
    >> > > > > dependencies.
    >> > > > > >>>>> After doing it you can make sure that all tests work fine
    >> (and
    >> > > fix
    >> > > > > any
    >> > > > > >>>>> problems that result from it). And then you test it
    >> thoroughly
    >> > > > before
    >> > > > > >>> you
    >> > > > > >>>>> make final release. You can do it in separate PR - with
    >> > automated
    >> > > > > >>> testing
    >> > > > > >>>>> in Travis which means that you are not disturbing work of
    >> > others
    >> > > > > >>>>> (compilation/building + unit tests are guaranteed to work
    >> > before
    >> > > > you
    >> > > > > >>>> merge
    >> > > > > >>>>> it) while doing it. It's all conscious rather than
    >> accidental.
    >> > > Nice
    >> > > > > >>> side
    >> > > > > >>>>> effect of that is that with every release you can actually
    >> > > > "catch-up"
    >> > > > > >>>> with
    >> > > > > >>>>> latest stable versions of many libraries in one go. It's
    >> better
    >> > > > than
    >> > > > > >>>>> waiting until someone deliberately upgrades to newer version
    >> > (and
    >> > > > the
    >> > > > > >>>> rest
    >> > > > > >>>>> remain terribly out-dated as is the case for Airflow now).
    >> > > > > >>>>>
    >> > > > > >>>>> So a bit counterintuitively I think tools like
    >> pip-tools/poetry
    >> > > > help
    >> > > > > >>> you
    >> > > > > >>>> to
    >> > > > > >>>>> catch up faster in many cases. That is at least my
    >> experience
    >> > so
    >> > > > far.
    >> > > > > >>>>>
    >> > > > > >>>>> Additionally, Airflow is an open system - if you have very
    >> > > specific
    >> > > > > >>> needs
    >> > > > > >>>>> for requirements, you might actually - in the very same way
    >> > with
    >> > > > > >>>>> pip-tools/poetry - upgrade all your dependencies in your
    >> local
    >> > > fork
    >> > > > > of
    >> > > > > >>>>> Airflow before someone else does it in master/release. Those
    >> > > tools
    >> > > > > kind
    >> > > > > >>>> of
    >> > > > > >>>>> democratise dependency management. It should be as easy as
    >> > > > > `pip-compile
    >> > > > > >>>>> --upgrade` or `poetry update` and you will get all the
    >> > > > > >>> "non-conflicting"
    >> > > > > >>>>> latest dependencies in your local fork (and poetry
    >> especially
    >> > > seems
    >> > > > > to
    >> > > > > >>> do
    >> > > > > >>>>> all the heavy lifting of figuring out which versions will
    >> > work).
    >> > > > You
    >> > > > > >>>> should
    >> > > > > >>>>> be able to test and publish it locally as your private
    >> package
    >> > > for
    >> > > > > >>> local
    >> > > > > >>>>> installations. You can even mark the specific dependency you
    >> > want
    >> > > > to
    >> > > > > >>> use
    >> > > > > >>>>> specific version and let pip-tools/poetry figure out exact
    >> > > versions
    >> > > > > of
    >> > > > > >>>>> other requirements. You can even make a PR with such upgrade
    >> > > > > eventually
    >> > > > > >>>> to
    >> > > > > >>>>> get it faster in master. You can even downgrade in case
    >> newer
    >> > > > > >>> dependency
    >> > > > > >>>>> causes problems for you in similar way. Guided by the tools,
    >> > it's
    >> > > > > much
    >> > > > > >>>>> faster than figuring the versions out by yourself.
    >> > > > > >>>>>
    >> > > > > >>>>> As long as we have simple way of managing it and document
    >> how
    >> > to
    >> > > > > >>>>> upgrade/downgrade dependencies in your own fork, and mention
    >> > how
    >> > > to
    >> > > > > >>>> locally
    >> > > > > >>>>> release Airflow as a package, I think your case could be
    >> > covered
    >> > > > even
    >> > > > > >>>>> better than now. What do you think ?
    >> > > > > >>>>>
    >> > > > > >>>>> J.
    >> > > > > >>>>>
    >> > > > > >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
    >> > > > > >>>>> <EK...@novozymes.com.invalid> wrote:
    >> > > > > >>>>>
    >> > > > > >>>>>> For us, exact pinning of versions would be problematic. We
    >> > have
    >> > > > DAG
    >> > > > > >>>> code
    >> > > > > >>>>>> that shares direct and indirect dependencies with Airflow,
    >> > e.g.
    >> > > > > lxml,
    >> > > > > >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and
    >> ldap3.
    >> > > If
    >> > > > > our
    >> > > > > >>>> DAG
    >> > > > > >>>>>> code for some reason needs a newer point release due to a
    >> bug
    >> > > > that's
    >> > > > > >>>>> fixed,
    >> > > > > >>>>>> then we can't cleanly build a virtual environment
    >> containing
    >> > the
    >> > > > > >>> fixed
    >> > > > > >>>>>> version. For us, it's already a problem that Airflow has
    >> quite
    >> > > > > strict
    >> > > > > >>>>> (and
    >> > > > > >>>>>> sometimes old) requirements in setup.py.
    >> > > > > >>>>>>
    >> > > > > >>>>>> Erik
    >> > > > > >>>>>> ________________________________
    >> > > > > >>>>>> From: Jarek Potiuk <Ja...@polidea.com>
    >> > > > > >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
    >> > > > > >>>>>> To: dev@airflow.incubator.apache.org
    >> > > > > >>>>>> Subject: Re: Pinning dependencies for Apache Airflow
    >> > > > > >>>>>>
    >> > > > > >>>>>> I think one solution to release approach is to check as
    >> part
    >> > of
    >> > > > > >>>> automated
    >> > > > > >>>>>> Travis build if all requirements are pinned with == (even
    >> the
    >> > > deep
    >> > > > > >>>> ones)
    >> > > > > >>>>>> and fail the build in case they are not for ALL versions
    >> > > > (including
    >> > > > > >>>>>> dev). And of course we should document the approach of
    >> > > > > >>>> releases/upgrades
    >> > > > > >>>>>> etc. If we do it all the time for development versions
    >> (which
    >> > > > seems
    >> > > > > >>>> quite
    >> > > > > >>>>>> doable), then transitively all the releases will also have
    >> > > pinned
    >> > > > > >>>>> versions
    >> > > > > >>>>>> and they will never try to upgrade any of the
    >> dependencies. In
    >> > > > > poetry
    >> > > > > >>>>>> (similarly in pip-tools with .in file) it is done by
    >> having a
    >> > > > .lock
    >> > > > > >>>> file
    >> > > > > >>>>>> that specifies exact versions of each package so it can be
    >> > > rather
    >> > > > > >>> easy
    >> > > > > >>>> to
    >> > > > > >>>>>> manage (so it's worth trying it out I think  :D  - seems a
    >> bit
    >> > > > more
    >> > > > > >>>>>> friendly than pip-tools).
    >> > > > > >>>>>>
    >> > > > > >>>>>> There is a drawback - of course - with manually updating
    >> the
    >> > > > module
    >> > > > > >>>> that
    >> > > > > >>>>>> you want, but I really see that as an advantage rather than
    >> > > > drawback
    >> > > > > >>>>>> especially for users. This way you maintain the property
    >> that
    >> > it
    >> > > > > will
    >> > > > > >>>>>> always install and work the same way no matter if you
    >> > installed
    >> > > it
    >> > > > > >>>> today
    >> > > > > >>>>> or
    >> > > > > >>>>>> two months ago. I think the biggest drawback for
    >> maintainers
    >> > is
    >> > > > that
    >> > > > > >>>> you
    >> > > > > >>>>>> need some kind of monitoring of security vulnerabilities
    >> and
    >> > > > cannot
    >> > > > > >>>> rely
    >> > > > > >>>>> on
    >> > > > > >>>>>> automated security upgrades. With >= requirements those
    >> > security
    >> > > > > >>>> updates
    >> > > > > >>>>>> might happen automatically without anyone noticing, but to
    >> be
    >> > > > honest
    >> > > > > >>> I
    >> > > > > >>>>>> don't think such upgrades are guaranteed even in current
    >> setup
    >> > > for
    >> > > > > >>> all
    >> > > > > >>>>>> security issues for all libraries anyway.
    >> > > > > >>>>>>
    >> > > > > >>>>>> Finding the need to upgrade because of security issues can
    >> be
    >> > > > quite
    >> > > > > >>>>>> automated. Even now I noticed Github started to inform
    >> owners
    >> > > > about
    >> > > > > >>>>>> potential security vulnerabilities in used libraries for
    >> their
    >> > > > > >>> project.
    >> > > > > >>>>>> Those notifications can be sent to devlist and turned into
    >> > JIRA
    >> > > > > >>> issues
    >> > > > > >>>>>> followed bvy  minor security-related releases (with only
    >> few
    >> > > > library
    >> > > > > >>>>>> dependencies upgraded).
    >> > > > > >>>>>>
    >> > > > > >>>>>> I think it's even easier to automate it if you have pinned
    >> > > > > >>>> dependencies -
    >> > > > > >>>>>> because it's generally easy to find applicable
    >> vulnerabilities
    >> > > for
    >> > > > > >>>>> specific
    >> > > > > >>>>>> versions of libraries by static analysers - when you have
    >> >=,
    >> > > you
    >> > > > > >>> never
    >> > > > > >>>>>> know which version will be used until you actually perform
    >> the
    >> > > > > >>>>>> installation.
    >> > > > > >>>>>>
    >> > > > > >>>>>> There is one big advantage for maintainers for "pinned"
    >> case.
    >> > > Your
    >> > > > > >>>> users
    >> > > > > >>>>>> always have the same dependencies - so when issue is
    >> raised,
    >> > you
    >> > > > can
    >> > > > > >>>>>> reproduce it more easily. It's hard to know which version
    >> user
    >> > > has
    >> > > > > >>> (as
    >> > > > > >>>>> the
    >> > > > > >>>>>> user could install it month ago or yesterday) and even if
    >> you
    >> > > find
    >> > > > > >>> out
    >> > > > > >>>> by
    >> > > > > >>>>>> asking the user, you might not be able to reproduce the
    >> set of
    >> > > > > >>>>> requirements
    >> > > > > >>>>>> easily (simply because there are already newer versions of
    >> the
    >> > > > > >>>> libraries
    >> > > > > >>>>>> released and they are used automatically). You can ask the
    >> > user
    >> > > to
    >> > > > > >>> run
    >> > > > > >>>>> pip
    >> > > > > >>>>>> --upgrade but that's dangerous and pretty lame ("check the
    >> > > latest
    >> > > > > >>>>> version -
    >> > > > > >>>>>> maybe it fixes your problem ? ") and sometimes not possible
    >> > > (e.g.
    >> > > > > >>>> someone
    >> > > > > >>>>>> has pre-built docker image with dependencies from few
    >> months
    >> > ago
    >> > > > and
    >> > > > > >>>>> cannot
    >> > > > > >>>>>> rebuild the image easily).
    >> > > > > >>>>>>
    >> > > > > >>>>>> J.
    >> > > > > >>>>>>
    >> > > > > >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <
    >> > > ash@apache.org
    >> > > > >
    >> > > > > >>>>> wrote:
    >> > > > > >>>>>>
    >> > > > > >>>>>>> One thing to point out here.
    >> > > > > >>>>>>>
    >> > > > > >>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a
    >> > clean
    >> > > > > >>>>>>> environment it will fail.
    >> > > > > >>>>>>>
    >> > > > > >>>>>>> This is because we pin flask-login to 0.2.1 but
    >> > > flask-appbuilder
    >> > > > is
    >> > > > > >>>>> =
    >> > > > > >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires
    >> flask-login >=
    >> > > > 0.3.
    >> > > > > >>>>>>>
    >> > > > > >>>>>>> So I do think there is maybe something to be said about
    >> > pinning
    >> > > > for
    >> > > > > >>>>>>> releases. The down side to that is that if there are
    >> updates
    >> > > to a
    >> > > > > >>>>> module
    >> > > > > >>>>>>> that we want then we have to make a point release to let
    >> > people
    >> > > > get
    >> > > > > >>>> it
    >> > > > > >>>>>>>
    >> > > > > >>>>>>> Both methods have draw-backs
    >> > > > > >>>>>>>
    >> > > > > >>>>>>> -ash
    >> > > > > >>>>>>>
    >> > > > > >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
    >> > > > > >>> arthur.wiedmer@gmail.com>
    >> > > > > >>>>>>> wrote:
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>> Hi Jarek,
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>> I will +1 the discussion Dan is referring to and George's
    >> > > > advice.
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>> I just want to double check we are talking about pinning
    >> in
    >> > > > > >>>>>>>> requirements.txt only.
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>> This offers the ability to
    >> > > > > >>>>>>>> pip install -r requirements.txt
    >> > > > > >>>>>>>> pip install --no-deps airflow
    >> > > > > >>>>>>>> For a guaranteed install which works.
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>> Several different requirement files can be provided for
    >> > > specific
    >> > > > > >>>> use
    >> > > > > >>>>>>> cases,
    >> > > > > >>>>>>>> like a stable dev one for instance for people wanting to
    >> > work
    >> > > on
    >> > > > > >>>>>>> operators
    >> > > > > >>>>>>>> and non-core functions.
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>> However, I think we should proactively test in CI against
    >> > > > > >>> unpinned
    >> > > > > >>>>>>>> dependencies (though it might be a separate case in the
    >> > > matrix)
    >> > > > ,
    >> > > > > >>>> so
    >> > > > > >>>>>> that
    >> > > > > >>>>>>>> we get advance warning if possible that things will
    >> break.
    >> > > > > >>>>>>>> CI downtime is not a bad thing here, it actually caught a
    >> > > > problem
    >> > > > > >>>> :)
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>> We should unpin as possible in setup.py to only maintain
    >> > > minimum
    >> > > > > >>>>>> required
    >> > > > > >>>>>>>> compatibility. The process of pinning in setup.py is
    >> > extremely
    >> > > > > >>>>>>> detrimental
    >> > > > > >>>>>>>> when you have a large number of python libraries
    >> installed
    >> > > with
    >> > > > > >>>>>> different
    >> > > > > >>>>>>>> pinned versions.
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>> Best,
    >> > > > > >>>>>>>> Arthur
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
    >> > > > > >>>>>> <ddavydov@twitter.com.invalid
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>> wrote:
    >> > > > > >>>>>>>>
    >> > > > > >>>>>>>>> Relevant discussion about this:
    >> > > > > >>>>>>>>>
    >> > > > > >>>>>>>>>
    >> > > > > >>>>>>>
    >> > > > > >>>>>>
    >> > > > > >>>>>
    >> > > > > >>>>
    >> > > > > >>>
    >> > > > >
    >> > > >
    >> > >
    >> >
    >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=9wta3PcUeZjBg%2FmACBH06cNRzbYG4NcAW0XDJKan6cM%3D&amp;reserved=0
    >> > > > > >>>>>>>>>
    >> > > > > >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
    >> > > > > >>>>>> Jarek.Potiuk@polidea.com>
    >> > > > > >>>>>>>>> wrote:
    >> > > > > >>>>>>>>>
    >> > > > > >>>>>>>>>> TL;DR; A change is coming in the way how
    >> > > > > >>>> dependencies/requirements
    >> > > > > >>>>>> are
    >> > > > > >>>>>>>>>> specified for Apache Airflow - they will be fixed
    >> rather
    >> > > than
    >> > > > > >>>>>> flexible
    >> > > > > >>>>>>>>> (==
    >> > > > > >>>>>>>>>> rather than >=).
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>> This is follow up after Slack discussion we had with
    >> Ash
    >> > and
    >> > > > > >>>> Kaxil
    >> > > > > >>>>> -
    >> > > > > >>>>>>>>>> summarising what we propose we'll do.
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>> *Problem:*
    >> > > > > >>>>>>>>>> During last few weeks we experienced quite a few
    >> downtimes
    >> > > of
    >> > > > > >>>>>> TravisCI
    >> > > > > >>>>>>>>>> builds (for all PRs/branches including master) as some
    >> of
    >> > > the
    >> > > > > >>>>>>> transitive
    >> > > > > >>>>>>>>>> dependencies were automatically upgraded. This because
    >> in
    >> > a
    >> > > > > >>>> number
    >> > > > > >>>>> of
    >> > > > > >>>>>>>>>> dependencies we have  >= rather than == dependencies.
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>> Whenever there is a new release of such dependency, it
    >> > might
    >> > > > > >>>> cause
    >> > > > > >>>>>>> chain
    >> > > > > >>>>>>>>>> reaction with upgrade of transitive dependencies which
    >> > might
    >> > > > > >>> get
    >> > > > > >>>>> into
    >> > > > > >>>>>>>>>> conflict.
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>> An example was Flask-AppBuilder vs flask-login
    >> transitive
    >> > > > > >>>>> dependency
    >> > > > > >>>>>>> with
    >> > > > > >>>>>>>>>> click. They started to conflict once AppBuilder has
    >> > released
    >> > > > > >>>>> version
    >> > > > > >>>>>>>>>> 1.12.0.
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>> *Diagnosis:*
    >> > > > > >>>>>>>>>> Transitive dependencies with "flexible" versions
    >> (where >=
    >> > > is
    >> > > > > >>>> used
    >> > > > > >>>>>>>>> instead
    >> > > > > >>>>>>>>>> of ==) is a reason for "dependency hell". We will
    >> sooner
    >> > or
    >> > > > > >>> later
    >> > > > > >>>>> hit
    >> > > > > >>>>>>>>> other
    >> > > > > >>>>>>>>>> cases where not fixed dependencies cause similar
    >> problems
    >> > > with
    >> > > > > >>>>> other
    >> > > > > >>>>>>>>>> transitive dependencies. We need to fix-pin them. This
    >> > > causes
    >> > > > > >>>>>> problems
    >> > > > > >>>>>>>>> for
    >> > > > > >>>>>>>>>> both - released versions (cause they stop to work!) and
    >> > for
    >> > > > > >>>>>> development
    >> > > > > >>>>>>>>>> (cause they break master builds in TravisCI and prevent
    >> > > people
    >> > > > > >>>> from
    >> > > > > >>>>>>>>>> installing development environment from the scratch.
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>> *Solution:*
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>>  - Following the old-but-good post
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>
    >> > > > > >>>>>
    >> > > > > >>>>
    >> > > > > >>>
    >> > > > >
    >> > > >
    >> > >
    >> >
    >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=0jqlZcLU6%2BvO%2BJKSMlX7gyix6dKvD%2BZbrgHn9pRknLY%3D&amp;reserved=0
    >> > > > > >>>>>> we are going to fix the
    >> > > > > >>>>>>>>>> pinned
    >> > > > > >>>>>>>>>>  dependencies to specific versions (so basically all
    >> > > > > >>>> dependencies
    >> > > > > >>>>>> are
    >> > > > > >>>>>>>>>>  "fixed").
    >> > > > > >>>>>>>>>>  - We will introduce mechanism to be able to upgrade
    >> > > > > >>>> dependencies
    >> > > > > >>>>>> with
    >> > > > > >>>>>>>>>>  pip-tools (
    >> > > > > >>>>>>
    >> > > > > >>>>>
    >> > > > > >>>>
    >> > > > > >>>
    >> > > > >
    >> > > >
    >> > >
    >> >
    >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=hu%2FivDsKxwocNlVtBTgYE0E%2BET97u2DWN1IdnCF1ckU%3D&amp;reserved=0
    >> > > > > >>>>> ).
    >> > > > > >>>>>> We might also
    >> > > > > >>>>>>>>> take a
    >> > > > > >>>>>>>>>>  look at pipenv:
    >> > > > > >>>>>>
    >> > > > > >>>>>
    >> > > > > >>>>
    >> > > > > >>>
    >> > > > >
    >> > > >
    >> > >
    >> >
    >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=s0iqMPk3O8%2Bk1BCPBLYfIIMU2D4SdmPVEYELo%2FKS1%2FA%3D&amp;reserved=0
    >> > > > > >>>>>>>>>>  - People who would like to upgrade some dependencies
    >> for
    >> > > > > >>> their
    >> > > > > >>>>> PRs
    >> > > > > >>>>>>>>> will
    >> > > > > >>>>>>>>>>  still be able to do it - but such upgrades will be in
    >> > their
    >> > > > > >>> PR
    >> > > > > >>>>> thus
    >> > > > > >>>>>>>>> they
    >> > > > > >>>>>>>>>>  will go through TravisCI tests and they will also
    >> have to
    >> > > be
    >> > > > > >>>>>>> specified
    >> > > > > >>>>>>>>>> with
    >> > > > > >>>>>>>>>>  pinned fixed versions (==). This should be part of
    >> review
    >> > > > > >>>> process
    >> > > > > >>>>>> to
    >> > > > > >>>>>>>>>> make
    >> > > > > >>>>>>>>>>  sure new/changed requirements are pinned.
    >> > > > > >>>>>>>>>>  - In release process there will be a point where an
    >> > upgrade
    >> > > > > >>>> will
    >> > > > > >>>>> be
    >> > > > > >>>>>>>>>>  attempted for all requirements (using pip-tools) so
    >> that
    >> > we
    >> > > > > >>> are
    >> > > > > >>>>> not
    >> > > > > >>>>>>>>>> stuck
    >> > > > > >>>>>>>>>>  with older releases. This will be in controlled PR
    >> > > > > >>> environment
    >> > > > > >>>>>> where
    >> > > > > >>>>>>>>>> there
    >> > > > > >>>>>>>>>>  will be time to fix all dependencies without impacting
    >> > > others
    >> > > > > >>>> and
    >> > > > > >>>>>>>>> likely
    >> > > > > >>>>>>>>>>  enough time to "vet" such changes (this can be done
    >> for
    >> > > > > >>>>> alpha/beta
    >> > > > > >>>>>>>>>> releases
    >> > > > > >>>>>>>>>>  for example).
    >> > > > > >>>>>>>>>>  - As a side effect dependencies specification will
    >> become
    >> > > far
    >> > > > > >>>>>> simpler
    >> > > > > >>>>>>>>>>  and straightforward.
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>> Happy to hear community comments to the proposal. I am
    >> > happy
    >> > > > to
    >> > > > > >>>>> take
    >> > > > > >>>>>> a
    >> > > > > >>>>>>>>> lead
    >> > > > > >>>>>>>>>> on that, open JIRA issue and implement if this is
    >> > something
    >> > > > > >>>>> community
    >> > > > > >>>>>>> is
    >> > > > > >>>>>>>>>> happy with.
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>> J.
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>> --
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
    >> > > > > >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
    >> > <+48%20660%20796%20129>
    >> > > > <+48%20660%20796%20129>
    >> > > > > >>>>>>>>>>
    >> > > > > >>>>>>>>>
    >> > > > > >>>>>>>
    >> > > > > >>>>>>>
    >> > > > > >>>>>>
    >> > > > > >>>>>> --
    >> > > > > >>>>>>
    >> > > > > >>>>>> *Jarek Potiuk, Principal Software Engineer*
    >> > > > > >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
    >> > <+48%20660%20796%20129>
    >> > > > <+48%20660%20796%20129>
    >> > > > > >>>>>>
    >> > > > > >>>>>
    >> > > > > >>>>>
    >> > > > > >>>>> --
    >> > > > > >>>>>
    >> > > > > >>>>> *Jarek Potiuk, Principal Software Engineer*
    >> > > > > >>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
    >> > <+48%20660%20796%20129>
    >> > > > <+48%20660%20796%20129>
    >> > > > > >>>>>
    >> > > > > >>>>
    >> > > > > >>>
    >> > > > > >>>
    >> > > > > >>> --
    >> > > > > >>>
    >> > > > > >>> *Jarek Potiuk, Principal Software Engineer*
    >> > > > > >>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
    >> > <+48%20660%20796%20129>
    >> > > > <+48%20660%20796%20129>
    >> > > > > >>>
    >> > > > >
    >> > > > >
    >> > > >
    >> > >
    >> >
    >> >
    >> > --
    >> >
    >> > *Jarek Potiuk, Principal Software Engineer*
    >> > Mobile: +48 660 796 129 <+48%20660%20796%20129>
    >> >
    >>
    >
    >
    > --
    >
    > *Jarek Potiuk, Principal Software Engineer*
    > Mobile: +48 660 796 129
    >
    
    
    -- 
    
    *Jarek Potiuk, Principal Software Engineer*
    Mobile: +48 660 796 129
    


Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
Speaking of which - just to show what kind of problems we are talking about
- here is a link to a relevant discussion in troubleshooting @ slack from
today,  where someone tries to install v1.10-stable and needs help.
This is exactly the kind of problems I think are important to solve,
whatever way we choose to solve it:
https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1539573567000100

I really don't think it's a good idea to put especially new Airflow users
in this situation where they need to search through devlist and upstream
commits or ask for help to just be able to install stable release of
Airflow.

J.

On Mon, Oct 15, 2018 at 9:29 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Sorry for late reply - I was travelling, was at Cloud Next in London last
> week (BTW. there were talks about Composer/Airflow there).
>
> I see the point, it's indeed very difficult to solve when we want both:
> stability of releases and flexibility of using released version and write
> the code within it. I think some trade-offs need to be made as we won't
> solve it all with a one-size-fits-all approach. Answering your question
> George - the value of pinning for release purpose is addressing "stability"
> need.
>
>    - Due to my background I come from the "stability" side (which is more
>    user-focused) - i.e. the main problem that I want to solve is to make sure
>    that someone who wants to install airflow a fresh and start using it as a
>    beginner user, can always run 'pip install airflow' and it will get
>    installed. For me this is the point when many users my simply get put off
>    if it refuses to install out-of-the-box. Few months ago I actually
>    evaluated airflow to run ML pipeline for startup I was at that time. If
>    back then it refused to install out-of-the-box, my evaluation results would
>    be 'did not pass the basic criteria'. Luckily it did not happen, we did
>    more elaborated evaluation then - we did not use Airflow eventually but for
>    other reasons. For us the criteria "it just works!" was super important -
>    because we did not have time to deep dive into details, find out why things
>    do not work - we had a lot of "core/ML/robotics" things to worry about and
>    any hurdles with unstable tools would be a major distraction. We really
>    wanted to write several DAGs and get them executed in stable, repeatable
>    way, and that when we install it on production machine in two months - it
>    continues to work without any extra work.
>    - then there are a lot of concerns from the "flexibility" side (which
>    is more advanced users/developers) side. It becomes important when you want
>    to actively develop your Dags (you start using more than just built-in
>    operators and start developing lot more code in DAGs or use PythonOperator
>    more and more. Then of course it is important to get the "flexible"
>    approach. I argue that in this cases the "active" developers might be more
>    inclined to do any tweaking of their environment as they are more advanced
>    and might be more experience in the dependencies and would be able to
>    downgrade/upgrade dependencies as they will need in their virtualenvs.
>    Those people should be quite ok with spending a bit more time to get their
>    environment tweaked to their needs.
>
> I was thinking if there is a way to satisfy both ? And I have a wild idea:
>
>    - we have two set of requirements (easy-upgradeable "stable" ones in
>    requirements.txt/poetry and flexible with versions in setup.py (or similar)
>    - as proposed earlier in this thread
>    - we release two flavours of pip-installable airflow: 1.10.1 with
>    stable/pinned dependencies and 1.10.1-devel (we can pick other flavour
>    name) with flexible dependencies. It's quite common to have devel releases
>    in Linux world - they serve a bit different purpose (like include headers
>    for C/C++ programs) and it's usually extra package on top of the basic one,
>    but the basic idea is similar - if you are a user, you install 1.10.1, if
>    you are active developer, you install 1.10.1-devel
>
> What do you think?
>
> Off-topic a bit: a friend of mine pointed me to this excellent talk by Elm
> creator: "The Hard Parts of Open Source" by Evan Czaplicki
> <https://www.youtube.com/watch?v=o_4EX4dPppA> and it made me think
> differently about the discussion we have :D
>
> J.
>
> On Wed, Oct 10, 2018 at 7:51 PM George Leslie-Waksman <wa...@gmail.com>
> wrote:
>
>> It's not upgrading dependencies that I'm worried about, it's downgrading.
>> With upgrade conflicts, we can treat the dependency upgrades as a
>> necessary
>> aspect of the Airflow upgrade.
>>
>> Suppose Airflow pins LibraryA==1.2.3 and then a security issue is found in
>> LibraryA==1.2.3. This issue is fixed in LibraryA==1.2.4. Now, we are
>> placed
>> in the annoying situation of either: a) managing our deployments so that
>> we
>> install Airflow first, and then upgrade LibraryA and ignore pip's warning
>> about incompatible versions, b) keeping the insecure version of LibraryA,
>> c) waiting for another Airflow release and accepting all other changes, d)
>> maintaining our own fork of Airflow and diverging from mainline.
>>
>> If Airflow specifies a requirement of LibraryA>=1.2.3, there is no problem
>> whatsoever. If we're worried about API changes in the future, there's
>> always LibraryA>=1.2.3,1.3 or LibraryA>=1.2.3,<2.0
>>
>> As has been pointed out, that PythonOperator tasks run in the same venv as
>> Airflow, it is necessary that users be able to control dependencies for
>> their code.
>>
>> To be clear, it's not always a security risk but this is not a
>> hypothetical
>> issue. We ran into a code incompatibility with psutil that mattered to us
>> but had no impact on Airflow (see:
>> https://github.com/apache/incubator-airflow/pull/3585) and are currently
>> seeing SQLAlchemy held back without any clear need (
>> https://github.com/apache/incubator-airflow/blob/master/setup.py#L325).
>>
>> Pinning dependencies for releases will force us (and I expect others) to
>> either: ignore/workaround the pinning, or not use Airflow releases. Both
>> of
>> those options exactly defeat the point.
>>
>> If people are on board with pinning / locking all dependencies for CI
>> purposes, and we can constrain requirements to ranges for necessary
>> compatibility, what is the value of pinning all dependencies for release
>> purposes?
>>
>> --George
>>
>> On Tue, Oct 9, 2018 at 11:57 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>> > I am still not convinced that pinning is bad. I re-read again the whole
>> > mail thread and the thread from 2016
>> > <
>> >
>> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
>> > >
>> > to
>> > read all the arguments, but I stand by pinning.
>> >
>> > I am - of course - not sure about graduation argument. I would just
>> imagine
>> > it might be the cas.. I however really think that situation we are in
>> now
>> > is quite volatile. The latest 1.10.0 cannot be clean-installed via pip
>> > without manually tweaking and forcing lower version of flask-appbuilder.
>> > Even if you use the constraints file it's pretty cumbersome because
>> you'd
>> > have to somehow know that you need to do exactly that (not at all
>> obvious
>> > from the error you get). Also it might at any time get worse as other
>> > packages get newer versions released. The thing here is that
>> maintainers of
>> > flask-appbuilder did nothing wrong, they simply released new version
>> with
>> > click dependency version increased (probably for a good reason) and it's
>> > airflow's cross-dependency graph which makes it incompatible.
>> >
>> > I am afraid that if we don't change it, it's all but guaranteed that
>> every
>> > single release at some point of time will "deteriorate" and refuse to
>> > clean-install. If we want to solve this problem (maybe we don't and we
>> > accept it as it is?), I think the only way to solve it is to hard-pin
>> all
>> > the requirements at the very least for releases.
>> >
>> > Of course we might choose pinning only for releases (and CI builds) and
>> > have the compromise that Matt mentioned. I have the worry however (also
>> > mentioned in the previous thread) that it will be hard to maintain.
>> > Effectively you will have to maintain both in parallel. And the case
>> with
>> > constraints is a nice workaround for someone who actually need specific
>> > (even newer) version of specific package in their environment.
>> >
>> > Maybe we should simply give it a try and do Proof-Of-Concept/experiment
>> as
>> > also Fokko mentioned?
>> >
>> > We could have a PR with pinning enabled, and maybe ask the people who
>> voice
>> > concerns about environment give it a try with those pinned versions and
>> see
>> > if that makes it difficult for them to either upgrade dependencies and
>> fork
>> > apache-airflow or use constraints file of pip?
>> >
>> > J.
>> >
>> >
>> > On Tue, Oct 9, 2018 at 5:56 PM Matt Davis <ji...@gmail.com> wrote:
>> >
>> > > Erik, the Airflow task execution code itself of course must run
>> somewhere
>> > > with Airflow installed, but if the task is making a database query or
>> a
>> > web
>> > > request or running something in Docker there's separation between the
>> > > environments and maybe you don't care about Python dependencies at all
>> > > except to get Airflow running. When running Python operators that's
>> not
>> > the
>> > > case (as you already deal with).
>> > >
>> > > - Matt
>> > >
>> > > On Tue, Oct 9, 2018 at 2:45 AM EKC (Erik Cederstrand)
>> > > <EK...@novozymes.com.invalid> wrote:
>> > >
>> > > > This is maybe a stupid question, but is it even possible to run
>> tasks
>> > in
>> > > > an environment where Airflow is not installed?
>> > > >
>> > > >
>> > > > Kind regards,
>> > > >
>> > > > Erik
>> > > >
>> > > > ________________________________
>> > > > From: Matt Davis <ji...@gmail.com>
>> > > > Sent: Monday, October 8, 2018 10:13:34 PM
>> > > > To: dev@airflow.incubator.apache.org
>> > > > Subject: Re: Pinning dependencies for Apache Airflow
>> > > >
>> > > > It sounds like we can get the best of both worlds with the original
>> > > > proposals to have minimal requirements in setup.py and "guaranteed
>> to
>> > > work"
>> > > > complete requirements in a separate file. That way we have
>> flexibility
>> > > for
>> > > > teams that run airflow and tasks in the same environment and
>> guidance
>> > on
>> > > a
>> > > > working set of requirements. (Disclaimer: I work on the same team as
>> > > > George.)
>> > > >
>> > > > Thanks,
>> > > > Matt
>> > > >
>> > > > On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor <as...@apache.org>
>> > wrote:
>> > > >
>> > > > > Although I think I come down on the side against pinning, my
>> reasons
>> > > are
>> > > > > different.
>> > > > >
>> > > > > For the two (or more) people who have expressed concern about it
>> > would
>> > > > > pip's "Constraint Files" help:
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpip.pypa.io%2Fen%2Fstable%2Fuser_guide%2F%23constraints-files&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=rUqtgC5eVKIQGlzniFMyJpU9IXFZ2Efs04ZCgO2I%2F9g%3D&amp;reserved=0
>> > > > >
>> > > > > For example, you could add "flask-appbuilder==1.11.1" in to this
>> > file,
>> > > > > specify it with `pip install -c constraints.txt apache-airflow`
>> and
>> > > then
>> > > > > whenever pip attempted to install _any version of FAB it would use
>> > the
>> > > > > exact version from the constraints file.
>> > > > >
>> > > > > I don't buy the argument about pinning being a requirement for
>> > > graduation
>> > > > > from Incubation fwiw - it's an unavoidable artefact of the
>> > open-source
>> > > > > world we develop in.
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibraries.io%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=QX5hO%2FVPJE9M9A38QgCjx%2BfT4C1tfvr1ySUW%2FpV86Jw%3D&amp;reserved=0
>> > > > offers a (free?) service that will monitor apps
>> > > > > dependencies for being out of date, might be better than writing
>> our
>> > > own
>> > > > > solution.
>> > > > >
>> > > > > Pip has for a while now supported a way of saying "this dep is for
>> > > py2.7
>> > > > > only":
>> > > > >
>> > > > > > Since version 6.0, pip also supports specifiers containing
>> > > environment
>> > > > > markers like so:
>> > > > > >
>> > > > > >    SomeProject ==5.4 ; python_version < '2.7'
>> > > > > >    SomeProject; sys_platform == 'win32'
>> > > > >
>> > > > >
>> > > > > Ash
>> > > > >
>> > > > >
>> > > > > > On 8 Oct 2018, at 07:58, George Leslie-Waksman <
>> waksman@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > As a member of a team that will also have really big problems if
>> > > > > > Airflow pins all requirements (for reasons similar to those
>> already
>> > > > > > stated), I would like to add a very strong -1 to the idea of
>> > pinning
>> > > > > > them for all installations.
>> > > > > >
>> > > > > > In a number of situation on our end, to avoid similar problems
>> with
>> > > > > > CI, we use `pip-compile` from pip-tools (also mentioned):
>> > > > > >
>> > > >
>> > >
>> >
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fpip-tools%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1d9m%2Bk4NSuXNtnXFRFtv6pGdAUDvVvkoFe95pTshiIQ%3D&amp;reserved=0
>> > > > > >
>> > > > > > I would like to suggest, a middle ground of:
>> > > > > >
>> > > > > > - Have the installation continue to use unpinned (`>=`) with
>> > minimum
>> > > > > > necessary requirements set
>> > > > > > - Include a pip-compiled requirements file
>> (`requirements-ci.txt`?)
>> > > > > > that is used by CI
>> > > > > > - - If we need, there can be one file for each incompatible
>> python
>> > > > > version
>> > > > > > - Append a watermark (hash of `setup.py` requirements?) to the
>> > > > > > compiled requirements file
>> > > > > > - Add a CI check that the watermark and original match to
>> ensure no
>> > > > > > drift since last compile
>> > > > > >
>> > > > > > I am happy to do much of the work for this, if it can help avoid
>> > > > > > pinning all of the depends at the installation level.
>> > > > > >
>> > > > > > --George Leslie-Waksman
>> > > > > >
>> > > > > > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
>> > > > > > <ma...@gmail.com> wrote:
>> > > > > >>
>> > > > > >> pip-tools can definitely help here to ship a reference [locked]
>> > > > > >> `requirements.txt` that can be used in [all or part of] the CI.
>> > It's
>> > > > > >> actually kind of important to get CI to fail when a new
>> [backward
>> > > > > >> incompatible] lib comes out and break things while allowing
>> > version
>> > > > > ranges.
>> > > > > >>
>> > > > > >> I think there may be challenges around pip-tools and projects
>> that
>> > > run
>> > > > > in
>> > > > > >> both python2.7 and python3.6. You sometimes need to have 2
>> > > > > requirements.txt
>> > > > > >> lock files.
>> > > > > >>
>> > > > > >> Max
>> > > > > >>
>> > > > > >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <
>> > > Jarek.Potiuk@polidea.com
>> > > > >
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >>> It's a nice one :). However I think when/if we go to pinned
>> > > > > dependencies
>> > > > > >>> the way poetry/pip-tools do it, this will be suddenly lot-less
>> > > useful
>> > > > > It
>> > > > > >>> will be very easy to track dependency changes (they will be
>> > always
>> > > > > >>> committed as a change in the .lock file or requirements.txt)
>> and
>> > if
>> > > > > someone
>> > > > > >>> has a problem while upgrading a dependency (always
>> consciously,
>> > > never
>> > > > > >>> accidentally) it will simply fail during CI build and the
>> change
>> > > > won't
>> > > > > get
>> > > > > >>> merged/won't break the builds of others in the first place :).
>> > > > > >>>
>> > > > > >>> J.
>> > > > > >>>
>> > > > > >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <
>> > xd.deng.r@gmail.com>
>> > > > > wrote:
>> > > > > >>>
>> > > > > >>>> Hi folks,
>> > > > > >>>>
>> > > > > >>>> On top of this discussion, I was thinking we should have the
>> > > ability
>> > > > > to
>> > > > > >>>> quickly monitor dependency release as well. Previously, it
>> > > happened
>> > > > > for a
>> > > > > >>>> few times that CI kept failing for no reason and eventually
>> > turned
>> > > > > out it
>> > > > > >>>> was due to dependency release. But it took us some time,
>> > > sometimes a
>> > > > > few
>> > > > > >>>> days, to realise the failure was because of dependency
>> release.
>> > > > > >>>>
>> > > > > >>>> To partially address this, I tried to develop a mini tool to
>> > help
>> > > us
>> > > > > >>> check
>> > > > > >>>> the latest release of Python packages & the release
>> date-time on
>> > > > PyPi.
>> > > > > >>> So,
>> > > > > >>>> by comparing it with our CI failure history, we may be able
>> to
>> > > > > >>> troubleshoot
>> > > > > >>>> faster.
>> > > > > >>>>
>> > > > > >>>> Output Sample (ordered by upload time in desc order):
>> > > > > >>>>                               Latest Version          Upload
>> > Time
>> > > > > >>>> Package Name
>> > > > > >>>> awscli                    1.16.28
>> > > > > >>> 2018-10-05T23:12:45
>> > > > > >>>> botocore                1.12.18
>> > > > > 2018-10-05T23:12:39
>> > > > > >>>> promise                   2.2.1
>> > > > > >>> 2018-10-04T22:04:18
>> > > > > >>>> Keras                     2.2.4
>> > > > > >>> 2018-10-03T20:59:39
>> > > > > >>>> bleach                    3.0.0
>> > > > > >>> 2018-10-03T16:54:27
>> > > > > >>>> Flask-AppBuilder         1.12.0
>> > 2018-10-03T09:03:48
>> > > > > >>>> ... ...
>> > > > > >>>>
>> > > > > >>>> It's a minimal tool (not perfect yet but working). I have
>> hosted
>> > > > this
>> > > > > >>> tool
>> > > > > >>>> at
>> > > >
>> > >
>> >
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXD-DENG%2Fpypi-release-query&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=xk9hyQA%2BnaJjqPF7bTQB%2BydqSfGIVzxkynfxjx%2FVoYo%3D&amp;reserved=0
>> > > > .
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> XD
>> > > > > >>>>
>> > > > > >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
>> > > > > Jarek.Potiuk@polidea.com>
>> > > > > >>>> wrote:
>> > > > > >>>>
>> > > > > >>>>> Hello Erik,
>> > > > > >>>>>
>> > > > > >>>>> I understand your concern. It's a hard one to solve in
>> general
>> > > > (i.e.
>> > > > > >>>>> dependency-hell). It looks like in this case you treat
>> Airflow
>> > as
>> > > > > >>>>> 'library', where for some other people it might be more like
>> > 'end
>> > > > > >>>> product'.
>> > > > > >>>>> If you look at the "pinning" philosophy - the "pin
>> everything"
>> > is
>> > > > > good
>> > > > > >>>> for
>> > > > > >>>>> end products, but not good for libraries. In the case you
>> have
>> > > > > Airflow
>> > > > > >>> is
>> > > > > >>>>> treated as a bit of both. And it's perfectly valid case at
>> that
>> > > > (with
>> > > > > >>>>> custom python DAGs being central concept for Airflow).
>> > > > > >>>>> However, I think it's not as bad as you think when it comes
>> to
>> > > > exact
>> > > > > >>>>> pinning.
>> > > > > >>>>>
>> > > > > >>>>> I believe - a bit counter-intuitively - that tools like
>> > > > > >>> pip-tools/poetry
>> > > > > >>>>> with exact pinning result in having your dependencies
>> upgraded
>> > > more
>> > > > > >>>> often,
>> > > > > >>>>> rather than less - especially in complex systems where
>> > > > > dependency-hell
>> > > > > >>>>> creeps-in. If you look at Airflow's setup.py now - It's a
>> bit
>> > > scary
>> > > > > to
>> > > > > >>>> make
>> > > > > >>>>> any change to it. There is a chance it will blow at your
>> face
>> > if
>> > > > you
>> > > > > >>>> change
>> > > > > >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you
>> > > change
>> > > > > it,
>> > > > > >>>>> whether it will cause chain reaction of conflicts that will
>> > ruin
>> > > > your
>> > > > > >>>> work
>> > > > > >>>>> day.
>> > > > > >>>>>
>> > > > > >>>>> On the contrary - if you change it to exact pinning in
>> > > > > >>>>> .lock/requirements.txt file (poetry/pip-tools) and have much
>> > > > simpler
>> > > > > >>> (and
>> > > > > >>>>> commented) exclusion/avoidance rules in your .in/.tml file,
>> the
>> > > > whole
>> > > > > >>>> setup
>> > > > > >>>>> might be much easier to maintain and upgrade. Every time you
>> > > > prepare
>> > > > > >>> for
>> > > > > >>>>> release (or even once in a while for master) one person
>> might
>> > > > > >>> consciously
>> > > > > >>>>> attempt to upgrade all dependencies to latest ones. It
>> should
>> > be
>> > > > > almost
>> > > > > >>>> as
>> > > > > >>>>> easy as letting poetry/pip-tools help with figuring out what
>> > are
>> > > > the
>> > > > > >>>> latest
>> > > > > >>>>> set of dependencies that will work without conflicts. It
>> should
>> > > be
>> > > > > >>> rather
>> > > > > >>>>> straightforward (I've done it in the past for fairly complex
>> > > > > systems).
>> > > > > >>>> What
>> > > > > >>>>> those tools enable is - doing single-shot upgrade of all
>> > > > > dependencies.
>> > > > > >>>>> After doing it you can make sure that all tests work fine
>> (and
>> > > fix
>> > > > > any
>> > > > > >>>>> problems that result from it). And then you test it
>> thoroughly
>> > > > before
>> > > > > >>> you
>> > > > > >>>>> make final release. You can do it in separate PR - with
>> > automated
>> > > > > >>> testing
>> > > > > >>>>> in Travis which means that you are not disturbing work of
>> > others
>> > > > > >>>>> (compilation/building + unit tests are guaranteed to work
>> > before
>> > > > you
>> > > > > >>>> merge
>> > > > > >>>>> it) while doing it. It's all conscious rather than
>> accidental.
>> > > Nice
>> > > > > >>> side
>> > > > > >>>>> effect of that is that with every release you can actually
>> > > > "catch-up"
>> > > > > >>>> with
>> > > > > >>>>> latest stable versions of many libraries in one go. It's
>> better
>> > > > than
>> > > > > >>>>> waiting until someone deliberately upgrades to newer version
>> > (and
>> > > > the
>> > > > > >>>> rest
>> > > > > >>>>> remain terribly out-dated as is the case for Airflow now).
>> > > > > >>>>>
>> > > > > >>>>> So a bit counterintuitively I think tools like
>> pip-tools/poetry
>> > > > help
>> > > > > >>> you
>> > > > > >>>> to
>> > > > > >>>>> catch up faster in many cases. That is at least my
>> experience
>> > so
>> > > > far.
>> > > > > >>>>>
>> > > > > >>>>> Additionally, Airflow is an open system - if you have very
>> > > specific
>> > > > > >>> needs
>> > > > > >>>>> for requirements, you might actually - in the very same way
>> > with
>> > > > > >>>>> pip-tools/poetry - upgrade all your dependencies in your
>> local
>> > > fork
>> > > > > of
>> > > > > >>>>> Airflow before someone else does it in master/release. Those
>> > > tools
>> > > > > kind
>> > > > > >>>> of
>> > > > > >>>>> democratise dependency management. It should be as easy as
>> > > > > `pip-compile
>> > > > > >>>>> --upgrade` or `poetry update` and you will get all the
>> > > > > >>> "non-conflicting"
>> > > > > >>>>> latest dependencies in your local fork (and poetry
>> especially
>> > > seems
>> > > > > to
>> > > > > >>> do
>> > > > > >>>>> all the heavy lifting of figuring out which versions will
>> > work).
>> > > > You
>> > > > > >>>> should
>> > > > > >>>>> be able to test and publish it locally as your private
>> package
>> > > for
>> > > > > >>> local
>> > > > > >>>>> installations. You can even mark the specific dependency you
>> > want
>> > > > to
>> > > > > >>> use
>> > > > > >>>>> specific version and let pip-tools/poetry figure out exact
>> > > versions
>> > > > > of
>> > > > > >>>>> other requirements. You can even make a PR with such upgrade
>> > > > > eventually
>> > > > > >>>> to
>> > > > > >>>>> get it faster in master. You can even downgrade in case
>> newer
>> > > > > >>> dependency
>> > > > > >>>>> causes problems for you in similar way. Guided by the tools,
>> > it's
>> > > > > much
>> > > > > >>>>> faster than figuring the versions out by yourself.
>> > > > > >>>>>
>> > > > > >>>>> As long as we have simple way of managing it and document
>> how
>> > to
>> > > > > >>>>> upgrade/downgrade dependencies in your own fork, and mention
>> > how
>> > > to
>> > > > > >>>> locally
>> > > > > >>>>> release Airflow as a package, I think your case could be
>> > covered
>> > > > even
>> > > > > >>>>> better than now. What do you think ?
>> > > > > >>>>>
>> > > > > >>>>> J.
>> > > > > >>>>>
>> > > > > >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
>> > > > > >>>>> <EK...@novozymes.com.invalid> wrote:
>> > > > > >>>>>
>> > > > > >>>>>> For us, exact pinning of versions would be problematic. We
>> > have
>> > > > DAG
>> > > > > >>>> code
>> > > > > >>>>>> that shares direct and indirect dependencies with Airflow,
>> > e.g.
>> > > > > lxml,
>> > > > > >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and
>> ldap3.
>> > > If
>> > > > > our
>> > > > > >>>> DAG
>> > > > > >>>>>> code for some reason needs a newer point release due to a
>> bug
>> > > > that's
>> > > > > >>>>> fixed,
>> > > > > >>>>>> then we can't cleanly build a virtual environment
>> containing
>> > the
>> > > > > >>> fixed
>> > > > > >>>>>> version. For us, it's already a problem that Airflow has
>> quite
>> > > > > strict
>> > > > > >>>>> (and
>> > > > > >>>>>> sometimes old) requirements in setup.py.
>> > > > > >>>>>>
>> > > > > >>>>>> Erik
>> > > > > >>>>>> ________________________________
>> > > > > >>>>>> From: Jarek Potiuk <Ja...@polidea.com>
>> > > > > >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
>> > > > > >>>>>> To: dev@airflow.incubator.apache.org
>> > > > > >>>>>> Subject: Re: Pinning dependencies for Apache Airflow
>> > > > > >>>>>>
>> > > > > >>>>>> I think one solution to release approach is to check as
>> part
>> > of
>> > > > > >>>> automated
>> > > > > >>>>>> Travis build if all requirements are pinned with == (even
>> the
>> > > deep
>> > > > > >>>> ones)
>> > > > > >>>>>> and fail the build in case they are not for ALL versions
>> > > > (including
>> > > > > >>>>>> dev). And of course we should document the approach of
>> > > > > >>>> releases/upgrades
>> > > > > >>>>>> etc. If we do it all the time for development versions
>> (which
>> > > > seems
>> > > > > >>>> quite
>> > > > > >>>>>> doable), then transitively all the releases will also have
>> > > pinned
>> > > > > >>>>> versions
>> > > > > >>>>>> and they will never try to upgrade any of the
>> dependencies. In
>> > > > > poetry
>> > > > > >>>>>> (similarly in pip-tools with .in file) it is done by
>> having a
>> > > > .lock
>> > > > > >>>> file
>> > > > > >>>>>> that specifies exact versions of each package so it can be
>> > > rather
>> > > > > >>> easy
>> > > > > >>>> to
>> > > > > >>>>>> manage (so it's worth trying it out I think  :D  - seems a
>> bit
>> > > > more
>> > > > > >>>>>> friendly than pip-tools).
>> > > > > >>>>>>
>> > > > > >>>>>> There is a drawback - of course - with manually updating
>> the
>> > > > module
>> > > > > >>>> that
>> > > > > >>>>>> you want, but I really see that as an advantage rather than
>> > > > drawback
>> > > > > >>>>>> especially for users. This way you maintain the property
>> that
>> > it
>> > > > > will
>> > > > > >>>>>> always install and work the same way no matter if you
>> > installed
>> > > it
>> > > > > >>>> today
>> > > > > >>>>> or
>> > > > > >>>>>> two months ago. I think the biggest drawback for
>> maintainers
>> > is
>> > > > that
>> > > > > >>>> you
>> > > > > >>>>>> need some kind of monitoring of security vulnerabilities
>> and
>> > > > cannot
>> > > > > >>>> rely
>> > > > > >>>>> on
>> > > > > >>>>>> automated security upgrades. With >= requirements those
>> > security
>> > > > > >>>> updates
>> > > > > >>>>>> might happen automatically without anyone noticing, but to
>> be
>> > > > honest
>> > > > > >>> I
>> > > > > >>>>>> don't think such upgrades are guaranteed even in current
>> setup
>> > > for
>> > > > > >>> all
>> > > > > >>>>>> security issues for all libraries anyway.
>> > > > > >>>>>>
>> > > > > >>>>>> Finding the need to upgrade because of security issues can
>> be
>> > > > quite
>> > > > > >>>>>> automated. Even now I noticed Github started to inform
>> owners
>> > > > about
>> > > > > >>>>>> potential security vulnerabilities in used libraries for
>> their
>> > > > > >>> project.
>> > > > > >>>>>> Those notifications can be sent to devlist and turned into
>> > JIRA
>> > > > > >>> issues
>> > > > > >>>>>> followed bvy  minor security-related releases (with only
>> few
>> > > > library
>> > > > > >>>>>> dependencies upgraded).
>> > > > > >>>>>>
>> > > > > >>>>>> I think it's even easier to automate it if you have pinned
>> > > > > >>>> dependencies -
>> > > > > >>>>>> because it's generally easy to find applicable
>> vulnerabilities
>> > > for
>> > > > > >>>>> specific
>> > > > > >>>>>> versions of libraries by static analysers - when you have
>> >=,
>> > > you
>> > > > > >>> never
>> > > > > >>>>>> know which version will be used until you actually perform
>> the
>> > > > > >>>>>> installation.
>> > > > > >>>>>>
>> > > > > >>>>>> There is one big advantage for maintainers for "pinned"
>> case.
>> > > Your
>> > > > > >>>> users
>> > > > > >>>>>> always have the same dependencies - so when issue is
>> raised,
>> > you
>> > > > can
>> > > > > >>>>>> reproduce it more easily. It's hard to know which version
>> user
>> > > has
>> > > > > >>> (as
>> > > > > >>>>> the
>> > > > > >>>>>> user could install it month ago or yesterday) and even if
>> you
>> > > find
>> > > > > >>> out
>> > > > > >>>> by
>> > > > > >>>>>> asking the user, you might not be able to reproduce the
>> set of
>> > > > > >>>>> requirements
>> > > > > >>>>>> easily (simply because there are already newer versions of
>> the
>> > > > > >>>> libraries
>> > > > > >>>>>> released and they are used automatically). You can ask the
>> > user
>> > > to
>> > > > > >>> run
>> > > > > >>>>> pip
>> > > > > >>>>>> --upgrade but that's dangerous and pretty lame ("check the
>> > > latest
>> > > > > >>>>> version -
>> > > > > >>>>>> maybe it fixes your problem ? ") and sometimes not possible
>> > > (e.g.
>> > > > > >>>> someone
>> > > > > >>>>>> has pre-built docker image with dependencies from few
>> months
>> > ago
>> > > > and
>> > > > > >>>>> cannot
>> > > > > >>>>>> rebuild the image easily).
>> > > > > >>>>>>
>> > > > > >>>>>> J.
>> > > > > >>>>>>
>> > > > > >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <
>> > > ash@apache.org
>> > > > >
>> > > > > >>>>> wrote:
>> > > > > >>>>>>
>> > > > > >>>>>>> One thing to point out here.
>> > > > > >>>>>>>
>> > > > > >>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a
>> > clean
>> > > > > >>>>>>> environment it will fail.
>> > > > > >>>>>>>
>> > > > > >>>>>>> This is because we pin flask-login to 0.2.1 but
>> > > flask-appbuilder
>> > > > is
>> > > > > >>>>> =
>> > > > > >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires
>> flask-login >=
>> > > > 0.3.
>> > > > > >>>>>>>
>> > > > > >>>>>>> So I do think there is maybe something to be said about
>> > pinning
>> > > > for
>> > > > > >>>>>>> releases. The down side to that is that if there are
>> updates
>> > > to a
>> > > > > >>>>> module
>> > > > > >>>>>>> that we want then we have to make a point release to let
>> > people
>> > > > get
>> > > > > >>>> it
>> > > > > >>>>>>>
>> > > > > >>>>>>> Both methods have draw-backs
>> > > > > >>>>>>>
>> > > > > >>>>>>> -ash
>> > > > > >>>>>>>
>> > > > > >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
>> > > > > >>> arthur.wiedmer@gmail.com>
>> > > > > >>>>>>> wrote:
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> Hi Jarek,
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> I will +1 the discussion Dan is referring to and George's
>> > > > advice.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> I just want to double check we are talking about pinning
>> in
>> > > > > >>>>>>>> requirements.txt only.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> This offers the ability to
>> > > > > >>>>>>>> pip install -r requirements.txt
>> > > > > >>>>>>>> pip install --no-deps airflow
>> > > > > >>>>>>>> For a guaranteed install which works.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> Several different requirement files can be provided for
>> > > specific
>> > > > > >>>> use
>> > > > > >>>>>>> cases,
>> > > > > >>>>>>>> like a stable dev one for instance for people wanting to
>> > work
>> > > on
>> > > > > >>>>>>> operators
>> > > > > >>>>>>>> and non-core functions.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> However, I think we should proactively test in CI against
>> > > > > >>> unpinned
>> > > > > >>>>>>>> dependencies (though it might be a separate case in the
>> > > matrix)
>> > > > ,
>> > > > > >>>> so
>> > > > > >>>>>> that
>> > > > > >>>>>>>> we get advance warning if possible that things will
>> break.
>> > > > > >>>>>>>> CI downtime is not a bad thing here, it actually caught a
>> > > > problem
>> > > > > >>>> :)
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> We should unpin as possible in setup.py to only maintain
>> > > minimum
>> > > > > >>>>>> required
>> > > > > >>>>>>>> compatibility. The process of pinning in setup.py is
>> > extremely
>> > > > > >>>>>>> detrimental
>> > > > > >>>>>>>> when you have a large number of python libraries
>> installed
>> > > with
>> > > > > >>>>>> different
>> > > > > >>>>>>>> pinned versions.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> Best,
>> > > > > >>>>>>>> Arthur
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
>> > > > > >>>>>> <ddavydov@twitter.com.invalid
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> wrote:
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>> Relevant discussion about this:
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>>
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=9wta3PcUeZjBg%2FmACBH06cNRzbYG4NcAW0XDJKan6cM%3D&amp;reserved=0
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
>> > > > > >>>>>> Jarek.Potiuk@polidea.com>
>> > > > > >>>>>>>>> wrote:
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>> TL;DR; A change is coming in the way how
>> > > > > >>>> dependencies/requirements
>> > > > > >>>>>> are
>> > > > > >>>>>>>>>> specified for Apache Airflow - they will be fixed
>> rather
>> > > than
>> > > > > >>>>>> flexible
>> > > > > >>>>>>>>> (==
>> > > > > >>>>>>>>>> rather than >=).
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> This is follow up after Slack discussion we had with
>> Ash
>> > and
>> > > > > >>>> Kaxil
>> > > > > >>>>> -
>> > > > > >>>>>>>>>> summarising what we propose we'll do.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> *Problem:*
>> > > > > >>>>>>>>>> During last few weeks we experienced quite a few
>> downtimes
>> > > of
>> > > > > >>>>>> TravisCI
>> > > > > >>>>>>>>>> builds (for all PRs/branches including master) as some
>> of
>> > > the
>> > > > > >>>>>>> transitive
>> > > > > >>>>>>>>>> dependencies were automatically upgraded. This because
>> in
>> > a
>> > > > > >>>> number
>> > > > > >>>>> of
>> > > > > >>>>>>>>>> dependencies we have  >= rather than == dependencies.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> Whenever there is a new release of such dependency, it
>> > might
>> > > > > >>>> cause
>> > > > > >>>>>>> chain
>> > > > > >>>>>>>>>> reaction with upgrade of transitive dependencies which
>> > might
>> > > > > >>> get
>> > > > > >>>>> into
>> > > > > >>>>>>>>>> conflict.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> An example was Flask-AppBuilder vs flask-login
>> transitive
>> > > > > >>>>> dependency
>> > > > > >>>>>>> with
>> > > > > >>>>>>>>>> click. They started to conflict once AppBuilder has
>> > released
>> > > > > >>>>> version
>> > > > > >>>>>>>>>> 1.12.0.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> *Diagnosis:*
>> > > > > >>>>>>>>>> Transitive dependencies with "flexible" versions
>> (where >=
>> > > is
>> > > > > >>>> used
>> > > > > >>>>>>>>> instead
>> > > > > >>>>>>>>>> of ==) is a reason for "dependency hell". We will
>> sooner
>> > or
>> > > > > >>> later
>> > > > > >>>>> hit
>> > > > > >>>>>>>>> other
>> > > > > >>>>>>>>>> cases where not fixed dependencies cause similar
>> problems
>> > > with
>> > > > > >>>>> other
>> > > > > >>>>>>>>>> transitive dependencies. We need to fix-pin them. This
>> > > causes
>> > > > > >>>>>> problems
>> > > > > >>>>>>>>> for
>> > > > > >>>>>>>>>> both - released versions (cause they stop to work!) and
>> > for
>> > > > > >>>>>> development
>> > > > > >>>>>>>>>> (cause they break master builds in TravisCI and prevent
>> > > people
>> > > > > >>>> from
>> > > > > >>>>>>>>>> installing development environment from the scratch.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> *Solution:*
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>>  - Following the old-but-good post
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>>
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=0jqlZcLU6%2BvO%2BJKSMlX7gyix6dKvD%2BZbrgHn9pRknLY%3D&amp;reserved=0
>> > > > > >>>>>> we are going to fix the
>> > > > > >>>>>>>>>> pinned
>> > > > > >>>>>>>>>>  dependencies to specific versions (so basically all
>> > > > > >>>> dependencies
>> > > > > >>>>>> are
>> > > > > >>>>>>>>>>  "fixed").
>> > > > > >>>>>>>>>>  - We will introduce mechanism to be able to upgrade
>> > > > > >>>> dependencies
>> > > > > >>>>>> with
>> > > > > >>>>>>>>>>  pip-tools (
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>>
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=hu%2FivDsKxwocNlVtBTgYE0E%2BET97u2DWN1IdnCF1ckU%3D&amp;reserved=0
>> > > > > >>>>> ).
>> > > > > >>>>>> We might also
>> > > > > >>>>>>>>> take a
>> > > > > >>>>>>>>>>  look at pipenv:
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>>
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=s0iqMPk3O8%2Bk1BCPBLYfIIMU2D4SdmPVEYELo%2FKS1%2FA%3D&amp;reserved=0
>> > > > > >>>>>>>>>>  - People who would like to upgrade some dependencies
>> for
>> > > > > >>> their
>> > > > > >>>>> PRs
>> > > > > >>>>>>>>> will
>> > > > > >>>>>>>>>>  still be able to do it - but such upgrades will be in
>> > their
>> > > > > >>> PR
>> > > > > >>>>> thus
>> > > > > >>>>>>>>> they
>> > > > > >>>>>>>>>>  will go through TravisCI tests and they will also
>> have to
>> > > be
>> > > > > >>>>>>> specified
>> > > > > >>>>>>>>>> with
>> > > > > >>>>>>>>>>  pinned fixed versions (==). This should be part of
>> review
>> > > > > >>>> process
>> > > > > >>>>>> to
>> > > > > >>>>>>>>>> make
>> > > > > >>>>>>>>>>  sure new/changed requirements are pinned.
>> > > > > >>>>>>>>>>  - In release process there will be a point where an
>> > upgrade
>> > > > > >>>> will
>> > > > > >>>>> be
>> > > > > >>>>>>>>>>  attempted for all requirements (using pip-tools) so
>> that
>> > we
>> > > > > >>> are
>> > > > > >>>>> not
>> > > > > >>>>>>>>>> stuck
>> > > > > >>>>>>>>>>  with older releases. This will be in controlled PR
>> > > > > >>> environment
>> > > > > >>>>>> where
>> > > > > >>>>>>>>>> there
>> > > > > >>>>>>>>>>  will be time to fix all dependencies without impacting
>> > > others
>> > > > > >>>> and
>> > > > > >>>>>>>>> likely
>> > > > > >>>>>>>>>>  enough time to "vet" such changes (this can be done
>> for
>> > > > > >>>>> alpha/beta
>> > > > > >>>>>>>>>> releases
>> > > > > >>>>>>>>>>  for example).
>> > > > > >>>>>>>>>>  - As a side effect dependencies specification will
>> become
>> > > far
>> > > > > >>>>>> simpler
>> > > > > >>>>>>>>>>  and straightforward.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> Happy to hear community comments to the proposal. I am
>> > happy
>> > > > to
>> > > > > >>>>> take
>> > > > > >>>>>> a
>> > > > > >>>>>>>>> lead
>> > > > > >>>>>>>>>> on that, open JIRA issue and implement if this is
>> > something
>> > > > > >>>>> community
>> > > > > >>>>>>> is
>> > > > > >>>>>>>>>> happy with.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> J.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> --
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
>> > > > > >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
>> > <+48%20660%20796%20129>
>> > > > <+48%20660%20796%20129>
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>> --
>> > > > > >>>>>>
>> > > > > >>>>>> *Jarek Potiuk, Principal Software Engineer*
>> > > > > >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
>> > <+48%20660%20796%20129>
>> > > > <+48%20660%20796%20129>
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>>>
>> > > > > >>>>> --
>> > > > > >>>>>
>> > > > > >>>>> *Jarek Potiuk, Principal Software Engineer*
>> > > > > >>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
>> > <+48%20660%20796%20129>
>> > > > <+48%20660%20796%20129>
>> > > > > >>>>>
>> > > > > >>>>
>> > > > > >>>
>> > > > > >>>
>> > > > > >>> --
>> > > > > >>>
>> > > > > >>> *Jarek Potiuk, Principal Software Engineer*
>> > > > > >>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
>> > <+48%20660%20796%20129>
>> > > > <+48%20660%20796%20129>
>> > > > > >>>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> > --
>> >
>> > *Jarek Potiuk, Principal Software Engineer*
>> > Mobile: +48 660 796 129 <+48%20660%20796%20129>
>> >
>>
>
>
> --
>
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129
>


-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
Sorry for late reply - I was travelling, was at Cloud Next in London last
week (BTW. there were talks about Composer/Airflow there).

I see the point, it's indeed very difficult to solve when we want both:
stability of releases and flexibility of using released version and write
the code within it. I think some trade-offs need to be made as we won't
solve it all with a one-size-fits-all approach. Answering your question
George - the value of pinning for release purpose is addressing "stability"
need.

   - Due to my background I come from the "stability" side (which is more
   user-focused) - i.e. the main problem that I want to solve is to make sure
   that someone who wants to install airflow a fresh and start using it as a
   beginner user, can always run 'pip install airflow' and it will get
   installed. For me this is the point when many users my simply get put off
   if it refuses to install out-of-the-box. Few months ago I actually
   evaluated airflow to run ML pipeline for startup I was at that time. If
   back then it refused to install out-of-the-box, my evaluation results would
   be 'did not pass the basic criteria'. Luckily it did not happen, we did
   more elaborated evaluation then - we did not use Airflow eventually but for
   other reasons. For us the criteria "it just works!" was super important -
   because we did not have time to deep dive into details, find out why things
   do not work - we had a lot of "core/ML/robotics" things to worry about and
   any hurdles with unstable tools would be a major distraction. We really
   wanted to write several DAGs and get them executed in stable, repeatable
   way, and that when we install it on production machine in two months - it
   continues to work without any extra work.
   - then there are a lot of concerns from the "flexibility" side (which is
   more advanced users/developers) side. It becomes important when you want to
   actively develop your Dags (you start using more than just built-in
   operators and start developing lot more code in DAGs or use PythonOperator
   more and more. Then of course it is important to get the "flexible"
   approach. I argue that in this cases the "active" developers might be more
   inclined to do any tweaking of their environment as they are more advanced
   and might be more experience in the dependencies and would be able to
   downgrade/upgrade dependencies as they will need in their virtualenvs.
   Those people should be quite ok with spending a bit more time to get their
   environment tweaked to their needs.

I was thinking if there is a way to satisfy both ? And I have a wild idea:

   - we have two set of requirements (easy-upgradeable "stable" ones in
   requirements.txt/poetry and flexible with versions in setup.py (or similar)
   - as proposed earlier in this thread
   - we release two flavours of pip-installable airflow: 1.10.1 with
   stable/pinned dependencies and 1.10.1-devel (we can pick other flavour
   name) with flexible dependencies. It's quite common to have devel releases
   in Linux world - they serve a bit different purpose (like include headers
   for C/C++ programs) and it's usually extra package on top of the basic one,
   but the basic idea is similar - if you are a user, you install 1.10.1, if
   you are active developer, you install 1.10.1-devel

What do you think?

Off-topic a bit: a friend of mine pointed me to this excellent talk by Elm
creator: "The Hard Parts of Open Source" by Evan Czaplicki
<https://www.youtube.com/watch?v=o_4EX4dPppA> and it made me think
differently about the discussion we have :D

J.

On Wed, Oct 10, 2018 at 7:51 PM George Leslie-Waksman <wa...@gmail.com>
wrote:

> It's not upgrading dependencies that I'm worried about, it's downgrading.
> With upgrade conflicts, we can treat the dependency upgrades as a necessary
> aspect of the Airflow upgrade.
>
> Suppose Airflow pins LibraryA==1.2.3 and then a security issue is found in
> LibraryA==1.2.3. This issue is fixed in LibraryA==1.2.4. Now, we are placed
> in the annoying situation of either: a) managing our deployments so that we
> install Airflow first, and then upgrade LibraryA and ignore pip's warning
> about incompatible versions, b) keeping the insecure version of LibraryA,
> c) waiting for another Airflow release and accepting all other changes, d)
> maintaining our own fork of Airflow and diverging from mainline.
>
> If Airflow specifies a requirement of LibraryA>=1.2.3, there is no problem
> whatsoever. If we're worried about API changes in the future, there's
> always LibraryA>=1.2.3,1.3 or LibraryA>=1.2.3,<2.0
>
> As has been pointed out, that PythonOperator tasks run in the same venv as
> Airflow, it is necessary that users be able to control dependencies for
> their code.
>
> To be clear, it's not always a security risk but this is not a hypothetical
> issue. We ran into a code incompatibility with psutil that mattered to us
> but had no impact on Airflow (see:
> https://github.com/apache/incubator-airflow/pull/3585) and are currently
> seeing SQLAlchemy held back without any clear need (
> https://github.com/apache/incubator-airflow/blob/master/setup.py#L325).
>
> Pinning dependencies for releases will force us (and I expect others) to
> either: ignore/workaround the pinning, or not use Airflow releases. Both of
> those options exactly defeat the point.
>
> If people are on board with pinning / locking all dependencies for CI
> purposes, and we can constrain requirements to ranges for necessary
> compatibility, what is the value of pinning all dependencies for release
> purposes?
>
> --George
>
> On Tue, Oct 9, 2018 at 11:57 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > I am still not convinced that pinning is bad. I re-read again the whole
> > mail thread and the thread from 2016
> > <
> >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > >
> > to
> > read all the arguments, but I stand by pinning.
> >
> > I am - of course - not sure about graduation argument. I would just
> imagine
> > it might be the cas.. I however really think that situation we are in now
> > is quite volatile. The latest 1.10.0 cannot be clean-installed via pip
> > without manually tweaking and forcing lower version of flask-appbuilder.
> > Even if you use the constraints file it's pretty cumbersome because you'd
> > have to somehow know that you need to do exactly that (not at all obvious
> > from the error you get). Also it might at any time get worse as other
> > packages get newer versions released. The thing here is that maintainers
> of
> > flask-appbuilder did nothing wrong, they simply released new version with
> > click dependency version increased (probably for a good reason) and it's
> > airflow's cross-dependency graph which makes it incompatible.
> >
> > I am afraid that if we don't change it, it's all but guaranteed that
> every
> > single release at some point of time will "deteriorate" and refuse to
> > clean-install. If we want to solve this problem (maybe we don't and we
> > accept it as it is?), I think the only way to solve it is to hard-pin all
> > the requirements at the very least for releases.
> >
> > Of course we might choose pinning only for releases (and CI builds) and
> > have the compromise that Matt mentioned. I have the worry however (also
> > mentioned in the previous thread) that it will be hard to maintain.
> > Effectively you will have to maintain both in parallel. And the case with
> > constraints is a nice workaround for someone who actually need specific
> > (even newer) version of specific package in their environment.
> >
> > Maybe we should simply give it a try and do Proof-Of-Concept/experiment
> as
> > also Fokko mentioned?
> >
> > We could have a PR with pinning enabled, and maybe ask the people who
> voice
> > concerns about environment give it a try with those pinned versions and
> see
> > if that makes it difficult for them to either upgrade dependencies and
> fork
> > apache-airflow or use constraints file of pip?
> >
> > J.
> >
> >
> > On Tue, Oct 9, 2018 at 5:56 PM Matt Davis <ji...@gmail.com> wrote:
> >
> > > Erik, the Airflow task execution code itself of course must run
> somewhere
> > > with Airflow installed, but if the task is making a database query or a
> > web
> > > request or running something in Docker there's separation between the
> > > environments and maybe you don't care about Python dependencies at all
> > > except to get Airflow running. When running Python operators that's not
> > the
> > > case (as you already deal with).
> > >
> > > - Matt
> > >
> > > On Tue, Oct 9, 2018 at 2:45 AM EKC (Erik Cederstrand)
> > > <EK...@novozymes.com.invalid> wrote:
> > >
> > > > This is maybe a stupid question, but is it even possible to run tasks
> > in
> > > > an environment where Airflow is not installed?
> > > >
> > > >
> > > > Kind regards,
> > > >
> > > > Erik
> > > >
> > > > ________________________________
> > > > From: Matt Davis <ji...@gmail.com>
> > > > Sent: Monday, October 8, 2018 10:13:34 PM
> > > > To: dev@airflow.incubator.apache.org
> > > > Subject: Re: Pinning dependencies for Apache Airflow
> > > >
> > > > It sounds like we can get the best of both worlds with the original
> > > > proposals to have minimal requirements in setup.py and "guaranteed to
> > > work"
> > > > complete requirements in a separate file. That way we have
> flexibility
> > > for
> > > > teams that run airflow and tasks in the same environment and guidance
> > on
> > > a
> > > > working set of requirements. (Disclaimer: I work on the same team as
> > > > George.)
> > > >
> > > > Thanks,
> > > > Matt
> > > >
> > > > On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor <as...@apache.org>
> > wrote:
> > > >
> > > > > Although I think I come down on the side against pinning, my
> reasons
> > > are
> > > > > different.
> > > > >
> > > > > For the two (or more) people who have expressed concern about it
> > would
> > > > > pip's "Constraint Files" help:
> > > > >
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpip.pypa.io%2Fen%2Fstable%2Fuser_guide%2F%23constraints-files&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=rUqtgC5eVKIQGlzniFMyJpU9IXFZ2Efs04ZCgO2I%2F9g%3D&amp;reserved=0
> > > > >
> > > > > For example, you could add "flask-appbuilder==1.11.1" in to this
> > file,
> > > > > specify it with `pip install -c constraints.txt apache-airflow` and
> > > then
> > > > > whenever pip attempted to install _any version of FAB it would use
> > the
> > > > > exact version from the constraints file.
> > > > >
> > > > > I don't buy the argument about pinning being a requirement for
> > > graduation
> > > > > from Incubation fwiw - it's an unavoidable artefact of the
> > open-source
> > > > > world we develop in.
> > > > >
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibraries.io%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=QX5hO%2FVPJE9M9A38QgCjx%2BfT4C1tfvr1ySUW%2FpV86Jw%3D&amp;reserved=0
> > > > offers a (free?) service that will monitor apps
> > > > > dependencies for being out of date, might be better than writing
> our
> > > own
> > > > > solution.
> > > > >
> > > > > Pip has for a while now supported a way of saying "this dep is for
> > > py2.7
> > > > > only":
> > > > >
> > > > > > Since version 6.0, pip also supports specifiers containing
> > > environment
> > > > > markers like so:
> > > > > >
> > > > > >    SomeProject ==5.4 ; python_version < '2.7'
> > > > > >    SomeProject; sys_platform == 'win32'
> > > > >
> > > > >
> > > > > Ash
> > > > >
> > > > >
> > > > > > On 8 Oct 2018, at 07:58, George Leslie-Waksman <
> waksman@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > As a member of a team that will also have really big problems if
> > > > > > Airflow pins all requirements (for reasons similar to those
> already
> > > > > > stated), I would like to add a very strong -1 to the idea of
> > pinning
> > > > > > them for all installations.
> > > > > >
> > > > > > In a number of situation on our end, to avoid similar problems
> with
> > > > > > CI, we use `pip-compile` from pip-tools (also mentioned):
> > > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fpip-tools%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1d9m%2Bk4NSuXNtnXFRFtv6pGdAUDvVvkoFe95pTshiIQ%3D&amp;reserved=0
> > > > > >
> > > > > > I would like to suggest, a middle ground of:
> > > > > >
> > > > > > - Have the installation continue to use unpinned (`>=`) with
> > minimum
> > > > > > necessary requirements set
> > > > > > - Include a pip-compiled requirements file
> (`requirements-ci.txt`?)
> > > > > > that is used by CI
> > > > > > - - If we need, there can be one file for each incompatible
> python
> > > > > version
> > > > > > - Append a watermark (hash of `setup.py` requirements?) to the
> > > > > > compiled requirements file
> > > > > > - Add a CI check that the watermark and original match to ensure
> no
> > > > > > drift since last compile
> > > > > >
> > > > > > I am happy to do much of the work for this, if it can help avoid
> > > > > > pinning all of the depends at the installation level.
> > > > > >
> > > > > > --George Leslie-Waksman
> > > > > >
> > > > > > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
> > > > > > <ma...@gmail.com> wrote:
> > > > > >>
> > > > > >> pip-tools can definitely help here to ship a reference [locked]
> > > > > >> `requirements.txt` that can be used in [all or part of] the CI.
> > It's
> > > > > >> actually kind of important to get CI to fail when a new
> [backward
> > > > > >> incompatible] lib comes out and break things while allowing
> > version
> > > > > ranges.
> > > > > >>
> > > > > >> I think there may be challenges around pip-tools and projects
> that
> > > run
> > > > > in
> > > > > >> both python2.7 and python3.6. You sometimes need to have 2
> > > > > requirements.txt
> > > > > >> lock files.
> > > > > >>
> > > > > >> Max
> > > > > >>
> > > > > >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> It's a nice one :). However I think when/if we go to pinned
> > > > > dependencies
> > > > > >>> the way poetry/pip-tools do it, this will be suddenly lot-less
> > > useful
> > > > > It
> > > > > >>> will be very easy to track dependency changes (they will be
> > always
> > > > > >>> committed as a change in the .lock file or requirements.txt)
> and
> > if
> > > > > someone
> > > > > >>> has a problem while upgrading a dependency (always consciously,
> > > never
> > > > > >>> accidentally) it will simply fail during CI build and the
> change
> > > > won't
> > > > > get
> > > > > >>> merged/won't break the builds of others in the first place :).
> > > > > >>>
> > > > > >>> J.
> > > > > >>>
> > > > > >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <
> > xd.deng.r@gmail.com>
> > > > > wrote:
> > > > > >>>
> > > > > >>>> Hi folks,
> > > > > >>>>
> > > > > >>>> On top of this discussion, I was thinking we should have the
> > > ability
> > > > > to
> > > > > >>>> quickly monitor dependency release as well. Previously, it
> > > happened
> > > > > for a
> > > > > >>>> few times that CI kept failing for no reason and eventually
> > turned
> > > > > out it
> > > > > >>>> was due to dependency release. But it took us some time,
> > > sometimes a
> > > > > few
> > > > > >>>> days, to realise the failure was because of dependency
> release.
> > > > > >>>>
> > > > > >>>> To partially address this, I tried to develop a mini tool to
> > help
> > > us
> > > > > >>> check
> > > > > >>>> the latest release of Python packages & the release date-time
> on
> > > > PyPi.
> > > > > >>> So,
> > > > > >>>> by comparing it with our CI failure history, we may be able to
> > > > > >>> troubleshoot
> > > > > >>>> faster.
> > > > > >>>>
> > > > > >>>> Output Sample (ordered by upload time in desc order):
> > > > > >>>>                               Latest Version          Upload
> > Time
> > > > > >>>> Package Name
> > > > > >>>> awscli                    1.16.28
> > > > > >>> 2018-10-05T23:12:45
> > > > > >>>> botocore                1.12.18
> > > > > 2018-10-05T23:12:39
> > > > > >>>> promise                   2.2.1
> > > > > >>> 2018-10-04T22:04:18
> > > > > >>>> Keras                     2.2.4
> > > > > >>> 2018-10-03T20:59:39
> > > > > >>>> bleach                    3.0.0
> > > > > >>> 2018-10-03T16:54:27
> > > > > >>>> Flask-AppBuilder         1.12.0
> > 2018-10-03T09:03:48
> > > > > >>>> ... ...
> > > > > >>>>
> > > > > >>>> It's a minimal tool (not perfect yet but working). I have
> hosted
> > > > this
> > > > > >>> tool
> > > > > >>>> at
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXD-DENG%2Fpypi-release-query&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=xk9hyQA%2BnaJjqPF7bTQB%2BydqSfGIVzxkynfxjx%2FVoYo%3D&amp;reserved=0
> > > > .
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> XD
> > > > > >>>>
> > > > > >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
> > > > > Jarek.Potiuk@polidea.com>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> Hello Erik,
> > > > > >>>>>
> > > > > >>>>> I understand your concern. It's a hard one to solve in
> general
> > > > (i.e.
> > > > > >>>>> dependency-hell). It looks like in this case you treat
> Airflow
> > as
> > > > > >>>>> 'library', where for some other people it might be more like
> > 'end
> > > > > >>>> product'.
> > > > > >>>>> If you look at the "pinning" philosophy - the "pin
> everything"
> > is
> > > > > good
> > > > > >>>> for
> > > > > >>>>> end products, but not good for libraries. In the case you
> have
> > > > > Airflow
> > > > > >>> is
> > > > > >>>>> treated as a bit of both. And it's perfectly valid case at
> that
> > > > (with
> > > > > >>>>> custom python DAGs being central concept for Airflow).
> > > > > >>>>> However, I think it's not as bad as you think when it comes
> to
> > > > exact
> > > > > >>>>> pinning.
> > > > > >>>>>
> > > > > >>>>> I believe - a bit counter-intuitively - that tools like
> > > > > >>> pip-tools/poetry
> > > > > >>>>> with exact pinning result in having your dependencies
> upgraded
> > > more
> > > > > >>>> often,
> > > > > >>>>> rather than less - especially in complex systems where
> > > > > dependency-hell
> > > > > >>>>> creeps-in. If you look at Airflow's setup.py now - It's a bit
> > > scary
> > > > > to
> > > > > >>>> make
> > > > > >>>>> any change to it. There is a chance it will blow at your face
> > if
> > > > you
> > > > > >>>> change
> > > > > >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you
> > > change
> > > > > it,
> > > > > >>>>> whether it will cause chain reaction of conflicts that will
> > ruin
> > > > your
> > > > > >>>> work
> > > > > >>>>> day.
> > > > > >>>>>
> > > > > >>>>> On the contrary - if you change it to exact pinning in
> > > > > >>>>> .lock/requirements.txt file (poetry/pip-tools) and have much
> > > > simpler
> > > > > >>> (and
> > > > > >>>>> commented) exclusion/avoidance rules in your .in/.tml file,
> the
> > > > whole
> > > > > >>>> setup
> > > > > >>>>> might be much easier to maintain and upgrade. Every time you
> > > > prepare
> > > > > >>> for
> > > > > >>>>> release (or even once in a while for master) one person might
> > > > > >>> consciously
> > > > > >>>>> attempt to upgrade all dependencies to latest ones. It should
> > be
> > > > > almost
> > > > > >>>> as
> > > > > >>>>> easy as letting poetry/pip-tools help with figuring out what
> > are
> > > > the
> > > > > >>>> latest
> > > > > >>>>> set of dependencies that will work without conflicts. It
> should
> > > be
> > > > > >>> rather
> > > > > >>>>> straightforward (I've done it in the past for fairly complex
> > > > > systems).
> > > > > >>>> What
> > > > > >>>>> those tools enable is - doing single-shot upgrade of all
> > > > > dependencies.
> > > > > >>>>> After doing it you can make sure that all tests work fine
> (and
> > > fix
> > > > > any
> > > > > >>>>> problems that result from it). And then you test it
> thoroughly
> > > > before
> > > > > >>> you
> > > > > >>>>> make final release. You can do it in separate PR - with
> > automated
> > > > > >>> testing
> > > > > >>>>> in Travis which means that you are not disturbing work of
> > others
> > > > > >>>>> (compilation/building + unit tests are guaranteed to work
> > before
> > > > you
> > > > > >>>> merge
> > > > > >>>>> it) while doing it. It's all conscious rather than
> accidental.
> > > Nice
> > > > > >>> side
> > > > > >>>>> effect of that is that with every release you can actually
> > > > "catch-up"
> > > > > >>>> with
> > > > > >>>>> latest stable versions of many libraries in one go. It's
> better
> > > > than
> > > > > >>>>> waiting until someone deliberately upgrades to newer version
> > (and
> > > > the
> > > > > >>>> rest
> > > > > >>>>> remain terribly out-dated as is the case for Airflow now).
> > > > > >>>>>
> > > > > >>>>> So a bit counterintuitively I think tools like
> pip-tools/poetry
> > > > help
> > > > > >>> you
> > > > > >>>> to
> > > > > >>>>> catch up faster in many cases. That is at least my experience
> > so
> > > > far.
> > > > > >>>>>
> > > > > >>>>> Additionally, Airflow is an open system - if you have very
> > > specific
> > > > > >>> needs
> > > > > >>>>> for requirements, you might actually - in the very same way
> > with
> > > > > >>>>> pip-tools/poetry - upgrade all your dependencies in your
> local
> > > fork
> > > > > of
> > > > > >>>>> Airflow before someone else does it in master/release. Those
> > > tools
> > > > > kind
> > > > > >>>> of
> > > > > >>>>> democratise dependency management. It should be as easy as
> > > > > `pip-compile
> > > > > >>>>> --upgrade` or `poetry update` and you will get all the
> > > > > >>> "non-conflicting"
> > > > > >>>>> latest dependencies in your local fork (and poetry especially
> > > seems
> > > > > to
> > > > > >>> do
> > > > > >>>>> all the heavy lifting of figuring out which versions will
> > work).
> > > > You
> > > > > >>>> should
> > > > > >>>>> be able to test and publish it locally as your private
> package
> > > for
> > > > > >>> local
> > > > > >>>>> installations. You can even mark the specific dependency you
> > want
> > > > to
> > > > > >>> use
> > > > > >>>>> specific version and let pip-tools/poetry figure out exact
> > > versions
> > > > > of
> > > > > >>>>> other requirements. You can even make a PR with such upgrade
> > > > > eventually
> > > > > >>>> to
> > > > > >>>>> get it faster in master. You can even downgrade in case newer
> > > > > >>> dependency
> > > > > >>>>> causes problems for you in similar way. Guided by the tools,
> > it's
> > > > > much
> > > > > >>>>> faster than figuring the versions out by yourself.
> > > > > >>>>>
> > > > > >>>>> As long as we have simple way of managing it and document how
> > to
> > > > > >>>>> upgrade/downgrade dependencies in your own fork, and mention
> > how
> > > to
> > > > > >>>> locally
> > > > > >>>>> release Airflow as a package, I think your case could be
> > covered
> > > > even
> > > > > >>>>> better than now. What do you think ?
> > > > > >>>>>
> > > > > >>>>> J.
> > > > > >>>>>
> > > > > >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> > > > > >>>>> <EK...@novozymes.com.invalid> wrote:
> > > > > >>>>>
> > > > > >>>>>> For us, exact pinning of versions would be problematic. We
> > have
> > > > DAG
> > > > > >>>> code
> > > > > >>>>>> that shares direct and indirect dependencies with Airflow,
> > e.g.
> > > > > lxml,
> > > > > >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and
> ldap3.
> > > If
> > > > > our
> > > > > >>>> DAG
> > > > > >>>>>> code for some reason needs a newer point release due to a
> bug
> > > > that's
> > > > > >>>>> fixed,
> > > > > >>>>>> then we can't cleanly build a virtual environment containing
> > the
> > > > > >>> fixed
> > > > > >>>>>> version. For us, it's already a problem that Airflow has
> quite
> > > > > strict
> > > > > >>>>> (and
> > > > > >>>>>> sometimes old) requirements in setup.py.
> > > > > >>>>>>
> > > > > >>>>>> Erik
> > > > > >>>>>> ________________________________
> > > > > >>>>>> From: Jarek Potiuk <Ja...@polidea.com>
> > > > > >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
> > > > > >>>>>> To: dev@airflow.incubator.apache.org
> > > > > >>>>>> Subject: Re: Pinning dependencies for Apache Airflow
> > > > > >>>>>>
> > > > > >>>>>> I think one solution to release approach is to check as part
> > of
> > > > > >>>> automated
> > > > > >>>>>> Travis build if all requirements are pinned with == (even
> the
> > > deep
> > > > > >>>> ones)
> > > > > >>>>>> and fail the build in case they are not for ALL versions
> > > > (including
> > > > > >>>>>> dev). And of course we should document the approach of
> > > > > >>>> releases/upgrades
> > > > > >>>>>> etc. If we do it all the time for development versions
> (which
> > > > seems
> > > > > >>>> quite
> > > > > >>>>>> doable), then transitively all the releases will also have
> > > pinned
> > > > > >>>>> versions
> > > > > >>>>>> and they will never try to upgrade any of the dependencies.
> In
> > > > > poetry
> > > > > >>>>>> (similarly in pip-tools with .in file) it is done by having
> a
> > > > .lock
> > > > > >>>> file
> > > > > >>>>>> that specifies exact versions of each package so it can be
> > > rather
> > > > > >>> easy
> > > > > >>>> to
> > > > > >>>>>> manage (so it's worth trying it out I think  :D  - seems a
> bit
> > > > more
> > > > > >>>>>> friendly than pip-tools).
> > > > > >>>>>>
> > > > > >>>>>> There is a drawback - of course - with manually updating the
> > > > module
> > > > > >>>> that
> > > > > >>>>>> you want, but I really see that as an advantage rather than
> > > > drawback
> > > > > >>>>>> especially for users. This way you maintain the property
> that
> > it
> > > > > will
> > > > > >>>>>> always install and work the same way no matter if you
> > installed
> > > it
> > > > > >>>> today
> > > > > >>>>> or
> > > > > >>>>>> two months ago. I think the biggest drawback for maintainers
> > is
> > > > that
> > > > > >>>> you
> > > > > >>>>>> need some kind of monitoring of security vulnerabilities and
> > > > cannot
> > > > > >>>> rely
> > > > > >>>>> on
> > > > > >>>>>> automated security upgrades. With >= requirements those
> > security
> > > > > >>>> updates
> > > > > >>>>>> might happen automatically without anyone noticing, but to
> be
> > > > honest
> > > > > >>> I
> > > > > >>>>>> don't think such upgrades are guaranteed even in current
> setup
> > > for
> > > > > >>> all
> > > > > >>>>>> security issues for all libraries anyway.
> > > > > >>>>>>
> > > > > >>>>>> Finding the need to upgrade because of security issues can
> be
> > > > quite
> > > > > >>>>>> automated. Even now I noticed Github started to inform
> owners
> > > > about
> > > > > >>>>>> potential security vulnerabilities in used libraries for
> their
> > > > > >>> project.
> > > > > >>>>>> Those notifications can be sent to devlist and turned into
> > JIRA
> > > > > >>> issues
> > > > > >>>>>> followed bvy  minor security-related releases (with only few
> > > > library
> > > > > >>>>>> dependencies upgraded).
> > > > > >>>>>>
> > > > > >>>>>> I think it's even easier to automate it if you have pinned
> > > > > >>>> dependencies -
> > > > > >>>>>> because it's generally easy to find applicable
> vulnerabilities
> > > for
> > > > > >>>>> specific
> > > > > >>>>>> versions of libraries by static analysers - when you have
> >=,
> > > you
> > > > > >>> never
> > > > > >>>>>> know which version will be used until you actually perform
> the
> > > > > >>>>>> installation.
> > > > > >>>>>>
> > > > > >>>>>> There is one big advantage for maintainers for "pinned"
> case.
> > > Your
> > > > > >>>> users
> > > > > >>>>>> always have the same dependencies - so when issue is raised,
> > you
> > > > can
> > > > > >>>>>> reproduce it more easily. It's hard to know which version
> user
> > > has
> > > > > >>> (as
> > > > > >>>>> the
> > > > > >>>>>> user could install it month ago or yesterday) and even if
> you
> > > find
> > > > > >>> out
> > > > > >>>> by
> > > > > >>>>>> asking the user, you might not be able to reproduce the set
> of
> > > > > >>>>> requirements
> > > > > >>>>>> easily (simply because there are already newer versions of
> the
> > > > > >>>> libraries
> > > > > >>>>>> released and they are used automatically). You can ask the
> > user
> > > to
> > > > > >>> run
> > > > > >>>>> pip
> > > > > >>>>>> --upgrade but that's dangerous and pretty lame ("check the
> > > latest
> > > > > >>>>> version -
> > > > > >>>>>> maybe it fixes your problem ? ") and sometimes not possible
> > > (e.g.
> > > > > >>>> someone
> > > > > >>>>>> has pre-built docker image with dependencies from few months
> > ago
> > > > and
> > > > > >>>>> cannot
> > > > > >>>>>> rebuild the image easily).
> > > > > >>>>>>
> > > > > >>>>>> J.
> > > > > >>>>>>
> > > > > >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <
> > > ash@apache.org
> > > > >
> > > > > >>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> One thing to point out here.
> > > > > >>>>>>>
> > > > > >>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a
> > clean
> > > > > >>>>>>> environment it will fail.
> > > > > >>>>>>>
> > > > > >>>>>>> This is because we pin flask-login to 0.2.1 but
> > > flask-appbuilder
> > > > is
> > > > > >>>>> =
> > > > > >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires flask-login
> >=
> > > > 0.3.
> > > > > >>>>>>>
> > > > > >>>>>>> So I do think there is maybe something to be said about
> > pinning
> > > > for
> > > > > >>>>>>> releases. The down side to that is that if there are
> updates
> > > to a
> > > > > >>>>> module
> > > > > >>>>>>> that we want then we have to make a point release to let
> > people
> > > > get
> > > > > >>>> it
> > > > > >>>>>>>
> > > > > >>>>>>> Both methods have draw-backs
> > > > > >>>>>>>
> > > > > >>>>>>> -ash
> > > > > >>>>>>>
> > > > > >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> > > > > >>> arthur.wiedmer@gmail.com>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>> Hi Jarek,
> > > > > >>>>>>>>
> > > > > >>>>>>>> I will +1 the discussion Dan is referring to and George's
> > > > advice.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I just want to double check we are talking about pinning
> in
> > > > > >>>>>>>> requirements.txt only.
> > > > > >>>>>>>>
> > > > > >>>>>>>> This offers the ability to
> > > > > >>>>>>>> pip install -r requirements.txt
> > > > > >>>>>>>> pip install --no-deps airflow
> > > > > >>>>>>>> For a guaranteed install which works.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Several different requirement files can be provided for
> > > specific
> > > > > >>>> use
> > > > > >>>>>>> cases,
> > > > > >>>>>>>> like a stable dev one for instance for people wanting to
> > work
> > > on
> > > > > >>>>>>> operators
> > > > > >>>>>>>> and non-core functions.
> > > > > >>>>>>>>
> > > > > >>>>>>>> However, I think we should proactively test in CI against
> > > > > >>> unpinned
> > > > > >>>>>>>> dependencies (though it might be a separate case in the
> > > matrix)
> > > > ,
> > > > > >>>> so
> > > > > >>>>>> that
> > > > > >>>>>>>> we get advance warning if possible that things will break.
> > > > > >>>>>>>> CI downtime is not a bad thing here, it actually caught a
> > > > problem
> > > > > >>>> :)
> > > > > >>>>>>>>
> > > > > >>>>>>>> We should unpin as possible in setup.py to only maintain
> > > minimum
> > > > > >>>>>> required
> > > > > >>>>>>>> compatibility. The process of pinning in setup.py is
> > extremely
> > > > > >>>>>>> detrimental
> > > > > >>>>>>>> when you have a large number of python libraries installed
> > > with
> > > > > >>>>>> different
> > > > > >>>>>>>> pinned versions.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Best,
> > > > > >>>>>>>> Arthur
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > > > >>>>>> <ddavydov@twitter.com.invalid
> > > > > >>>>>>>>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Relevant discussion about this:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=9wta3PcUeZjBg%2FmACBH06cNRzbYG4NcAW0XDJKan6cM%3D&amp;reserved=0
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > > > >>>>>> Jarek.Potiuk@polidea.com>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> TL;DR; A change is coming in the way how
> > > > > >>>> dependencies/requirements
> > > > > >>>>>> are
> > > > > >>>>>>>>>> specified for Apache Airflow - they will be fixed rather
> > > than
> > > > > >>>>>> flexible
> > > > > >>>>>>>>> (==
> > > > > >>>>>>>>>> rather than >=).
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> This is follow up after Slack discussion we had with Ash
> > and
> > > > > >>>> Kaxil
> > > > > >>>>> -
> > > > > >>>>>>>>>> summarising what we propose we'll do.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> *Problem:*
> > > > > >>>>>>>>>> During last few weeks we experienced quite a few
> downtimes
> > > of
> > > > > >>>>>> TravisCI
> > > > > >>>>>>>>>> builds (for all PRs/branches including master) as some
> of
> > > the
> > > > > >>>>>>> transitive
> > > > > >>>>>>>>>> dependencies were automatically upgraded. This because
> in
> > a
> > > > > >>>> number
> > > > > >>>>> of
> > > > > >>>>>>>>>> dependencies we have  >= rather than == dependencies.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Whenever there is a new release of such dependency, it
> > might
> > > > > >>>> cause
> > > > > >>>>>>> chain
> > > > > >>>>>>>>>> reaction with upgrade of transitive dependencies which
> > might
> > > > > >>> get
> > > > > >>>>> into
> > > > > >>>>>>>>>> conflict.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> An example was Flask-AppBuilder vs flask-login
> transitive
> > > > > >>>>> dependency
> > > > > >>>>>>> with
> > > > > >>>>>>>>>> click. They started to conflict once AppBuilder has
> > released
> > > > > >>>>> version
> > > > > >>>>>>>>>> 1.12.0.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> *Diagnosis:*
> > > > > >>>>>>>>>> Transitive dependencies with "flexible" versions (where
> >=
> > > is
> > > > > >>>> used
> > > > > >>>>>>>>> instead
> > > > > >>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner
> > or
> > > > > >>> later
> > > > > >>>>> hit
> > > > > >>>>>>>>> other
> > > > > >>>>>>>>>> cases where not fixed dependencies cause similar
> problems
> > > with
> > > > > >>>>> other
> > > > > >>>>>>>>>> transitive dependencies. We need to fix-pin them. This
> > > causes
> > > > > >>>>>> problems
> > > > > >>>>>>>>> for
> > > > > >>>>>>>>>> both - released versions (cause they stop to work!) and
> > for
> > > > > >>>>>> development
> > > > > >>>>>>>>>> (cause they break master builds in TravisCI and prevent
> > > people
> > > > > >>>> from
> > > > > >>>>>>>>>> installing development environment from the scratch.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> *Solution:*
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>  - Following the old-but-good post
> > > > > >>>>>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=0jqlZcLU6%2BvO%2BJKSMlX7gyix6dKvD%2BZbrgHn9pRknLY%3D&amp;reserved=0
> > > > > >>>>>> we are going to fix the
> > > > > >>>>>>>>>> pinned
> > > > > >>>>>>>>>>  dependencies to specific versions (so basically all
> > > > > >>>> dependencies
> > > > > >>>>>> are
> > > > > >>>>>>>>>>  "fixed").
> > > > > >>>>>>>>>>  - We will introduce mechanism to be able to upgrade
> > > > > >>>> dependencies
> > > > > >>>>>> with
> > > > > >>>>>>>>>>  pip-tools (
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=hu%2FivDsKxwocNlVtBTgYE0E%2BET97u2DWN1IdnCF1ckU%3D&amp;reserved=0
> > > > > >>>>> ).
> > > > > >>>>>> We might also
> > > > > >>>>>>>>> take a
> > > > > >>>>>>>>>>  look at pipenv:
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=s0iqMPk3O8%2Bk1BCPBLYfIIMU2D4SdmPVEYELo%2FKS1%2FA%3D&amp;reserved=0
> > > > > >>>>>>>>>>  - People who would like to upgrade some dependencies
> for
> > > > > >>> their
> > > > > >>>>> PRs
> > > > > >>>>>>>>> will
> > > > > >>>>>>>>>>  still be able to do it - but such upgrades will be in
> > their
> > > > > >>> PR
> > > > > >>>>> thus
> > > > > >>>>>>>>> they
> > > > > >>>>>>>>>>  will go through TravisCI tests and they will also have
> to
> > > be
> > > > > >>>>>>> specified
> > > > > >>>>>>>>>> with
> > > > > >>>>>>>>>>  pinned fixed versions (==). This should be part of
> review
> > > > > >>>> process
> > > > > >>>>>> to
> > > > > >>>>>>>>>> make
> > > > > >>>>>>>>>>  sure new/changed requirements are pinned.
> > > > > >>>>>>>>>>  - In release process there will be a point where an
> > upgrade
> > > > > >>>> will
> > > > > >>>>> be
> > > > > >>>>>>>>>>  attempted for all requirements (using pip-tools) so
> that
> > we
> > > > > >>> are
> > > > > >>>>> not
> > > > > >>>>>>>>>> stuck
> > > > > >>>>>>>>>>  with older releases. This will be in controlled PR
> > > > > >>> environment
> > > > > >>>>>> where
> > > > > >>>>>>>>>> there
> > > > > >>>>>>>>>>  will be time to fix all dependencies without impacting
> > > others
> > > > > >>>> and
> > > > > >>>>>>>>> likely
> > > > > >>>>>>>>>>  enough time to "vet" such changes (this can be done for
> > > > > >>>>> alpha/beta
> > > > > >>>>>>>>>> releases
> > > > > >>>>>>>>>>  for example).
> > > > > >>>>>>>>>>  - As a side effect dependencies specification will
> become
> > > far
> > > > > >>>>>> simpler
> > > > > >>>>>>>>>>  and straightforward.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Happy to hear community comments to the proposal. I am
> > happy
> > > > to
> > > > > >>>>> take
> > > > > >>>>>> a
> > > > > >>>>>>>>> lead
> > > > > >>>>>>>>>> on that, open JIRA issue and implement if this is
> > something
> > > > > >>>>> community
> > > > > >>>>>>> is
> > > > > >>>>>>>>>> happy with.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> J.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> --
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
> > > > > >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > > <+48%20660%20796%20129>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>> --
> > > > > >>>>>>
> > > > > >>>>>> *Jarek Potiuk, Principal Software Engineer*
> > > > > >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > > <+48%20660%20796%20129>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> --
> > > > > >>>>>
> > > > > >>>>> *Jarek Potiuk, Principal Software Engineer*
> > > > > >>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > > <+48%20660%20796%20129>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>>
> > > > > >>> *Jarek Potiuk, Principal Software Engineer*
> > > > > >>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > > <+48%20660%20796%20129>
> > > > > >>>
> > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > *Jarek Potiuk, Principal Software Engineer*
> > Mobile: +48 660 796 129 <+48%20660%20796%20129>
> >
>


-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Re: Pinning dependencies for Apache Airflow

Posted by George Leslie-Waksman <wa...@gmail.com>.
It's not upgrading dependencies that I'm worried about, it's downgrading.
With upgrade conflicts, we can treat the dependency upgrades as a necessary
aspect of the Airflow upgrade.

Suppose Airflow pins LibraryA==1.2.3 and then a security issue is found in
LibraryA==1.2.3. This issue is fixed in LibraryA==1.2.4. Now, we are placed
in the annoying situation of either: a) managing our deployments so that we
install Airflow first, and then upgrade LibraryA and ignore pip's warning
about incompatible versions, b) keeping the insecure version of LibraryA,
c) waiting for another Airflow release and accepting all other changes, d)
maintaining our own fork of Airflow and diverging from mainline.

If Airflow specifies a requirement of LibraryA>=1.2.3, there is no problem
whatsoever. If we're worried about API changes in the future, there's
always LibraryA>=1.2.3,1.3 or LibraryA>=1.2.3,<2.0

As has been pointed out, that PythonOperator tasks run in the same venv as
Airflow, it is necessary that users be able to control dependencies for
their code.

To be clear, it's not always a security risk but this is not a hypothetical
issue. We ran into a code incompatibility with psutil that mattered to us
but had no impact on Airflow (see:
https://github.com/apache/incubator-airflow/pull/3585) and are currently
seeing SQLAlchemy held back without any clear need (
https://github.com/apache/incubator-airflow/blob/master/setup.py#L325).

Pinning dependencies for releases will force us (and I expect others) to
either: ignore/workaround the pinning, or not use Airflow releases. Both of
those options exactly defeat the point.

If people are on board with pinning / locking all dependencies for CI
purposes, and we can constrain requirements to ranges for necessary
compatibility, what is the value of pinning all dependencies for release
purposes?

--George

On Tue, Oct 9, 2018 at 11:57 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> I am still not convinced that pinning is bad. I re-read again the whole
> mail thread and the thread from 2016
> <
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> >
> to
> read all the arguments, but I stand by pinning.
>
> I am - of course - not sure about graduation argument. I would just imagine
> it might be the cas.. I however really think that situation we are in now
> is quite volatile. The latest 1.10.0 cannot be clean-installed via pip
> without manually tweaking and forcing lower version of flask-appbuilder.
> Even if you use the constraints file it's pretty cumbersome because you'd
> have to somehow know that you need to do exactly that (not at all obvious
> from the error you get). Also it might at any time get worse as other
> packages get newer versions released. The thing here is that maintainers of
> flask-appbuilder did nothing wrong, they simply released new version with
> click dependency version increased (probably for a good reason) and it's
> airflow's cross-dependency graph which makes it incompatible.
>
> I am afraid that if we don't change it, it's all but guaranteed that every
> single release at some point of time will "deteriorate" and refuse to
> clean-install. If we want to solve this problem (maybe we don't and we
> accept it as it is?), I think the only way to solve it is to hard-pin all
> the requirements at the very least for releases.
>
> Of course we might choose pinning only for releases (and CI builds) and
> have the compromise that Matt mentioned. I have the worry however (also
> mentioned in the previous thread) that it will be hard to maintain.
> Effectively you will have to maintain both in parallel. And the case with
> constraints is a nice workaround for someone who actually need specific
> (even newer) version of specific package in their environment.
>
> Maybe we should simply give it a try and do Proof-Of-Concept/experiment as
> also Fokko mentioned?
>
> We could have a PR with pinning enabled, and maybe ask the people who voice
> concerns about environment give it a try with those pinned versions and see
> if that makes it difficult for them to either upgrade dependencies and fork
> apache-airflow or use constraints file of pip?
>
> J.
>
>
> On Tue, Oct 9, 2018 at 5:56 PM Matt Davis <ji...@gmail.com> wrote:
>
> > Erik, the Airflow task execution code itself of course must run somewhere
> > with Airflow installed, but if the task is making a database query or a
> web
> > request or running something in Docker there's separation between the
> > environments and maybe you don't care about Python dependencies at all
> > except to get Airflow running. When running Python operators that's not
> the
> > case (as you already deal with).
> >
> > - Matt
> >
> > On Tue, Oct 9, 2018 at 2:45 AM EKC (Erik Cederstrand)
> > <EK...@novozymes.com.invalid> wrote:
> >
> > > This is maybe a stupid question, but is it even possible to run tasks
> in
> > > an environment where Airflow is not installed?
> > >
> > >
> > > Kind regards,
> > >
> > > Erik
> > >
> > > ________________________________
> > > From: Matt Davis <ji...@gmail.com>
> > > Sent: Monday, October 8, 2018 10:13:34 PM
> > > To: dev@airflow.incubator.apache.org
> > > Subject: Re: Pinning dependencies for Apache Airflow
> > >
> > > It sounds like we can get the best of both worlds with the original
> > > proposals to have minimal requirements in setup.py and "guaranteed to
> > work"
> > > complete requirements in a separate file. That way we have flexibility
> > for
> > > teams that run airflow and tasks in the same environment and guidance
> on
> > a
> > > working set of requirements. (Disclaimer: I work on the same team as
> > > George.)
> > >
> > > Thanks,
> > > Matt
> > >
> > > On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor <as...@apache.org>
> wrote:
> > >
> > > > Although I think I come down on the side against pinning, my reasons
> > are
> > > > different.
> > > >
> > > > For the two (or more) people who have expressed concern about it
> would
> > > > pip's "Constraint Files" help:
> > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpip.pypa.io%2Fen%2Fstable%2Fuser_guide%2F%23constraints-files&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=rUqtgC5eVKIQGlzniFMyJpU9IXFZ2Efs04ZCgO2I%2F9g%3D&amp;reserved=0
> > > >
> > > > For example, you could add "flask-appbuilder==1.11.1" in to this
> file,
> > > > specify it with `pip install -c constraints.txt apache-airflow` and
> > then
> > > > whenever pip attempted to install _any version of FAB it would use
> the
> > > > exact version from the constraints file.
> > > >
> > > > I don't buy the argument about pinning being a requirement for
> > graduation
> > > > from Incubation fwiw - it's an unavoidable artefact of the
> open-source
> > > > world we develop in.
> > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibraries.io%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=QX5hO%2FVPJE9M9A38QgCjx%2BfT4C1tfvr1ySUW%2FpV86Jw%3D&amp;reserved=0
> > > offers a (free?) service that will monitor apps
> > > > dependencies for being out of date, might be better than writing our
> > own
> > > > solution.
> > > >
> > > > Pip has for a while now supported a way of saying "this dep is for
> > py2.7
> > > > only":
> > > >
> > > > > Since version 6.0, pip also supports specifiers containing
> > environment
> > > > markers like so:
> > > > >
> > > > >    SomeProject ==5.4 ; python_version < '2.7'
> > > > >    SomeProject; sys_platform == 'win32'
> > > >
> > > >
> > > > Ash
> > > >
> > > >
> > > > > On 8 Oct 2018, at 07:58, George Leslie-Waksman <wa...@gmail.com>
> > > > wrote:
> > > > >
> > > > > As a member of a team that will also have really big problems if
> > > > > Airflow pins all requirements (for reasons similar to those already
> > > > > stated), I would like to add a very strong -1 to the idea of
> pinning
> > > > > them for all installations.
> > > > >
> > > > > In a number of situation on our end, to avoid similar problems with
> > > > > CI, we use `pip-compile` from pip-tools (also mentioned):
> > > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fpip-tools%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1d9m%2Bk4NSuXNtnXFRFtv6pGdAUDvVvkoFe95pTshiIQ%3D&amp;reserved=0
> > > > >
> > > > > I would like to suggest, a middle ground of:
> > > > >
> > > > > - Have the installation continue to use unpinned (`>=`) with
> minimum
> > > > > necessary requirements set
> > > > > - Include a pip-compiled requirements file (`requirements-ci.txt`?)
> > > > > that is used by CI
> > > > > - - If we need, there can be one file for each incompatible python
> > > > version
> > > > > - Append a watermark (hash of `setup.py` requirements?) to the
> > > > > compiled requirements file
> > > > > - Add a CI check that the watermark and original match to ensure no
> > > > > drift since last compile
> > > > >
> > > > > I am happy to do much of the work for this, if it can help avoid
> > > > > pinning all of the depends at the installation level.
> > > > >
> > > > > --George Leslie-Waksman
> > > > >
> > > > > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
> > > > > <ma...@gmail.com> wrote:
> > > > >>
> > > > >> pip-tools can definitely help here to ship a reference [locked]
> > > > >> `requirements.txt` that can be used in [all or part of] the CI.
> It's
> > > > >> actually kind of important to get CI to fail when a new [backward
> > > > >> incompatible] lib comes out and break things while allowing
> version
> > > > ranges.
> > > > >>
> > > > >> I think there may be challenges around pip-tools and projects that
> > run
> > > > in
> > > > >> both python2.7 and python3.6. You sometimes need to have 2
> > > > requirements.txt
> > > > >> lock files.
> > > > >>
> > > > >> Max
> > > > >>
> > > > >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >>> It's a nice one :). However I think when/if we go to pinned
> > > > dependencies
> > > > >>> the way poetry/pip-tools do it, this will be suddenly lot-less
> > useful
> > > > It
> > > > >>> will be very easy to track dependency changes (they will be
> always
> > > > >>> committed as a change in the .lock file or requirements.txt) and
> if
> > > > someone
> > > > >>> has a problem while upgrading a dependency (always consciously,
> > never
> > > > >>> accidentally) it will simply fail during CI build and the change
> > > won't
> > > > get
> > > > >>> merged/won't break the builds of others in the first place :).
> > > > >>>
> > > > >>> J.
> > > > >>>
> > > > >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <
> xd.deng.r@gmail.com>
> > > > wrote:
> > > > >>>
> > > > >>>> Hi folks,
> > > > >>>>
> > > > >>>> On top of this discussion, I was thinking we should have the
> > ability
> > > > to
> > > > >>>> quickly monitor dependency release as well. Previously, it
> > happened
> > > > for a
> > > > >>>> few times that CI kept failing for no reason and eventually
> turned
> > > > out it
> > > > >>>> was due to dependency release. But it took us some time,
> > sometimes a
> > > > few
> > > > >>>> days, to realise the failure was because of dependency release.
> > > > >>>>
> > > > >>>> To partially address this, I tried to develop a mini tool to
> help
> > us
> > > > >>> check
> > > > >>>> the latest release of Python packages & the release date-time on
> > > PyPi.
> > > > >>> So,
> > > > >>>> by comparing it with our CI failure history, we may be able to
> > > > >>> troubleshoot
> > > > >>>> faster.
> > > > >>>>
> > > > >>>> Output Sample (ordered by upload time in desc order):
> > > > >>>>                               Latest Version          Upload
> Time
> > > > >>>> Package Name
> > > > >>>> awscli                    1.16.28
> > > > >>> 2018-10-05T23:12:45
> > > > >>>> botocore                1.12.18
> > > > 2018-10-05T23:12:39
> > > > >>>> promise                   2.2.1
> > > > >>> 2018-10-04T22:04:18
> > > > >>>> Keras                     2.2.4
> > > > >>> 2018-10-03T20:59:39
> > > > >>>> bleach                    3.0.0
> > > > >>> 2018-10-03T16:54:27
> > > > >>>> Flask-AppBuilder         1.12.0
> 2018-10-03T09:03:48
> > > > >>>> ... ...
> > > > >>>>
> > > > >>>> It's a minimal tool (not perfect yet but working). I have hosted
> > > this
> > > > >>> tool
> > > > >>>> at
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXD-DENG%2Fpypi-release-query&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=xk9hyQA%2BnaJjqPF7bTQB%2BydqSfGIVzxkynfxjx%2FVoYo%3D&amp;reserved=0
> > > .
> > > > >>>>
> > > > >>>>
> > > > >>>> XD
> > > > >>>>
> > > > >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Hello Erik,
> > > > >>>>>
> > > > >>>>> I understand your concern. It's a hard one to solve in general
> > > (i.e.
> > > > >>>>> dependency-hell). It looks like in this case you treat Airflow
> as
> > > > >>>>> 'library', where for some other people it might be more like
> 'end
> > > > >>>> product'.
> > > > >>>>> If you look at the "pinning" philosophy - the "pin everything"
> is
> > > > good
> > > > >>>> for
> > > > >>>>> end products, but not good for libraries. In the case you have
> > > > Airflow
> > > > >>> is
> > > > >>>>> treated as a bit of both. And it's perfectly valid case at that
> > > (with
> > > > >>>>> custom python DAGs being central concept for Airflow).
> > > > >>>>> However, I think it's not as bad as you think when it comes to
> > > exact
> > > > >>>>> pinning.
> > > > >>>>>
> > > > >>>>> I believe - a bit counter-intuitively - that tools like
> > > > >>> pip-tools/poetry
> > > > >>>>> with exact pinning result in having your dependencies upgraded
> > more
> > > > >>>> often,
> > > > >>>>> rather than less - especially in complex systems where
> > > > dependency-hell
> > > > >>>>> creeps-in. If you look at Airflow's setup.py now - It's a bit
> > scary
> > > > to
> > > > >>>> make
> > > > >>>>> any change to it. There is a chance it will blow at your face
> if
> > > you
> > > > >>>> change
> > > > >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you
> > change
> > > > it,
> > > > >>>>> whether it will cause chain reaction of conflicts that will
> ruin
> > > your
> > > > >>>> work
> > > > >>>>> day.
> > > > >>>>>
> > > > >>>>> On the contrary - if you change it to exact pinning in
> > > > >>>>> .lock/requirements.txt file (poetry/pip-tools) and have much
> > > simpler
> > > > >>> (and
> > > > >>>>> commented) exclusion/avoidance rules in your .in/.tml file, the
> > > whole
> > > > >>>> setup
> > > > >>>>> might be much easier to maintain and upgrade. Every time you
> > > prepare
> > > > >>> for
> > > > >>>>> release (or even once in a while for master) one person might
> > > > >>> consciously
> > > > >>>>> attempt to upgrade all dependencies to latest ones. It should
> be
> > > > almost
> > > > >>>> as
> > > > >>>>> easy as letting poetry/pip-tools help with figuring out what
> are
> > > the
> > > > >>>> latest
> > > > >>>>> set of dependencies that will work without conflicts. It should
> > be
> > > > >>> rather
> > > > >>>>> straightforward (I've done it in the past for fairly complex
> > > > systems).
> > > > >>>> What
> > > > >>>>> those tools enable is - doing single-shot upgrade of all
> > > > dependencies.
> > > > >>>>> After doing it you can make sure that all tests work fine (and
> > fix
> > > > any
> > > > >>>>> problems that result from it). And then you test it thoroughly
> > > before
> > > > >>> you
> > > > >>>>> make final release. You can do it in separate PR - with
> automated
> > > > >>> testing
> > > > >>>>> in Travis which means that you are not disturbing work of
> others
> > > > >>>>> (compilation/building + unit tests are guaranteed to work
> before
> > > you
> > > > >>>> merge
> > > > >>>>> it) while doing it. It's all conscious rather than accidental.
> > Nice
> > > > >>> side
> > > > >>>>> effect of that is that with every release you can actually
> > > "catch-up"
> > > > >>>> with
> > > > >>>>> latest stable versions of many libraries in one go. It's better
> > > than
> > > > >>>>> waiting until someone deliberately upgrades to newer version
> (and
> > > the
> > > > >>>> rest
> > > > >>>>> remain terribly out-dated as is the case for Airflow now).
> > > > >>>>>
> > > > >>>>> So a bit counterintuitively I think tools like pip-tools/poetry
> > > help
> > > > >>> you
> > > > >>>> to
> > > > >>>>> catch up faster in many cases. That is at least my experience
> so
> > > far.
> > > > >>>>>
> > > > >>>>> Additionally, Airflow is an open system - if you have very
> > specific
> > > > >>> needs
> > > > >>>>> for requirements, you might actually - in the very same way
> with
> > > > >>>>> pip-tools/poetry - upgrade all your dependencies in your local
> > fork
> > > > of
> > > > >>>>> Airflow before someone else does it in master/release. Those
> > tools
> > > > kind
> > > > >>>> of
> > > > >>>>> democratise dependency management. It should be as easy as
> > > > `pip-compile
> > > > >>>>> --upgrade` or `poetry update` and you will get all the
> > > > >>> "non-conflicting"
> > > > >>>>> latest dependencies in your local fork (and poetry especially
> > seems
> > > > to
> > > > >>> do
> > > > >>>>> all the heavy lifting of figuring out which versions will
> work).
> > > You
> > > > >>>> should
> > > > >>>>> be able to test and publish it locally as your private package
> > for
> > > > >>> local
> > > > >>>>> installations. You can even mark the specific dependency you
> want
> > > to
> > > > >>> use
> > > > >>>>> specific version and let pip-tools/poetry figure out exact
> > versions
> > > > of
> > > > >>>>> other requirements. You can even make a PR with such upgrade
> > > > eventually
> > > > >>>> to
> > > > >>>>> get it faster in master. You can even downgrade in case newer
> > > > >>> dependency
> > > > >>>>> causes problems for you in similar way. Guided by the tools,
> it's
> > > > much
> > > > >>>>> faster than figuring the versions out by yourself.
> > > > >>>>>
> > > > >>>>> As long as we have simple way of managing it and document how
> to
> > > > >>>>> upgrade/downgrade dependencies in your own fork, and mention
> how
> > to
> > > > >>>> locally
> > > > >>>>> release Airflow as a package, I think your case could be
> covered
> > > even
> > > > >>>>> better than now. What do you think ?
> > > > >>>>>
> > > > >>>>> J.
> > > > >>>>>
> > > > >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> > > > >>>>> <EK...@novozymes.com.invalid> wrote:
> > > > >>>>>
> > > > >>>>>> For us, exact pinning of versions would be problematic. We
> have
> > > DAG
> > > > >>>> code
> > > > >>>>>> that shares direct and indirect dependencies with Airflow,
> e.g.
> > > > lxml,
> > > > >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3.
> > If
> > > > our
> > > > >>>> DAG
> > > > >>>>>> code for some reason needs a newer point release due to a bug
> > > that's
> > > > >>>>> fixed,
> > > > >>>>>> then we can't cleanly build a virtual environment containing
> the
> > > > >>> fixed
> > > > >>>>>> version. For us, it's already a problem that Airflow has quite
> > > > strict
> > > > >>>>> (and
> > > > >>>>>> sometimes old) requirements in setup.py.
> > > > >>>>>>
> > > > >>>>>> Erik
> > > > >>>>>> ________________________________
> > > > >>>>>> From: Jarek Potiuk <Ja...@polidea.com>
> > > > >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
> > > > >>>>>> To: dev@airflow.incubator.apache.org
> > > > >>>>>> Subject: Re: Pinning dependencies for Apache Airflow
> > > > >>>>>>
> > > > >>>>>> I think one solution to release approach is to check as part
> of
> > > > >>>> automated
> > > > >>>>>> Travis build if all requirements are pinned with == (even the
> > deep
> > > > >>>> ones)
> > > > >>>>>> and fail the build in case they are not for ALL versions
> > > (including
> > > > >>>>>> dev). And of course we should document the approach of
> > > > >>>> releases/upgrades
> > > > >>>>>> etc. If we do it all the time for development versions (which
> > > seems
> > > > >>>> quite
> > > > >>>>>> doable), then transitively all the releases will also have
> > pinned
> > > > >>>>> versions
> > > > >>>>>> and they will never try to upgrade any of the dependencies. In
> > > > poetry
> > > > >>>>>> (similarly in pip-tools with .in file) it is done by having a
> > > .lock
> > > > >>>> file
> > > > >>>>>> that specifies exact versions of each package so it can be
> > rather
> > > > >>> easy
> > > > >>>> to
> > > > >>>>>> manage (so it's worth trying it out I think  :D  - seems a bit
> > > more
> > > > >>>>>> friendly than pip-tools).
> > > > >>>>>>
> > > > >>>>>> There is a drawback - of course - with manually updating the
> > > module
> > > > >>>> that
> > > > >>>>>> you want, but I really see that as an advantage rather than
> > > drawback
> > > > >>>>>> especially for users. This way you maintain the property that
> it
> > > > will
> > > > >>>>>> always install and work the same way no matter if you
> installed
> > it
> > > > >>>> today
> > > > >>>>> or
> > > > >>>>>> two months ago. I think the biggest drawback for maintainers
> is
> > > that
> > > > >>>> you
> > > > >>>>>> need some kind of monitoring of security vulnerabilities and
> > > cannot
> > > > >>>> rely
> > > > >>>>> on
> > > > >>>>>> automated security upgrades. With >= requirements those
> security
> > > > >>>> updates
> > > > >>>>>> might happen automatically without anyone noticing, but to be
> > > honest
> > > > >>> I
> > > > >>>>>> don't think such upgrades are guaranteed even in current setup
> > for
> > > > >>> all
> > > > >>>>>> security issues for all libraries anyway.
> > > > >>>>>>
> > > > >>>>>> Finding the need to upgrade because of security issues can be
> > > quite
> > > > >>>>>> automated. Even now I noticed Github started to inform owners
> > > about
> > > > >>>>>> potential security vulnerabilities in used libraries for their
> > > > >>> project.
> > > > >>>>>> Those notifications can be sent to devlist and turned into
> JIRA
> > > > >>> issues
> > > > >>>>>> followed bvy  minor security-related releases (with only few
> > > library
> > > > >>>>>> dependencies upgraded).
> > > > >>>>>>
> > > > >>>>>> I think it's even easier to automate it if you have pinned
> > > > >>>> dependencies -
> > > > >>>>>> because it's generally easy to find applicable vulnerabilities
> > for
> > > > >>>>> specific
> > > > >>>>>> versions of libraries by static analysers - when you have >=,
> > you
> > > > >>> never
> > > > >>>>>> know which version will be used until you actually perform the
> > > > >>>>>> installation.
> > > > >>>>>>
> > > > >>>>>> There is one big advantage for maintainers for "pinned" case.
> > Your
> > > > >>>> users
> > > > >>>>>> always have the same dependencies - so when issue is raised,
> you
> > > can
> > > > >>>>>> reproduce it more easily. It's hard to know which version user
> > has
> > > > >>> (as
> > > > >>>>> the
> > > > >>>>>> user could install it month ago or yesterday) and even if you
> > find
> > > > >>> out
> > > > >>>> by
> > > > >>>>>> asking the user, you might not be able to reproduce the set of
> > > > >>>>> requirements
> > > > >>>>>> easily (simply because there are already newer versions of the
> > > > >>>> libraries
> > > > >>>>>> released and they are used automatically). You can ask the
> user
> > to
> > > > >>> run
> > > > >>>>> pip
> > > > >>>>>> --upgrade but that's dangerous and pretty lame ("check the
> > latest
> > > > >>>>> version -
> > > > >>>>>> maybe it fixes your problem ? ") and sometimes not possible
> > (e.g.
> > > > >>>> someone
> > > > >>>>>> has pre-built docker image with dependencies from few months
> ago
> > > and
> > > > >>>>> cannot
> > > > >>>>>> rebuild the image easily).
> > > > >>>>>>
> > > > >>>>>> J.
> > > > >>>>>>
> > > > >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <
> > ash@apache.org
> > > >
> > > > >>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> One thing to point out here.
> > > > >>>>>>>
> > > > >>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a
> clean
> > > > >>>>>>> environment it will fail.
> > > > >>>>>>>
> > > > >>>>>>> This is because we pin flask-login to 0.2.1 but
> > flask-appbuilder
> > > is
> > > > >>>>> =
> > > > >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires flask-login >=
> > > 0.3.
> > > > >>>>>>>
> > > > >>>>>>> So I do think there is maybe something to be said about
> pinning
> > > for
> > > > >>>>>>> releases. The down side to that is that if there are updates
> > to a
> > > > >>>>> module
> > > > >>>>>>> that we want then we have to make a point release to let
> people
> > > get
> > > > >>>> it
> > > > >>>>>>>
> > > > >>>>>>> Both methods have draw-backs
> > > > >>>>>>>
> > > > >>>>>>> -ash
> > > > >>>>>>>
> > > > >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> > > > >>> arthur.wiedmer@gmail.com>
> > > > >>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>> Hi Jarek,
> > > > >>>>>>>>
> > > > >>>>>>>> I will +1 the discussion Dan is referring to and George's
> > > advice.
> > > > >>>>>>>>
> > > > >>>>>>>> I just want to double check we are talking about pinning in
> > > > >>>>>>>> requirements.txt only.
> > > > >>>>>>>>
> > > > >>>>>>>> This offers the ability to
> > > > >>>>>>>> pip install -r requirements.txt
> > > > >>>>>>>> pip install --no-deps airflow
> > > > >>>>>>>> For a guaranteed install which works.
> > > > >>>>>>>>
> > > > >>>>>>>> Several different requirement files can be provided for
> > specific
> > > > >>>> use
> > > > >>>>>>> cases,
> > > > >>>>>>>> like a stable dev one for instance for people wanting to
> work
> > on
> > > > >>>>>>> operators
> > > > >>>>>>>> and non-core functions.
> > > > >>>>>>>>
> > > > >>>>>>>> However, I think we should proactively test in CI against
> > > > >>> unpinned
> > > > >>>>>>>> dependencies (though it might be a separate case in the
> > matrix)
> > > ,
> > > > >>>> so
> > > > >>>>>> that
> > > > >>>>>>>> we get advance warning if possible that things will break.
> > > > >>>>>>>> CI downtime is not a bad thing here, it actually caught a
> > > problem
> > > > >>>> :)
> > > > >>>>>>>>
> > > > >>>>>>>> We should unpin as possible in setup.py to only maintain
> > minimum
> > > > >>>>>> required
> > > > >>>>>>>> compatibility. The process of pinning in setup.py is
> extremely
> > > > >>>>>>> detrimental
> > > > >>>>>>>> when you have a large number of python libraries installed
> > with
> > > > >>>>>> different
> > > > >>>>>>>> pinned versions.
> > > > >>>>>>>>
> > > > >>>>>>>> Best,
> > > > >>>>>>>> Arthur
> > > > >>>>>>>>
> > > > >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > > >>>>>> <ddavydov@twitter.com.invalid
> > > > >>>>>>>>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Relevant discussion about this:
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=9wta3PcUeZjBg%2FmACBH06cNRzbYG4NcAW0XDJKan6cM%3D&amp;reserved=0
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > > >>>>>> Jarek.Potiuk@polidea.com>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> TL;DR; A change is coming in the way how
> > > > >>>> dependencies/requirements
> > > > >>>>>> are
> > > > >>>>>>>>>> specified for Apache Airflow - they will be fixed rather
> > than
> > > > >>>>>> flexible
> > > > >>>>>>>>> (==
> > > > >>>>>>>>>> rather than >=).
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> This is follow up after Slack discussion we had with Ash
> and
> > > > >>>> Kaxil
> > > > >>>>> -
> > > > >>>>>>>>>> summarising what we propose we'll do.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> *Problem:*
> > > > >>>>>>>>>> During last few weeks we experienced quite a few downtimes
> > of
> > > > >>>>>> TravisCI
> > > > >>>>>>>>>> builds (for all PRs/branches including master) as some of
> > the
> > > > >>>>>>> transitive
> > > > >>>>>>>>>> dependencies were automatically upgraded. This because in
> a
> > > > >>>> number
> > > > >>>>> of
> > > > >>>>>>>>>> dependencies we have  >= rather than == dependencies.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Whenever there is a new release of such dependency, it
> might
> > > > >>>> cause
> > > > >>>>>>> chain
> > > > >>>>>>>>>> reaction with upgrade of transitive dependencies which
> might
> > > > >>> get
> > > > >>>>> into
> > > > >>>>>>>>>> conflict.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> An example was Flask-AppBuilder vs flask-login transitive
> > > > >>>>> dependency
> > > > >>>>>>> with
> > > > >>>>>>>>>> click. They started to conflict once AppBuilder has
> released
> > > > >>>>> version
> > > > >>>>>>>>>> 1.12.0.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> *Diagnosis:*
> > > > >>>>>>>>>> Transitive dependencies with "flexible" versions (where >=
> > is
> > > > >>>> used
> > > > >>>>>>>>> instead
> > > > >>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner
> or
> > > > >>> later
> > > > >>>>> hit
> > > > >>>>>>>>> other
> > > > >>>>>>>>>> cases where not fixed dependencies cause similar problems
> > with
> > > > >>>>> other
> > > > >>>>>>>>>> transitive dependencies. We need to fix-pin them. This
> > causes
> > > > >>>>>> problems
> > > > >>>>>>>>> for
> > > > >>>>>>>>>> both - released versions (cause they stop to work!) and
> for
> > > > >>>>>> development
> > > > >>>>>>>>>> (cause they break master builds in TravisCI and prevent
> > people
> > > > >>>> from
> > > > >>>>>>>>>> installing development environment from the scratch.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> *Solution:*
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>  - Following the old-but-good post
> > > > >>>>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=0jqlZcLU6%2BvO%2BJKSMlX7gyix6dKvD%2BZbrgHn9pRknLY%3D&amp;reserved=0
> > > > >>>>>> we are going to fix the
> > > > >>>>>>>>>> pinned
> > > > >>>>>>>>>>  dependencies to specific versions (so basically all
> > > > >>>> dependencies
> > > > >>>>>> are
> > > > >>>>>>>>>>  "fixed").
> > > > >>>>>>>>>>  - We will introduce mechanism to be able to upgrade
> > > > >>>> dependencies
> > > > >>>>>> with
> > > > >>>>>>>>>>  pip-tools (
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=hu%2FivDsKxwocNlVtBTgYE0E%2BET97u2DWN1IdnCF1ckU%3D&amp;reserved=0
> > > > >>>>> ).
> > > > >>>>>> We might also
> > > > >>>>>>>>> take a
> > > > >>>>>>>>>>  look at pipenv:
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=s0iqMPk3O8%2Bk1BCPBLYfIIMU2D4SdmPVEYELo%2FKS1%2FA%3D&amp;reserved=0
> > > > >>>>>>>>>>  - People who would like to upgrade some dependencies for
> > > > >>> their
> > > > >>>>> PRs
> > > > >>>>>>>>> will
> > > > >>>>>>>>>>  still be able to do it - but such upgrades will be in
> their
> > > > >>> PR
> > > > >>>>> thus
> > > > >>>>>>>>> they
> > > > >>>>>>>>>>  will go through TravisCI tests and they will also have to
> > be
> > > > >>>>>>> specified
> > > > >>>>>>>>>> with
> > > > >>>>>>>>>>  pinned fixed versions (==). This should be part of review
> > > > >>>> process
> > > > >>>>>> to
> > > > >>>>>>>>>> make
> > > > >>>>>>>>>>  sure new/changed requirements are pinned.
> > > > >>>>>>>>>>  - In release process there will be a point where an
> upgrade
> > > > >>>> will
> > > > >>>>> be
> > > > >>>>>>>>>>  attempted for all requirements (using pip-tools) so that
> we
> > > > >>> are
> > > > >>>>> not
> > > > >>>>>>>>>> stuck
> > > > >>>>>>>>>>  with older releases. This will be in controlled PR
> > > > >>> environment
> > > > >>>>>> where
> > > > >>>>>>>>>> there
> > > > >>>>>>>>>>  will be time to fix all dependencies without impacting
> > others
> > > > >>>> and
> > > > >>>>>>>>> likely
> > > > >>>>>>>>>>  enough time to "vet" such changes (this can be done for
> > > > >>>>> alpha/beta
> > > > >>>>>>>>>> releases
> > > > >>>>>>>>>>  for example).
> > > > >>>>>>>>>>  - As a side effect dependencies specification will become
> > far
> > > > >>>>>> simpler
> > > > >>>>>>>>>>  and straightforward.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Happy to hear community comments to the proposal. I am
> happy
> > > to
> > > > >>>>> take
> > > > >>>>>> a
> > > > >>>>>>>>> lead
> > > > >>>>>>>>>> on that, open JIRA issue and implement if this is
> something
> > > > >>>>> community
> > > > >>>>>>> is
> > > > >>>>>>>>>> happy with.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> J.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> --
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
> > > > >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> <+48%20660%20796%20129>
> > > <+48%20660%20796%20129>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>> --
> > > > >>>>>>
> > > > >>>>>> *Jarek Potiuk, Principal Software Engineer*
> > > > >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> <+48%20660%20796%20129>
> > > <+48%20660%20796%20129>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> --
> > > > >>>>>
> > > > >>>>> *Jarek Potiuk, Principal Software Engineer*
> > > > >>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> <+48%20660%20796%20129>
> > > <+48%20660%20796%20129>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>>
> > > > >>> *Jarek Potiuk, Principal Software Engineer*
> > > > >>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> <+48%20660%20796%20129>
> > > <+48%20660%20796%20129>
> > > > >>>
> > > >
> > > >
> > >
> >
>
>
> --
>
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129 <+48%20660%20796%20129>
>

Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
I am still not convinced that pinning is bad. I re-read again the whole
mail thread and the thread from 2016
<https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174>
to
read all the arguments, but I stand by pinning.

I am - of course - not sure about graduation argument. I would just imagine
it might be the cas.. I however really think that situation we are in now
is quite volatile. The latest 1.10.0 cannot be clean-installed via pip
without manually tweaking and forcing lower version of flask-appbuilder.
Even if you use the constraints file it's pretty cumbersome because you'd
have to somehow know that you need to do exactly that (not at all obvious
from the error you get). Also it might at any time get worse as other
packages get newer versions released. The thing here is that maintainers of
flask-appbuilder did nothing wrong, they simply released new version with
click dependency version increased (probably for a good reason) and it's
airflow's cross-dependency graph which makes it incompatible.

I am afraid that if we don't change it, it's all but guaranteed that every
single release at some point of time will "deteriorate" and refuse to
clean-install. If we want to solve this problem (maybe we don't and we
accept it as it is?), I think the only way to solve it is to hard-pin all
the requirements at the very least for releases.

Of course we might choose pinning only for releases (and CI builds) and
have the compromise that Matt mentioned. I have the worry however (also
mentioned in the previous thread) that it will be hard to maintain.
Effectively you will have to maintain both in parallel. And the case with
constraints is a nice workaround for someone who actually need specific
(even newer) version of specific package in their environment.

Maybe we should simply give it a try and do Proof-Of-Concept/experiment as
also Fokko mentioned?

We could have a PR with pinning enabled, and maybe ask the people who voice
concerns about environment give it a try with those pinned versions and see
if that makes it difficult for them to either upgrade dependencies and fork
apache-airflow or use constraints file of pip?

J.


On Tue, Oct 9, 2018 at 5:56 PM Matt Davis <ji...@gmail.com> wrote:

> Erik, the Airflow task execution code itself of course must run somewhere
> with Airflow installed, but if the task is making a database query or a web
> request or running something in Docker there's separation between the
> environments and maybe you don't care about Python dependencies at all
> except to get Airflow running. When running Python operators that's not the
> case (as you already deal with).
>
> - Matt
>
> On Tue, Oct 9, 2018 at 2:45 AM EKC (Erik Cederstrand)
> <EK...@novozymes.com.invalid> wrote:
>
> > This is maybe a stupid question, but is it even possible to run tasks in
> > an environment where Airflow is not installed?
> >
> >
> > Kind regards,
> >
> > Erik
> >
> > ________________________________
> > From: Matt Davis <ji...@gmail.com>
> > Sent: Monday, October 8, 2018 10:13:34 PM
> > To: dev@airflow.incubator.apache.org
> > Subject: Re: Pinning dependencies for Apache Airflow
> >
> > It sounds like we can get the best of both worlds with the original
> > proposals to have minimal requirements in setup.py and "guaranteed to
> work"
> > complete requirements in a separate file. That way we have flexibility
> for
> > teams that run airflow and tasks in the same environment and guidance on
> a
> > working set of requirements. (Disclaimer: I work on the same team as
> > George.)
> >
> > Thanks,
> > Matt
> >
> > On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor <as...@apache.org> wrote:
> >
> > > Although I think I come down on the side against pinning, my reasons
> are
> > > different.
> > >
> > > For the two (or more) people who have expressed concern about it would
> > > pip's "Constraint Files" help:
> > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpip.pypa.io%2Fen%2Fstable%2Fuser_guide%2F%23constraints-files&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=rUqtgC5eVKIQGlzniFMyJpU9IXFZ2Efs04ZCgO2I%2F9g%3D&amp;reserved=0
> > >
> > > For example, you could add "flask-appbuilder==1.11.1" in to this file,
> > > specify it with `pip install -c constraints.txt apache-airflow` and
> then
> > > whenever pip attempted to install _any version of FAB it would use the
> > > exact version from the constraints file.
> > >
> > > I don't buy the argument about pinning being a requirement for
> graduation
> > > from Incubation fwiw - it's an unavoidable artefact of the open-source
> > > world we develop in.
> > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibraries.io%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=QX5hO%2FVPJE9M9A38QgCjx%2BfT4C1tfvr1ySUW%2FpV86Jw%3D&amp;reserved=0
> > offers a (free?) service that will monitor apps
> > > dependencies for being out of date, might be better than writing our
> own
> > > solution.
> > >
> > > Pip has for a while now supported a way of saying "this dep is for
> py2.7
> > > only":
> > >
> > > > Since version 6.0, pip also supports specifiers containing
> environment
> > > markers like so:
> > > >
> > > >    SomeProject ==5.4 ; python_version < '2.7'
> > > >    SomeProject; sys_platform == 'win32'
> > >
> > >
> > > Ash
> > >
> > >
> > > > On 8 Oct 2018, at 07:58, George Leslie-Waksman <wa...@gmail.com>
> > > wrote:
> > > >
> > > > As a member of a team that will also have really big problems if
> > > > Airflow pins all requirements (for reasons similar to those already
> > > > stated), I would like to add a very strong -1 to the idea of pinning
> > > > them for all installations.
> > > >
> > > > In a number of situation on our end, to avoid similar problems with
> > > > CI, we use `pip-compile` from pip-tools (also mentioned):
> > > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fpip-tools%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1d9m%2Bk4NSuXNtnXFRFtv6pGdAUDvVvkoFe95pTshiIQ%3D&amp;reserved=0
> > > >
> > > > I would like to suggest, a middle ground of:
> > > >
> > > > - Have the installation continue to use unpinned (`>=`) with minimum
> > > > necessary requirements set
> > > > - Include a pip-compiled requirements file (`requirements-ci.txt`?)
> > > > that is used by CI
> > > > - - If we need, there can be one file for each incompatible python
> > > version
> > > > - Append a watermark (hash of `setup.py` requirements?) to the
> > > > compiled requirements file
> > > > - Add a CI check that the watermark and original match to ensure no
> > > > drift since last compile
> > > >
> > > > I am happy to do much of the work for this, if it can help avoid
> > > > pinning all of the depends at the installation level.
> > > >
> > > > --George Leslie-Waksman
> > > >
> > > > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
> > > > <ma...@gmail.com> wrote:
> > > >>
> > > >> pip-tools can definitely help here to ship a reference [locked]
> > > >> `requirements.txt` that can be used in [all or part of] the CI. It's
> > > >> actually kind of important to get CI to fail when a new [backward
> > > >> incompatible] lib comes out and break things while allowing version
> > > ranges.
> > > >>
> > > >> I think there may be challenges around pip-tools and projects that
> run
> > > in
> > > >> both python2.7 and python3.6. You sometimes need to have 2
> > > requirements.txt
> > > >> lock files.
> > > >>
> > > >> Max
> > > >>
> > > >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com
> > >
> > > >> wrote:
> > > >>
> > > >>> It's a nice one :). However I think when/if we go to pinned
> > > dependencies
> > > >>> the way poetry/pip-tools do it, this will be suddenly lot-less
> useful
> > > It
> > > >>> will be very easy to track dependency changes (they will be always
> > > >>> committed as a change in the .lock file or requirements.txt) and if
> > > someone
> > > >>> has a problem while upgrading a dependency (always consciously,
> never
> > > >>> accidentally) it will simply fail during CI build and the change
> > won't
> > > get
> > > >>> merged/won't break the builds of others in the first place :).
> > > >>>
> > > >>> J.
> > > >>>
> > > >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd...@gmail.com>
> > > wrote:
> > > >>>
> > > >>>> Hi folks,
> > > >>>>
> > > >>>> On top of this discussion, I was thinking we should have the
> ability
> > > to
> > > >>>> quickly monitor dependency release as well. Previously, it
> happened
> > > for a
> > > >>>> few times that CI kept failing for no reason and eventually turned
> > > out it
> > > >>>> was due to dependency release. But it took us some time,
> sometimes a
> > > few
> > > >>>> days, to realise the failure was because of dependency release.
> > > >>>>
> > > >>>> To partially address this, I tried to develop a mini tool to help
> us
> > > >>> check
> > > >>>> the latest release of Python packages & the release date-time on
> > PyPi.
> > > >>> So,
> > > >>>> by comparing it with our CI failure history, we may be able to
> > > >>> troubleshoot
> > > >>>> faster.
> > > >>>>
> > > >>>> Output Sample (ordered by upload time in desc order):
> > > >>>>                               Latest Version          Upload Time
> > > >>>> Package Name
> > > >>>> awscli                    1.16.28
> > > >>> 2018-10-05T23:12:45
> > > >>>> botocore                1.12.18
> > > 2018-10-05T23:12:39
> > > >>>> promise                   2.2.1
> > > >>> 2018-10-04T22:04:18
> > > >>>> Keras                     2.2.4
> > > >>> 2018-10-03T20:59:39
> > > >>>> bleach                    3.0.0
> > > >>> 2018-10-03T16:54:27
> > > >>>> Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
> > > >>>> ... ...
> > > >>>>
> > > >>>> It's a minimal tool (not perfect yet but working). I have hosted
> > this
> > > >>> tool
> > > >>>> at
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXD-DENG%2Fpypi-release-query&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=xk9hyQA%2BnaJjqPF7bTQB%2BydqSfGIVzxkynfxjx%2FVoYo%3D&amp;reserved=0
> > .
> > > >>>>
> > > >>>>
> > > >>>> XD
> > > >>>>
> > > >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hello Erik,
> > > >>>>>
> > > >>>>> I understand your concern. It's a hard one to solve in general
> > (i.e.
> > > >>>>> dependency-hell). It looks like in this case you treat Airflow as
> > > >>>>> 'library', where for some other people it might be more like 'end
> > > >>>> product'.
> > > >>>>> If you look at the "pinning" philosophy - the "pin everything" is
> > > good
> > > >>>> for
> > > >>>>> end products, but not good for libraries. In the case you have
> > > Airflow
> > > >>> is
> > > >>>>> treated as a bit of both. And it's perfectly valid case at that
> > (with
> > > >>>>> custom python DAGs being central concept for Airflow).
> > > >>>>> However, I think it's not as bad as you think when it comes to
> > exact
> > > >>>>> pinning.
> > > >>>>>
> > > >>>>> I believe - a bit counter-intuitively - that tools like
> > > >>> pip-tools/poetry
> > > >>>>> with exact pinning result in having your dependencies upgraded
> more
> > > >>>> often,
> > > >>>>> rather than less - especially in complex systems where
> > > dependency-hell
> > > >>>>> creeps-in. If you look at Airflow's setup.py now - It's a bit
> scary
> > > to
> > > >>>> make
> > > >>>>> any change to it. There is a chance it will blow at your face if
> > you
> > > >>>> change
> > > >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you
> change
> > > it,
> > > >>>>> whether it will cause chain reaction of conflicts that will ruin
> > your
> > > >>>> work
> > > >>>>> day.
> > > >>>>>
> > > >>>>> On the contrary - if you change it to exact pinning in
> > > >>>>> .lock/requirements.txt file (poetry/pip-tools) and have much
> > simpler
> > > >>> (and
> > > >>>>> commented) exclusion/avoidance rules in your .in/.tml file, the
> > whole
> > > >>>> setup
> > > >>>>> might be much easier to maintain and upgrade. Every time you
> > prepare
> > > >>> for
> > > >>>>> release (or even once in a while for master) one person might
> > > >>> consciously
> > > >>>>> attempt to upgrade all dependencies to latest ones. It should be
> > > almost
> > > >>>> as
> > > >>>>> easy as letting poetry/pip-tools help with figuring out what are
> > the
> > > >>>> latest
> > > >>>>> set of dependencies that will work without conflicts. It should
> be
> > > >>> rather
> > > >>>>> straightforward (I've done it in the past for fairly complex
> > > systems).
> > > >>>> What
> > > >>>>> those tools enable is - doing single-shot upgrade of all
> > > dependencies.
> > > >>>>> After doing it you can make sure that all tests work fine (and
> fix
> > > any
> > > >>>>> problems that result from it). And then you test it thoroughly
> > before
> > > >>> you
> > > >>>>> make final release. You can do it in separate PR - with automated
> > > >>> testing
> > > >>>>> in Travis which means that you are not disturbing work of others
> > > >>>>> (compilation/building + unit tests are guaranteed to work before
> > you
> > > >>>> merge
> > > >>>>> it) while doing it. It's all conscious rather than accidental.
> Nice
> > > >>> side
> > > >>>>> effect of that is that with every release you can actually
> > "catch-up"
> > > >>>> with
> > > >>>>> latest stable versions of many libraries in one go. It's better
> > than
> > > >>>>> waiting until someone deliberately upgrades to newer version (and
> > the
> > > >>>> rest
> > > >>>>> remain terribly out-dated as is the case for Airflow now).
> > > >>>>>
> > > >>>>> So a bit counterintuitively I think tools like pip-tools/poetry
> > help
> > > >>> you
> > > >>>> to
> > > >>>>> catch up faster in many cases. That is at least my experience so
> > far.
> > > >>>>>
> > > >>>>> Additionally, Airflow is an open system - if you have very
> specific
> > > >>> needs
> > > >>>>> for requirements, you might actually - in the very same way with
> > > >>>>> pip-tools/poetry - upgrade all your dependencies in your local
> fork
> > > of
> > > >>>>> Airflow before someone else does it in master/release. Those
> tools
> > > kind
> > > >>>> of
> > > >>>>> democratise dependency management. It should be as easy as
> > > `pip-compile
> > > >>>>> --upgrade` or `poetry update` and you will get all the
> > > >>> "non-conflicting"
> > > >>>>> latest dependencies in your local fork (and poetry especially
> seems
> > > to
> > > >>> do
> > > >>>>> all the heavy lifting of figuring out which versions will work).
> > You
> > > >>>> should
> > > >>>>> be able to test and publish it locally as your private package
> for
> > > >>> local
> > > >>>>> installations. You can even mark the specific dependency you want
> > to
> > > >>> use
> > > >>>>> specific version and let pip-tools/poetry figure out exact
> versions
> > > of
> > > >>>>> other requirements. You can even make a PR with such upgrade
> > > eventually
> > > >>>> to
> > > >>>>> get it faster in master. You can even downgrade in case newer
> > > >>> dependency
> > > >>>>> causes problems for you in similar way. Guided by the tools, it's
> > > much
> > > >>>>> faster than figuring the versions out by yourself.
> > > >>>>>
> > > >>>>> As long as we have simple way of managing it and document how to
> > > >>>>> upgrade/downgrade dependencies in your own fork, and mention how
> to
> > > >>>> locally
> > > >>>>> release Airflow as a package, I think your case could be covered
> > even
> > > >>>>> better than now. What do you think ?
> > > >>>>>
> > > >>>>> J.
> > > >>>>>
> > > >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> > > >>>>> <EK...@novozymes.com.invalid> wrote:
> > > >>>>>
> > > >>>>>> For us, exact pinning of versions would be problematic. We have
> > DAG
> > > >>>> code
> > > >>>>>> that shares direct and indirect dependencies with Airflow, e.g.
> > > lxml,
> > > >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3.
> If
> > > our
> > > >>>> DAG
> > > >>>>>> code for some reason needs a newer point release due to a bug
> > that's
> > > >>>>> fixed,
> > > >>>>>> then we can't cleanly build a virtual environment containing the
> > > >>> fixed
> > > >>>>>> version. For us, it's already a problem that Airflow has quite
> > > strict
> > > >>>>> (and
> > > >>>>>> sometimes old) requirements in setup.py.
> > > >>>>>>
> > > >>>>>> Erik
> > > >>>>>> ________________________________
> > > >>>>>> From: Jarek Potiuk <Ja...@polidea.com>
> > > >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
> > > >>>>>> To: dev@airflow.incubator.apache.org
> > > >>>>>> Subject: Re: Pinning dependencies for Apache Airflow
> > > >>>>>>
> > > >>>>>> I think one solution to release approach is to check as part of
> > > >>>> automated
> > > >>>>>> Travis build if all requirements are pinned with == (even the
> deep
> > > >>>> ones)
> > > >>>>>> and fail the build in case they are not for ALL versions
> > (including
> > > >>>>>> dev). And of course we should document the approach of
> > > >>>> releases/upgrades
> > > >>>>>> etc. If we do it all the time for development versions (which
> > seems
> > > >>>> quite
> > > >>>>>> doable), then transitively all the releases will also have
> pinned
> > > >>>>> versions
> > > >>>>>> and they will never try to upgrade any of the dependencies. In
> > > poetry
> > > >>>>>> (similarly in pip-tools with .in file) it is done by having a
> > .lock
> > > >>>> file
> > > >>>>>> that specifies exact versions of each package so it can be
> rather
> > > >>> easy
> > > >>>> to
> > > >>>>>> manage (so it's worth trying it out I think  :D  - seems a bit
> > more
> > > >>>>>> friendly than pip-tools).
> > > >>>>>>
> > > >>>>>> There is a drawback - of course - with manually updating the
> > module
> > > >>>> that
> > > >>>>>> you want, but I really see that as an advantage rather than
> > drawback
> > > >>>>>> especially for users. This way you maintain the property that it
> > > will
> > > >>>>>> always install and work the same way no matter if you installed
> it
> > > >>>> today
> > > >>>>> or
> > > >>>>>> two months ago. I think the biggest drawback for maintainers is
> > that
> > > >>>> you
> > > >>>>>> need some kind of monitoring of security vulnerabilities and
> > cannot
> > > >>>> rely
> > > >>>>> on
> > > >>>>>> automated security upgrades. With >= requirements those security
> > > >>>> updates
> > > >>>>>> might happen automatically without anyone noticing, but to be
> > honest
> > > >>> I
> > > >>>>>> don't think such upgrades are guaranteed even in current setup
> for
> > > >>> all
> > > >>>>>> security issues for all libraries anyway.
> > > >>>>>>
> > > >>>>>> Finding the need to upgrade because of security issues can be
> > quite
> > > >>>>>> automated. Even now I noticed Github started to inform owners
> > about
> > > >>>>>> potential security vulnerabilities in used libraries for their
> > > >>> project.
> > > >>>>>> Those notifications can be sent to devlist and turned into JIRA
> > > >>> issues
> > > >>>>>> followed bvy  minor security-related releases (with only few
> > library
> > > >>>>>> dependencies upgraded).
> > > >>>>>>
> > > >>>>>> I think it's even easier to automate it if you have pinned
> > > >>>> dependencies -
> > > >>>>>> because it's generally easy to find applicable vulnerabilities
> for
> > > >>>>> specific
> > > >>>>>> versions of libraries by static analysers - when you have >=,
> you
> > > >>> never
> > > >>>>>> know which version will be used until you actually perform the
> > > >>>>>> installation.
> > > >>>>>>
> > > >>>>>> There is one big advantage for maintainers for "pinned" case.
> Your
> > > >>>> users
> > > >>>>>> always have the same dependencies - so when issue is raised, you
> > can
> > > >>>>>> reproduce it more easily. It's hard to know which version user
> has
> > > >>> (as
> > > >>>>> the
> > > >>>>>> user could install it month ago or yesterday) and even if you
> find
> > > >>> out
> > > >>>> by
> > > >>>>>> asking the user, you might not be able to reproduce the set of
> > > >>>>> requirements
> > > >>>>>> easily (simply because there are already newer versions of the
> > > >>>> libraries
> > > >>>>>> released and they are used automatically). You can ask the user
> to
> > > >>> run
> > > >>>>> pip
> > > >>>>>> --upgrade but that's dangerous and pretty lame ("check the
> latest
> > > >>>>> version -
> > > >>>>>> maybe it fixes your problem ? ") and sometimes not possible
> (e.g.
> > > >>>> someone
> > > >>>>>> has pre-built docker image with dependencies from few months ago
> > and
> > > >>>>> cannot
> > > >>>>>> rebuild the image easily).
> > > >>>>>>
> > > >>>>>> J.
> > > >>>>>>
> > > >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <
> ash@apache.org
> > >
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>>> One thing to point out here.
> > > >>>>>>>
> > > >>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a clean
> > > >>>>>>> environment it will fail.
> > > >>>>>>>
> > > >>>>>>> This is because we pin flask-login to 0.2.1 but
> flask-appbuilder
> > is
> > > >>>>> =
> > > >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires flask-login >=
> > 0.3.
> > > >>>>>>>
> > > >>>>>>> So I do think there is maybe something to be said about pinning
> > for
> > > >>>>>>> releases. The down side to that is that if there are updates
> to a
> > > >>>>> module
> > > >>>>>>> that we want then we have to make a point release to let people
> > get
> > > >>>> it
> > > >>>>>>>
> > > >>>>>>> Both methods have draw-backs
> > > >>>>>>>
> > > >>>>>>> -ash
> > > >>>>>>>
> > > >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> > > >>> arthur.wiedmer@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>> Hi Jarek,
> > > >>>>>>>>
> > > >>>>>>>> I will +1 the discussion Dan is referring to and George's
> > advice.
> > > >>>>>>>>
> > > >>>>>>>> I just want to double check we are talking about pinning in
> > > >>>>>>>> requirements.txt only.
> > > >>>>>>>>
> > > >>>>>>>> This offers the ability to
> > > >>>>>>>> pip install -r requirements.txt
> > > >>>>>>>> pip install --no-deps airflow
> > > >>>>>>>> For a guaranteed install which works.
> > > >>>>>>>>
> > > >>>>>>>> Several different requirement files can be provided for
> specific
> > > >>>> use
> > > >>>>>>> cases,
> > > >>>>>>>> like a stable dev one for instance for people wanting to work
> on
> > > >>>>>>> operators
> > > >>>>>>>> and non-core functions.
> > > >>>>>>>>
> > > >>>>>>>> However, I think we should proactively test in CI against
> > > >>> unpinned
> > > >>>>>>>> dependencies (though it might be a separate case in the
> matrix)
> > ,
> > > >>>> so
> > > >>>>>> that
> > > >>>>>>>> we get advance warning if possible that things will break.
> > > >>>>>>>> CI downtime is not a bad thing here, it actually caught a
> > problem
> > > >>>> :)
> > > >>>>>>>>
> > > >>>>>>>> We should unpin as possible in setup.py to only maintain
> minimum
> > > >>>>>> required
> > > >>>>>>>> compatibility. The process of pinning in setup.py is extremely
> > > >>>>>>> detrimental
> > > >>>>>>>> when you have a large number of python libraries installed
> with
> > > >>>>>> different
> > > >>>>>>>> pinned versions.
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Arthur
> > > >>>>>>>>
> > > >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > >>>>>> <ddavydov@twitter.com.invalid
> > > >>>>>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Relevant discussion about this:
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=9wta3PcUeZjBg%2FmACBH06cNRzbYG4NcAW0XDJKan6cM%3D&amp;reserved=0
> > > >>>>>>>>>
> > > >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > >>>>>> Jarek.Potiuk@polidea.com>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> TL;DR; A change is coming in the way how
> > > >>>> dependencies/requirements
> > > >>>>>> are
> > > >>>>>>>>>> specified for Apache Airflow - they will be fixed rather
> than
> > > >>>>>> flexible
> > > >>>>>>>>> (==
> > > >>>>>>>>>> rather than >=).
> > > >>>>>>>>>>
> > > >>>>>>>>>> This is follow up after Slack discussion we had with Ash and
> > > >>>> Kaxil
> > > >>>>> -
> > > >>>>>>>>>> summarising what we propose we'll do.
> > > >>>>>>>>>>
> > > >>>>>>>>>> *Problem:*
> > > >>>>>>>>>> During last few weeks we experienced quite a few downtimes
> of
> > > >>>>>> TravisCI
> > > >>>>>>>>>> builds (for all PRs/branches including master) as some of
> the
> > > >>>>>>> transitive
> > > >>>>>>>>>> dependencies were automatically upgraded. This because in a
> > > >>>> number
> > > >>>>> of
> > > >>>>>>>>>> dependencies we have  >= rather than == dependencies.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Whenever there is a new release of such dependency, it might
> > > >>>> cause
> > > >>>>>>> chain
> > > >>>>>>>>>> reaction with upgrade of transitive dependencies which might
> > > >>> get
> > > >>>>> into
> > > >>>>>>>>>> conflict.
> > > >>>>>>>>>>
> > > >>>>>>>>>> An example was Flask-AppBuilder vs flask-login transitive
> > > >>>>> dependency
> > > >>>>>>> with
> > > >>>>>>>>>> click. They started to conflict once AppBuilder has released
> > > >>>>> version
> > > >>>>>>>>>> 1.12.0.
> > > >>>>>>>>>>
> > > >>>>>>>>>> *Diagnosis:*
> > > >>>>>>>>>> Transitive dependencies with "flexible" versions (where >=
> is
> > > >>>> used
> > > >>>>>>>>> instead
> > > >>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner or
> > > >>> later
> > > >>>>> hit
> > > >>>>>>>>> other
> > > >>>>>>>>>> cases where not fixed dependencies cause similar problems
> with
> > > >>>>> other
> > > >>>>>>>>>> transitive dependencies. We need to fix-pin them. This
> causes
> > > >>>>>> problems
> > > >>>>>>>>> for
> > > >>>>>>>>>> both - released versions (cause they stop to work!) and for
> > > >>>>>> development
> > > >>>>>>>>>> (cause they break master builds in TravisCI and prevent
> people
> > > >>>> from
> > > >>>>>>>>>> installing development environment from the scratch.
> > > >>>>>>>>>>
> > > >>>>>>>>>> *Solution:*
> > > >>>>>>>>>>
> > > >>>>>>>>>>  - Following the old-but-good post
> > > >>>>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=0jqlZcLU6%2BvO%2BJKSMlX7gyix6dKvD%2BZbrgHn9pRknLY%3D&amp;reserved=0
> > > >>>>>> we are going to fix the
> > > >>>>>>>>>> pinned
> > > >>>>>>>>>>  dependencies to specific versions (so basically all
> > > >>>> dependencies
> > > >>>>>> are
> > > >>>>>>>>>>  "fixed").
> > > >>>>>>>>>>  - We will introduce mechanism to be able to upgrade
> > > >>>> dependencies
> > > >>>>>> with
> > > >>>>>>>>>>  pip-tools (
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=hu%2FivDsKxwocNlVtBTgYE0E%2BET97u2DWN1IdnCF1ckU%3D&amp;reserved=0
> > > >>>>> ).
> > > >>>>>> We might also
> > > >>>>>>>>> take a
> > > >>>>>>>>>>  look at pipenv:
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=s0iqMPk3O8%2Bk1BCPBLYfIIMU2D4SdmPVEYELo%2FKS1%2FA%3D&amp;reserved=0
> > > >>>>>>>>>>  - People who would like to upgrade some dependencies for
> > > >>> their
> > > >>>>> PRs
> > > >>>>>>>>> will
> > > >>>>>>>>>>  still be able to do it - but such upgrades will be in their
> > > >>> PR
> > > >>>>> thus
> > > >>>>>>>>> they
> > > >>>>>>>>>>  will go through TravisCI tests and they will also have to
> be
> > > >>>>>>> specified
> > > >>>>>>>>>> with
> > > >>>>>>>>>>  pinned fixed versions (==). This should be part of review
> > > >>>> process
> > > >>>>>> to
> > > >>>>>>>>>> make
> > > >>>>>>>>>>  sure new/changed requirements are pinned.
> > > >>>>>>>>>>  - In release process there will be a point where an upgrade
> > > >>>> will
> > > >>>>> be
> > > >>>>>>>>>>  attempted for all requirements (using pip-tools) so that we
> > > >>> are
> > > >>>>> not
> > > >>>>>>>>>> stuck
> > > >>>>>>>>>>  with older releases. This will be in controlled PR
> > > >>> environment
> > > >>>>>> where
> > > >>>>>>>>>> there
> > > >>>>>>>>>>  will be time to fix all dependencies without impacting
> others
> > > >>>> and
> > > >>>>>>>>> likely
> > > >>>>>>>>>>  enough time to "vet" such changes (this can be done for
> > > >>>>> alpha/beta
> > > >>>>>>>>>> releases
> > > >>>>>>>>>>  for example).
> > > >>>>>>>>>>  - As a side effect dependencies specification will become
> far
> > > >>>>>> simpler
> > > >>>>>>>>>>  and straightforward.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Happy to hear community comments to the proposal. I am happy
> > to
> > > >>>>> take
> > > >>>>>> a
> > > >>>>>>>>> lead
> > > >>>>>>>>>> on that, open JIRA issue and implement if this is something
> > > >>>>> community
> > > >>>>>>> is
> > > >>>>>>>>>> happy with.
> > > >>>>>>>>>>
> > > >>>>>>>>>> J.
> > > >>>>>>>>>>
> > > >>>>>>>>>> --
> > > >>>>>>>>>>
> > > >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
> > > >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>>
> > > >>>>>> *Jarek Potiuk, Principal Software Engineer*
> > > >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>>
> > > >>>>> *Jarek Potiuk, Principal Software Engineer*
> > > >>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>>
> > > >>> *Jarek Potiuk, Principal Software Engineer*
> > > >>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > >>>
> > >
> > >
> >
>


-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Re: Pinning dependencies for Apache Airflow

Posted by Matt Davis <ji...@gmail.com>.
Erik, the Airflow task execution code itself of course must run somewhere
with Airflow installed, but if the task is making a database query or a web
request or running something in Docker there's separation between the
environments and maybe you don't care about Python dependencies at all
except to get Airflow running. When running Python operators that's not the
case (as you already deal with).

- Matt

On Tue, Oct 9, 2018 at 2:45 AM EKC (Erik Cederstrand)
<EK...@novozymes.com.invalid> wrote:

> This is maybe a stupid question, but is it even possible to run tasks in
> an environment where Airflow is not installed?
>
>
> Kind regards,
>
> Erik
>
> ________________________________
> From: Matt Davis <ji...@gmail.com>
> Sent: Monday, October 8, 2018 10:13:34 PM
> To: dev@airflow.incubator.apache.org
> Subject: Re: Pinning dependencies for Apache Airflow
>
> It sounds like we can get the best of both worlds with the original
> proposals to have minimal requirements in setup.py and "guaranteed to work"
> complete requirements in a separate file. That way we have flexibility for
> teams that run airflow and tasks in the same environment and guidance on a
> working set of requirements. (Disclaimer: I work on the same team as
> George.)
>
> Thanks,
> Matt
>
> On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>
> > Although I think I come down on the side against pinning, my reasons are
> > different.
> >
> > For the two (or more) people who have expressed concern about it would
> > pip's "Constraint Files" help:
> >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpip.pypa.io%2Fen%2Fstable%2Fuser_guide%2F%23constraints-files&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=rUqtgC5eVKIQGlzniFMyJpU9IXFZ2Efs04ZCgO2I%2F9g%3D&amp;reserved=0
> >
> > For example, you could add "flask-appbuilder==1.11.1" in to this file,
> > specify it with `pip install -c constraints.txt apache-airflow` and then
> > whenever pip attempted to install _any version of FAB it would use the
> > exact version from the constraints file.
> >
> > I don't buy the argument about pinning being a requirement for graduation
> > from Incubation fwiw - it's an unavoidable artefact of the open-source
> > world we develop in.
> >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibraries.io%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=QX5hO%2FVPJE9M9A38QgCjx%2BfT4C1tfvr1ySUW%2FpV86Jw%3D&amp;reserved=0
> offers a (free?) service that will monitor apps
> > dependencies for being out of date, might be better than writing our own
> > solution.
> >
> > Pip has for a while now supported a way of saying "this dep is for py2.7
> > only":
> >
> > > Since version 6.0, pip also supports specifiers containing environment
> > markers like so:
> > >
> > >    SomeProject ==5.4 ; python_version < '2.7'
> > >    SomeProject; sys_platform == 'win32'
> >
> >
> > Ash
> >
> >
> > > On 8 Oct 2018, at 07:58, George Leslie-Waksman <wa...@gmail.com>
> > wrote:
> > >
> > > As a member of a team that will also have really big problems if
> > > Airflow pins all requirements (for reasons similar to those already
> > > stated), I would like to add a very strong -1 to the idea of pinning
> > > them for all installations.
> > >
> > > In a number of situation on our end, to avoid similar problems with
> > > CI, we use `pip-compile` from pip-tools (also mentioned):
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fpip-tools%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1d9m%2Bk4NSuXNtnXFRFtv6pGdAUDvVvkoFe95pTshiIQ%3D&amp;reserved=0
> > >
> > > I would like to suggest, a middle ground of:
> > >
> > > - Have the installation continue to use unpinned (`>=`) with minimum
> > > necessary requirements set
> > > - Include a pip-compiled requirements file (`requirements-ci.txt`?)
> > > that is used by CI
> > > - - If we need, there can be one file for each incompatible python
> > version
> > > - Append a watermark (hash of `setup.py` requirements?) to the
> > > compiled requirements file
> > > - Add a CI check that the watermark and original match to ensure no
> > > drift since last compile
> > >
> > > I am happy to do much of the work for this, if it can help avoid
> > > pinning all of the depends at the installation level.
> > >
> > > --George Leslie-Waksman
> > >
> > > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
> > > <ma...@gmail.com> wrote:
> > >>
> > >> pip-tools can definitely help here to ship a reference [locked]
> > >> `requirements.txt` that can be used in [all or part of] the CI. It's
> > >> actually kind of important to get CI to fail when a new [backward
> > >> incompatible] lib comes out and break things while allowing version
> > ranges.
> > >>
> > >> I think there may be challenges around pip-tools and projects that run
> > in
> > >> both python2.7 and python3.6. You sometimes need to have 2
> > requirements.txt
> > >> lock files.
> > >>
> > >> Max
> > >>
> > >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > >> wrote:
> > >>
> > >>> It's a nice one :). However I think when/if we go to pinned
> > dependencies
> > >>> the way poetry/pip-tools do it, this will be suddenly lot-less useful
> > It
> > >>> will be very easy to track dependency changes (they will be always
> > >>> committed as a change in the .lock file or requirements.txt) and if
> > someone
> > >>> has a problem while upgrading a dependency (always consciously, never
> > >>> accidentally) it will simply fail during CI build and the change
> won't
> > get
> > >>> merged/won't break the builds of others in the first place :).
> > >>>
> > >>> J.
> > >>>
> > >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd...@gmail.com>
> > wrote:
> > >>>
> > >>>> Hi folks,
> > >>>>
> > >>>> On top of this discussion, I was thinking we should have the ability
> > to
> > >>>> quickly monitor dependency release as well. Previously, it happened
> > for a
> > >>>> few times that CI kept failing for no reason and eventually turned
> > out it
> > >>>> was due to dependency release. But it took us some time, sometimes a
> > few
> > >>>> days, to realise the failure was because of dependency release.
> > >>>>
> > >>>> To partially address this, I tried to develop a mini tool to help us
> > >>> check
> > >>>> the latest release of Python packages & the release date-time on
> PyPi.
> > >>> So,
> > >>>> by comparing it with our CI failure history, we may be able to
> > >>> troubleshoot
> > >>>> faster.
> > >>>>
> > >>>> Output Sample (ordered by upload time in desc order):
> > >>>>                               Latest Version          Upload Time
> > >>>> Package Name
> > >>>> awscli                    1.16.28
> > >>> 2018-10-05T23:12:45
> > >>>> botocore                1.12.18
> > 2018-10-05T23:12:39
> > >>>> promise                   2.2.1
> > >>> 2018-10-04T22:04:18
> > >>>> Keras                     2.2.4
> > >>> 2018-10-03T20:59:39
> > >>>> bleach                    3.0.0
> > >>> 2018-10-03T16:54:27
> > >>>> Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
> > >>>> ... ...
> > >>>>
> > >>>> It's a minimal tool (not perfect yet but working). I have hosted
> this
> > >>> tool
> > >>>> at
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXD-DENG%2Fpypi-release-query&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=xk9hyQA%2BnaJjqPF7bTQB%2BydqSfGIVzxkynfxjx%2FVoYo%3D&amp;reserved=0
> .
> > >>>>
> > >>>>
> > >>>> XD
> > >>>>
> > >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > >>>> wrote:
> > >>>>
> > >>>>> Hello Erik,
> > >>>>>
> > >>>>> I understand your concern. It's a hard one to solve in general
> (i.e.
> > >>>>> dependency-hell). It looks like in this case you treat Airflow as
> > >>>>> 'library', where for some other people it might be more like 'end
> > >>>> product'.
> > >>>>> If you look at the "pinning" philosophy - the "pin everything" is
> > good
> > >>>> for
> > >>>>> end products, but not good for libraries. In the case you have
> > Airflow
> > >>> is
> > >>>>> treated as a bit of both. And it's perfectly valid case at that
> (with
> > >>>>> custom python DAGs being central concept for Airflow).
> > >>>>> However, I think it's not as bad as you think when it comes to
> exact
> > >>>>> pinning.
> > >>>>>
> > >>>>> I believe - a bit counter-intuitively - that tools like
> > >>> pip-tools/poetry
> > >>>>> with exact pinning result in having your dependencies upgraded more
> > >>>> often,
> > >>>>> rather than less - especially in complex systems where
> > dependency-hell
> > >>>>> creeps-in. If you look at Airflow's setup.py now - It's a bit scary
> > to
> > >>>> make
> > >>>>> any change to it. There is a chance it will blow at your face if
> you
> > >>>> change
> > >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you change
> > it,
> > >>>>> whether it will cause chain reaction of conflicts that will ruin
> your
> > >>>> work
> > >>>>> day.
> > >>>>>
> > >>>>> On the contrary - if you change it to exact pinning in
> > >>>>> .lock/requirements.txt file (poetry/pip-tools) and have much
> simpler
> > >>> (and
> > >>>>> commented) exclusion/avoidance rules in your .in/.tml file, the
> whole
> > >>>> setup
> > >>>>> might be much easier to maintain and upgrade. Every time you
> prepare
> > >>> for
> > >>>>> release (or even once in a while for master) one person might
> > >>> consciously
> > >>>>> attempt to upgrade all dependencies to latest ones. It should be
> > almost
> > >>>> as
> > >>>>> easy as letting poetry/pip-tools help with figuring out what are
> the
> > >>>> latest
> > >>>>> set of dependencies that will work without conflicts. It should be
> > >>> rather
> > >>>>> straightforward (I've done it in the past for fairly complex
> > systems).
> > >>>> What
> > >>>>> those tools enable is - doing single-shot upgrade of all
> > dependencies.
> > >>>>> After doing it you can make sure that all tests work fine (and fix
> > any
> > >>>>> problems that result from it). And then you test it thoroughly
> before
> > >>> you
> > >>>>> make final release. You can do it in separate PR - with automated
> > >>> testing
> > >>>>> in Travis which means that you are not disturbing work of others
> > >>>>> (compilation/building + unit tests are guaranteed to work before
> you
> > >>>> merge
> > >>>>> it) while doing it. It's all conscious rather than accidental. Nice
> > >>> side
> > >>>>> effect of that is that with every release you can actually
> "catch-up"
> > >>>> with
> > >>>>> latest stable versions of many libraries in one go. It's better
> than
> > >>>>> waiting until someone deliberately upgrades to newer version (and
> the
> > >>>> rest
> > >>>>> remain terribly out-dated as is the case for Airflow now).
> > >>>>>
> > >>>>> So a bit counterintuitively I think tools like pip-tools/poetry
> help
> > >>> you
> > >>>> to
> > >>>>> catch up faster in many cases. That is at least my experience so
> far.
> > >>>>>
> > >>>>> Additionally, Airflow is an open system - if you have very specific
> > >>> needs
> > >>>>> for requirements, you might actually - in the very same way with
> > >>>>> pip-tools/poetry - upgrade all your dependencies in your local fork
> > of
> > >>>>> Airflow before someone else does it in master/release. Those tools
> > kind
> > >>>> of
> > >>>>> democratise dependency management. It should be as easy as
> > `pip-compile
> > >>>>> --upgrade` or `poetry update` and you will get all the
> > >>> "non-conflicting"
> > >>>>> latest dependencies in your local fork (and poetry especially seems
> > to
> > >>> do
> > >>>>> all the heavy lifting of figuring out which versions will work).
> You
> > >>>> should
> > >>>>> be able to test and publish it locally as your private package for
> > >>> local
> > >>>>> installations. You can even mark the specific dependency you want
> to
> > >>> use
> > >>>>> specific version and let pip-tools/poetry figure out exact versions
> > of
> > >>>>> other requirements. You can even make a PR with such upgrade
> > eventually
> > >>>> to
> > >>>>> get it faster in master. You can even downgrade in case newer
> > >>> dependency
> > >>>>> causes problems for you in similar way. Guided by the tools, it's
> > much
> > >>>>> faster than figuring the versions out by yourself.
> > >>>>>
> > >>>>> As long as we have simple way of managing it and document how to
> > >>>>> upgrade/downgrade dependencies in your own fork, and mention how to
> > >>>> locally
> > >>>>> release Airflow as a package, I think your case could be covered
> even
> > >>>>> better than now. What do you think ?
> > >>>>>
> > >>>>> J.
> > >>>>>
> > >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> > >>>>> <EK...@novozymes.com.invalid> wrote:
> > >>>>>
> > >>>>>> For us, exact pinning of versions would be problematic. We have
> DAG
> > >>>> code
> > >>>>>> that shares direct and indirect dependencies with Airflow, e.g.
> > lxml,
> > >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If
> > our
> > >>>> DAG
> > >>>>>> code for some reason needs a newer point release due to a bug
> that's
> > >>>>> fixed,
> > >>>>>> then we can't cleanly build a virtual environment containing the
> > >>> fixed
> > >>>>>> version. For us, it's already a problem that Airflow has quite
> > strict
> > >>>>> (and
> > >>>>>> sometimes old) requirements in setup.py.
> > >>>>>>
> > >>>>>> Erik
> > >>>>>> ________________________________
> > >>>>>> From: Jarek Potiuk <Ja...@polidea.com>
> > >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
> > >>>>>> To: dev@airflow.incubator.apache.org
> > >>>>>> Subject: Re: Pinning dependencies for Apache Airflow
> > >>>>>>
> > >>>>>> I think one solution to release approach is to check as part of
> > >>>> automated
> > >>>>>> Travis build if all requirements are pinned with == (even the deep
> > >>>> ones)
> > >>>>>> and fail the build in case they are not for ALL versions
> (including
> > >>>>>> dev). And of course we should document the approach of
> > >>>> releases/upgrades
> > >>>>>> etc. If we do it all the time for development versions (which
> seems
> > >>>> quite
> > >>>>>> doable), then transitively all the releases will also have pinned
> > >>>>> versions
> > >>>>>> and they will never try to upgrade any of the dependencies. In
> > poetry
> > >>>>>> (similarly in pip-tools with .in file) it is done by having a
> .lock
> > >>>> file
> > >>>>>> that specifies exact versions of each package so it can be rather
> > >>> easy
> > >>>> to
> > >>>>>> manage (so it's worth trying it out I think  :D  - seems a bit
> more
> > >>>>>> friendly than pip-tools).
> > >>>>>>
> > >>>>>> There is a drawback - of course - with manually updating the
> module
> > >>>> that
> > >>>>>> you want, but I really see that as an advantage rather than
> drawback
> > >>>>>> especially for users. This way you maintain the property that it
> > will
> > >>>>>> always install and work the same way no matter if you installed it
> > >>>> today
> > >>>>> or
> > >>>>>> two months ago. I think the biggest drawback for maintainers is
> that
> > >>>> you
> > >>>>>> need some kind of monitoring of security vulnerabilities and
> cannot
> > >>>> rely
> > >>>>> on
> > >>>>>> automated security upgrades. With >= requirements those security
> > >>>> updates
> > >>>>>> might happen automatically without anyone noticing, but to be
> honest
> > >>> I
> > >>>>>> don't think such upgrades are guaranteed even in current setup for
> > >>> all
> > >>>>>> security issues for all libraries anyway.
> > >>>>>>
> > >>>>>> Finding the need to upgrade because of security issues can be
> quite
> > >>>>>> automated. Even now I noticed Github started to inform owners
> about
> > >>>>>> potential security vulnerabilities in used libraries for their
> > >>> project.
> > >>>>>> Those notifications can be sent to devlist and turned into JIRA
> > >>> issues
> > >>>>>> followed bvy  minor security-related releases (with only few
> library
> > >>>>>> dependencies upgraded).
> > >>>>>>
> > >>>>>> I think it's even easier to automate it if you have pinned
> > >>>> dependencies -
> > >>>>>> because it's generally easy to find applicable vulnerabilities for
> > >>>>> specific
> > >>>>>> versions of libraries by static analysers - when you have >=, you
> > >>> never
> > >>>>>> know which version will be used until you actually perform the
> > >>>>>> installation.
> > >>>>>>
> > >>>>>> There is one big advantage for maintainers for "pinned" case. Your
> > >>>> users
> > >>>>>> always have the same dependencies - so when issue is raised, you
> can
> > >>>>>> reproduce it more easily. It's hard to know which version user has
> > >>> (as
> > >>>>> the
> > >>>>>> user could install it month ago or yesterday) and even if you find
> > >>> out
> > >>>> by
> > >>>>>> asking the user, you might not be able to reproduce the set of
> > >>>>> requirements
> > >>>>>> easily (simply because there are already newer versions of the
> > >>>> libraries
> > >>>>>> released and they are used automatically). You can ask the user to
> > >>> run
> > >>>>> pip
> > >>>>>> --upgrade but that's dangerous and pretty lame ("check the latest
> > >>>>> version -
> > >>>>>> maybe it fixes your problem ? ") and sometimes not possible (e.g.
> > >>>> someone
> > >>>>>> has pre-built docker image with dependencies from few months ago
> and
> > >>>>> cannot
> > >>>>>> rebuild the image easily).
> > >>>>>>
> > >>>>>> J.
> > >>>>>>
> > >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <ash@apache.org
> >
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> One thing to point out here.
> > >>>>>>>
> > >>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a clean
> > >>>>>>> environment it will fail.
> > >>>>>>>
> > >>>>>>> This is because we pin flask-login to 0.2.1 but flask-appbuilder
> is
> > >>>>> =
> > >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires flask-login >=
> 0.3.
> > >>>>>>>
> > >>>>>>> So I do think there is maybe something to be said about pinning
> for
> > >>>>>>> releases. The down side to that is that if there are updates to a
> > >>>>> module
> > >>>>>>> that we want then we have to make a point release to let people
> get
> > >>>> it
> > >>>>>>>
> > >>>>>>> Both methods have draw-backs
> > >>>>>>>
> > >>>>>>> -ash
> > >>>>>>>
> > >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> > >>> arthur.wiedmer@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi Jarek,
> > >>>>>>>>
> > >>>>>>>> I will +1 the discussion Dan is referring to and George's
> advice.
> > >>>>>>>>
> > >>>>>>>> I just want to double check we are talking about pinning in
> > >>>>>>>> requirements.txt only.
> > >>>>>>>>
> > >>>>>>>> This offers the ability to
> > >>>>>>>> pip install -r requirements.txt
> > >>>>>>>> pip install --no-deps airflow
> > >>>>>>>> For a guaranteed install which works.
> > >>>>>>>>
> > >>>>>>>> Several different requirement files can be provided for specific
> > >>>> use
> > >>>>>>> cases,
> > >>>>>>>> like a stable dev one for instance for people wanting to work on
> > >>>>>>> operators
> > >>>>>>>> and non-core functions.
> > >>>>>>>>
> > >>>>>>>> However, I think we should proactively test in CI against
> > >>> unpinned
> > >>>>>>>> dependencies (though it might be a separate case in the matrix)
> ,
> > >>>> so
> > >>>>>> that
> > >>>>>>>> we get advance warning if possible that things will break.
> > >>>>>>>> CI downtime is not a bad thing here, it actually caught a
> problem
> > >>>> :)
> > >>>>>>>>
> > >>>>>>>> We should unpin as possible in setup.py to only maintain minimum
> > >>>>>> required
> > >>>>>>>> compatibility. The process of pinning in setup.py is extremely
> > >>>>>>> detrimental
> > >>>>>>>> when you have a large number of python libraries installed with
> > >>>>>> different
> > >>>>>>>> pinned versions.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Arthur
> > >>>>>>>>
> > >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > >>>>>> <ddavydov@twitter.com.invalid
> > >>>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Relevant discussion about this:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=9wta3PcUeZjBg%2FmACBH06cNRzbYG4NcAW0XDJKan6cM%3D&amp;reserved=0
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > >>>>>> Jarek.Potiuk@polidea.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> TL;DR; A change is coming in the way how
> > >>>> dependencies/requirements
> > >>>>>> are
> > >>>>>>>>>> specified for Apache Airflow - they will be fixed rather than
> > >>>>>> flexible
> > >>>>>>>>> (==
> > >>>>>>>>>> rather than >=).
> > >>>>>>>>>>
> > >>>>>>>>>> This is follow up after Slack discussion we had with Ash and
> > >>>> Kaxil
> > >>>>> -
> > >>>>>>>>>> summarising what we propose we'll do.
> > >>>>>>>>>>
> > >>>>>>>>>> *Problem:*
> > >>>>>>>>>> During last few weeks we experienced quite a few downtimes of
> > >>>>>> TravisCI
> > >>>>>>>>>> builds (for all PRs/branches including master) as some of the
> > >>>>>>> transitive
> > >>>>>>>>>> dependencies were automatically upgraded. This because in a
> > >>>> number
> > >>>>> of
> > >>>>>>>>>> dependencies we have  >= rather than == dependencies.
> > >>>>>>>>>>
> > >>>>>>>>>> Whenever there is a new release of such dependency, it might
> > >>>> cause
> > >>>>>>> chain
> > >>>>>>>>>> reaction with upgrade of transitive dependencies which might
> > >>> get
> > >>>>> into
> > >>>>>>>>>> conflict.
> > >>>>>>>>>>
> > >>>>>>>>>> An example was Flask-AppBuilder vs flask-login transitive
> > >>>>> dependency
> > >>>>>>> with
> > >>>>>>>>>> click. They started to conflict once AppBuilder has released
> > >>>>> version
> > >>>>>>>>>> 1.12.0.
> > >>>>>>>>>>
> > >>>>>>>>>> *Diagnosis:*
> > >>>>>>>>>> Transitive dependencies with "flexible" versions (where >= is
> > >>>> used
> > >>>>>>>>> instead
> > >>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner or
> > >>> later
> > >>>>> hit
> > >>>>>>>>> other
> > >>>>>>>>>> cases where not fixed dependencies cause similar problems with
> > >>>>> other
> > >>>>>>>>>> transitive dependencies. We need to fix-pin them. This causes
> > >>>>>> problems
> > >>>>>>>>> for
> > >>>>>>>>>> both - released versions (cause they stop to work!) and for
> > >>>>>> development
> > >>>>>>>>>> (cause they break master builds in TravisCI and prevent people
> > >>>> from
> > >>>>>>>>>> installing development environment from the scratch.
> > >>>>>>>>>>
> > >>>>>>>>>> *Solution:*
> > >>>>>>>>>>
> > >>>>>>>>>>  - Following the old-but-good post
> > >>>>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=0jqlZcLU6%2BvO%2BJKSMlX7gyix6dKvD%2BZbrgHn9pRknLY%3D&amp;reserved=0
> > >>>>>> we are going to fix the
> > >>>>>>>>>> pinned
> > >>>>>>>>>>  dependencies to specific versions (so basically all
> > >>>> dependencies
> > >>>>>> are
> > >>>>>>>>>>  "fixed").
> > >>>>>>>>>>  - We will introduce mechanism to be able to upgrade
> > >>>> dependencies
> > >>>>>> with
> > >>>>>>>>>>  pip-tools (
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=hu%2FivDsKxwocNlVtBTgYE0E%2BET97u2DWN1IdnCF1ckU%3D&amp;reserved=0
> > >>>>> ).
> > >>>>>> We might also
> > >>>>>>>>> take a
> > >>>>>>>>>>  look at pipenv:
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=s0iqMPk3O8%2Bk1BCPBLYfIIMU2D4SdmPVEYELo%2FKS1%2FA%3D&amp;reserved=0
> > >>>>>>>>>>  - People who would like to upgrade some dependencies for
> > >>> their
> > >>>>> PRs
> > >>>>>>>>> will
> > >>>>>>>>>>  still be able to do it - but such upgrades will be in their
> > >>> PR
> > >>>>> thus
> > >>>>>>>>> they
> > >>>>>>>>>>  will go through TravisCI tests and they will also have to be
> > >>>>>>> specified
> > >>>>>>>>>> with
> > >>>>>>>>>>  pinned fixed versions (==). This should be part of review
> > >>>> process
> > >>>>>> to
> > >>>>>>>>>> make
> > >>>>>>>>>>  sure new/changed requirements are pinned.
> > >>>>>>>>>>  - In release process there will be a point where an upgrade
> > >>>> will
> > >>>>> be
> > >>>>>>>>>>  attempted for all requirements (using pip-tools) so that we
> > >>> are
> > >>>>> not
> > >>>>>>>>>> stuck
> > >>>>>>>>>>  with older releases. This will be in controlled PR
> > >>> environment
> > >>>>>> where
> > >>>>>>>>>> there
> > >>>>>>>>>>  will be time to fix all dependencies without impacting others
> > >>>> and
> > >>>>>>>>> likely
> > >>>>>>>>>>  enough time to "vet" such changes (this can be done for
> > >>>>> alpha/beta
> > >>>>>>>>>> releases
> > >>>>>>>>>>  for example).
> > >>>>>>>>>>  - As a side effect dependencies specification will become far
> > >>>>>> simpler
> > >>>>>>>>>>  and straightforward.
> > >>>>>>>>>>
> > >>>>>>>>>> Happy to hear community comments to the proposal. I am happy
> to
> > >>>>> take
> > >>>>>> a
> > >>>>>>>>> lead
> > >>>>>>>>>> on that, open JIRA issue and implement if this is something
> > >>>>> community
> > >>>>>>> is
> > >>>>>>>>>> happy with.
> > >>>>>>>>>>
> > >>>>>>>>>> J.
> > >>>>>>>>>>
> > >>>>>>>>>> --
> > >>>>>>>>>>
> > >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
> > >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> <+48%20660%20796%20129>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>>
> > >>>>>> *Jarek Potiuk, Principal Software Engineer*
> > >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> <+48%20660%20796%20129>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>>
> > >>>>> *Jarek Potiuk, Principal Software Engineer*
> > >>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> <+48%20660%20796%20129>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> *Jarek Potiuk, Principal Software Engineer*
> > >>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> <+48%20660%20796%20129>
> > >>>
> >
> >
>

Re: Pinning dependencies for Apache Airflow

Posted by "EKC (Erik Cederstrand)" <EK...@novozymes.com.INVALID>.
This is maybe a stupid question, but is it even possible to run tasks in an environment where Airflow is not installed?


Kind regards,

Erik

________________________________
From: Matt Davis <ji...@gmail.com>
Sent: Monday, October 8, 2018 10:13:34 PM
To: dev@airflow.incubator.apache.org
Subject: Re: Pinning dependencies for Apache Airflow

It sounds like we can get the best of both worlds with the original
proposals to have minimal requirements in setup.py and "guaranteed to work"
complete requirements in a separate file. That way we have flexibility for
teams that run airflow and tasks in the same environment and guidance on a
working set of requirements. (Disclaimer: I work on the same team as
George.)

Thanks,
Matt

On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> Although I think I come down on the side against pinning, my reasons are
> different.
>
> For the two (or more) people who have expressed concern about it would
> pip's "Constraint Files" help:
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpip.pypa.io%2Fen%2Fstable%2Fuser_guide%2F%23constraints-files&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=rUqtgC5eVKIQGlzniFMyJpU9IXFZ2Efs04ZCgO2I%2F9g%3D&amp;reserved=0
>
> For example, you could add "flask-appbuilder==1.11.1" in to this file,
> specify it with `pip install -c constraints.txt apache-airflow` and then
> whenever pip attempted to install _any version of FAB it would use the
> exact version from the constraints file.
>
> I don't buy the argument about pinning being a requirement for graduation
> from Incubation fwiw - it's an unavoidable artefact of the open-source
> world we develop in.
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibraries.io%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=QX5hO%2FVPJE9M9A38QgCjx%2BfT4C1tfvr1ySUW%2FpV86Jw%3D&amp;reserved=0 offers a (free?) service that will monitor apps
> dependencies for being out of date, might be better than writing our own
> solution.
>
> Pip has for a while now supported a way of saying "this dep is for py2.7
> only":
>
> > Since version 6.0, pip also supports specifiers containing environment
> markers like so:
> >
> >    SomeProject ==5.4 ; python_version < '2.7'
> >    SomeProject; sys_platform == 'win32'
>
>
> Ash
>
>
> > On 8 Oct 2018, at 07:58, George Leslie-Waksman <wa...@gmail.com>
> wrote:
> >
> > As a member of a team that will also have really big problems if
> > Airflow pins all requirements (for reasons similar to those already
> > stated), I would like to add a very strong -1 to the idea of pinning
> > them for all installations.
> >
> > In a number of situation on our end, to avoid similar problems with
> > CI, we use `pip-compile` from pip-tools (also mentioned):
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fpip-tools%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1d9m%2Bk4NSuXNtnXFRFtv6pGdAUDvVvkoFe95pTshiIQ%3D&amp;reserved=0
> >
> > I would like to suggest, a middle ground of:
> >
> > - Have the installation continue to use unpinned (`>=`) with minimum
> > necessary requirements set
> > - Include a pip-compiled requirements file (`requirements-ci.txt`?)
> > that is used by CI
> > - - If we need, there can be one file for each incompatible python
> version
> > - Append a watermark (hash of `setup.py` requirements?) to the
> > compiled requirements file
> > - Add a CI check that the watermark and original match to ensure no
> > drift since last compile
> >
> > I am happy to do much of the work for this, if it can help avoid
> > pinning all of the depends at the installation level.
> >
> > --George Leslie-Waksman
> >
> > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
> > <ma...@gmail.com> wrote:
> >>
> >> pip-tools can definitely help here to ship a reference [locked]
> >> `requirements.txt` that can be used in [all or part of] the CI. It's
> >> actually kind of important to get CI to fail when a new [backward
> >> incompatible] lib comes out and break things while allowing version
> ranges.
> >>
> >> I think there may be challenges around pip-tools and projects that run
> in
> >> both python2.7 and python3.6. You sometimes need to have 2
> requirements.txt
> >> lock files.
> >>
> >> Max
> >>
> >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <Ja...@polidea.com>
> >> wrote:
> >>
> >>> It's a nice one :). However I think when/if we go to pinned
> dependencies
> >>> the way poetry/pip-tools do it, this will be suddenly lot-less useful
> It
> >>> will be very easy to track dependency changes (they will be always
> >>> committed as a change in the .lock file or requirements.txt) and if
> someone
> >>> has a problem while upgrading a dependency (always consciously, never
> >>> accidentally) it will simply fail during CI build and the change won't
> get
> >>> merged/won't break the builds of others in the first place :).
> >>>
> >>> J.
> >>>
> >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd...@gmail.com>
> wrote:
> >>>
> >>>> Hi folks,
> >>>>
> >>>> On top of this discussion, I was thinking we should have the ability
> to
> >>>> quickly monitor dependency release as well. Previously, it happened
> for a
> >>>> few times that CI kept failing for no reason and eventually turned
> out it
> >>>> was due to dependency release. But it took us some time, sometimes a
> few
> >>>> days, to realise the failure was because of dependency release.
> >>>>
> >>>> To partially address this, I tried to develop a mini tool to help us
> >>> check
> >>>> the latest release of Python packages & the release date-time on PyPi.
> >>> So,
> >>>> by comparing it with our CI failure history, we may be able to
> >>> troubleshoot
> >>>> faster.
> >>>>
> >>>> Output Sample (ordered by upload time in desc order):
> >>>>                               Latest Version          Upload Time
> >>>> Package Name
> >>>> awscli                    1.16.28
> >>> 2018-10-05T23:12:45
> >>>> botocore                1.12.18
> 2018-10-05T23:12:39
> >>>> promise                   2.2.1
> >>> 2018-10-04T22:04:18
> >>>> Keras                     2.2.4
> >>> 2018-10-03T20:59:39
> >>>> bleach                    3.0.0
> >>> 2018-10-03T16:54:27
> >>>> Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
> >>>> ... ...
> >>>>
> >>>> It's a minimal tool (not perfect yet but working). I have hosted this
> >>> tool
> >>>> at https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXD-DENG%2Fpypi-release-query&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=xk9hyQA%2BnaJjqPF7bTQB%2BydqSfGIVzxkynfxjx%2FVoYo%3D&amp;reserved=0.
> >>>>
> >>>>
> >>>> XD
> >>>>
> >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> >>>> wrote:
> >>>>
> >>>>> Hello Erik,
> >>>>>
> >>>>> I understand your concern. It's a hard one to solve in general (i.e.
> >>>>> dependency-hell). It looks like in this case you treat Airflow as
> >>>>> 'library', where for some other people it might be more like 'end
> >>>> product'.
> >>>>> If you look at the "pinning" philosophy - the "pin everything" is
> good
> >>>> for
> >>>>> end products, but not good for libraries. In the case you have
> Airflow
> >>> is
> >>>>> treated as a bit of both. And it's perfectly valid case at that (with
> >>>>> custom python DAGs being central concept for Airflow).
> >>>>> However, I think it's not as bad as you think when it comes to exact
> >>>>> pinning.
> >>>>>
> >>>>> I believe - a bit counter-intuitively - that tools like
> >>> pip-tools/poetry
> >>>>> with exact pinning result in having your dependencies upgraded more
> >>>> often,
> >>>>> rather than less - especially in complex systems where
> dependency-hell
> >>>>> creeps-in. If you look at Airflow's setup.py now - It's a bit scary
> to
> >>>> make
> >>>>> any change to it. There is a chance it will blow at your face if you
> >>>> change
> >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you change
> it,
> >>>>> whether it will cause chain reaction of conflicts that will ruin your
> >>>> work
> >>>>> day.
> >>>>>
> >>>>> On the contrary - if you change it to exact pinning in
> >>>>> .lock/requirements.txt file (poetry/pip-tools) and have much simpler
> >>> (and
> >>>>> commented) exclusion/avoidance rules in your .in/.tml file, the whole
> >>>> setup
> >>>>> might be much easier to maintain and upgrade. Every time you prepare
> >>> for
> >>>>> release (or even once in a while for master) one person might
> >>> consciously
> >>>>> attempt to upgrade all dependencies to latest ones. It should be
> almost
> >>>> as
> >>>>> easy as letting poetry/pip-tools help with figuring out what are the
> >>>> latest
> >>>>> set of dependencies that will work without conflicts. It should be
> >>> rather
> >>>>> straightforward (I've done it in the past for fairly complex
> systems).
> >>>> What
> >>>>> those tools enable is - doing single-shot upgrade of all
> dependencies.
> >>>>> After doing it you can make sure that all tests work fine (and fix
> any
> >>>>> problems that result from it). And then you test it thoroughly before
> >>> you
> >>>>> make final release. You can do it in separate PR - with automated
> >>> testing
> >>>>> in Travis which means that you are not disturbing work of others
> >>>>> (compilation/building + unit tests are guaranteed to work before you
> >>>> merge
> >>>>> it) while doing it. It's all conscious rather than accidental. Nice
> >>> side
> >>>>> effect of that is that with every release you can actually "catch-up"
> >>>> with
> >>>>> latest stable versions of many libraries in one go. It's better than
> >>>>> waiting until someone deliberately upgrades to newer version (and the
> >>>> rest
> >>>>> remain terribly out-dated as is the case for Airflow now).
> >>>>>
> >>>>> So a bit counterintuitively I think tools like pip-tools/poetry help
> >>> you
> >>>> to
> >>>>> catch up faster in many cases. That is at least my experience so far.
> >>>>>
> >>>>> Additionally, Airflow is an open system - if you have very specific
> >>> needs
> >>>>> for requirements, you might actually - in the very same way with
> >>>>> pip-tools/poetry - upgrade all your dependencies in your local fork
> of
> >>>>> Airflow before someone else does it in master/release. Those tools
> kind
> >>>> of
> >>>>> democratise dependency management. It should be as easy as
> `pip-compile
> >>>>> --upgrade` or `poetry update` and you will get all the
> >>> "non-conflicting"
> >>>>> latest dependencies in your local fork (and poetry especially seems
> to
> >>> do
> >>>>> all the heavy lifting of figuring out which versions will work). You
> >>>> should
> >>>>> be able to test and publish it locally as your private package for
> >>> local
> >>>>> installations. You can even mark the specific dependency you want to
> >>> use
> >>>>> specific version and let pip-tools/poetry figure out exact versions
> of
> >>>>> other requirements. You can even make a PR with such upgrade
> eventually
> >>>> to
> >>>>> get it faster in master. You can even downgrade in case newer
> >>> dependency
> >>>>> causes problems for you in similar way. Guided by the tools, it's
> much
> >>>>> faster than figuring the versions out by yourself.
> >>>>>
> >>>>> As long as we have simple way of managing it and document how to
> >>>>> upgrade/downgrade dependencies in your own fork, and mention how to
> >>>> locally
> >>>>> release Airflow as a package, I think your case could be covered even
> >>>>> better than now. What do you think ?
> >>>>>
> >>>>> J.
> >>>>>
> >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> >>>>> <EK...@novozymes.com.invalid> wrote:
> >>>>>
> >>>>>> For us, exact pinning of versions would be problematic. We have DAG
> >>>> code
> >>>>>> that shares direct and indirect dependencies with Airflow, e.g.
> lxml,
> >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If
> our
> >>>> DAG
> >>>>>> code for some reason needs a newer point release due to a bug that's
> >>>>> fixed,
> >>>>>> then we can't cleanly build a virtual environment containing the
> >>> fixed
> >>>>>> version. For us, it's already a problem that Airflow has quite
> strict
> >>>>> (and
> >>>>>> sometimes old) requirements in setup.py.
> >>>>>>
> >>>>>> Erik
> >>>>>> ________________________________
> >>>>>> From: Jarek Potiuk <Ja...@polidea.com>
> >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
> >>>>>> To: dev@airflow.incubator.apache.org
> >>>>>> Subject: Re: Pinning dependencies for Apache Airflow
> >>>>>>
> >>>>>> I think one solution to release approach is to check as part of
> >>>> automated
> >>>>>> Travis build if all requirements are pinned with == (even the deep
> >>>> ones)
> >>>>>> and fail the build in case they are not for ALL versions (including
> >>>>>> dev). And of course we should document the approach of
> >>>> releases/upgrades
> >>>>>> etc. If we do it all the time for development versions (which seems
> >>>> quite
> >>>>>> doable), then transitively all the releases will also have pinned
> >>>>> versions
> >>>>>> and they will never try to upgrade any of the dependencies. In
> poetry
> >>>>>> (similarly in pip-tools with .in file) it is done by having a .lock
> >>>> file
> >>>>>> that specifies exact versions of each package so it can be rather
> >>> easy
> >>>> to
> >>>>>> manage (so it's worth trying it out I think  :D  - seems a bit more
> >>>>>> friendly than pip-tools).
> >>>>>>
> >>>>>> There is a drawback - of course - with manually updating the module
> >>>> that
> >>>>>> you want, but I really see that as an advantage rather than drawback
> >>>>>> especially for users. This way you maintain the property that it
> will
> >>>>>> always install and work the same way no matter if you installed it
> >>>> today
> >>>>> or
> >>>>>> two months ago. I think the biggest drawback for maintainers is that
> >>>> you
> >>>>>> need some kind of monitoring of security vulnerabilities and cannot
> >>>> rely
> >>>>> on
> >>>>>> automated security upgrades. With >= requirements those security
> >>>> updates
> >>>>>> might happen automatically without anyone noticing, but to be honest
> >>> I
> >>>>>> don't think such upgrades are guaranteed even in current setup for
> >>> all
> >>>>>> security issues for all libraries anyway.
> >>>>>>
> >>>>>> Finding the need to upgrade because of security issues can be quite
> >>>>>> automated. Even now I noticed Github started to inform owners about
> >>>>>> potential security vulnerabilities in used libraries for their
> >>> project.
> >>>>>> Those notifications can be sent to devlist and turned into JIRA
> >>> issues
> >>>>>> followed bvy  minor security-related releases (with only few library
> >>>>>> dependencies upgraded).
> >>>>>>
> >>>>>> I think it's even easier to automate it if you have pinned
> >>>> dependencies -
> >>>>>> because it's generally easy to find applicable vulnerabilities for
> >>>>> specific
> >>>>>> versions of libraries by static analysers - when you have >=, you
> >>> never
> >>>>>> know which version will be used until you actually perform the
> >>>>>> installation.
> >>>>>>
> >>>>>> There is one big advantage for maintainers for "pinned" case. Your
> >>>> users
> >>>>>> always have the same dependencies - so when issue is raised, you can
> >>>>>> reproduce it more easily. It's hard to know which version user has
> >>> (as
> >>>>> the
> >>>>>> user could install it month ago or yesterday) and even if you find
> >>> out
> >>>> by
> >>>>>> asking the user, you might not be able to reproduce the set of
> >>>>> requirements
> >>>>>> easily (simply because there are already newer versions of the
> >>>> libraries
> >>>>>> released and they are used automatically). You can ask the user to
> >>> run
> >>>>> pip
> >>>>>> --upgrade but that's dangerous and pretty lame ("check the latest
> >>>>> version -
> >>>>>> maybe it fixes your problem ? ") and sometimes not possible (e.g.
> >>>> someone
> >>>>>> has pre-built docker image with dependencies from few months ago and
> >>>>> cannot
> >>>>>> rebuild the image easily).
> >>>>>>
> >>>>>> J.
> >>>>>>
> >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <as...@apache.org>
> >>>>> wrote:
> >>>>>>
> >>>>>>> One thing to point out here.
> >>>>>>>
> >>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a clean
> >>>>>>> environment it will fail.
> >>>>>>>
> >>>>>>> This is because we pin flask-login to 0.2.1 but flask-appbuilder is
> >>>>> =
> >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
> >>>>>>>
> >>>>>>> So I do think there is maybe something to be said about pinning for
> >>>>>>> releases. The down side to that is that if there are updates to a
> >>>>> module
> >>>>>>> that we want then we have to make a point release to let people get
> >>>> it
> >>>>>>>
> >>>>>>> Both methods have draw-backs
> >>>>>>>
> >>>>>>> -ash
> >>>>>>>
> >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> >>> arthur.wiedmer@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi Jarek,
> >>>>>>>>
> >>>>>>>> I will +1 the discussion Dan is referring to and George's advice.
> >>>>>>>>
> >>>>>>>> I just want to double check we are talking about pinning in
> >>>>>>>> requirements.txt only.
> >>>>>>>>
> >>>>>>>> This offers the ability to
> >>>>>>>> pip install -r requirements.txt
> >>>>>>>> pip install --no-deps airflow
> >>>>>>>> For a guaranteed install which works.
> >>>>>>>>
> >>>>>>>> Several different requirement files can be provided for specific
> >>>> use
> >>>>>>> cases,
> >>>>>>>> like a stable dev one for instance for people wanting to work on
> >>>>>>> operators
> >>>>>>>> and non-core functions.
> >>>>>>>>
> >>>>>>>> However, I think we should proactively test in CI against
> >>> unpinned
> >>>>>>>> dependencies (though it might be a separate case in the matrix) ,
> >>>> so
> >>>>>> that
> >>>>>>>> we get advance warning if possible that things will break.
> >>>>>>>> CI downtime is not a bad thing here, it actually caught a problem
> >>>> :)
> >>>>>>>>
> >>>>>>>> We should unpin as possible in setup.py to only maintain minimum
> >>>>>> required
> >>>>>>>> compatibility. The process of pinning in setup.py is extremely
> >>>>>>> detrimental
> >>>>>>>> when you have a large number of python libraries installed with
> >>>>>> different
> >>>>>>>> pinned versions.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Arthur
> >>>>>>>>
> >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> >>>>>> <ddavydov@twitter.com.invalid
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Relevant discussion about this:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=9wta3PcUeZjBg%2FmACBH06cNRzbYG4NcAW0XDJKan6cM%3D&amp;reserved=0
> >>>>>>>>>
> >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> >>>>>> Jarek.Potiuk@polidea.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> TL;DR; A change is coming in the way how
> >>>> dependencies/requirements
> >>>>>> are
> >>>>>>>>>> specified for Apache Airflow - they will be fixed rather than
> >>>>>> flexible
> >>>>>>>>> (==
> >>>>>>>>>> rather than >=).
> >>>>>>>>>>
> >>>>>>>>>> This is follow up after Slack discussion we had with Ash and
> >>>> Kaxil
> >>>>> -
> >>>>>>>>>> summarising what we propose we'll do.
> >>>>>>>>>>
> >>>>>>>>>> *Problem:*
> >>>>>>>>>> During last few weeks we experienced quite a few downtimes of
> >>>>>> TravisCI
> >>>>>>>>>> builds (for all PRs/branches including master) as some of the
> >>>>>>> transitive
> >>>>>>>>>> dependencies were automatically upgraded. This because in a
> >>>> number
> >>>>> of
> >>>>>>>>>> dependencies we have  >= rather than == dependencies.
> >>>>>>>>>>
> >>>>>>>>>> Whenever there is a new release of such dependency, it might
> >>>> cause
> >>>>>>> chain
> >>>>>>>>>> reaction with upgrade of transitive dependencies which might
> >>> get
> >>>>> into
> >>>>>>>>>> conflict.
> >>>>>>>>>>
> >>>>>>>>>> An example was Flask-AppBuilder vs flask-login transitive
> >>>>> dependency
> >>>>>>> with
> >>>>>>>>>> click. They started to conflict once AppBuilder has released
> >>>>> version
> >>>>>>>>>> 1.12.0.
> >>>>>>>>>>
> >>>>>>>>>> *Diagnosis:*
> >>>>>>>>>> Transitive dependencies with "flexible" versions (where >= is
> >>>> used
> >>>>>>>>> instead
> >>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner or
> >>> later
> >>>>> hit
> >>>>>>>>> other
> >>>>>>>>>> cases where not fixed dependencies cause similar problems with
> >>>>> other
> >>>>>>>>>> transitive dependencies. We need to fix-pin them. This causes
> >>>>>> problems
> >>>>>>>>> for
> >>>>>>>>>> both - released versions (cause they stop to work!) and for
> >>>>>> development
> >>>>>>>>>> (cause they break master builds in TravisCI and prevent people
> >>>> from
> >>>>>>>>>> installing development environment from the scratch.
> >>>>>>>>>>
> >>>>>>>>>> *Solution:*
> >>>>>>>>>>
> >>>>>>>>>>  - Following the old-but-good post
> >>>>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=0jqlZcLU6%2BvO%2BJKSMlX7gyix6dKvD%2BZbrgHn9pRknLY%3D&amp;reserved=0
> >>>>>> we are going to fix the
> >>>>>>>>>> pinned
> >>>>>>>>>>  dependencies to specific versions (so basically all
> >>>> dependencies
> >>>>>> are
> >>>>>>>>>>  "fixed").
> >>>>>>>>>>  - We will introduce mechanism to be able to upgrade
> >>>> dependencies
> >>>>>> with
> >>>>>>>>>>  pip-tools (
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=hu%2FivDsKxwocNlVtBTgYE0E%2BET97u2DWN1IdnCF1ckU%3D&amp;reserved=0
> >>>>> ).
> >>>>>> We might also
> >>>>>>>>> take a
> >>>>>>>>>>  look at pipenv:
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=s0iqMPk3O8%2Bk1BCPBLYfIIMU2D4SdmPVEYELo%2FKS1%2FA%3D&amp;reserved=0
> >>>>>>>>>>  - People who would like to upgrade some dependencies for
> >>> their
> >>>>> PRs
> >>>>>>>>> will
> >>>>>>>>>>  still be able to do it - but such upgrades will be in their
> >>> PR
> >>>>> thus
> >>>>>>>>> they
> >>>>>>>>>>  will go through TravisCI tests and they will also have to be
> >>>>>>> specified
> >>>>>>>>>> with
> >>>>>>>>>>  pinned fixed versions (==). This should be part of review
> >>>> process
> >>>>>> to
> >>>>>>>>>> make
> >>>>>>>>>>  sure new/changed requirements are pinned.
> >>>>>>>>>>  - In release process there will be a point where an upgrade
> >>>> will
> >>>>> be
> >>>>>>>>>>  attempted for all requirements (using pip-tools) so that we
> >>> are
> >>>>> not
> >>>>>>>>>> stuck
> >>>>>>>>>>  with older releases. This will be in controlled PR
> >>> environment
> >>>>>> where
> >>>>>>>>>> there
> >>>>>>>>>>  will be time to fix all dependencies without impacting others
> >>>> and
> >>>>>>>>> likely
> >>>>>>>>>>  enough time to "vet" such changes (this can be done for
> >>>>> alpha/beta
> >>>>>>>>>> releases
> >>>>>>>>>>  for example).
> >>>>>>>>>>  - As a side effect dependencies specification will become far
> >>>>>> simpler
> >>>>>>>>>>  and straightforward.
> >>>>>>>>>>
> >>>>>>>>>> Happy to hear community comments to the proposal. I am happy to
> >>>>> take
> >>>>>> a
> >>>>>>>>> lead
> >>>>>>>>>> on that, open JIRA issue and implement if this is something
> >>>>> community
> >>>>>>> is
> >>>>>>>>>> happy with.
> >>>>>>>>>>
> >>>>>>>>>> J.
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>>
> >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
> >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> *Jarek Potiuk, Principal Software Engineer*
> >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> *Jarek Potiuk, Principal Software Engineer*
> >>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> *Jarek Potiuk, Principal Software Engineer*
> >>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> >>>
>
>

Re: Pinning dependencies for Apache Airflow

Posted by Matt Davis <ji...@gmail.com>.
It sounds like we can get the best of both worlds with the original
proposals to have minimal requirements in setup.py and "guaranteed to work"
complete requirements in a separate file. That way we have flexibility for
teams that run airflow and tasks in the same environment and guidance on a
working set of requirements. (Disclaimer: I work on the same team as
George.)

Thanks,
Matt

On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> Although I think I come down on the side against pinning, my reasons are
> different.
>
> For the two (or more) people who have expressed concern about it would
> pip's "Constraint Files" help:
>
> https://pip.pypa.io/en/stable/user_guide/#constraints-files
>
> For example, you could add "flask-appbuilder==1.11.1" in to this file,
> specify it with `pip install -c constraints.txt apache-airflow` and then
> whenever pip attempted to install _any version of FAB it would use the
> exact version from the constraints file.
>
> I don't buy the argument about pinning being a requirement for graduation
> from Incubation fwiw - it's an unavoidable artefact of the open-source
> world we develop in.
>
> https://libraries.io/ offers a (free?) service that will monitor apps
> dependencies for being out of date, might be better than writing our own
> solution.
>
> Pip has for a while now supported a way of saying "this dep is for py2.7
> only":
>
> > Since version 6.0, pip also supports specifiers containing environment
> markers like so:
> >
> >    SomeProject ==5.4 ; python_version < '2.7'
> >    SomeProject; sys_platform == 'win32'
>
>
> Ash
>
>
> > On 8 Oct 2018, at 07:58, George Leslie-Waksman <wa...@gmail.com>
> wrote:
> >
> > As a member of a team that will also have really big problems if
> > Airflow pins all requirements (for reasons similar to those already
> > stated), I would like to add a very strong -1 to the idea of pinning
> > them for all installations.
> >
> > In a number of situation on our end, to avoid similar problems with
> > CI, we use `pip-compile` from pip-tools (also mentioned):
> > https://pypi.org/project/pip-tools/
> >
> > I would like to suggest, a middle ground of:
> >
> > - Have the installation continue to use unpinned (`>=`) with minimum
> > necessary requirements set
> > - Include a pip-compiled requirements file (`requirements-ci.txt`?)
> > that is used by CI
> > - - If we need, there can be one file for each incompatible python
> version
> > - Append a watermark (hash of `setup.py` requirements?) to the
> > compiled requirements file
> > - Add a CI check that the watermark and original match to ensure no
> > drift since last compile
> >
> > I am happy to do much of the work for this, if it can help avoid
> > pinning all of the depends at the installation level.
> >
> > --George Leslie-Waksman
> >
> > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
> > <ma...@gmail.com> wrote:
> >>
> >> pip-tools can definitely help here to ship a reference [locked]
> >> `requirements.txt` that can be used in [all or part of] the CI. It's
> >> actually kind of important to get CI to fail when a new [backward
> >> incompatible] lib comes out and break things while allowing version
> ranges.
> >>
> >> I think there may be challenges around pip-tools and projects that run
> in
> >> both python2.7 and python3.6. You sometimes need to have 2
> requirements.txt
> >> lock files.
> >>
> >> Max
> >>
> >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <Ja...@polidea.com>
> >> wrote:
> >>
> >>> It's a nice one :). However I think when/if we go to pinned
> dependencies
> >>> the way poetry/pip-tools do it, this will be suddenly lot-less useful
> It
> >>> will be very easy to track dependency changes (they will be always
> >>> committed as a change in the .lock file or requirements.txt) and if
> someone
> >>> has a problem while upgrading a dependency (always consciously, never
> >>> accidentally) it will simply fail during CI build and the change won't
> get
> >>> merged/won't break the builds of others in the first place :).
> >>>
> >>> J.
> >>>
> >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd...@gmail.com>
> wrote:
> >>>
> >>>> Hi folks,
> >>>>
> >>>> On top of this discussion, I was thinking we should have the ability
> to
> >>>> quickly monitor dependency release as well. Previously, it happened
> for a
> >>>> few times that CI kept failing for no reason and eventually turned
> out it
> >>>> was due to dependency release. But it took us some time, sometimes a
> few
> >>>> days, to realise the failure was because of dependency release.
> >>>>
> >>>> To partially address this, I tried to develop a mini tool to help us
> >>> check
> >>>> the latest release of Python packages & the release date-time on PyPi.
> >>> So,
> >>>> by comparing it with our CI failure history, we may be able to
> >>> troubleshoot
> >>>> faster.
> >>>>
> >>>> Output Sample (ordered by upload time in desc order):
> >>>>                               Latest Version          Upload Time
> >>>> Package Name
> >>>> awscli                    1.16.28
> >>> 2018-10-05T23:12:45
> >>>> botocore                1.12.18
> 2018-10-05T23:12:39
> >>>> promise                   2.2.1
> >>> 2018-10-04T22:04:18
> >>>> Keras                     2.2.4
> >>> 2018-10-03T20:59:39
> >>>> bleach                    3.0.0
> >>> 2018-10-03T16:54:27
> >>>> Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
> >>>> ... ...
> >>>>
> >>>> It's a minimal tool (not perfect yet but working). I have hosted this
> >>> tool
> >>>> at https://github.com/XD-DENG/pypi-release-query.
> >>>>
> >>>>
> >>>> XD
> >>>>
> >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> >>>> wrote:
> >>>>
> >>>>> Hello Erik,
> >>>>>
> >>>>> I understand your concern. It's a hard one to solve in general (i.e.
> >>>>> dependency-hell). It looks like in this case you treat Airflow as
> >>>>> 'library', where for some other people it might be more like 'end
> >>>> product'.
> >>>>> If you look at the "pinning" philosophy - the "pin everything" is
> good
> >>>> for
> >>>>> end products, but not good for libraries. In the case you have
> Airflow
> >>> is
> >>>>> treated as a bit of both. And it's perfectly valid case at that (with
> >>>>> custom python DAGs being central concept for Airflow).
> >>>>> However, I think it's not as bad as you think when it comes to exact
> >>>>> pinning.
> >>>>>
> >>>>> I believe - a bit counter-intuitively - that tools like
> >>> pip-tools/poetry
> >>>>> with exact pinning result in having your dependencies upgraded more
> >>>> often,
> >>>>> rather than less - especially in complex systems where
> dependency-hell
> >>>>> creeps-in. If you look at Airflow's setup.py now - It's a bit scary
> to
> >>>> make
> >>>>> any change to it. There is a chance it will blow at your face if you
> >>>> change
> >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you change
> it,
> >>>>> whether it will cause chain reaction of conflicts that will ruin your
> >>>> work
> >>>>> day.
> >>>>>
> >>>>> On the contrary - if you change it to exact pinning in
> >>>>> .lock/requirements.txt file (poetry/pip-tools) and have much simpler
> >>> (and
> >>>>> commented) exclusion/avoidance rules in your .in/.tml file, the whole
> >>>> setup
> >>>>> might be much easier to maintain and upgrade. Every time you prepare
> >>> for
> >>>>> release (or even once in a while for master) one person might
> >>> consciously
> >>>>> attempt to upgrade all dependencies to latest ones. It should be
> almost
> >>>> as
> >>>>> easy as letting poetry/pip-tools help with figuring out what are the
> >>>> latest
> >>>>> set of dependencies that will work without conflicts. It should be
> >>> rather
> >>>>> straightforward (I've done it in the past for fairly complex
> systems).
> >>>> What
> >>>>> those tools enable is - doing single-shot upgrade of all
> dependencies.
> >>>>> After doing it you can make sure that all tests work fine (and fix
> any
> >>>>> problems that result from it). And then you test it thoroughly before
> >>> you
> >>>>> make final release. You can do it in separate PR - with automated
> >>> testing
> >>>>> in Travis which means that you are not disturbing work of others
> >>>>> (compilation/building + unit tests are guaranteed to work before you
> >>>> merge
> >>>>> it) while doing it. It's all conscious rather than accidental. Nice
> >>> side
> >>>>> effect of that is that with every release you can actually "catch-up"
> >>>> with
> >>>>> latest stable versions of many libraries in one go. It's better than
> >>>>> waiting until someone deliberately upgrades to newer version (and the
> >>>> rest
> >>>>> remain terribly out-dated as is the case for Airflow now).
> >>>>>
> >>>>> So a bit counterintuitively I think tools like pip-tools/poetry help
> >>> you
> >>>> to
> >>>>> catch up faster in many cases. That is at least my experience so far.
> >>>>>
> >>>>> Additionally, Airflow is an open system - if you have very specific
> >>> needs
> >>>>> for requirements, you might actually - in the very same way with
> >>>>> pip-tools/poetry - upgrade all your dependencies in your local fork
> of
> >>>>> Airflow before someone else does it in master/release. Those tools
> kind
> >>>> of
> >>>>> democratise dependency management. It should be as easy as
> `pip-compile
> >>>>> --upgrade` or `poetry update` and you will get all the
> >>> "non-conflicting"
> >>>>> latest dependencies in your local fork (and poetry especially seems
> to
> >>> do
> >>>>> all the heavy lifting of figuring out which versions will work). You
> >>>> should
> >>>>> be able to test and publish it locally as your private package for
> >>> local
> >>>>> installations. You can even mark the specific dependency you want to
> >>> use
> >>>>> specific version and let pip-tools/poetry figure out exact versions
> of
> >>>>> other requirements. You can even make a PR with such upgrade
> eventually
> >>>> to
> >>>>> get it faster in master. You can even downgrade in case newer
> >>> dependency
> >>>>> causes problems for you in similar way. Guided by the tools, it's
> much
> >>>>> faster than figuring the versions out by yourself.
> >>>>>
> >>>>> As long as we have simple way of managing it and document how to
> >>>>> upgrade/downgrade dependencies in your own fork, and mention how to
> >>>> locally
> >>>>> release Airflow as a package, I think your case could be covered even
> >>>>> better than now. What do you think ?
> >>>>>
> >>>>> J.
> >>>>>
> >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> >>>>> <EK...@novozymes.com.invalid> wrote:
> >>>>>
> >>>>>> For us, exact pinning of versions would be problematic. We have DAG
> >>>> code
> >>>>>> that shares direct and indirect dependencies with Airflow, e.g.
> lxml,
> >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If
> our
> >>>> DAG
> >>>>>> code for some reason needs a newer point release due to a bug that's
> >>>>> fixed,
> >>>>>> then we can't cleanly build a virtual environment containing the
> >>> fixed
> >>>>>> version. For us, it's already a problem that Airflow has quite
> strict
> >>>>> (and
> >>>>>> sometimes old) requirements in setup.py.
> >>>>>>
> >>>>>> Erik
> >>>>>> ________________________________
> >>>>>> From: Jarek Potiuk <Ja...@polidea.com>
> >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
> >>>>>> To: dev@airflow.incubator.apache.org
> >>>>>> Subject: Re: Pinning dependencies for Apache Airflow
> >>>>>>
> >>>>>> I think one solution to release approach is to check as part of
> >>>> automated
> >>>>>> Travis build if all requirements are pinned with == (even the deep
> >>>> ones)
> >>>>>> and fail the build in case they are not for ALL versions (including
> >>>>>> dev). And of course we should document the approach of
> >>>> releases/upgrades
> >>>>>> etc. If we do it all the time for development versions (which seems
> >>>> quite
> >>>>>> doable), then transitively all the releases will also have pinned
> >>>>> versions
> >>>>>> and they will never try to upgrade any of the dependencies. In
> poetry
> >>>>>> (similarly in pip-tools with .in file) it is done by having a .lock
> >>>> file
> >>>>>> that specifies exact versions of each package so it can be rather
> >>> easy
> >>>> to
> >>>>>> manage (so it's worth trying it out I think  :D  - seems a bit more
> >>>>>> friendly than pip-tools).
> >>>>>>
> >>>>>> There is a drawback - of course - with manually updating the module
> >>>> that
> >>>>>> you want, but I really see that as an advantage rather than drawback
> >>>>>> especially for users. This way you maintain the property that it
> will
> >>>>>> always install and work the same way no matter if you installed it
> >>>> today
> >>>>> or
> >>>>>> two months ago. I think the biggest drawback for maintainers is that
> >>>> you
> >>>>>> need some kind of monitoring of security vulnerabilities and cannot
> >>>> rely
> >>>>> on
> >>>>>> automated security upgrades. With >= requirements those security
> >>>> updates
> >>>>>> might happen automatically without anyone noticing, but to be honest
> >>> I
> >>>>>> don't think such upgrades are guaranteed even in current setup for
> >>> all
> >>>>>> security issues for all libraries anyway.
> >>>>>>
> >>>>>> Finding the need to upgrade because of security issues can be quite
> >>>>>> automated. Even now I noticed Github started to inform owners about
> >>>>>> potential security vulnerabilities in used libraries for their
> >>> project.
> >>>>>> Those notifications can be sent to devlist and turned into JIRA
> >>> issues
> >>>>>> followed bvy  minor security-related releases (with only few library
> >>>>>> dependencies upgraded).
> >>>>>>
> >>>>>> I think it's even easier to automate it if you have pinned
> >>>> dependencies -
> >>>>>> because it's generally easy to find applicable vulnerabilities for
> >>>>> specific
> >>>>>> versions of libraries by static analysers - when you have >=, you
> >>> never
> >>>>>> know which version will be used until you actually perform the
> >>>>>> installation.
> >>>>>>
> >>>>>> There is one big advantage for maintainers for "pinned" case. Your
> >>>> users
> >>>>>> always have the same dependencies - so when issue is raised, you can
> >>>>>> reproduce it more easily. It's hard to know which version user has
> >>> (as
> >>>>> the
> >>>>>> user could install it month ago or yesterday) and even if you find
> >>> out
> >>>> by
> >>>>>> asking the user, you might not be able to reproduce the set of
> >>>>> requirements
> >>>>>> easily (simply because there are already newer versions of the
> >>>> libraries
> >>>>>> released and they are used automatically). You can ask the user to
> >>> run
> >>>>> pip
> >>>>>> --upgrade but that's dangerous and pretty lame ("check the latest
> >>>>> version -
> >>>>>> maybe it fixes your problem ? ") and sometimes not possible (e.g.
> >>>> someone
> >>>>>> has pre-built docker image with dependencies from few months ago and
> >>>>> cannot
> >>>>>> rebuild the image easily).
> >>>>>>
> >>>>>> J.
> >>>>>>
> >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <as...@apache.org>
> >>>>> wrote:
> >>>>>>
> >>>>>>> One thing to point out here.
> >>>>>>>
> >>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a clean
> >>>>>>> environment it will fail.
> >>>>>>>
> >>>>>>> This is because we pin flask-login to 0.2.1 but flask-appbuilder is
> >>>>> =
> >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
> >>>>>>>
> >>>>>>> So I do think there is maybe something to be said about pinning for
> >>>>>>> releases. The down side to that is that if there are updates to a
> >>>>> module
> >>>>>>> that we want then we have to make a point release to let people get
> >>>> it
> >>>>>>>
> >>>>>>> Both methods have draw-backs
> >>>>>>>
> >>>>>>> -ash
> >>>>>>>
> >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> >>> arthur.wiedmer@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi Jarek,
> >>>>>>>>
> >>>>>>>> I will +1 the discussion Dan is referring to and George's advice.
> >>>>>>>>
> >>>>>>>> I just want to double check we are talking about pinning in
> >>>>>>>> requirements.txt only.
> >>>>>>>>
> >>>>>>>> This offers the ability to
> >>>>>>>> pip install -r requirements.txt
> >>>>>>>> pip install --no-deps airflow
> >>>>>>>> For a guaranteed install which works.
> >>>>>>>>
> >>>>>>>> Several different requirement files can be provided for specific
> >>>> use
> >>>>>>> cases,
> >>>>>>>> like a stable dev one for instance for people wanting to work on
> >>>>>>> operators
> >>>>>>>> and non-core functions.
> >>>>>>>>
> >>>>>>>> However, I think we should proactively test in CI against
> >>> unpinned
> >>>>>>>> dependencies (though it might be a separate case in the matrix) ,
> >>>> so
> >>>>>> that
> >>>>>>>> we get advance warning if possible that things will break.
> >>>>>>>> CI downtime is not a bad thing here, it actually caught a problem
> >>>> :)
> >>>>>>>>
> >>>>>>>> We should unpin as possible in setup.py to only maintain minimum
> >>>>>> required
> >>>>>>>> compatibility. The process of pinning in setup.py is extremely
> >>>>>>> detrimental
> >>>>>>>> when you have a large number of python libraries installed with
> >>>>>> different
> >>>>>>>> pinned versions.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Arthur
> >>>>>>>>
> >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> >>>>>> <ddavydov@twitter.com.invalid
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Relevant discussion about this:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
> >>>>>>>>>
> >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> >>>>>> Jarek.Potiuk@polidea.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> TL;DR; A change is coming in the way how
> >>>> dependencies/requirements
> >>>>>> are
> >>>>>>>>>> specified for Apache Airflow - they will be fixed rather than
> >>>>>> flexible
> >>>>>>>>> (==
> >>>>>>>>>> rather than >=).
> >>>>>>>>>>
> >>>>>>>>>> This is follow up after Slack discussion we had with Ash and
> >>>> Kaxil
> >>>>> -
> >>>>>>>>>> summarising what we propose we'll do.
> >>>>>>>>>>
> >>>>>>>>>> *Problem:*
> >>>>>>>>>> During last few weeks we experienced quite a few downtimes of
> >>>>>> TravisCI
> >>>>>>>>>> builds (for all PRs/branches including master) as some of the
> >>>>>>> transitive
> >>>>>>>>>> dependencies were automatically upgraded. This because in a
> >>>> number
> >>>>> of
> >>>>>>>>>> dependencies we have  >= rather than == dependencies.
> >>>>>>>>>>
> >>>>>>>>>> Whenever there is a new release of such dependency, it might
> >>>> cause
> >>>>>>> chain
> >>>>>>>>>> reaction with upgrade of transitive dependencies which might
> >>> get
> >>>>> into
> >>>>>>>>>> conflict.
> >>>>>>>>>>
> >>>>>>>>>> An example was Flask-AppBuilder vs flask-login transitive
> >>>>> dependency
> >>>>>>> with
> >>>>>>>>>> click. They started to conflict once AppBuilder has released
> >>>>> version
> >>>>>>>>>> 1.12.0.
> >>>>>>>>>>
> >>>>>>>>>> *Diagnosis:*
> >>>>>>>>>> Transitive dependencies with "flexible" versions (where >= is
> >>>> used
> >>>>>>>>> instead
> >>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner or
> >>> later
> >>>>> hit
> >>>>>>>>> other
> >>>>>>>>>> cases where not fixed dependencies cause similar problems with
> >>>>> other
> >>>>>>>>>> transitive dependencies. We need to fix-pin them. This causes
> >>>>>> problems
> >>>>>>>>> for
> >>>>>>>>>> both - released versions (cause they stop to work!) and for
> >>>>>> development
> >>>>>>>>>> (cause they break master builds in TravisCI and prevent people
> >>>> from
> >>>>>>>>>> installing development environment from the scratch.
> >>>>>>>>>>
> >>>>>>>>>> *Solution:*
> >>>>>>>>>>
> >>>>>>>>>>  - Following the old-but-good post
> >>>>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0
> >>>>>> we are going to fix the
> >>>>>>>>>> pinned
> >>>>>>>>>>  dependencies to specific versions (so basically all
> >>>> dependencies
> >>>>>> are
> >>>>>>>>>>  "fixed").
> >>>>>>>>>>  - We will introduce mechanism to be able to upgrade
> >>>> dependencies
> >>>>>> with
> >>>>>>>>>>  pip-tools (
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0
> >>>>> ).
> >>>>>> We might also
> >>>>>>>>> take a
> >>>>>>>>>>  look at pipenv:
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
> >>>>>>>>>>  - People who would like to upgrade some dependencies for
> >>> their
> >>>>> PRs
> >>>>>>>>> will
> >>>>>>>>>>  still be able to do it - but such upgrades will be in their
> >>> PR
> >>>>> thus
> >>>>>>>>> they
> >>>>>>>>>>  will go through TravisCI tests and they will also have to be
> >>>>>>> specified
> >>>>>>>>>> with
> >>>>>>>>>>  pinned fixed versions (==). This should be part of review
> >>>> process
> >>>>>> to
> >>>>>>>>>> make
> >>>>>>>>>>  sure new/changed requirements are pinned.
> >>>>>>>>>>  - In release process there will be a point where an upgrade
> >>>> will
> >>>>> be
> >>>>>>>>>>  attempted for all requirements (using pip-tools) so that we
> >>> are
> >>>>> not
> >>>>>>>>>> stuck
> >>>>>>>>>>  with older releases. This will be in controlled PR
> >>> environment
> >>>>>> where
> >>>>>>>>>> there
> >>>>>>>>>>  will be time to fix all dependencies without impacting others
> >>>> and
> >>>>>>>>> likely
> >>>>>>>>>>  enough time to "vet" such changes (this can be done for
> >>>>> alpha/beta
> >>>>>>>>>> releases
> >>>>>>>>>>  for example).
> >>>>>>>>>>  - As a side effect dependencies specification will become far
> >>>>>> simpler
> >>>>>>>>>>  and straightforward.
> >>>>>>>>>>
> >>>>>>>>>> Happy to hear community comments to the proposal. I am happy to
> >>>>> take
> >>>>>> a
> >>>>>>>>> lead
> >>>>>>>>>> on that, open JIRA issue and implement if this is something
> >>>>> community
> >>>>>>> is
> >>>>>>>>>> happy with.
> >>>>>>>>>>
> >>>>>>>>>> J.
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>>
> >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
> >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> *Jarek Potiuk, Principal Software Engineer*
> >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> *Jarek Potiuk, Principal Software Engineer*
> >>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> *Jarek Potiuk, Principal Software Engineer*
> >>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> >>>
>
>

Re: Pinning dependencies for Apache Airflow

Posted by Ash Berlin-Taylor <as...@apache.org>.
Although I think I come down on the side against pinning, my reasons are different.

For the two (or more) people who have expressed concern about it would pip's "Constraint Files" help:

https://pip.pypa.io/en/stable/user_guide/#constraints-files

For example, you could add "flask-appbuilder==1.11.1" in to this file, specify it with `pip install -c constraints.txt apache-airflow` and then whenever pip attempted to install _any version of FAB it would use the exact version from the constraints file.

I don't buy the argument about pinning being a requirement for graduation from Incubation fwiw - it's an unavoidable artefact of the open-source world we develop in.

https://libraries.io/ offers a (free?) service that will monitor apps dependencies for being out of date, might be better than writing our own solution.

Pip has for a while now supported a way of saying "this dep is for py2.7 only":

> Since version 6.0, pip also supports specifiers containing environment markers like so:
> 
>    SomeProject ==5.4 ; python_version < '2.7'
>    SomeProject; sys_platform == 'win32'


Ash


> On 8 Oct 2018, at 07:58, George Leslie-Waksman <wa...@gmail.com> wrote:
> 
> As a member of a team that will also have really big problems if
> Airflow pins all requirements (for reasons similar to those already
> stated), I would like to add a very strong -1 to the idea of pinning
> them for all installations.
> 
> In a number of situation on our end, to avoid similar problems with
> CI, we use `pip-compile` from pip-tools (also mentioned):
> https://pypi.org/project/pip-tools/
> 
> I would like to suggest, a middle ground of:
> 
> - Have the installation continue to use unpinned (`>=`) with minimum
> necessary requirements set
> - Include a pip-compiled requirements file (`requirements-ci.txt`?)
> that is used by CI
> - - If we need, there can be one file for each incompatible python version
> - Append a watermark (hash of `setup.py` requirements?) to the
> compiled requirements file
> - Add a CI check that the watermark and original match to ensure no
> drift since last compile
> 
> I am happy to do much of the work for this, if it can help avoid
> pinning all of the depends at the installation level.
> 
> --George Leslie-Waksman
> 
> On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
> <ma...@gmail.com> wrote:
>> 
>> pip-tools can definitely help here to ship a reference [locked]
>> `requirements.txt` that can be used in [all or part of] the CI. It's
>> actually kind of important to get CI to fail when a new [backward
>> incompatible] lib comes out and break things while allowing version ranges.
>> 
>> I think there may be challenges around pip-tools and projects that run in
>> both python2.7 and python3.6. You sometimes need to have 2 requirements.txt
>> lock files.
>> 
>> Max
>> 
>> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>> 
>>> It's a nice one :). However I think when/if we go to pinned dependencies
>>> the way poetry/pip-tools do it, this will be suddenly lot-less useful It
>>> will be very easy to track dependency changes (they will be always
>>> committed as a change in the .lock file or requirements.txt) and if someone
>>> has a problem while upgrading a dependency (always consciously, never
>>> accidentally) it will simply fail during CI build and the change won't get
>>> merged/won't break the builds of others in the first place :).
>>> 
>>> J.
>>> 
>>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd...@gmail.com> wrote:
>>> 
>>>> Hi folks,
>>>> 
>>>> On top of this discussion, I was thinking we should have the ability to
>>>> quickly monitor dependency release as well. Previously, it happened for a
>>>> few times that CI kept failing for no reason and eventually turned out it
>>>> was due to dependency release. But it took us some time, sometimes a few
>>>> days, to realise the failure was because of dependency release.
>>>> 
>>>> To partially address this, I tried to develop a mini tool to help us
>>> check
>>>> the latest release of Python packages & the release date-time on PyPi.
>>> So,
>>>> by comparing it with our CI failure history, we may be able to
>>> troubleshoot
>>>> faster.
>>>> 
>>>> Output Sample (ordered by upload time in desc order):
>>>>                               Latest Version          Upload Time
>>>> Package Name
>>>> awscli                    1.16.28
>>> 2018-10-05T23:12:45
>>>> botocore                1.12.18                      2018-10-05T23:12:39
>>>> promise                   2.2.1
>>> 2018-10-04T22:04:18
>>>> Keras                     2.2.4
>>> 2018-10-03T20:59:39
>>>> bleach                    3.0.0
>>> 2018-10-03T16:54:27
>>>> Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
>>>> ... ...
>>>> 
>>>> It's a minimal tool (not perfect yet but working). I have hosted this
>>> tool
>>>> at https://github.com/XD-DENG/pypi-release-query.
>>>> 
>>>> 
>>>> XD
>>>> 
>>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>> 
>>>>> Hello Erik,
>>>>> 
>>>>> I understand your concern. It's a hard one to solve in general (i.e.
>>>>> dependency-hell). It looks like in this case you treat Airflow as
>>>>> 'library', where for some other people it might be more like 'end
>>>> product'.
>>>>> If you look at the "pinning" philosophy - the "pin everything" is good
>>>> for
>>>>> end products, but not good for libraries. In the case you have Airflow
>>> is
>>>>> treated as a bit of both. And it's perfectly valid case at that (with
>>>>> custom python DAGs being central concept for Airflow).
>>>>> However, I think it's not as bad as you think when it comes to exact
>>>>> pinning.
>>>>> 
>>>>> I believe - a bit counter-intuitively - that tools like
>>> pip-tools/poetry
>>>>> with exact pinning result in having your dependencies upgraded more
>>>> often,
>>>>> rather than less - especially in complex systems where dependency-hell
>>>>> creeps-in. If you look at Airflow's setup.py now - It's a bit scary to
>>>> make
>>>>> any change to it. There is a chance it will blow at your face if you
>>>> change
>>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you change it,
>>>>> whether it will cause chain reaction of conflicts that will ruin your
>>>> work
>>>>> day.
>>>>> 
>>>>> On the contrary - if you change it to exact pinning in
>>>>> .lock/requirements.txt file (poetry/pip-tools) and have much simpler
>>> (and
>>>>> commented) exclusion/avoidance rules in your .in/.tml file, the whole
>>>> setup
>>>>> might be much easier to maintain and upgrade. Every time you prepare
>>> for
>>>>> release (or even once in a while for master) one person might
>>> consciously
>>>>> attempt to upgrade all dependencies to latest ones. It should be almost
>>>> as
>>>>> easy as letting poetry/pip-tools help with figuring out what are the
>>>> latest
>>>>> set of dependencies that will work without conflicts. It should be
>>> rather
>>>>> straightforward (I've done it in the past for fairly complex systems).
>>>> What
>>>>> those tools enable is - doing single-shot upgrade of all dependencies.
>>>>> After doing it you can make sure that all tests work fine (and fix any
>>>>> problems that result from it). And then you test it thoroughly before
>>> you
>>>>> make final release. You can do it in separate PR - with automated
>>> testing
>>>>> in Travis which means that you are not disturbing work of others
>>>>> (compilation/building + unit tests are guaranteed to work before you
>>>> merge
>>>>> it) while doing it. It's all conscious rather than accidental. Nice
>>> side
>>>>> effect of that is that with every release you can actually "catch-up"
>>>> with
>>>>> latest stable versions of many libraries in one go. It's better than
>>>>> waiting until someone deliberately upgrades to newer version (and the
>>>> rest
>>>>> remain terribly out-dated as is the case for Airflow now).
>>>>> 
>>>>> So a bit counterintuitively I think tools like pip-tools/poetry help
>>> you
>>>> to
>>>>> catch up faster in many cases. That is at least my experience so far.
>>>>> 
>>>>> Additionally, Airflow is an open system - if you have very specific
>>> needs
>>>>> for requirements, you might actually - in the very same way with
>>>>> pip-tools/poetry - upgrade all your dependencies in your local fork of
>>>>> Airflow before someone else does it in master/release. Those tools kind
>>>> of
>>>>> democratise dependency management. It should be as easy as `pip-compile
>>>>> --upgrade` or `poetry update` and you will get all the
>>> "non-conflicting"
>>>>> latest dependencies in your local fork (and poetry especially seems to
>>> do
>>>>> all the heavy lifting of figuring out which versions will work). You
>>>> should
>>>>> be able to test and publish it locally as your private package for
>>> local
>>>>> installations. You can even mark the specific dependency you want to
>>> use
>>>>> specific version and let pip-tools/poetry figure out exact versions of
>>>>> other requirements. You can even make a PR with such upgrade eventually
>>>> to
>>>>> get it faster in master. You can even downgrade in case newer
>>> dependency
>>>>> causes problems for you in similar way. Guided by the tools, it's much
>>>>> faster than figuring the versions out by yourself.
>>>>> 
>>>>> As long as we have simple way of managing it and document how to
>>>>> upgrade/downgrade dependencies in your own fork, and mention how to
>>>> locally
>>>>> release Airflow as a package, I think your case could be covered even
>>>>> better than now. What do you think ?
>>>>> 
>>>>> J.
>>>>> 
>>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
>>>>> <EK...@novozymes.com.invalid> wrote:
>>>>> 
>>>>>> For us, exact pinning of versions would be problematic. We have DAG
>>>> code
>>>>>> that shares direct and indirect dependencies with Airflow, e.g. lxml,
>>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our
>>>> DAG
>>>>>> code for some reason needs a newer point release due to a bug that's
>>>>> fixed,
>>>>>> then we can't cleanly build a virtual environment containing the
>>> fixed
>>>>>> version. For us, it's already a problem that Airflow has quite strict
>>>>> (and
>>>>>> sometimes old) requirements in setup.py.
>>>>>> 
>>>>>> Erik
>>>>>> ________________________________
>>>>>> From: Jarek Potiuk <Ja...@polidea.com>
>>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
>>>>>> To: dev@airflow.incubator.apache.org
>>>>>> Subject: Re: Pinning dependencies for Apache Airflow
>>>>>> 
>>>>>> I think one solution to release approach is to check as part of
>>>> automated
>>>>>> Travis build if all requirements are pinned with == (even the deep
>>>> ones)
>>>>>> and fail the build in case they are not for ALL versions (including
>>>>>> dev). And of course we should document the approach of
>>>> releases/upgrades
>>>>>> etc. If we do it all the time for development versions (which seems
>>>> quite
>>>>>> doable), then transitively all the releases will also have pinned
>>>>> versions
>>>>>> and they will never try to upgrade any of the dependencies. In poetry
>>>>>> (similarly in pip-tools with .in file) it is done by having a .lock
>>>> file
>>>>>> that specifies exact versions of each package so it can be rather
>>> easy
>>>> to
>>>>>> manage (so it's worth trying it out I think  :D  - seems a bit more
>>>>>> friendly than pip-tools).
>>>>>> 
>>>>>> There is a drawback - of course - with manually updating the module
>>>> that
>>>>>> you want, but I really see that as an advantage rather than drawback
>>>>>> especially for users. This way you maintain the property that it will
>>>>>> always install and work the same way no matter if you installed it
>>>> today
>>>>> or
>>>>>> two months ago. I think the biggest drawback for maintainers is that
>>>> you
>>>>>> need some kind of monitoring of security vulnerabilities and cannot
>>>> rely
>>>>> on
>>>>>> automated security upgrades. With >= requirements those security
>>>> updates
>>>>>> might happen automatically without anyone noticing, but to be honest
>>> I
>>>>>> don't think such upgrades are guaranteed even in current setup for
>>> all
>>>>>> security issues for all libraries anyway.
>>>>>> 
>>>>>> Finding the need to upgrade because of security issues can be quite
>>>>>> automated. Even now I noticed Github started to inform owners about
>>>>>> potential security vulnerabilities in used libraries for their
>>> project.
>>>>>> Those notifications can be sent to devlist and turned into JIRA
>>> issues
>>>>>> followed bvy  minor security-related releases (with only few library
>>>>>> dependencies upgraded).
>>>>>> 
>>>>>> I think it's even easier to automate it if you have pinned
>>>> dependencies -
>>>>>> because it's generally easy to find applicable vulnerabilities for
>>>>> specific
>>>>>> versions of libraries by static analysers - when you have >=, you
>>> never
>>>>>> know which version will be used until you actually perform the
>>>>>> installation.
>>>>>> 
>>>>>> There is one big advantage for maintainers for "pinned" case. Your
>>>> users
>>>>>> always have the same dependencies - so when issue is raised, you can
>>>>>> reproduce it more easily. It's hard to know which version user has
>>> (as
>>>>> the
>>>>>> user could install it month ago or yesterday) and even if you find
>>> out
>>>> by
>>>>>> asking the user, you might not be able to reproduce the set of
>>>>> requirements
>>>>>> easily (simply because there are already newer versions of the
>>>> libraries
>>>>>> released and they are used automatically). You can ask the user to
>>> run
>>>>> pip
>>>>>> --upgrade but that's dangerous and pretty lame ("check the latest
>>>>> version -
>>>>>> maybe it fixes your problem ? ") and sometimes not possible (e.g.
>>>> someone
>>>>>> has pre-built docker image with dependencies from few months ago and
>>>>> cannot
>>>>>> rebuild the image easily).
>>>>>> 
>>>>>> J.
>>>>>> 
>>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <as...@apache.org>
>>>>> wrote:
>>>>>> 
>>>>>>> One thing to point out here.
>>>>>>> 
>>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a clean
>>>>>>> environment it will fail.
>>>>>>> 
>>>>>>> This is because we pin flask-login to 0.2.1 but flask-appbuilder is
>>>>> =
>>>>>>> 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
>>>>>>> 
>>>>>>> So I do think there is maybe something to be said about pinning for
>>>>>>> releases. The down side to that is that if there are updates to a
>>>>> module
>>>>>>> that we want then we have to make a point release to let people get
>>>> it
>>>>>>> 
>>>>>>> Both methods have draw-backs
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
>>> arthur.wiedmer@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi Jarek,
>>>>>>>> 
>>>>>>>> I will +1 the discussion Dan is referring to and George's advice.
>>>>>>>> 
>>>>>>>> I just want to double check we are talking about pinning in
>>>>>>>> requirements.txt only.
>>>>>>>> 
>>>>>>>> This offers the ability to
>>>>>>>> pip install -r requirements.txt
>>>>>>>> pip install --no-deps airflow
>>>>>>>> For a guaranteed install which works.
>>>>>>>> 
>>>>>>>> Several different requirement files can be provided for specific
>>>> use
>>>>>>> cases,
>>>>>>>> like a stable dev one for instance for people wanting to work on
>>>>>>> operators
>>>>>>>> and non-core functions.
>>>>>>>> 
>>>>>>>> However, I think we should proactively test in CI against
>>> unpinned
>>>>>>>> dependencies (though it might be a separate case in the matrix) ,
>>>> so
>>>>>> that
>>>>>>>> we get advance warning if possible that things will break.
>>>>>>>> CI downtime is not a bad thing here, it actually caught a problem
>>>> :)
>>>>>>>> 
>>>>>>>> We should unpin as possible in setup.py to only maintain minimum
>>>>>> required
>>>>>>>> compatibility. The process of pinning in setup.py is extremely
>>>>>>> detrimental
>>>>>>>> when you have a large number of python libraries installed with
>>>>>> different
>>>>>>>> pinned versions.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Arthur
>>>>>>>> 
>>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
>>>>>> <ddavydov@twitter.com.invalid
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Relevant discussion about this:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
>>>>>>>>> 
>>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
>>>>>> Jarek.Potiuk@polidea.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> TL;DR; A change is coming in the way how
>>>> dependencies/requirements
>>>>>> are
>>>>>>>>>> specified for Apache Airflow - they will be fixed rather than
>>>>>> flexible
>>>>>>>>> (==
>>>>>>>>>> rather than >=).
>>>>>>>>>> 
>>>>>>>>>> This is follow up after Slack discussion we had with Ash and
>>>> Kaxil
>>>>> -
>>>>>>>>>> summarising what we propose we'll do.
>>>>>>>>>> 
>>>>>>>>>> *Problem:*
>>>>>>>>>> During last few weeks we experienced quite a few downtimes of
>>>>>> TravisCI
>>>>>>>>>> builds (for all PRs/branches including master) as some of the
>>>>>>> transitive
>>>>>>>>>> dependencies were automatically upgraded. This because in a
>>>> number
>>>>> of
>>>>>>>>>> dependencies we have  >= rather than == dependencies.
>>>>>>>>>> 
>>>>>>>>>> Whenever there is a new release of such dependency, it might
>>>> cause
>>>>>>> chain
>>>>>>>>>> reaction with upgrade of transitive dependencies which might
>>> get
>>>>> into
>>>>>>>>>> conflict.
>>>>>>>>>> 
>>>>>>>>>> An example was Flask-AppBuilder vs flask-login transitive
>>>>> dependency
>>>>>>> with
>>>>>>>>>> click. They started to conflict once AppBuilder has released
>>>>> version
>>>>>>>>>> 1.12.0.
>>>>>>>>>> 
>>>>>>>>>> *Diagnosis:*
>>>>>>>>>> Transitive dependencies with "flexible" versions (where >= is
>>>> used
>>>>>>>>> instead
>>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner or
>>> later
>>>>> hit
>>>>>>>>> other
>>>>>>>>>> cases where not fixed dependencies cause similar problems with
>>>>> other
>>>>>>>>>> transitive dependencies. We need to fix-pin them. This causes
>>>>>> problems
>>>>>>>>> for
>>>>>>>>>> both - released versions (cause they stop to work!) and for
>>>>>> development
>>>>>>>>>> (cause they break master builds in TravisCI and prevent people
>>>> from
>>>>>>>>>> installing development environment from the scratch.
>>>>>>>>>> 
>>>>>>>>>> *Solution:*
>>>>>>>>>> 
>>>>>>>>>>  - Following the old-but-good post
>>>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0
>>>>>> we are going to fix the
>>>>>>>>>> pinned
>>>>>>>>>>  dependencies to specific versions (so basically all
>>>> dependencies
>>>>>> are
>>>>>>>>>>  "fixed").
>>>>>>>>>>  - We will introduce mechanism to be able to upgrade
>>>> dependencies
>>>>>> with
>>>>>>>>>>  pip-tools (
>>>>>> 
>>>>> 
>>>> 
>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0
>>>>> ).
>>>>>> We might also
>>>>>>>>> take a
>>>>>>>>>>  look at pipenv:
>>>>>> 
>>>>> 
>>>> 
>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
>>>>>>>>>>  - People who would like to upgrade some dependencies for
>>> their
>>>>> PRs
>>>>>>>>> will
>>>>>>>>>>  still be able to do it - but such upgrades will be in their
>>> PR
>>>>> thus
>>>>>>>>> they
>>>>>>>>>>  will go through TravisCI tests and they will also have to be
>>>>>>> specified
>>>>>>>>>> with
>>>>>>>>>>  pinned fixed versions (==). This should be part of review
>>>> process
>>>>>> to
>>>>>>>>>> make
>>>>>>>>>>  sure new/changed requirements are pinned.
>>>>>>>>>>  - In release process there will be a point where an upgrade
>>>> will
>>>>> be
>>>>>>>>>>  attempted for all requirements (using pip-tools) so that we
>>> are
>>>>> not
>>>>>>>>>> stuck
>>>>>>>>>>  with older releases. This will be in controlled PR
>>> environment
>>>>>> where
>>>>>>>>>> there
>>>>>>>>>>  will be time to fix all dependencies without impacting others
>>>> and
>>>>>>>>> likely
>>>>>>>>>>  enough time to "vet" such changes (this can be done for
>>>>> alpha/beta
>>>>>>>>>> releases
>>>>>>>>>>  for example).
>>>>>>>>>>  - As a side effect dependencies specification will become far
>>>>>> simpler
>>>>>>>>>>  and straightforward.
>>>>>>>>>> 
>>>>>>>>>> Happy to hear community comments to the proposal. I am happy to
>>>>> take
>>>>>> a
>>>>>>>>> lead
>>>>>>>>>> on that, open JIRA issue and implement if this is something
>>>>> community
>>>>>>> is
>>>>>>>>>> happy with.
>>>>>>>>>> 
>>>>>>>>>> J.
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> 
>>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
>>>>>>>>>> Mobile: +48 660 796 129
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> *Jarek Potiuk, Principal Software Engineer*
>>>>>> Mobile: +48 660 796 129
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> *Jarek Potiuk, Principal Software Engineer*
>>>>> Mobile: +48 660 796 129
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> *Jarek Potiuk, Principal Software Engineer*
>>> Mobile: +48 660 796 129
>>> 


Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
I hope others also will chime in. it's great to see different points of
view :).

Just one comment - something that Ash mentioned already before - the
problem is not at all limited to CI.

Much bigger issue is that currently released 1.10.0 airflow package has
dependencies that make it impossible to clean install the package. It's a
very bad thing that actions done by others  (releasing newer version of
flask-appbuilder) make your 'fixed in stone' package fail to install. I am
not sure about all rules of incubation in ASF but I am pretty sure it's
disqualifying behaviour for graduation. And we need to work out solution to
prevent it in the future.

Now when I think about it - there is even bigger problem that might prevent
graduation. Any potential hacks on open source infrastructure in case not
pinned versions for release. Imagine scenario: someone added malicious code
after taking over flask app builder repo/credentials, and released a new
version - thus making all new airflow installations infected. This scenario
already happened in the wild for others (see for one the Gentoo repository
hack earlier this year) and is extremely dangerous. PyPi with its
'immutability' of relased packages in specific version + pinning is the
only way to prevent such hack (unless pypi itself is hacked).

So at the very least i believe pinning must be done for CI and Releases.

But I am still quite convinced you don't really need >= for Development if
you instead can - at any point in time - run pip-tools --upgrade or poetry
update and get all the latest versions that are cross-compatible, or even
enforce your own version for your development needs and let the tool figure
out dependencies for you. I think it's much easier for both: maintenance
and users and has very little drawbacks - if any. And you can still very
easily change it to >= if you need - in your local fork providing that you
don't merge it back to main repo.

J.

Principal Software Engineer
Phone: +48660796129

On Mon, 8 Oct 2018, 07:59 George Leslie-Waksman, <wa...@gmail.com> wrote:

> As a member of a team that will also have really big problems if
> Airflow pins all requirements (for reasons similar to those already
> stated), I would like to add a very strong -1 to the idea of pinning
> them for all installations.
>
> In a number of situation on our end, to avoid similar problems with
> CI, we use `pip-compile` from pip-tools (also mentioned):
> https://pypi.org/project/pip-tools/
>
> I would like to suggest, a middle ground of:
>
> - Have the installation continue to use unpinned (`>=`) with minimum
> necessary requirements set
> - Include a pip-compiled requirements file (`requirements-ci.txt`?)
> that is used by CI
> - - If we need, there can be one file for each incompatible python version
> - Append a watermark (hash of `setup.py` requirements?) to the
> compiled requirements file
> - Add a CI check that the watermark and original match to ensure no
> drift since last compile
>
> I am happy to do much of the work for this, if it can help avoid
> pinning all of the depends at the installation level.
>
> --George Leslie-Waksman
>
> On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
> <ma...@gmail.com> wrote:
> >
> > pip-tools can definitely help here to ship a reference [locked]
> > `requirements.txt` that can be used in [all or part of] the CI. It's
> > actually kind of important to get CI to fail when a new [backward
> > incompatible] lib comes out and break things while allowing version
> ranges.
> >
> > I think there may be challenges around pip-tools and projects that run in
> > both python2.7 and python3.6. You sometimes need to have 2
> requirements.txt
> > lock files.
> >
> > Max
> >
> > On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > It's a nice one :). However I think when/if we go to pinned
> dependencies
> > > the way poetry/pip-tools do it, this will be suddenly lot-less useful
> It
> > > will be very easy to track dependency changes (they will be always
> > > committed as a change in the .lock file or requirements.txt) and if
> someone
> > > has a problem while upgrading a dependency (always consciously, never
> > > accidentally) it will simply fail during CI build and the change won't
> get
> > > merged/won't break the builds of others in the first place :).
> > >
> > > J.
> > >
> > > On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd...@gmail.com>
> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > On top of this discussion, I was thinking we should have the ability
> to
> > > > quickly monitor dependency release as well. Previously, it happened
> for a
> > > > few times that CI kept failing for no reason and eventually turned
> out it
> > > > was due to dependency release. But it took us some time, sometimes a
> few
> > > > days, to realise the failure was because of dependency release.
> > > >
> > > > To partially address this, I tried to develop a mini tool to help us
> > > check
> > > > the latest release of Python packages & the release date-time on
> PyPi.
> > > So,
> > > > by comparing it with our CI failure history, we may be able to
> > > troubleshoot
> > > > faster.
> > > >
> > > > Output Sample (ordered by upload time in desc order):
> > > >                                Latest Version          Upload Time
> > > > Package Name
> > > > awscli                    1.16.28
> > > 2018-10-05T23:12:45
> > > > botocore                1.12.18
> 2018-10-05T23:12:39
> > > > promise                   2.2.1
> > > 2018-10-04T22:04:18
> > > > Keras                     2.2.4
> > >  2018-10-03T20:59:39
> > > > bleach                    3.0.0
> > > 2018-10-03T16:54:27
> > > > Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
> > > > ... ...
> > > >
> > > > It's a minimal tool (not perfect yet but working). I have hosted this
> > > tool
> > > > at https://github.com/XD-DENG/pypi-release-query.
> > > >
> > > >
> > > > XD
> > > >
> > > > On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > > wrote:
> > > >
> > > > > Hello Erik,
> > > > >
> > > > > I understand your concern. It's a hard one to solve in general
> (i.e.
> > > > > dependency-hell). It looks like in this case you treat Airflow as
> > > > > 'library', where for some other people it might be more like 'end
> > > > product'.
> > > > > If you look at the "pinning" philosophy - the "pin everything" is
> good
> > > > for
> > > > > end products, but not good for libraries. In the case you have
> Airflow
> > > is
> > > > > treated as a bit of both. And it's perfectly valid case at that
> (with
> > > > > custom python DAGs being central concept for Airflow).
> > > > > However, I think it's not as bad as you think when it comes to
> exact
> > > > > pinning.
> > > > >
> > > > > I believe - a bit counter-intuitively - that tools like
> > > pip-tools/poetry
> > > > > with exact pinning result in having your dependencies upgraded more
> > > > often,
> > > > > rather than less - especially in complex systems where
> dependency-hell
> > > > > creeps-in. If you look at Airflow's setup.py now - It's a bit
> scary to
> > > > make
> > > > > any change to it. There is a chance it will blow at your face if
> you
> > > > change
> > > > > it. You never know why there is 0.3 < ver < 1.0 - and if you
> change it,
> > > > > whether it will cause chain reaction of conflicts that will ruin
> your
> > > > work
> > > > > day.
> > > > >
> > > > > On the contrary - if you change it to exact pinning in
> > > > > .lock/requirements.txt file (poetry/pip-tools) and have much
> simpler
> > > (and
> > > > > commented) exclusion/avoidance rules in your .in/.tml file, the
> whole
> > > > setup
> > > > > might be much easier to maintain and upgrade. Every time you
> prepare
> > > for
> > > > > release (or even once in a while for master) one person might
> > > consciously
> > > > > attempt to upgrade all dependencies to latest ones. It should be
> almost
> > > > as
> > > > > easy as letting poetry/pip-tools help with figuring out what are
> the
> > > > latest
> > > > > set of dependencies that will work without conflicts. It should be
> > > rather
> > > > > straightforward (I've done it in the past for fairly complex
> systems).
> > > > What
> > > > > those tools enable is - doing single-shot upgrade of all
> dependencies.
> > > > > After doing it you can make sure that all tests work fine (and fix
> any
> > > > > problems that result from it). And then you test it thoroughly
> before
> > > you
> > > > > make final release. You can do it in separate PR - with automated
> > > testing
> > > > > in Travis which means that you are not disturbing work of others
> > > > > (compilation/building + unit tests are guaranteed to work before
> you
> > > > merge
> > > > > it) while doing it. It's all conscious rather than accidental. Nice
> > > side
> > > > > effect of that is that with every release you can actually
> "catch-up"
> > > > with
> > > > > latest stable versions of many libraries in one go. It's better
> than
> > > > > waiting until someone deliberately upgrades to newer version (and
> the
> > > > rest
> > > > > remain terribly out-dated as is the case for Airflow now).
> > > > >
> > > > > So a bit counterintuitively I think tools like pip-tools/poetry
> help
> > > you
> > > > to
> > > > > catch up faster in many cases. That is at least my experience so
> far.
> > > > >
> > > > > Additionally, Airflow is an open system - if you have very specific
> > > needs
> > > > > for requirements, you might actually - in the very same way with
> > > > > pip-tools/poetry - upgrade all your dependencies in your local
> fork of
> > > > > Airflow before someone else does it in master/release. Those tools
> kind
> > > > of
> > > > > democratise dependency management. It should be as easy as
> `pip-compile
> > > > > --upgrade` or `poetry update` and you will get all the
> > > "non-conflicting"
> > > > > latest dependencies in your local fork (and poetry especially
> seems to
> > > do
> > > > > all the heavy lifting of figuring out which versions will work).
> You
> > > > should
> > > > > be able to test and publish it locally as your private package for
> > > local
> > > > > installations. You can even mark the specific dependency you want
> to
> > > use
> > > > > specific version and let pip-tools/poetry figure out exact
> versions of
> > > > > other requirements. You can even make a PR with such upgrade
> eventually
> > > > to
> > > > > get it faster in master. You can even downgrade in case newer
> > > dependency
> > > > > causes problems for you in similar way. Guided by the tools, it's
> much
> > > > > faster than figuring the versions out by yourself.
> > > > >
> > > > > As long as we have simple way of managing it and document how to
> > > > > upgrade/downgrade dependencies in your own fork, and mention how to
> > > > locally
> > > > > release Airflow as a package, I think your case could be covered
> even
> > > > > better than now. What do you think ?
> > > > >
> > > > > J.
> > > > >
> > > > > On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> > > > > <EK...@novozymes.com.invalid> wrote:
> > > > >
> > > > > > For us, exact pinning of versions would be problematic. We have
> DAG
> > > > code
> > > > > > that shares direct and indirect dependencies with Airflow, e.g.
> lxml,
> > > > > > requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3.
> If our
> > > > DAG
> > > > > > code for some reason needs a newer point release due to a bug
> that's
> > > > > fixed,
> > > > > > then we can't cleanly build a virtual environment containing the
> > > fixed
> > > > > > version. For us, it's already a problem that Airflow has quite
> strict
> > > > > (and
> > > > > > sometimes old) requirements in setup.py.
> > > > > >
> > > > > > Erik
> > > > > > ________________________________
> > > > > > From: Jarek Potiuk <Ja...@polidea.com>
> > > > > > Sent: Friday, October 5, 2018 2:01:15 PM
> > > > > > To: dev@airflow.incubator.apache.org
> > > > > > Subject: Re: Pinning dependencies for Apache Airflow
> > > > > >
> > > > > > I think one solution to release approach is to check as part of
> > > > automated
> > > > > > Travis build if all requirements are pinned with == (even the
> deep
> > > > ones)
> > > > > > and fail the build in case they are not for ALL versions
> (including
> > > > > > dev). And of course we should document the approach of
> > > > releases/upgrades
> > > > > > etc. If we do it all the time for development versions (which
> seems
> > > > quite
> > > > > > doable), then transitively all the releases will also have pinned
> > > > > versions
> > > > > > and they will never try to upgrade any of the dependencies. In
> poetry
> > > > > > (similarly in pip-tools with .in file) it is done by having a
> .lock
> > > > file
> > > > > > that specifies exact versions of each package so it can be rather
> > > easy
> > > > to
> > > > > > manage (so it's worth trying it out I think  :D  - seems a bit
> more
> > > > > > friendly than pip-tools).
> > > > > >
> > > > > > There is a drawback - of course - with manually updating the
> module
> > > > that
> > > > > > you want, but I really see that as an advantage rather than
> drawback
> > > > > > especially for users. This way you maintain the property that it
> will
> > > > > > always install and work the same way no matter if you installed
> it
> > > > today
> > > > > or
> > > > > > two months ago. I think the biggest drawback for maintainers is
> that
> > > > you
> > > > > > need some kind of monitoring of security vulnerabilities and
> cannot
> > > > rely
> > > > > on
> > > > > > automated security upgrades. With >= requirements those security
> > > > updates
> > > > > > might happen automatically without anyone noticing, but to be
> honest
> > > I
> > > > > > don't think such upgrades are guaranteed even in current setup
> for
> > > all
> > > > > > security issues for all libraries anyway.
> > > > > >
> > > > > > Finding the need to upgrade because of security issues can be
> quite
> > > > > > automated. Even now I noticed Github started to inform owners
> about
> > > > > > potential security vulnerabilities in used libraries for their
> > > project.
> > > > > > Those notifications can be sent to devlist and turned into JIRA
> > > issues
> > > > > > followed bvy  minor security-related releases (with only few
> library
> > > > > > dependencies upgraded).
> > > > > >
> > > > > > I think it's even easier to automate it if you have pinned
> > > > dependencies -
> > > > > > because it's generally easy to find applicable vulnerabilities
> for
> > > > > specific
> > > > > > versions of libraries by static analysers - when you have >=, you
> > > never
> > > > > > know which version will be used until you actually perform the
> > > > > > installation.
> > > > > >
> > > > > > There is one big advantage for maintainers for "pinned" case.
> Your
> > > > users
> > > > > > always have the same dependencies - so when issue is raised, you
> can
> > > > > > reproduce it more easily. It's hard to know which version user
> has
> > > (as
> > > > > the
> > > > > > user could install it month ago or yesterday) and even if you
> find
> > > out
> > > > by
> > > > > > asking the user, you might not be able to reproduce the set of
> > > > > requirements
> > > > > > easily (simply because there are already newer versions of the
> > > > libraries
> > > > > > released and they are used automatically). You can ask the user
> to
> > > run
> > > > > pip
> > > > > > --upgrade but that's dangerous and pretty lame ("check the latest
> > > > > version -
> > > > > > maybe it fixes your problem ? ") and sometimes not possible (e.g.
> > > > someone
> > > > > > has pre-built docker image with dependencies from few months ago
> and
> > > > > cannot
> > > > > > rebuild the image easily).
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <
> ash@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > One thing to point out here.
> > > > > > >
> > > > > > > Right now if you `pip install apache-airflow=1.10.0` in a clean
> > > > > > > environment it will fail.
> > > > > > >
> > > > > > > This is because we pin flask-login to 0.2.1 but
> flask-appbuilder is
> > > > >=
> > > > > > > 1.11.1, so that pulls in 1.12.0 which requires flask-login >=
> 0.3.
> > > > > > >
> > > > > > > So I do think there is maybe something to be said about
> pinning for
> > > > > > > releases. The down side to that is that if there are updates
> to a
> > > > > module
> > > > > > > that we want then we have to make a point release to let
> people get
> > > > it
> > > > > > >
> > > > > > > Both methods have draw-backs
> > > > > > >
> > > > > > > -ash
> > > > > > >
> > > > > > > > On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> > > arthur.wiedmer@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Jarek,
> > > > > > > >
> > > > > > > > I will +1 the discussion Dan is referring to and George's
> advice.
> > > > > > > >
> > > > > > > > I just want to double check we are talking about pinning in
> > > > > > > > requirements.txt only.
> > > > > > > >
> > > > > > > > This offers the ability to
> > > > > > > > pip install -r requirements.txt
> > > > > > > > pip install --no-deps airflow
> > > > > > > > For a guaranteed install which works.
> > > > > > > >
> > > > > > > > Several different requirement files can be provided for
> specific
> > > > use
> > > > > > > cases,
> > > > > > > > like a stable dev one for instance for people wanting to
> work on
> > > > > > > operators
> > > > > > > > and non-core functions.
> > > > > > > >
> > > > > > > > However, I think we should proactively test in CI against
> > > unpinned
> > > > > > > > dependencies (though it might be a separate case in the
> matrix) ,
> > > > so
> > > > > > that
> > > > > > > > we get advance warning if possible that things will break.
> > > > > > > > CI downtime is not a bad thing here, it actually caught a
> problem
> > > > :)
> > > > > > > >
> > > > > > > > We should unpin as possible in setup.py to only maintain
> minimum
> > > > > > required
> > > > > > > > compatibility. The process of pinning in setup.py is
> extremely
> > > > > > > detrimental
> > > > > > > > when you have a large number of python libraries installed
> with
> > > > > > different
> > > > > > > > pinned versions.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Arthur
> > > > > > > >
> > > > > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > > > > <ddavydov@twitter.com.invalid
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Relevant discussion about this:
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
> > > > > > > >>
> > > > > > > >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > > > > Jarek.Potiuk@polidea.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> TL;DR; A change is coming in the way how
> > > > dependencies/requirements
> > > > > > are
> > > > > > > >>> specified for Apache Airflow - they will be fixed rather
> than
> > > > > > flexible
> > > > > > > >> (==
> > > > > > > >>> rather than >=).
> > > > > > > >>>
> > > > > > > >>> This is follow up after Slack discussion we had with Ash
> and
> > > > Kaxil
> > > > > -
> > > > > > > >>> summarising what we propose we'll do.
> > > > > > > >>>
> > > > > > > >>> *Problem:*
> > > > > > > >>> During last few weeks we experienced quite a few downtimes
> of
> > > > > > TravisCI
> > > > > > > >>> builds (for all PRs/branches including master) as some of
> the
> > > > > > > transitive
> > > > > > > >>> dependencies were automatically upgraded. This because in a
> > > > number
> > > > > of
> > > > > > > >>> dependencies we have  >= rather than == dependencies.
> > > > > > > >>>
> > > > > > > >>> Whenever there is a new release of such dependency, it
> might
> > > > cause
> > > > > > > chain
> > > > > > > >>> reaction with upgrade of transitive dependencies which
> might
> > > get
> > > > > into
> > > > > > > >>> conflict.
> > > > > > > >>>
> > > > > > > >>> An example was Flask-AppBuilder vs flask-login transitive
> > > > > dependency
> > > > > > > with
> > > > > > > >>> click. They started to conflict once AppBuilder has
> released
> > > > > version
> > > > > > > >>> 1.12.0.
> > > > > > > >>>
> > > > > > > >>> *Diagnosis:*
> > > > > > > >>> Transitive dependencies with "flexible" versions (where >=
> is
> > > > used
> > > > > > > >> instead
> > > > > > > >>> of ==) is a reason for "dependency hell". We will sooner or
> > > later
> > > > > hit
> > > > > > > >> other
> > > > > > > >>> cases where not fixed dependencies cause similar problems
> with
> > > > > other
> > > > > > > >>> transitive dependencies. We need to fix-pin them. This
> causes
> > > > > > problems
> > > > > > > >> for
> > > > > > > >>> both - released versions (cause they stop to work!) and for
> > > > > > development
> > > > > > > >>> (cause they break master builds in TravisCI and prevent
> people
> > > > from
> > > > > > > >>> installing development environment from the scratch.
> > > > > > > >>>
> > > > > > > >>> *Solution:*
> > > > > > > >>>
> > > > > > > >>>   - Following the old-but-good post
> > > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0
> > > > > > we are going to fix the
> > > > > > > >>> pinned
> > > > > > > >>>   dependencies to specific versions (so basically all
> > > > dependencies
> > > > > > are
> > > > > > > >>>   "fixed").
> > > > > > > >>>   - We will introduce mechanism to be able to upgrade
> > > > dependencies
> > > > > > with
> > > > > > > >>>   pip-tools (
> > > > > >
> > > > >
> > > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0
> > > > > ).
> > > > > > We might also
> > > > > > > >> take a
> > > > > > > >>>   look at pipenv:
> > > > > >
> > > > >
> > > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
> > > > > > > >>>   - People who would like to upgrade some dependencies for
> > > their
> > > > > PRs
> > > > > > > >> will
> > > > > > > >>>   still be able to do it - but such upgrades will be in
> their
> > > PR
> > > > > thus
> > > > > > > >> they
> > > > > > > >>>   will go through TravisCI tests and they will also have
> to be
> > > > > > > specified
> > > > > > > >>> with
> > > > > > > >>>   pinned fixed versions (==). This should be part of review
> > > > process
> > > > > > to
> > > > > > > >>> make
> > > > > > > >>>   sure new/changed requirements are pinned.
> > > > > > > >>>   - In release process there will be a point where an
> upgrade
> > > > will
> > > > > be
> > > > > > > >>>   attempted for all requirements (using pip-tools) so that
> we
> > > are
> > > > > not
> > > > > > > >>> stuck
> > > > > > > >>>   with older releases. This will be in controlled PR
> > > environment
> > > > > > where
> > > > > > > >>> there
> > > > > > > >>>   will be time to fix all dependencies without impacting
> others
> > > > and
> > > > > > > >> likely
> > > > > > > >>>   enough time to "vet" such changes (this can be done for
> > > > > alpha/beta
> > > > > > > >>> releases
> > > > > > > >>>   for example).
> > > > > > > >>>   - As a side effect dependencies specification will
> become far
> > > > > > simpler
> > > > > > > >>>   and straightforward.
> > > > > > > >>>
> > > > > > > >>> Happy to hear community comments to the proposal. I am
> happy to
> > > > > take
> > > > > > a
> > > > > > > >> lead
> > > > > > > >>> on that, open JIRA issue and implement if this is something
> > > > > community
> > > > > > > is
> > > > > > > >>> happy with.
> > > > > > > >>>
> > > > > > > >>> J.
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>>
> > > > > > > >>> *Jarek Potiuk, Principal Software Engineer*
> > > > > > > >>> Mobile: +48 660 796 129
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > *Jarek Potiuk, Principal Software Engineer*
> > > > > > Mobile: +48 660 796 129
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > *Jarek Potiuk, Principal Software Engineer*
> > > > > Mobile: +48 660 796 129
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > *Jarek Potiuk, Principal Software Engineer*
> > > Mobile: +48 660 796 129
> > >
>

Re: Pinning dependencies for Apache Airflow

Posted by George Leslie-Waksman <wa...@gmail.com>.
As a member of a team that will also have really big problems if
Airflow pins all requirements (for reasons similar to those already
stated), I would like to add a very strong -1 to the idea of pinning
them for all installations.

In a number of situation on our end, to avoid similar problems with
CI, we use `pip-compile` from pip-tools (also mentioned):
https://pypi.org/project/pip-tools/

I would like to suggest, a middle ground of:

- Have the installation continue to use unpinned (`>=`) with minimum
necessary requirements set
- Include a pip-compiled requirements file (`requirements-ci.txt`?)
that is used by CI
- - If we need, there can be one file for each incompatible python version
- Append a watermark (hash of `setup.py` requirements?) to the
compiled requirements file
- Add a CI check that the watermark and original match to ensure no
drift since last compile

I am happy to do much of the work for this, if it can help avoid
pinning all of the depends at the installation level.

--George Leslie-Waksman

On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
<ma...@gmail.com> wrote:
>
> pip-tools can definitely help here to ship a reference [locked]
> `requirements.txt` that can be used in [all or part of] the CI. It's
> actually kind of important to get CI to fail when a new [backward
> incompatible] lib comes out and break things while allowing version ranges.
>
> I think there may be challenges around pip-tools and projects that run in
> both python2.7 and python3.6. You sometimes need to have 2 requirements.txt
> lock files.
>
> Max
>
> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > It's a nice one :). However I think when/if we go to pinned dependencies
> > the way poetry/pip-tools do it, this will be suddenly lot-less useful It
> > will be very easy to track dependency changes (they will be always
> > committed as a change in the .lock file or requirements.txt) and if someone
> > has a problem while upgrading a dependency (always consciously, never
> > accidentally) it will simply fail during CI build and the change won't get
> > merged/won't break the builds of others in the first place :).
> >
> > J.
> >
> > On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd...@gmail.com> wrote:
> >
> > > Hi folks,
> > >
> > > On top of this discussion, I was thinking we should have the ability to
> > > quickly monitor dependency release as well. Previously, it happened for a
> > > few times that CI kept failing for no reason and eventually turned out it
> > > was due to dependency release. But it took us some time, sometimes a few
> > > days, to realise the failure was because of dependency release.
> > >
> > > To partially address this, I tried to develop a mini tool to help us
> > check
> > > the latest release of Python packages & the release date-time on PyPi.
> > So,
> > > by comparing it with our CI failure history, we may be able to
> > troubleshoot
> > > faster.
> > >
> > > Output Sample (ordered by upload time in desc order):
> > >                                Latest Version          Upload Time
> > > Package Name
> > > awscli                    1.16.28
> > 2018-10-05T23:12:45
> > > botocore                1.12.18                      2018-10-05T23:12:39
> > > promise                   2.2.1
> > 2018-10-04T22:04:18
> > > Keras                     2.2.4
> >  2018-10-03T20:59:39
> > > bleach                    3.0.0
> > 2018-10-03T16:54:27
> > > Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
> > > ... ...
> > >
> > > It's a minimal tool (not perfect yet but working). I have hosted this
> > tool
> > > at https://github.com/XD-DENG/pypi-release-query.
> > >
> > >
> > > XD
> > >
> > > On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <Ja...@polidea.com>
> > > wrote:
> > >
> > > > Hello Erik,
> > > >
> > > > I understand your concern. It's a hard one to solve in general (i.e.
> > > > dependency-hell). It looks like in this case you treat Airflow as
> > > > 'library', where for some other people it might be more like 'end
> > > product'.
> > > > If you look at the "pinning" philosophy - the "pin everything" is good
> > > for
> > > > end products, but not good for libraries. In the case you have Airflow
> > is
> > > > treated as a bit of both. And it's perfectly valid case at that (with
> > > > custom python DAGs being central concept for Airflow).
> > > > However, I think it's not as bad as you think when it comes to exact
> > > > pinning.
> > > >
> > > > I believe - a bit counter-intuitively - that tools like
> > pip-tools/poetry
> > > > with exact pinning result in having your dependencies upgraded more
> > > often,
> > > > rather than less - especially in complex systems where dependency-hell
> > > > creeps-in. If you look at Airflow's setup.py now - It's a bit scary to
> > > make
> > > > any change to it. There is a chance it will blow at your face if you
> > > change
> > > > it. You never know why there is 0.3 < ver < 1.0 - and if you change it,
> > > > whether it will cause chain reaction of conflicts that will ruin your
> > > work
> > > > day.
> > > >
> > > > On the contrary - if you change it to exact pinning in
> > > > .lock/requirements.txt file (poetry/pip-tools) and have much simpler
> > (and
> > > > commented) exclusion/avoidance rules in your .in/.tml file, the whole
> > > setup
> > > > might be much easier to maintain and upgrade. Every time you prepare
> > for
> > > > release (or even once in a while for master) one person might
> > consciously
> > > > attempt to upgrade all dependencies to latest ones. It should be almost
> > > as
> > > > easy as letting poetry/pip-tools help with figuring out what are the
> > > latest
> > > > set of dependencies that will work without conflicts. It should be
> > rather
> > > > straightforward (I've done it in the past for fairly complex systems).
> > > What
> > > > those tools enable is - doing single-shot upgrade of all dependencies.
> > > > After doing it you can make sure that all tests work fine (and fix any
> > > > problems that result from it). And then you test it thoroughly before
> > you
> > > > make final release. You can do it in separate PR - with automated
> > testing
> > > > in Travis which means that you are not disturbing work of others
> > > > (compilation/building + unit tests are guaranteed to work before you
> > > merge
> > > > it) while doing it. It's all conscious rather than accidental. Nice
> > side
> > > > effect of that is that with every release you can actually "catch-up"
> > > with
> > > > latest stable versions of many libraries in one go. It's better than
> > > > waiting until someone deliberately upgrades to newer version (and the
> > > rest
> > > > remain terribly out-dated as is the case for Airflow now).
> > > >
> > > > So a bit counterintuitively I think tools like pip-tools/poetry help
> > you
> > > to
> > > > catch up faster in many cases. That is at least my experience so far.
> > > >
> > > > Additionally, Airflow is an open system - if you have very specific
> > needs
> > > > for requirements, you might actually - in the very same way with
> > > > pip-tools/poetry - upgrade all your dependencies in your local fork of
> > > > Airflow before someone else does it in master/release. Those tools kind
> > > of
> > > > democratise dependency management. It should be as easy as `pip-compile
> > > > --upgrade` or `poetry update` and you will get all the
> > "non-conflicting"
> > > > latest dependencies in your local fork (and poetry especially seems to
> > do
> > > > all the heavy lifting of figuring out which versions will work). You
> > > should
> > > > be able to test and publish it locally as your private package for
> > local
> > > > installations. You can even mark the specific dependency you want to
> > use
> > > > specific version and let pip-tools/poetry figure out exact versions of
> > > > other requirements. You can even make a PR with such upgrade eventually
> > > to
> > > > get it faster in master. You can even downgrade in case newer
> > dependency
> > > > causes problems for you in similar way. Guided by the tools, it's much
> > > > faster than figuring the versions out by yourself.
> > > >
> > > > As long as we have simple way of managing it and document how to
> > > > upgrade/downgrade dependencies in your own fork, and mention how to
> > > locally
> > > > release Airflow as a package, I think your case could be covered even
> > > > better than now. What do you think ?
> > > >
> > > > J.
> > > >
> > > > On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> > > > <EK...@novozymes.com.invalid> wrote:
> > > >
> > > > > For us, exact pinning of versions would be problematic. We have DAG
> > > code
> > > > > that shares direct and indirect dependencies with Airflow, e.g. lxml,
> > > > > requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our
> > > DAG
> > > > > code for some reason needs a newer point release due to a bug that's
> > > > fixed,
> > > > > then we can't cleanly build a virtual environment containing the
> > fixed
> > > > > version. For us, it's already a problem that Airflow has quite strict
> > > > (and
> > > > > sometimes old) requirements in setup.py.
> > > > >
> > > > > Erik
> > > > > ________________________________
> > > > > From: Jarek Potiuk <Ja...@polidea.com>
> > > > > Sent: Friday, October 5, 2018 2:01:15 PM
> > > > > To: dev@airflow.incubator.apache.org
> > > > > Subject: Re: Pinning dependencies for Apache Airflow
> > > > >
> > > > > I think one solution to release approach is to check as part of
> > > automated
> > > > > Travis build if all requirements are pinned with == (even the deep
> > > ones)
> > > > > and fail the build in case they are not for ALL versions (including
> > > > > dev). And of course we should document the approach of
> > > releases/upgrades
> > > > > etc. If we do it all the time for development versions (which seems
> > > quite
> > > > > doable), then transitively all the releases will also have pinned
> > > > versions
> > > > > and they will never try to upgrade any of the dependencies. In poetry
> > > > > (similarly in pip-tools with .in file) it is done by having a .lock
> > > file
> > > > > that specifies exact versions of each package so it can be rather
> > easy
> > > to
> > > > > manage (so it's worth trying it out I think  :D  - seems a bit more
> > > > > friendly than pip-tools).
> > > > >
> > > > > There is a drawback - of course - with manually updating the module
> > > that
> > > > > you want, but I really see that as an advantage rather than drawback
> > > > > especially for users. This way you maintain the property that it will
> > > > > always install and work the same way no matter if you installed it
> > > today
> > > > or
> > > > > two months ago. I think the biggest drawback for maintainers is that
> > > you
> > > > > need some kind of monitoring of security vulnerabilities and cannot
> > > rely
> > > > on
> > > > > automated security upgrades. With >= requirements those security
> > > updates
> > > > > might happen automatically without anyone noticing, but to be honest
> > I
> > > > > don't think such upgrades are guaranteed even in current setup for
> > all
> > > > > security issues for all libraries anyway.
> > > > >
> > > > > Finding the need to upgrade because of security issues can be quite
> > > > > automated. Even now I noticed Github started to inform owners about
> > > > > potential security vulnerabilities in used libraries for their
> > project.
> > > > > Those notifications can be sent to devlist and turned into JIRA
> > issues
> > > > > followed bvy  minor security-related releases (with only few library
> > > > > dependencies upgraded).
> > > > >
> > > > > I think it's even easier to automate it if you have pinned
> > > dependencies -
> > > > > because it's generally easy to find applicable vulnerabilities for
> > > > specific
> > > > > versions of libraries by static analysers - when you have >=, you
> > never
> > > > > know which version will be used until you actually perform the
> > > > > installation.
> > > > >
> > > > > There is one big advantage for maintainers for "pinned" case. Your
> > > users
> > > > > always have the same dependencies - so when issue is raised, you can
> > > > > reproduce it more easily. It's hard to know which version user has
> > (as
> > > > the
> > > > > user could install it month ago or yesterday) and even if you find
> > out
> > > by
> > > > > asking the user, you might not be able to reproduce the set of
> > > > requirements
> > > > > easily (simply because there are already newer versions of the
> > > libraries
> > > > > released and they are used automatically). You can ask the user to
> > run
> > > > pip
> > > > > --upgrade but that's dangerous and pretty lame ("check the latest
> > > > version -
> > > > > maybe it fixes your problem ? ") and sometimes not possible (e.g.
> > > someone
> > > > > has pre-built docker image with dependencies from few months ago and
> > > > cannot
> > > > > rebuild the image easily).
> > > > >
> > > > > J.
> > > > >
> > > > > On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <as...@apache.org>
> > > > wrote:
> > > > >
> > > > > > One thing to point out here.
> > > > > >
> > > > > > Right now if you `pip install apache-airflow=1.10.0` in a clean
> > > > > > environment it will fail.
> > > > > >
> > > > > > This is because we pin flask-login to 0.2.1 but flask-appbuilder is
> > > >=
> > > > > > 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
> > > > > >
> > > > > > So I do think there is maybe something to be said about pinning for
> > > > > > releases. The down side to that is that if there are updates to a
> > > > module
> > > > > > that we want then we have to make a point release to let people get
> > > it
> > > > > >
> > > > > > Both methods have draw-backs
> > > > > >
> > > > > > -ash
> > > > > >
> > > > > > > On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> > arthur.wiedmer@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi Jarek,
> > > > > > >
> > > > > > > I will +1 the discussion Dan is referring to and George's advice.
> > > > > > >
> > > > > > > I just want to double check we are talking about pinning in
> > > > > > > requirements.txt only.
> > > > > > >
> > > > > > > This offers the ability to
> > > > > > > pip install -r requirements.txt
> > > > > > > pip install --no-deps airflow
> > > > > > > For a guaranteed install which works.
> > > > > > >
> > > > > > > Several different requirement files can be provided for specific
> > > use
> > > > > > cases,
> > > > > > > like a stable dev one for instance for people wanting to work on
> > > > > > operators
> > > > > > > and non-core functions.
> > > > > > >
> > > > > > > However, I think we should proactively test in CI against
> > unpinned
> > > > > > > dependencies (though it might be a separate case in the matrix) ,
> > > so
> > > > > that
> > > > > > > we get advance warning if possible that things will break.
> > > > > > > CI downtime is not a bad thing here, it actually caught a problem
> > > :)
> > > > > > >
> > > > > > > We should unpin as possible in setup.py to only maintain minimum
> > > > > required
> > > > > > > compatibility. The process of pinning in setup.py is extremely
> > > > > > detrimental
> > > > > > > when you have a large number of python libraries installed with
> > > > > different
> > > > > > > pinned versions.
> > > > > > >
> > > > > > > Best,
> > > > > > > Arthur
> > > > > > >
> > > > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > > > <ddavydov@twitter.com.invalid
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Relevant discussion about this:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
> > > > > > >>
> > > > > > >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > > > Jarek.Potiuk@polidea.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> TL;DR; A change is coming in the way how
> > > dependencies/requirements
> > > > > are
> > > > > > >>> specified for Apache Airflow - they will be fixed rather than
> > > > > flexible
> > > > > > >> (==
> > > > > > >>> rather than >=).
> > > > > > >>>
> > > > > > >>> This is follow up after Slack discussion we had with Ash and
> > > Kaxil
> > > > -
> > > > > > >>> summarising what we propose we'll do.
> > > > > > >>>
> > > > > > >>> *Problem:*
> > > > > > >>> During last few weeks we experienced quite a few downtimes of
> > > > > TravisCI
> > > > > > >>> builds (for all PRs/branches including master) as some of the
> > > > > > transitive
> > > > > > >>> dependencies were automatically upgraded. This because in a
> > > number
> > > > of
> > > > > > >>> dependencies we have  >= rather than == dependencies.
> > > > > > >>>
> > > > > > >>> Whenever there is a new release of such dependency, it might
> > > cause
> > > > > > chain
> > > > > > >>> reaction with upgrade of transitive dependencies which might
> > get
> > > > into
> > > > > > >>> conflict.
> > > > > > >>>
> > > > > > >>> An example was Flask-AppBuilder vs flask-login transitive
> > > > dependency
> > > > > > with
> > > > > > >>> click. They started to conflict once AppBuilder has released
> > > > version
> > > > > > >>> 1.12.0.
> > > > > > >>>
> > > > > > >>> *Diagnosis:*
> > > > > > >>> Transitive dependencies with "flexible" versions (where >= is
> > > used
> > > > > > >> instead
> > > > > > >>> of ==) is a reason for "dependency hell". We will sooner or
> > later
> > > > hit
> > > > > > >> other
> > > > > > >>> cases where not fixed dependencies cause similar problems with
> > > > other
> > > > > > >>> transitive dependencies. We need to fix-pin them. This causes
> > > > > problems
> > > > > > >> for
> > > > > > >>> both - released versions (cause they stop to work!) and for
> > > > > development
> > > > > > >>> (cause they break master builds in TravisCI and prevent people
> > > from
> > > > > > >>> installing development environment from the scratch.
> > > > > > >>>
> > > > > > >>> *Solution:*
> > > > > > >>>
> > > > > > >>>   - Following the old-but-good post
> > > > > > >>>
> > > > >
> > > >
> > >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0
> > > > > we are going to fix the
> > > > > > >>> pinned
> > > > > > >>>   dependencies to specific versions (so basically all
> > > dependencies
> > > > > are
> > > > > > >>>   "fixed").
> > > > > > >>>   - We will introduce mechanism to be able to upgrade
> > > dependencies
> > > > > with
> > > > > > >>>   pip-tools (
> > > > >
> > > >
> > >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0
> > > > ).
> > > > > We might also
> > > > > > >> take a
> > > > > > >>>   look at pipenv:
> > > > >
> > > >
> > >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
> > > > > > >>>   - People who would like to upgrade some dependencies for
> > their
> > > > PRs
> > > > > > >> will
> > > > > > >>>   still be able to do it - but such upgrades will be in their
> > PR
> > > > thus
> > > > > > >> they
> > > > > > >>>   will go through TravisCI tests and they will also have to be
> > > > > > specified
> > > > > > >>> with
> > > > > > >>>   pinned fixed versions (==). This should be part of review
> > > process
> > > > > to
> > > > > > >>> make
> > > > > > >>>   sure new/changed requirements are pinned.
> > > > > > >>>   - In release process there will be a point where an upgrade
> > > will
> > > > be
> > > > > > >>>   attempted for all requirements (using pip-tools) so that we
> > are
> > > > not
> > > > > > >>> stuck
> > > > > > >>>   with older releases. This will be in controlled PR
> > environment
> > > > > where
> > > > > > >>> there
> > > > > > >>>   will be time to fix all dependencies without impacting others
> > > and
> > > > > > >> likely
> > > > > > >>>   enough time to "vet" such changes (this can be done for
> > > > alpha/beta
> > > > > > >>> releases
> > > > > > >>>   for example).
> > > > > > >>>   - As a side effect dependencies specification will become far
> > > > > simpler
> > > > > > >>>   and straightforward.
> > > > > > >>>
> > > > > > >>> Happy to hear community comments to the proposal. I am happy to
> > > > take
> > > > > a
> > > > > > >> lead
> > > > > > >>> on that, open JIRA issue and implement if this is something
> > > > community
> > > > > > is
> > > > > > >>> happy with.
> > > > > > >>>
> > > > > > >>> J.
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>>
> > > > > > >>> *Jarek Potiuk, Principal Software Engineer*
> > > > > > >>> Mobile: +48 660 796 129
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > *Jarek Potiuk, Principal Software Engineer*
> > > > > Mobile: +48 660 796 129
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > *Jarek Potiuk, Principal Software Engineer*
> > > > Mobile: +48 660 796 129
> > > >
> > >
> >
> >
> > --
> >
> > *Jarek Potiuk, Principal Software Engineer*
> > Mobile: +48 660 796 129
> >

Re: Pinning dependencies for Apache Airflow

Posted by Maxime Beauchemin <ma...@gmail.com>.
pip-tools can definitely help here to ship a reference [locked]
`requirements.txt` that can be used in [all or part of] the CI. It's
actually kind of important to get CI to fail when a new [backward
incompatible] lib comes out and break things while allowing version ranges.

I think there may be challenges around pip-tools and projects that run in
both python2.7 and python3.6. You sometimes need to have 2 requirements.txt
lock files.

Max

On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> It's a nice one :). However I think when/if we go to pinned dependencies
> the way poetry/pip-tools do it, this will be suddenly lot-less useful It
> will be very easy to track dependency changes (they will be always
> committed as a change in the .lock file or requirements.txt) and if someone
> has a problem while upgrading a dependency (always consciously, never
> accidentally) it will simply fail during CI build and the change won't get
> merged/won't break the builds of others in the first place :).
>
> J.
>
> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd...@gmail.com> wrote:
>
> > Hi folks,
> >
> > On top of this discussion, I was thinking we should have the ability to
> > quickly monitor dependency release as well. Previously, it happened for a
> > few times that CI kept failing for no reason and eventually turned out it
> > was due to dependency release. But it took us some time, sometimes a few
> > days, to realise the failure was because of dependency release.
> >
> > To partially address this, I tried to develop a mini tool to help us
> check
> > the latest release of Python packages & the release date-time on PyPi.
> So,
> > by comparing it with our CI failure history, we may be able to
> troubleshoot
> > faster.
> >
> > Output Sample (ordered by upload time in desc order):
> >                                Latest Version          Upload Time
> > Package Name
> > awscli                    1.16.28
> 2018-10-05T23:12:45
> > botocore                1.12.18                      2018-10-05T23:12:39
> > promise                   2.2.1
> 2018-10-04T22:04:18
> > Keras                     2.2.4
>  2018-10-03T20:59:39
> > bleach                    3.0.0
> 2018-10-03T16:54:27
> > Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
> > ... ...
> >
> > It's a minimal tool (not perfect yet but working). I have hosted this
> tool
> > at https://github.com/XD-DENG/pypi-release-query.
> >
> >
> > XD
> >
> > On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > Hello Erik,
> > >
> > > I understand your concern. It's a hard one to solve in general (i.e.
> > > dependency-hell). It looks like in this case you treat Airflow as
> > > 'library', where for some other people it might be more like 'end
> > product'.
> > > If you look at the "pinning" philosophy - the "pin everything" is good
> > for
> > > end products, but not good for libraries. In the case you have Airflow
> is
> > > treated as a bit of both. And it's perfectly valid case at that (with
> > > custom python DAGs being central concept for Airflow).
> > > However, I think it's not as bad as you think when it comes to exact
> > > pinning.
> > >
> > > I believe - a bit counter-intuitively - that tools like
> pip-tools/poetry
> > > with exact pinning result in having your dependencies upgraded more
> > often,
> > > rather than less - especially in complex systems where dependency-hell
> > > creeps-in. If you look at Airflow's setup.py now - It's a bit scary to
> > make
> > > any change to it. There is a chance it will blow at your face if you
> > change
> > > it. You never know why there is 0.3 < ver < 1.0 - and if you change it,
> > > whether it will cause chain reaction of conflicts that will ruin your
> > work
> > > day.
> > >
> > > On the contrary - if you change it to exact pinning in
> > > .lock/requirements.txt file (poetry/pip-tools) and have much simpler
> (and
> > > commented) exclusion/avoidance rules in your .in/.tml file, the whole
> > setup
> > > might be much easier to maintain and upgrade. Every time you prepare
> for
> > > release (or even once in a while for master) one person might
> consciously
> > > attempt to upgrade all dependencies to latest ones. It should be almost
> > as
> > > easy as letting poetry/pip-tools help with figuring out what are the
> > latest
> > > set of dependencies that will work without conflicts. It should be
> rather
> > > straightforward (I've done it in the past for fairly complex systems).
> > What
> > > those tools enable is - doing single-shot upgrade of all dependencies.
> > > After doing it you can make sure that all tests work fine (and fix any
> > > problems that result from it). And then you test it thoroughly before
> you
> > > make final release. You can do it in separate PR - with automated
> testing
> > > in Travis which means that you are not disturbing work of others
> > > (compilation/building + unit tests are guaranteed to work before you
> > merge
> > > it) while doing it. It's all conscious rather than accidental. Nice
> side
> > > effect of that is that with every release you can actually "catch-up"
> > with
> > > latest stable versions of many libraries in one go. It's better than
> > > waiting until someone deliberately upgrades to newer version (and the
> > rest
> > > remain terribly out-dated as is the case for Airflow now).
> > >
> > > So a bit counterintuitively I think tools like pip-tools/poetry help
> you
> > to
> > > catch up faster in many cases. That is at least my experience so far.
> > >
> > > Additionally, Airflow is an open system - if you have very specific
> needs
> > > for requirements, you might actually - in the very same way with
> > > pip-tools/poetry - upgrade all your dependencies in your local fork of
> > > Airflow before someone else does it in master/release. Those tools kind
> > of
> > > democratise dependency management. It should be as easy as `pip-compile
> > > --upgrade` or `poetry update` and you will get all the
> "non-conflicting"
> > > latest dependencies in your local fork (and poetry especially seems to
> do
> > > all the heavy lifting of figuring out which versions will work). You
> > should
> > > be able to test and publish it locally as your private package for
> local
> > > installations. You can even mark the specific dependency you want to
> use
> > > specific version and let pip-tools/poetry figure out exact versions of
> > > other requirements. You can even make a PR with such upgrade eventually
> > to
> > > get it faster in master. You can even downgrade in case newer
> dependency
> > > causes problems for you in similar way. Guided by the tools, it's much
> > > faster than figuring the versions out by yourself.
> > >
> > > As long as we have simple way of managing it and document how to
> > > upgrade/downgrade dependencies in your own fork, and mention how to
> > locally
> > > release Airflow as a package, I think your case could be covered even
> > > better than now. What do you think ?
> > >
> > > J.
> > >
> > > On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> > > <EK...@novozymes.com.invalid> wrote:
> > >
> > > > For us, exact pinning of versions would be problematic. We have DAG
> > code
> > > > that shares direct and indirect dependencies with Airflow, e.g. lxml,
> > > > requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our
> > DAG
> > > > code for some reason needs a newer point release due to a bug that's
> > > fixed,
> > > > then we can't cleanly build a virtual environment containing the
> fixed
> > > > version. For us, it's already a problem that Airflow has quite strict
> > > (and
> > > > sometimes old) requirements in setup.py.
> > > >
> > > > Erik
> > > > ________________________________
> > > > From: Jarek Potiuk <Ja...@polidea.com>
> > > > Sent: Friday, October 5, 2018 2:01:15 PM
> > > > To: dev@airflow.incubator.apache.org
> > > > Subject: Re: Pinning dependencies for Apache Airflow
> > > >
> > > > I think one solution to release approach is to check as part of
> > automated
> > > > Travis build if all requirements are pinned with == (even the deep
> > ones)
> > > > and fail the build in case they are not for ALL versions (including
> > > > dev). And of course we should document the approach of
> > releases/upgrades
> > > > etc. If we do it all the time for development versions (which seems
> > quite
> > > > doable), then transitively all the releases will also have pinned
> > > versions
> > > > and they will never try to upgrade any of the dependencies. In poetry
> > > > (similarly in pip-tools with .in file) it is done by having a .lock
> > file
> > > > that specifies exact versions of each package so it can be rather
> easy
> > to
> > > > manage (so it's worth trying it out I think  :D  - seems a bit more
> > > > friendly than pip-tools).
> > > >
> > > > There is a drawback - of course - with manually updating the module
> > that
> > > > you want, but I really see that as an advantage rather than drawback
> > > > especially for users. This way you maintain the property that it will
> > > > always install and work the same way no matter if you installed it
> > today
> > > or
> > > > two months ago. I think the biggest drawback for maintainers is that
> > you
> > > > need some kind of monitoring of security vulnerabilities and cannot
> > rely
> > > on
> > > > automated security upgrades. With >= requirements those security
> > updates
> > > > might happen automatically without anyone noticing, but to be honest
> I
> > > > don't think such upgrades are guaranteed even in current setup for
> all
> > > > security issues for all libraries anyway.
> > > >
> > > > Finding the need to upgrade because of security issues can be quite
> > > > automated. Even now I noticed Github started to inform owners about
> > > > potential security vulnerabilities in used libraries for their
> project.
> > > > Those notifications can be sent to devlist and turned into JIRA
> issues
> > > > followed bvy  minor security-related releases (with only few library
> > > > dependencies upgraded).
> > > >
> > > > I think it's even easier to automate it if you have pinned
> > dependencies -
> > > > because it's generally easy to find applicable vulnerabilities for
> > > specific
> > > > versions of libraries by static analysers - when you have >=, you
> never
> > > > know which version will be used until you actually perform the
> > > > installation.
> > > >
> > > > There is one big advantage for maintainers for "pinned" case. Your
> > users
> > > > always have the same dependencies - so when issue is raised, you can
> > > > reproduce it more easily. It's hard to know which version user has
> (as
> > > the
> > > > user could install it month ago or yesterday) and even if you find
> out
> > by
> > > > asking the user, you might not be able to reproduce the set of
> > > requirements
> > > > easily (simply because there are already newer versions of the
> > libraries
> > > > released and they are used automatically). You can ask the user to
> run
> > > pip
> > > > --upgrade but that's dangerous and pretty lame ("check the latest
> > > version -
> > > > maybe it fixes your problem ? ") and sometimes not possible (e.g.
> > someone
> > > > has pre-built docker image with dependencies from few months ago and
> > > cannot
> > > > rebuild the image easily).
> > > >
> > > > J.
> > > >
> > > > On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <as...@apache.org>
> > > wrote:
> > > >
> > > > > One thing to point out here.
> > > > >
> > > > > Right now if you `pip install apache-airflow=1.10.0` in a clean
> > > > > environment it will fail.
> > > > >
> > > > > This is because we pin flask-login to 0.2.1 but flask-appbuilder is
> > >=
> > > > > 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
> > > > >
> > > > > So I do think there is maybe something to be said about pinning for
> > > > > releases. The down side to that is that if there are updates to a
> > > module
> > > > > that we want then we have to make a point release to let people get
> > it
> > > > >
> > > > > Both methods have draw-backs
> > > > >
> > > > > -ash
> > > > >
> > > > > > On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> arthur.wiedmer@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi Jarek,
> > > > > >
> > > > > > I will +1 the discussion Dan is referring to and George's advice.
> > > > > >
> > > > > > I just want to double check we are talking about pinning in
> > > > > > requirements.txt only.
> > > > > >
> > > > > > This offers the ability to
> > > > > > pip install -r requirements.txt
> > > > > > pip install --no-deps airflow
> > > > > > For a guaranteed install which works.
> > > > > >
> > > > > > Several different requirement files can be provided for specific
> > use
> > > > > cases,
> > > > > > like a stable dev one for instance for people wanting to work on
> > > > > operators
> > > > > > and non-core functions.
> > > > > >
> > > > > > However, I think we should proactively test in CI against
> unpinned
> > > > > > dependencies (though it might be a separate case in the matrix) ,
> > so
> > > > that
> > > > > > we get advance warning if possible that things will break.
> > > > > > CI downtime is not a bad thing here, it actually caught a problem
> > :)
> > > > > >
> > > > > > We should unpin as possible in setup.py to only maintain minimum
> > > > required
> > > > > > compatibility. The process of pinning in setup.py is extremely
> > > > > detrimental
> > > > > > when you have a large number of python libraries installed with
> > > > different
> > > > > > pinned versions.
> > > > > >
> > > > > > Best,
> > > > > > Arthur
> > > > > >
> > > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > > <ddavydov@twitter.com.invalid
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Relevant discussion about this:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
> > > > > >>
> > > > > >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> TL;DR; A change is coming in the way how
> > dependencies/requirements
> > > > are
> > > > > >>> specified for Apache Airflow - they will be fixed rather than
> > > > flexible
> > > > > >> (==
> > > > > >>> rather than >=).
> > > > > >>>
> > > > > >>> This is follow up after Slack discussion we had with Ash and
> > Kaxil
> > > -
> > > > > >>> summarising what we propose we'll do.
> > > > > >>>
> > > > > >>> *Problem:*
> > > > > >>> During last few weeks we experienced quite a few downtimes of
> > > > TravisCI
> > > > > >>> builds (for all PRs/branches including master) as some of the
> > > > > transitive
> > > > > >>> dependencies were automatically upgraded. This because in a
> > number
> > > of
> > > > > >>> dependencies we have  >= rather than == dependencies.
> > > > > >>>
> > > > > >>> Whenever there is a new release of such dependency, it might
> > cause
> > > > > chain
> > > > > >>> reaction with upgrade of transitive dependencies which might
> get
> > > into
> > > > > >>> conflict.
> > > > > >>>
> > > > > >>> An example was Flask-AppBuilder vs flask-login transitive
> > > dependency
> > > > > with
> > > > > >>> click. They started to conflict once AppBuilder has released
> > > version
> > > > > >>> 1.12.0.
> > > > > >>>
> > > > > >>> *Diagnosis:*
> > > > > >>> Transitive dependencies with "flexible" versions (where >= is
> > used
> > > > > >> instead
> > > > > >>> of ==) is a reason for "dependency hell". We will sooner or
> later
> > > hit
> > > > > >> other
> > > > > >>> cases where not fixed dependencies cause similar problems with
> > > other
> > > > > >>> transitive dependencies. We need to fix-pin them. This causes
> > > > problems
> > > > > >> for
> > > > > >>> both - released versions (cause they stop to work!) and for
> > > > development
> > > > > >>> (cause they break master builds in TravisCI and prevent people
> > from
> > > > > >>> installing development environment from the scratch.
> > > > > >>>
> > > > > >>> *Solution:*
> > > > > >>>
> > > > > >>>   - Following the old-but-good post
> > > > > >>>
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0
> > > > we are going to fix the
> > > > > >>> pinned
> > > > > >>>   dependencies to specific versions (so basically all
> > dependencies
> > > > are
> > > > > >>>   "fixed").
> > > > > >>>   - We will introduce mechanism to be able to upgrade
> > dependencies
> > > > with
> > > > > >>>   pip-tools (
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0
> > > ).
> > > > We might also
> > > > > >> take a
> > > > > >>>   look at pipenv:
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
> > > > > >>>   - People who would like to upgrade some dependencies for
> their
> > > PRs
> > > > > >> will
> > > > > >>>   still be able to do it - but such upgrades will be in their
> PR
> > > thus
> > > > > >> they
> > > > > >>>   will go through TravisCI tests and they will also have to be
> > > > > specified
> > > > > >>> with
> > > > > >>>   pinned fixed versions (==). This should be part of review
> > process
> > > > to
> > > > > >>> make
> > > > > >>>   sure new/changed requirements are pinned.
> > > > > >>>   - In release process there will be a point where an upgrade
> > will
> > > be
> > > > > >>>   attempted for all requirements (using pip-tools) so that we
> are
> > > not
> > > > > >>> stuck
> > > > > >>>   with older releases. This will be in controlled PR
> environment
> > > > where
> > > > > >>> there
> > > > > >>>   will be time to fix all dependencies without impacting others
> > and
> > > > > >> likely
> > > > > >>>   enough time to "vet" such changes (this can be done for
> > > alpha/beta
> > > > > >>> releases
> > > > > >>>   for example).
> > > > > >>>   - As a side effect dependencies specification will become far
> > > > simpler
> > > > > >>>   and straightforward.
> > > > > >>>
> > > > > >>> Happy to hear community comments to the proposal. I am happy to
> > > take
> > > > a
> > > > > >> lead
> > > > > >>> on that, open JIRA issue and implement if this is something
> > > community
> > > > > is
> > > > > >>> happy with.
> > > > > >>>
> > > > > >>> J.
> > > > > >>>
> > > > > >>> --
> > > > > >>>
> > > > > >>> *Jarek Potiuk, Principal Software Engineer*
> > > > > >>> Mobile: +48 660 796 129
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > *Jarek Potiuk, Principal Software Engineer*
> > > > Mobile: +48 660 796 129
> > > >
> > >
> > >
> > > --
> > >
> > > *Jarek Potiuk, Principal Software Engineer*
> > > Mobile: +48 660 796 129
> > >
> >
>
>
> --
>
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129
>

Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
It's a nice one :). However I think when/if we go to pinned dependencies
the way poetry/pip-tools do it, this will be suddenly lot-less useful It
will be very easy to track dependency changes (they will be always
committed as a change in the .lock file or requirements.txt) and if someone
has a problem while upgrading a dependency (always consciously, never
accidentally) it will simply fail during CI build and the change won't get
merged/won't break the builds of others in the first place :).

J.

On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <xd...@gmail.com> wrote:

> Hi folks,
>
> On top of this discussion, I was thinking we should have the ability to
> quickly monitor dependency release as well. Previously, it happened for a
> few times that CI kept failing for no reason and eventually turned out it
> was due to dependency release. But it took us some time, sometimes a few
> days, to realise the failure was because of dependency release.
>
> To partially address this, I tried to develop a mini tool to help us check
> the latest release of Python packages & the release date-time on PyPi. So,
> by comparing it with our CI failure history, we may be able to troubleshoot
> faster.
>
> Output Sample (ordered by upload time in desc order):
>                                Latest Version          Upload Time
> Package Name
> awscli                    1.16.28                      2018-10-05T23:12:45
> botocore                1.12.18                      2018-10-05T23:12:39
> promise                   2.2.1                        2018-10-04T22:04:18
> Keras                     2.2.4                         2018-10-03T20:59:39
> bleach                    3.0.0                        2018-10-03T16:54:27
> Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
> ... ...
>
> It's a minimal tool (not perfect yet but working). I have hosted this tool
> at https://github.com/XD-DENG/pypi-release-query.
>
>
> XD
>
> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > Hello Erik,
> >
> > I understand your concern. It's a hard one to solve in general (i.e.
> > dependency-hell). It looks like in this case you treat Airflow as
> > 'library', where for some other people it might be more like 'end
> product'.
> > If you look at the "pinning" philosophy - the "pin everything" is good
> for
> > end products, but not good for libraries. In the case you have Airflow is
> > treated as a bit of both. And it's perfectly valid case at that (with
> > custom python DAGs being central concept for Airflow).
> > However, I think it's not as bad as you think when it comes to exact
> > pinning.
> >
> > I believe - a bit counter-intuitively - that tools like pip-tools/poetry
> > with exact pinning result in having your dependencies upgraded more
> often,
> > rather than less - especially in complex systems where dependency-hell
> > creeps-in. If you look at Airflow's setup.py now - It's a bit scary to
> make
> > any change to it. There is a chance it will blow at your face if you
> change
> > it. You never know why there is 0.3 < ver < 1.0 - and if you change it,
> > whether it will cause chain reaction of conflicts that will ruin your
> work
> > day.
> >
> > On the contrary - if you change it to exact pinning in
> > .lock/requirements.txt file (poetry/pip-tools) and have much simpler (and
> > commented) exclusion/avoidance rules in your .in/.tml file, the whole
> setup
> > might be much easier to maintain and upgrade. Every time you prepare for
> > release (or even once in a while for master) one person might consciously
> > attempt to upgrade all dependencies to latest ones. It should be almost
> as
> > easy as letting poetry/pip-tools help with figuring out what are the
> latest
> > set of dependencies that will work without conflicts. It should be rather
> > straightforward (I've done it in the past for fairly complex systems).
> What
> > those tools enable is - doing single-shot upgrade of all dependencies.
> > After doing it you can make sure that all tests work fine (and fix any
> > problems that result from it). And then you test it thoroughly before you
> > make final release. You can do it in separate PR - with automated testing
> > in Travis which means that you are not disturbing work of others
> > (compilation/building + unit tests are guaranteed to work before you
> merge
> > it) while doing it. It's all conscious rather than accidental. Nice side
> > effect of that is that with every release you can actually "catch-up"
> with
> > latest stable versions of many libraries in one go. It's better than
> > waiting until someone deliberately upgrades to newer version (and the
> rest
> > remain terribly out-dated as is the case for Airflow now).
> >
> > So a bit counterintuitively I think tools like pip-tools/poetry help you
> to
> > catch up faster in many cases. That is at least my experience so far.
> >
> > Additionally, Airflow is an open system - if you have very specific needs
> > for requirements, you might actually - in the very same way with
> > pip-tools/poetry - upgrade all your dependencies in your local fork of
> > Airflow before someone else does it in master/release. Those tools kind
> of
> > democratise dependency management. It should be as easy as `pip-compile
> > --upgrade` or `poetry update` and you will get all the "non-conflicting"
> > latest dependencies in your local fork (and poetry especially seems to do
> > all the heavy lifting of figuring out which versions will work). You
> should
> > be able to test and publish it locally as your private package for local
> > installations. You can even mark the specific dependency you want to use
> > specific version and let pip-tools/poetry figure out exact versions of
> > other requirements. You can even make a PR with such upgrade eventually
> to
> > get it faster in master. You can even downgrade in case newer dependency
> > causes problems for you in similar way. Guided by the tools, it's much
> > faster than figuring the versions out by yourself.
> >
> > As long as we have simple way of managing it and document how to
> > upgrade/downgrade dependencies in your own fork, and mention how to
> locally
> > release Airflow as a package, I think your case could be covered even
> > better than now. What do you think ?
> >
> > J.
> >
> > On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> > <EK...@novozymes.com.invalid> wrote:
> >
> > > For us, exact pinning of versions would be problematic. We have DAG
> code
> > > that shares direct and indirect dependencies with Airflow, e.g. lxml,
> > > requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our
> DAG
> > > code for some reason needs a newer point release due to a bug that's
> > fixed,
> > > then we can't cleanly build a virtual environment containing the fixed
> > > version. For us, it's already a problem that Airflow has quite strict
> > (and
> > > sometimes old) requirements in setup.py.
> > >
> > > Erik
> > > ________________________________
> > > From: Jarek Potiuk <Ja...@polidea.com>
> > > Sent: Friday, October 5, 2018 2:01:15 PM
> > > To: dev@airflow.incubator.apache.org
> > > Subject: Re: Pinning dependencies for Apache Airflow
> > >
> > > I think one solution to release approach is to check as part of
> automated
> > > Travis build if all requirements are pinned with == (even the deep
> ones)
> > > and fail the build in case they are not for ALL versions (including
> > > dev). And of course we should document the approach of
> releases/upgrades
> > > etc. If we do it all the time for development versions (which seems
> quite
> > > doable), then transitively all the releases will also have pinned
> > versions
> > > and they will never try to upgrade any of the dependencies. In poetry
> > > (similarly in pip-tools with .in file) it is done by having a .lock
> file
> > > that specifies exact versions of each package so it can be rather easy
> to
> > > manage (so it's worth trying it out I think  :D  - seems a bit more
> > > friendly than pip-tools).
> > >
> > > There is a drawback - of course - with manually updating the module
> that
> > > you want, but I really see that as an advantage rather than drawback
> > > especially for users. This way you maintain the property that it will
> > > always install and work the same way no matter if you installed it
> today
> > or
> > > two months ago. I think the biggest drawback for maintainers is that
> you
> > > need some kind of monitoring of security vulnerabilities and cannot
> rely
> > on
> > > automated security upgrades. With >= requirements those security
> updates
> > > might happen automatically without anyone noticing, but to be honest I
> > > don't think such upgrades are guaranteed even in current setup for all
> > > security issues for all libraries anyway.
> > >
> > > Finding the need to upgrade because of security issues can be quite
> > > automated. Even now I noticed Github started to inform owners about
> > > potential security vulnerabilities in used libraries for their project.
> > > Those notifications can be sent to devlist and turned into JIRA issues
> > > followed bvy  minor security-related releases (with only few library
> > > dependencies upgraded).
> > >
> > > I think it's even easier to automate it if you have pinned
> dependencies -
> > > because it's generally easy to find applicable vulnerabilities for
> > specific
> > > versions of libraries by static analysers - when you have >=, you never
> > > know which version will be used until you actually perform the
> > > installation.
> > >
> > > There is one big advantage for maintainers for "pinned" case. Your
> users
> > > always have the same dependencies - so when issue is raised, you can
> > > reproduce it more easily. It's hard to know which version user has (as
> > the
> > > user could install it month ago or yesterday) and even if you find out
> by
> > > asking the user, you might not be able to reproduce the set of
> > requirements
> > > easily (simply because there are already newer versions of the
> libraries
> > > released and they are used automatically). You can ask the user to run
> > pip
> > > --upgrade but that's dangerous and pretty lame ("check the latest
> > version -
> > > maybe it fixes your problem ? ") and sometimes not possible (e.g.
> someone
> > > has pre-built docker image with dependencies from few months ago and
> > cannot
> > > rebuild the image easily).
> > >
> > > J.
> > >
> > > On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <as...@apache.org>
> > wrote:
> > >
> > > > One thing to point out here.
> > > >
> > > > Right now if you `pip install apache-airflow=1.10.0` in a clean
> > > > environment it will fail.
> > > >
> > > > This is because we pin flask-login to 0.2.1 but flask-appbuilder is
> >=
> > > > 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
> > > >
> > > > So I do think there is maybe something to be said about pinning for
> > > > releases. The down side to that is that if there are updates to a
> > module
> > > > that we want then we have to make a point release to let people get
> it
> > > >
> > > > Both methods have draw-backs
> > > >
> > > > -ash
> > > >
> > > > > On 4 Oct 2018, at 17:13, Arthur Wiedmer <ar...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hi Jarek,
> > > > >
> > > > > I will +1 the discussion Dan is referring to and George's advice.
> > > > >
> > > > > I just want to double check we are talking about pinning in
> > > > > requirements.txt only.
> > > > >
> > > > > This offers the ability to
> > > > > pip install -r requirements.txt
> > > > > pip install --no-deps airflow
> > > > > For a guaranteed install which works.
> > > > >
> > > > > Several different requirement files can be provided for specific
> use
> > > > cases,
> > > > > like a stable dev one for instance for people wanting to work on
> > > > operators
> > > > > and non-core functions.
> > > > >
> > > > > However, I think we should proactively test in CI against unpinned
> > > > > dependencies (though it might be a separate case in the matrix) ,
> so
> > > that
> > > > > we get advance warning if possible that things will break.
> > > > > CI downtime is not a bad thing here, it actually caught a problem
> :)
> > > > >
> > > > > We should unpin as possible in setup.py to only maintain minimum
> > > required
> > > > > compatibility. The process of pinning in setup.py is extremely
> > > > detrimental
> > > > > when you have a large number of python libraries installed with
> > > different
> > > > > pinned versions.
> > > > >
> > > > > Best,
> > > > > Arthur
> > > > >
> > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > <ddavydov@twitter.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Relevant discussion about this:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
> > > > >>
> > > > >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com>
> > > > >> wrote:
> > > > >>
> > > > >>> TL;DR; A change is coming in the way how
> dependencies/requirements
> > > are
> > > > >>> specified for Apache Airflow - they will be fixed rather than
> > > flexible
> > > > >> (==
> > > > >>> rather than >=).
> > > > >>>
> > > > >>> This is follow up after Slack discussion we had with Ash and
> Kaxil
> > -
> > > > >>> summarising what we propose we'll do.
> > > > >>>
> > > > >>> *Problem:*
> > > > >>> During last few weeks we experienced quite a few downtimes of
> > > TravisCI
> > > > >>> builds (for all PRs/branches including master) as some of the
> > > > transitive
> > > > >>> dependencies were automatically upgraded. This because in a
> number
> > of
> > > > >>> dependencies we have  >= rather than == dependencies.
> > > > >>>
> > > > >>> Whenever there is a new release of such dependency, it might
> cause
> > > > chain
> > > > >>> reaction with upgrade of transitive dependencies which might get
> > into
> > > > >>> conflict.
> > > > >>>
> > > > >>> An example was Flask-AppBuilder vs flask-login transitive
> > dependency
> > > > with
> > > > >>> click. They started to conflict once AppBuilder has released
> > version
> > > > >>> 1.12.0.
> > > > >>>
> > > > >>> *Diagnosis:*
> > > > >>> Transitive dependencies with "flexible" versions (where >= is
> used
> > > > >> instead
> > > > >>> of ==) is a reason for "dependency hell". We will sooner or later
> > hit
> > > > >> other
> > > > >>> cases where not fixed dependencies cause similar problems with
> > other
> > > > >>> transitive dependencies. We need to fix-pin them. This causes
> > > problems
> > > > >> for
> > > > >>> both - released versions (cause they stop to work!) and for
> > > development
> > > > >>> (cause they break master builds in TravisCI and prevent people
> from
> > > > >>> installing development environment from the scratch.
> > > > >>>
> > > > >>> *Solution:*
> > > > >>>
> > > > >>>   - Following the old-but-good post
> > > > >>>
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0
> > > we are going to fix the
> > > > >>> pinned
> > > > >>>   dependencies to specific versions (so basically all
> dependencies
> > > are
> > > > >>>   "fixed").
> > > > >>>   - We will introduce mechanism to be able to upgrade
> dependencies
> > > with
> > > > >>>   pip-tools (
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0
> > ).
> > > We might also
> > > > >> take a
> > > > >>>   look at pipenv:
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
> > > > >>>   - People who would like to upgrade some dependencies for their
> > PRs
> > > > >> will
> > > > >>>   still be able to do it - but such upgrades will be in their PR
> > thus
> > > > >> they
> > > > >>>   will go through TravisCI tests and they will also have to be
> > > > specified
> > > > >>> with
> > > > >>>   pinned fixed versions (==). This should be part of review
> process
> > > to
> > > > >>> make
> > > > >>>   sure new/changed requirements are pinned.
> > > > >>>   - In release process there will be a point where an upgrade
> will
> > be
> > > > >>>   attempted for all requirements (using pip-tools) so that we are
> > not
> > > > >>> stuck
> > > > >>>   with older releases. This will be in controlled PR environment
> > > where
> > > > >>> there
> > > > >>>   will be time to fix all dependencies without impacting others
> and
> > > > >> likely
> > > > >>>   enough time to "vet" such changes (this can be done for
> > alpha/beta
> > > > >>> releases
> > > > >>>   for example).
> > > > >>>   - As a side effect dependencies specification will become far
> > > simpler
> > > > >>>   and straightforward.
> > > > >>>
> > > > >>> Happy to hear community comments to the proposal. I am happy to
> > take
> > > a
> > > > >> lead
> > > > >>> on that, open JIRA issue and implement if this is something
> > community
> > > > is
> > > > >>> happy with.
> > > > >>>
> > > > >>> J.
> > > > >>>
> > > > >>> --
> > > > >>>
> > > > >>> *Jarek Potiuk, Principal Software Engineer*
> > > > >>> Mobile: +48 660 796 129
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> > > --
> > >
> > > *Jarek Potiuk, Principal Software Engineer*
> > > Mobile: +48 660 796 129
> > >
> >
> >
> > --
> >
> > *Jarek Potiuk, Principal Software Engineer*
> > Mobile: +48 660 796 129
> >
>


-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Re: Pinning dependencies for Apache Airflow

Posted by Deng Xiaodong <xd...@gmail.com>.
Hi folks,

On top of this discussion, I was thinking we should have the ability to
quickly monitor dependency release as well. Previously, it happened for a
few times that CI kept failing for no reason and eventually turned out it
was due to dependency release. But it took us some time, sometimes a few
days, to realise the failure was because of dependency release.

To partially address this, I tried to develop a mini tool to help us check
the latest release of Python packages & the release date-time on PyPi. So,
by comparing it with our CI failure history, we may be able to troubleshoot
faster.

Output Sample (ordered by upload time in desc order):
                               Latest Version          Upload Time
Package Name
awscli                    1.16.28                      2018-10-05T23:12:45
botocore                1.12.18                      2018-10-05T23:12:39
promise                   2.2.1                        2018-10-04T22:04:18
Keras                     2.2.4                         2018-10-03T20:59:39
bleach                    3.0.0                        2018-10-03T16:54:27
Flask-AppBuilder         1.12.0                2018-10-03T09:03:48
... ...

It's a minimal tool (not perfect yet but working). I have hosted this tool
at https://github.com/XD-DENG/pypi-release-query.


XD

On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Hello Erik,
>
> I understand your concern. It's a hard one to solve in general (i.e.
> dependency-hell). It looks like in this case you treat Airflow as
> 'library', where for some other people it might be more like 'end product'.
> If you look at the "pinning" philosophy - the "pin everything" is good for
> end products, but not good for libraries. In the case you have Airflow is
> treated as a bit of both. And it's perfectly valid case at that (with
> custom python DAGs being central concept for Airflow).
> However, I think it's not as bad as you think when it comes to exact
> pinning.
>
> I believe - a bit counter-intuitively - that tools like pip-tools/poetry
> with exact pinning result in having your dependencies upgraded more often,
> rather than less - especially in complex systems where dependency-hell
> creeps-in. If you look at Airflow's setup.py now - It's a bit scary to make
> any change to it. There is a chance it will blow at your face if you change
> it. You never know why there is 0.3 < ver < 1.0 - and if you change it,
> whether it will cause chain reaction of conflicts that will ruin your work
> day.
>
> On the contrary - if you change it to exact pinning in
> .lock/requirements.txt file (poetry/pip-tools) and have much simpler (and
> commented) exclusion/avoidance rules in your .in/.tml file, the whole setup
> might be much easier to maintain and upgrade. Every time you prepare for
> release (or even once in a while for master) one person might consciously
> attempt to upgrade all dependencies to latest ones. It should be almost as
> easy as letting poetry/pip-tools help with figuring out what are the latest
> set of dependencies that will work without conflicts. It should be rather
> straightforward (I've done it in the past for fairly complex systems). What
> those tools enable is - doing single-shot upgrade of all dependencies.
> After doing it you can make sure that all tests work fine (and fix any
> problems that result from it). And then you test it thoroughly before you
> make final release. You can do it in separate PR - with automated testing
> in Travis which means that you are not disturbing work of others
> (compilation/building + unit tests are guaranteed to work before you merge
> it) while doing it. It's all conscious rather than accidental. Nice side
> effect of that is that with every release you can actually "catch-up" with
> latest stable versions of many libraries in one go. It's better than
> waiting until someone deliberately upgrades to newer version (and the rest
> remain terribly out-dated as is the case for Airflow now).
>
> So a bit counterintuitively I think tools like pip-tools/poetry help you to
> catch up faster in many cases. That is at least my experience so far.
>
> Additionally, Airflow is an open system - if you have very specific needs
> for requirements, you might actually - in the very same way with
> pip-tools/poetry - upgrade all your dependencies in your local fork of
> Airflow before someone else does it in master/release. Those tools kind of
> democratise dependency management. It should be as easy as `pip-compile
> --upgrade` or `poetry update` and you will get all the "non-conflicting"
> latest dependencies in your local fork (and poetry especially seems to do
> all the heavy lifting of figuring out which versions will work). You should
> be able to test and publish it locally as your private package for local
> installations. You can even mark the specific dependency you want to use
> specific version and let pip-tools/poetry figure out exact versions of
> other requirements. You can even make a PR with such upgrade eventually to
> get it faster in master. You can even downgrade in case newer dependency
> causes problems for you in similar way. Guided by the tools, it's much
> faster than figuring the versions out by yourself.
>
> As long as we have simple way of managing it and document how to
> upgrade/downgrade dependencies in your own fork, and mention how to locally
> release Airflow as a package, I think your case could be covered even
> better than now. What do you think ?
>
> J.
>
> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> <EK...@novozymes.com.invalid> wrote:
>
> > For us, exact pinning of versions would be problematic. We have DAG code
> > that shares direct and indirect dependencies with Airflow, e.g. lxml,
> > requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our DAG
> > code for some reason needs a newer point release due to a bug that's
> fixed,
> > then we can't cleanly build a virtual environment containing the fixed
> > version. For us, it's already a problem that Airflow has quite strict
> (and
> > sometimes old) requirements in setup.py.
> >
> > Erik
> > ________________________________
> > From: Jarek Potiuk <Ja...@polidea.com>
> > Sent: Friday, October 5, 2018 2:01:15 PM
> > To: dev@airflow.incubator.apache.org
> > Subject: Re: Pinning dependencies for Apache Airflow
> >
> > I think one solution to release approach is to check as part of automated
> > Travis build if all requirements are pinned with == (even the deep ones)
> > and fail the build in case they are not for ALL versions (including
> > dev). And of course we should document the approach of releases/upgrades
> > etc. If we do it all the time for development versions (which seems quite
> > doable), then transitively all the releases will also have pinned
> versions
> > and they will never try to upgrade any of the dependencies. In poetry
> > (similarly in pip-tools with .in file) it is done by having a .lock file
> > that specifies exact versions of each package so it can be rather easy to
> > manage (so it's worth trying it out I think  :D  - seems a bit more
> > friendly than pip-tools).
> >
> > There is a drawback - of course - with manually updating the module that
> > you want, but I really see that as an advantage rather than drawback
> > especially for users. This way you maintain the property that it will
> > always install and work the same way no matter if you installed it today
> or
> > two months ago. I think the biggest drawback for maintainers is that you
> > need some kind of monitoring of security vulnerabilities and cannot rely
> on
> > automated security upgrades. With >= requirements those security updates
> > might happen automatically without anyone noticing, but to be honest I
> > don't think such upgrades are guaranteed even in current setup for all
> > security issues for all libraries anyway.
> >
> > Finding the need to upgrade because of security issues can be quite
> > automated. Even now I noticed Github started to inform owners about
> > potential security vulnerabilities in used libraries for their project.
> > Those notifications can be sent to devlist and turned into JIRA issues
> > followed bvy  minor security-related releases (with only few library
> > dependencies upgraded).
> >
> > I think it's even easier to automate it if you have pinned dependencies -
> > because it's generally easy to find applicable vulnerabilities for
> specific
> > versions of libraries by static analysers - when you have >=, you never
> > know which version will be used until you actually perform the
> > installation.
> >
> > There is one big advantage for maintainers for "pinned" case. Your users
> > always have the same dependencies - so when issue is raised, you can
> > reproduce it more easily. It's hard to know which version user has (as
> the
> > user could install it month ago or yesterday) and even if you find out by
> > asking the user, you might not be able to reproduce the set of
> requirements
> > easily (simply because there are already newer versions of the libraries
> > released and they are used automatically). You can ask the user to run
> pip
> > --upgrade but that's dangerous and pretty lame ("check the latest
> version -
> > maybe it fixes your problem ? ") and sometimes not possible (e.g. someone
> > has pre-built docker image with dependencies from few months ago and
> cannot
> > rebuild the image easily).
> >
> > J.
> >
> > On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <as...@apache.org>
> wrote:
> >
> > > One thing to point out here.
> > >
> > > Right now if you `pip install apache-airflow=1.10.0` in a clean
> > > environment it will fail.
> > >
> > > This is because we pin flask-login to 0.2.1 but flask-appbuilder is >=
> > > 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
> > >
> > > So I do think there is maybe something to be said about pinning for
> > > releases. The down side to that is that if there are updates to a
> module
> > > that we want then we have to make a point release to let people get it
> > >
> > > Both methods have draw-backs
> > >
> > > -ash
> > >
> > > > On 4 Oct 2018, at 17:13, Arthur Wiedmer <ar...@gmail.com>
> > > wrote:
> > > >
> > > > Hi Jarek,
> > > >
> > > > I will +1 the discussion Dan is referring to and George's advice.
> > > >
> > > > I just want to double check we are talking about pinning in
> > > > requirements.txt only.
> > > >
> > > > This offers the ability to
> > > > pip install -r requirements.txt
> > > > pip install --no-deps airflow
> > > > For a guaranteed install which works.
> > > >
> > > > Several different requirement files can be provided for specific use
> > > cases,
> > > > like a stable dev one for instance for people wanting to work on
> > > operators
> > > > and non-core functions.
> > > >
> > > > However, I think we should proactively test in CI against unpinned
> > > > dependencies (though it might be a separate case in the matrix) , so
> > that
> > > > we get advance warning if possible that things will break.
> > > > CI downtime is not a bad thing here, it actually caught a problem :)
> > > >
> > > > We should unpin as possible in setup.py to only maintain minimum
> > required
> > > > compatibility. The process of pinning in setup.py is extremely
> > > detrimental
> > > > when you have a large number of python libraries installed with
> > different
> > > > pinned versions.
> > > >
> > > > Best,
> > > > Arthur
> > > >
> > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > <ddavydov@twitter.com.invalid
> > > >
> > > > wrote:
> > > >
> > > >> Relevant discussion about this:
> > > >>
> > > >>
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
> > > >>
> > > >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > > >> wrote:
> > > >>
> > > >>> TL;DR; A change is coming in the way how dependencies/requirements
> > are
> > > >>> specified for Apache Airflow - they will be fixed rather than
> > flexible
> > > >> (==
> > > >>> rather than >=).
> > > >>>
> > > >>> This is follow up after Slack discussion we had with Ash and Kaxil
> -
> > > >>> summarising what we propose we'll do.
> > > >>>
> > > >>> *Problem:*
> > > >>> During last few weeks we experienced quite a few downtimes of
> > TravisCI
> > > >>> builds (for all PRs/branches including master) as some of the
> > > transitive
> > > >>> dependencies were automatically upgraded. This because in a number
> of
> > > >>> dependencies we have  >= rather than == dependencies.
> > > >>>
> > > >>> Whenever there is a new release of such dependency, it might cause
> > > chain
> > > >>> reaction with upgrade of transitive dependencies which might get
> into
> > > >>> conflict.
> > > >>>
> > > >>> An example was Flask-AppBuilder vs flask-login transitive
> dependency
> > > with
> > > >>> click. They started to conflict once AppBuilder has released
> version
> > > >>> 1.12.0.
> > > >>>
> > > >>> *Diagnosis:*
> > > >>> Transitive dependencies with "flexible" versions (where >= is used
> > > >> instead
> > > >>> of ==) is a reason for "dependency hell". We will sooner or later
> hit
> > > >> other
> > > >>> cases where not fixed dependencies cause similar problems with
> other
> > > >>> transitive dependencies. We need to fix-pin them. This causes
> > problems
> > > >> for
> > > >>> both - released versions (cause they stop to work!) and for
> > development
> > > >>> (cause they break master builds in TravisCI and prevent people from
> > > >>> installing development environment from the scratch.
> > > >>>
> > > >>> *Solution:*
> > > >>>
> > > >>>   - Following the old-but-good post
> > > >>>
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0
> > we are going to fix the
> > > >>> pinned
> > > >>>   dependencies to specific versions (so basically all dependencies
> > are
> > > >>>   "fixed").
> > > >>>   - We will introduce mechanism to be able to upgrade dependencies
> > with
> > > >>>   pip-tools (
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0
> ).
> > We might also
> > > >> take a
> > > >>>   look at pipenv:
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
> > > >>>   - People who would like to upgrade some dependencies for their
> PRs
> > > >> will
> > > >>>   still be able to do it - but such upgrades will be in their PR
> thus
> > > >> they
> > > >>>   will go through TravisCI tests and they will also have to be
> > > specified
> > > >>> with
> > > >>>   pinned fixed versions (==). This should be part of review process
> > to
> > > >>> make
> > > >>>   sure new/changed requirements are pinned.
> > > >>>   - In release process there will be a point where an upgrade will
> be
> > > >>>   attempted for all requirements (using pip-tools) so that we are
> not
> > > >>> stuck
> > > >>>   with older releases. This will be in controlled PR environment
> > where
> > > >>> there
> > > >>>   will be time to fix all dependencies without impacting others and
> > > >> likely
> > > >>>   enough time to "vet" such changes (this can be done for
> alpha/beta
> > > >>> releases
> > > >>>   for example).
> > > >>>   - As a side effect dependencies specification will become far
> > simpler
> > > >>>   and straightforward.
> > > >>>
> > > >>> Happy to hear community comments to the proposal. I am happy to
> take
> > a
> > > >> lead
> > > >>> on that, open JIRA issue and implement if this is something
> community
> > > is
> > > >>> happy with.
> > > >>>
> > > >>> J.
> > > >>>
> > > >>> --
> > > >>>
> > > >>> *Jarek Potiuk, Principal Software Engineer*
> > > >>> Mobile: +48 660 796 129
> > > >>>
> > > >>
> > >
> > >
> >
> > --
> >
> > *Jarek Potiuk, Principal Software Engineer*
> > Mobile: +48 660 796 129
> >
>
>
> --
>
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129
>

Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
Hello Erik,

I understand your concern. It's a hard one to solve in general (i.e.
dependency-hell). It looks like in this case you treat Airflow as
'library', where for some other people it might be more like 'end product'.
If you look at the "pinning" philosophy - the "pin everything" is good for
end products, but not good for libraries. In the case you have Airflow is
treated as a bit of both. And it's perfectly valid case at that (with
custom python DAGs being central concept for Airflow).
However, I think it's not as bad as you think when it comes to exact
pinning.

I believe - a bit counter-intuitively - that tools like pip-tools/poetry
with exact pinning result in having your dependencies upgraded more often,
rather than less - especially in complex systems where dependency-hell
creeps-in. If you look at Airflow's setup.py now - It's a bit scary to make
any change to it. There is a chance it will blow at your face if you change
it. You never know why there is 0.3 < ver < 1.0 - and if you change it,
whether it will cause chain reaction of conflicts that will ruin your work
day.

On the contrary - if you change it to exact pinning in
.lock/requirements.txt file (poetry/pip-tools) and have much simpler (and
commented) exclusion/avoidance rules in your .in/.tml file, the whole setup
might be much easier to maintain and upgrade. Every time you prepare for
release (or even once in a while for master) one person might consciously
attempt to upgrade all dependencies to latest ones. It should be almost as
easy as letting poetry/pip-tools help with figuring out what are the latest
set of dependencies that will work without conflicts. It should be rather
straightforward (I've done it in the past for fairly complex systems). What
those tools enable is - doing single-shot upgrade of all dependencies.
After doing it you can make sure that all tests work fine (and fix any
problems that result from it). And then you test it thoroughly before you
make final release. You can do it in separate PR - with automated testing
in Travis which means that you are not disturbing work of others
(compilation/building + unit tests are guaranteed to work before you merge
it) while doing it. It's all conscious rather than accidental. Nice side
effect of that is that with every release you can actually "catch-up" with
latest stable versions of many libraries in one go. It's better than
waiting until someone deliberately upgrades to newer version (and the rest
remain terribly out-dated as is the case for Airflow now).

So a bit counterintuitively I think tools like pip-tools/poetry help you to
catch up faster in many cases. That is at least my experience so far.

Additionally, Airflow is an open system - if you have very specific needs
for requirements, you might actually - in the very same way with
pip-tools/poetry - upgrade all your dependencies in your local fork of
Airflow before someone else does it in master/release. Those tools kind of
democratise dependency management. It should be as easy as `pip-compile
--upgrade` or `poetry update` and you will get all the "non-conflicting"
latest dependencies in your local fork (and poetry especially seems to do
all the heavy lifting of figuring out which versions will work). You should
be able to test and publish it locally as your private package for local
installations. You can even mark the specific dependency you want to use
specific version and let pip-tools/poetry figure out exact versions of
other requirements. You can even make a PR with such upgrade eventually to
get it faster in master. You can even downgrade in case newer dependency
causes problems for you in similar way. Guided by the tools, it's much
faster than figuring the versions out by yourself.

As long as we have simple way of managing it and document how to
upgrade/downgrade dependencies in your own fork, and mention how to locally
release Airflow as a package, I think your case could be covered even
better than now. What do you think ?

J.

On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
<EK...@novozymes.com.invalid> wrote:

> For us, exact pinning of versions would be problematic. We have DAG code
> that shares direct and indirect dependencies with Airflow, e.g. lxml,
> requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our DAG
> code for some reason needs a newer point release due to a bug that's fixed,
> then we can't cleanly build a virtual environment containing the fixed
> version. For us, it's already a problem that Airflow has quite strict (and
> sometimes old) requirements in setup.py.
>
> Erik
> ________________________________
> From: Jarek Potiuk <Ja...@polidea.com>
> Sent: Friday, October 5, 2018 2:01:15 PM
> To: dev@airflow.incubator.apache.org
> Subject: Re: Pinning dependencies for Apache Airflow
>
> I think one solution to release approach is to check as part of automated
> Travis build if all requirements are pinned with == (even the deep ones)
> and fail the build in case they are not for ALL versions (including
> dev). And of course we should document the approach of releases/upgrades
> etc. If we do it all the time for development versions (which seems quite
> doable), then transitively all the releases will also have pinned versions
> and they will never try to upgrade any of the dependencies. In poetry
> (similarly in pip-tools with .in file) it is done by having a .lock file
> that specifies exact versions of each package so it can be rather easy to
> manage (so it's worth trying it out I think  :D  - seems a bit more
> friendly than pip-tools).
>
> There is a drawback - of course - with manually updating the module that
> you want, but I really see that as an advantage rather than drawback
> especially for users. This way you maintain the property that it will
> always install and work the same way no matter if you installed it today or
> two months ago. I think the biggest drawback for maintainers is that you
> need some kind of monitoring of security vulnerabilities and cannot rely on
> automated security upgrades. With >= requirements those security updates
> might happen automatically without anyone noticing, but to be honest I
> don't think such upgrades are guaranteed even in current setup for all
> security issues for all libraries anyway.
>
> Finding the need to upgrade because of security issues can be quite
> automated. Even now I noticed Github started to inform owners about
> potential security vulnerabilities in used libraries for their project.
> Those notifications can be sent to devlist and turned into JIRA issues
> followed bvy  minor security-related releases (with only few library
> dependencies upgraded).
>
> I think it's even easier to automate it if you have pinned dependencies -
> because it's generally easy to find applicable vulnerabilities for specific
> versions of libraries by static analysers - when you have >=, you never
> know which version will be used until you actually perform the
> installation.
>
> There is one big advantage for maintainers for "pinned" case. Your users
> always have the same dependencies - so when issue is raised, you can
> reproduce it more easily. It's hard to know which version user has (as the
> user could install it month ago or yesterday) and even if you find out by
> asking the user, you might not be able to reproduce the set of requirements
> easily (simply because there are already newer versions of the libraries
> released and they are used automatically). You can ask the user to run pip
> --upgrade but that's dangerous and pretty lame ("check the latest version -
> maybe it fixes your problem ? ") and sometimes not possible (e.g. someone
> has pre-built docker image with dependencies from few months ago and cannot
> rebuild the image easily).
>
> J.
>
> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
> > One thing to point out here.
> >
> > Right now if you `pip install apache-airflow=1.10.0` in a clean
> > environment it will fail.
> >
> > This is because we pin flask-login to 0.2.1 but flask-appbuilder is >=
> > 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
> >
> > So I do think there is maybe something to be said about pinning for
> > releases. The down side to that is that if there are updates to a module
> > that we want then we have to make a point release to let people get it
> >
> > Both methods have draw-backs
> >
> > -ash
> >
> > > On 4 Oct 2018, at 17:13, Arthur Wiedmer <ar...@gmail.com>
> > wrote:
> > >
> > > Hi Jarek,
> > >
> > > I will +1 the discussion Dan is referring to and George's advice.
> > >
> > > I just want to double check we are talking about pinning in
> > > requirements.txt only.
> > >
> > > This offers the ability to
> > > pip install -r requirements.txt
> > > pip install --no-deps airflow
> > > For a guaranteed install which works.
> > >
> > > Several different requirement files can be provided for specific use
> > cases,
> > > like a stable dev one for instance for people wanting to work on
> > operators
> > > and non-core functions.
> > >
> > > However, I think we should proactively test in CI against unpinned
> > > dependencies (though it might be a separate case in the matrix) , so
> that
> > > we get advance warning if possible that things will break.
> > > CI downtime is not a bad thing here, it actually caught a problem :)
> > >
> > > We should unpin as possible in setup.py to only maintain minimum
> required
> > > compatibility. The process of pinning in setup.py is extremely
> > detrimental
> > > when you have a large number of python libraries installed with
> different
> > > pinned versions.
> > >
> > > Best,
> > > Arthur
> > >
> > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> <ddavydov@twitter.com.invalid
> > >
> > > wrote:
> > >
> > >> Relevant discussion about this:
> > >>
> > >>
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
> > >>
> > >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > >> wrote:
> > >>
> > >>> TL;DR; A change is coming in the way how dependencies/requirements
> are
> > >>> specified for Apache Airflow - they will be fixed rather than
> flexible
> > >> (==
> > >>> rather than >=).
> > >>>
> > >>> This is follow up after Slack discussion we had with Ash and Kaxil -
> > >>> summarising what we propose we'll do.
> > >>>
> > >>> *Problem:*
> > >>> During last few weeks we experienced quite a few downtimes of
> TravisCI
> > >>> builds (for all PRs/branches including master) as some of the
> > transitive
> > >>> dependencies were automatically upgraded. This because in a number of
> > >>> dependencies we have  >= rather than == dependencies.
> > >>>
> > >>> Whenever there is a new release of such dependency, it might cause
> > chain
> > >>> reaction with upgrade of transitive dependencies which might get into
> > >>> conflict.
> > >>>
> > >>> An example was Flask-AppBuilder vs flask-login transitive dependency
> > with
> > >>> click. They started to conflict once AppBuilder has released version
> > >>> 1.12.0.
> > >>>
> > >>> *Diagnosis:*
> > >>> Transitive dependencies with "flexible" versions (where >= is used
> > >> instead
> > >>> of ==) is a reason for "dependency hell". We will sooner or later hit
> > >> other
> > >>> cases where not fixed dependencies cause similar problems with other
> > >>> transitive dependencies. We need to fix-pin them. This causes
> problems
> > >> for
> > >>> both - released versions (cause they stop to work!) and for
> development
> > >>> (cause they break master builds in TravisCI and prevent people from
> > >>> installing development environment from the scratch.
> > >>>
> > >>> *Solution:*
> > >>>
> > >>>   - Following the old-but-good post
> > >>>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0
> we are going to fix the
> > >>> pinned
> > >>>   dependencies to specific versions (so basically all dependencies
> are
> > >>>   "fixed").
> > >>>   - We will introduce mechanism to be able to upgrade dependencies
> with
> > >>>   pip-tools (
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0).
> We might also
> > >> take a
> > >>>   look at pipenv:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
> > >>>   - People who would like to upgrade some dependencies for their PRs
> > >> will
> > >>>   still be able to do it - but such upgrades will be in their PR thus
> > >> they
> > >>>   will go through TravisCI tests and they will also have to be
> > specified
> > >>> with
> > >>>   pinned fixed versions (==). This should be part of review process
> to
> > >>> make
> > >>>   sure new/changed requirements are pinned.
> > >>>   - In release process there will be a point where an upgrade will be
> > >>>   attempted for all requirements (using pip-tools) so that we are not
> > >>> stuck
> > >>>   with older releases. This will be in controlled PR environment
> where
> > >>> there
> > >>>   will be time to fix all dependencies without impacting others and
> > >> likely
> > >>>   enough time to "vet" such changes (this can be done for alpha/beta
> > >>> releases
> > >>>   for example).
> > >>>   - As a side effect dependencies specification will become far
> simpler
> > >>>   and straightforward.
> > >>>
> > >>> Happy to hear community comments to the proposal. I am happy to take
> a
> > >> lead
> > >>> on that, open JIRA issue and implement if this is something community
> > is
> > >>> happy with.
> > >>>
> > >>> J.
> > >>>
> > >>> --
> > >>>
> > >>> *Jarek Potiuk, Principal Software Engineer*
> > >>> Mobile: +48 660 796 129
> > >>>
> > >>
> >
> >
>
> --
>
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129
>


-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Re: Pinning dependencies for Apache Airflow

Posted by "EKC (Erik Cederstrand)" <EK...@novozymes.com.INVALID>.
For us, exact pinning of versions would be problematic. We have DAG code that shares direct and indirect dependencies with Airflow, e.g. lxml, requests, pyhive, future, thrift, tzlocal, psycopg2 and ldap3. If our DAG code for some reason needs a newer point release due to a bug that's fixed, then we can't cleanly build a virtual environment containing the fixed version. For us, it's already a problem that Airflow has quite strict (and sometimes old) requirements in setup.py.

Erik
________________________________
From: Jarek Potiuk <Ja...@polidea.com>
Sent: Friday, October 5, 2018 2:01:15 PM
To: dev@airflow.incubator.apache.org
Subject: Re: Pinning dependencies for Apache Airflow

I think one solution to release approach is to check as part of automated
Travis build if all requirements are pinned with == (even the deep ones)
and fail the build in case they are not for ALL versions (including
dev). And of course we should document the approach of releases/upgrades
etc. If we do it all the time for development versions (which seems quite
doable), then transitively all the releases will also have pinned versions
and they will never try to upgrade any of the dependencies. In poetry
(similarly in pip-tools with .in file) it is done by having a .lock file
that specifies exact versions of each package so it can be rather easy to
manage (so it's worth trying it out I think  :D  - seems a bit more
friendly than pip-tools).

There is a drawback - of course - with manually updating the module that
you want, but I really see that as an advantage rather than drawback
especially for users. This way you maintain the property that it will
always install and work the same way no matter if you installed it today or
two months ago. I think the biggest drawback for maintainers is that you
need some kind of monitoring of security vulnerabilities and cannot rely on
automated security upgrades. With >= requirements those security updates
might happen automatically without anyone noticing, but to be honest I
don't think such upgrades are guaranteed even in current setup for all
security issues for all libraries anyway.

Finding the need to upgrade because of security issues can be quite
automated. Even now I noticed Github started to inform owners about
potential security vulnerabilities in used libraries for their project.
Those notifications can be sent to devlist and turned into JIRA issues
followed bvy  minor security-related releases (with only few library
dependencies upgraded).

I think it's even easier to automate it if you have pinned dependencies -
because it's generally easy to find applicable vulnerabilities for specific
versions of libraries by static analysers - when you have >=, you never
know which version will be used until you actually perform the
installation.

There is one big advantage for maintainers for "pinned" case. Your users
always have the same dependencies - so when issue is raised, you can
reproduce it more easily. It's hard to know which version user has (as the
user could install it month ago or yesterday) and even if you find out by
asking the user, you might not be able to reproduce the set of requirements
easily (simply because there are already newer versions of the libraries
released and they are used automatically). You can ask the user to run pip
--upgrade but that's dangerous and pretty lame ("check the latest version -
maybe it fixes your problem ? ") and sometimes not possible (e.g. someone
has pre-built docker image with dependencies from few months ago and cannot
rebuild the image easily).

J.

On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> One thing to point out here.
>
> Right now if you `pip install apache-airflow=1.10.0` in a clean
> environment it will fail.
>
> This is because we pin flask-login to 0.2.1 but flask-appbuilder is >=
> 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
>
> So I do think there is maybe something to be said about pinning for
> releases. The down side to that is that if there are updates to a module
> that we want then we have to make a point release to let people get it
>
> Both methods have draw-backs
>
> -ash
>
> > On 4 Oct 2018, at 17:13, Arthur Wiedmer <ar...@gmail.com>
> wrote:
> >
> > Hi Jarek,
> >
> > I will +1 the discussion Dan is referring to and George's advice.
> >
> > I just want to double check we are talking about pinning in
> > requirements.txt only.
> >
> > This offers the ability to
> > pip install -r requirements.txt
> > pip install --no-deps airflow
> > For a guaranteed install which works.
> >
> > Several different requirement files can be provided for specific use
> cases,
> > like a stable dev one for instance for people wanting to work on
> operators
> > and non-core functions.
> >
> > However, I think we should proactively test in CI against unpinned
> > dependencies (though it might be a separate case in the matrix) , so that
> > we get advance warning if possible that things will break.
> > CI downtime is not a bad thing here, it actually caught a problem :)
> >
> > We should unpin as possible in setup.py to only maintain minimum required
> > compatibility. The process of pinning in setup.py is extremely
> detrimental
> > when you have a large number of python libraries installed with different
> > pinned versions.
> >
> > Best,
> > Arthur
> >
> > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov <ddavydov@twitter.com.invalid
> >
> > wrote:
> >
> >> Relevant discussion about this:
> >>
> >>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=MM%2FoNwkPYR8UtBUczXLfZD2lCp7Ig%2BI%2FL2rFszcoJi8%3D&amp;reserved=0
> >>
> >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <Ja...@polidea.com>
> >> wrote:
> >>
> >>> TL;DR; A change is coming in the way how dependencies/requirements are
> >>> specified for Apache Airflow - they will be fixed rather than flexible
> >> (==
> >>> rather than >=).
> >>>
> >>> This is follow up after Slack discussion we had with Ash and Kaxil -
> >>> summarising what we propose we'll do.
> >>>
> >>> *Problem:*
> >>> During last few weeks we experienced quite a few downtimes of TravisCI
> >>> builds (for all PRs/branches including master) as some of the
> transitive
> >>> dependencies were automatically upgraded. This because in a number of
> >>> dependencies we have  >= rather than == dependencies.
> >>>
> >>> Whenever there is a new release of such dependency, it might cause
> chain
> >>> reaction with upgrade of transitive dependencies which might get into
> >>> conflict.
> >>>
> >>> An example was Flask-AppBuilder vs flask-login transitive dependency
> with
> >>> click. They started to conflict once AppBuilder has released version
> >>> 1.12.0.
> >>>
> >>> *Diagnosis:*
> >>> Transitive dependencies with "flexible" versions (where >= is used
> >> instead
> >>> of ==) is a reason for "dependency hell". We will sooner or later hit
> >> other
> >>> cases where not fixed dependencies cause similar problems with other
> >>> transitive dependencies. We need to fix-pin them. This causes problems
> >> for
> >>> both - released versions (cause they stop to work!) and for development
> >>> (cause they break master builds in TravisCI and prevent people from
> >>> installing development environment from the scratch.
> >>>
> >>> *Solution:*
> >>>
> >>>   - Following the old-but-good post
> >>>   https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=PVE3S4mgki7L%2BcAe104o2cf68wRXolvYXRFmAyiX8gA%3D&amp;reserved=0 we are going to fix the
> >>> pinned
> >>>   dependencies to specific versions (so basically all dependencies are
> >>>   "fixed").
> >>>   - We will introduce mechanism to be able to upgrade dependencies with
> >>>   pip-tools (https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=Kt9CjWrolvpjp7MwIR2nn8EIf9CW9HW02U7GVGyOXMo%3D&amp;reserved=0). We might also
> >> take a
> >>>   look at pipenv: https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7Cd31403917b084e3615c208d62aba4c24%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1tiY6pgX3IbRYC5W0HKr0ER2qMZ3GKYrwmWg%2BUo0tqs%3D&amp;reserved=0
> >>>   - People who would like to upgrade some dependencies for their PRs
> >> will
> >>>   still be able to do it - but such upgrades will be in their PR thus
> >> they
> >>>   will go through TravisCI tests and they will also have to be
> specified
> >>> with
> >>>   pinned fixed versions (==). This should be part of review process to
> >>> make
> >>>   sure new/changed requirements are pinned.
> >>>   - In release process there will be a point where an upgrade will be
> >>>   attempted for all requirements (using pip-tools) so that we are not
> >>> stuck
> >>>   with older releases. This will be in controlled PR environment where
> >>> there
> >>>   will be time to fix all dependencies without impacting others and
> >> likely
> >>>   enough time to "vet" such changes (this can be done for alpha/beta
> >>> releases
> >>>   for example).
> >>>   - As a side effect dependencies specification will become far simpler
> >>>   and straightforward.
> >>>
> >>> Happy to hear community comments to the proposal. I am happy to take a
> >> lead
> >>> on that, open JIRA issue and implement if this is something community
> is
> >>> happy with.
> >>>
> >>> J.
> >>>
> >>> --
> >>>
> >>> *Jarek Potiuk, Principal Software Engineer*
> >>> Mobile: +48 660 796 129
> >>>
> >>
>
>

--

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
I think one solution to release approach is to check as part of automated
Travis build if all requirements are pinned with == (even the deep ones)
and fail the build in case they are not for ALL versions (including
dev). And of course we should document the approach of releases/upgrades
etc. If we do it all the time for development versions (which seems quite
doable), then transitively all the releases will also have pinned versions
and they will never try to upgrade any of the dependencies. In poetry
(similarly in pip-tools with .in file) it is done by having a .lock file
that specifies exact versions of each package so it can be rather easy to
manage (so it's worth trying it out I think  :D  - seems a bit more
friendly than pip-tools).

There is a drawback - of course - with manually updating the module that
you want, but I really see that as an advantage rather than drawback
especially for users. This way you maintain the property that it will
always install and work the same way no matter if you installed it today or
two months ago. I think the biggest drawback for maintainers is that you
need some kind of monitoring of security vulnerabilities and cannot rely on
automated security upgrades. With >= requirements those security updates
might happen automatically without anyone noticing, but to be honest I
don't think such upgrades are guaranteed even in current setup for all
security issues for all libraries anyway.

Finding the need to upgrade because of security issues can be quite
automated. Even now I noticed Github started to inform owners about
potential security vulnerabilities in used libraries for their project.
Those notifications can be sent to devlist and turned into JIRA issues
followed bvy  minor security-related releases (with only few library
dependencies upgraded).

I think it's even easier to automate it if you have pinned dependencies -
because it's generally easy to find applicable vulnerabilities for specific
versions of libraries by static analysers - when you have >=, you never
know which version will be used until you actually perform the
installation.

There is one big advantage for maintainers for "pinned" case. Your users
always have the same dependencies - so when issue is raised, you can
reproduce it more easily. It's hard to know which version user has (as the
user could install it month ago or yesterday) and even if you find out by
asking the user, you might not be able to reproduce the set of requirements
easily (simply because there are already newer versions of the libraries
released and they are used automatically). You can ask the user to run pip
--upgrade but that's dangerous and pretty lame ("check the latest version -
maybe it fixes your problem ? ") and sometimes not possible (e.g. someone
has pre-built docker image with dependencies from few months ago and cannot
rebuild the image easily).

J.

On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> One thing to point out here.
>
> Right now if you `pip install apache-airflow=1.10.0` in a clean
> environment it will fail.
>
> This is because we pin flask-login to 0.2.1 but flask-appbuilder is >=
> 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.
>
> So I do think there is maybe something to be said about pinning for
> releases. The down side to that is that if there are updates to a module
> that we want then we have to make a point release to let people get it
>
> Both methods have draw-backs
>
> -ash
>
> > On 4 Oct 2018, at 17:13, Arthur Wiedmer <ar...@gmail.com>
> wrote:
> >
> > Hi Jarek,
> >
> > I will +1 the discussion Dan is referring to and George's advice.
> >
> > I just want to double check we are talking about pinning in
> > requirements.txt only.
> >
> > This offers the ability to
> > pip install -r requirements.txt
> > pip install --no-deps airflow
> > For a guaranteed install which works.
> >
> > Several different requirement files can be provided for specific use
> cases,
> > like a stable dev one for instance for people wanting to work on
> operators
> > and non-core functions.
> >
> > However, I think we should proactively test in CI against unpinned
> > dependencies (though it might be a separate case in the matrix) , so that
> > we get advance warning if possible that things will break.
> > CI downtime is not a bad thing here, it actually caught a problem :)
> >
> > We should unpin as possible in setup.py to only maintain minimum required
> > compatibility. The process of pinning in setup.py is extremely
> detrimental
> > when you have a large number of python libraries installed with different
> > pinned versions.
> >
> > Best,
> > Arthur
> >
> > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov <ddavydov@twitter.com.invalid
> >
> > wrote:
> >
> >> Relevant discussion about this:
> >>
> >>
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> >>
> >> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <Ja...@polidea.com>
> >> wrote:
> >>
> >>> TL;DR; A change is coming in the way how dependencies/requirements are
> >>> specified for Apache Airflow - they will be fixed rather than flexible
> >> (==
> >>> rather than >=).
> >>>
> >>> This is follow up after Slack discussion we had with Ash and Kaxil -
> >>> summarising what we propose we'll do.
> >>>
> >>> *Problem:*
> >>> During last few weeks we experienced quite a few downtimes of TravisCI
> >>> builds (for all PRs/branches including master) as some of the
> transitive
> >>> dependencies were automatically upgraded. This because in a number of
> >>> dependencies we have  >= rather than == dependencies.
> >>>
> >>> Whenever there is a new release of such dependency, it might cause
> chain
> >>> reaction with upgrade of transitive dependencies which might get into
> >>> conflict.
> >>>
> >>> An example was Flask-AppBuilder vs flask-login transitive dependency
> with
> >>> click. They started to conflict once AppBuilder has released version
> >>> 1.12.0.
> >>>
> >>> *Diagnosis:*
> >>> Transitive dependencies with "flexible" versions (where >= is used
> >> instead
> >>> of ==) is a reason for "dependency hell". We will sooner or later hit
> >> other
> >>> cases where not fixed dependencies cause similar problems with other
> >>> transitive dependencies. We need to fix-pin them. This causes problems
> >> for
> >>> both - released versions (cause they stop to work!) and for development
> >>> (cause they break master builds in TravisCI and prevent people from
> >>> installing development environment from the scratch.
> >>>
> >>> *Solution:*
> >>>
> >>>   - Following the old-but-good post
> >>>   https://nvie.com/posts/pin-your-packages/ we are going to fix the
> >>> pinned
> >>>   dependencies to specific versions (so basically all dependencies are
> >>>   "fixed").
> >>>   - We will introduce mechanism to be able to upgrade dependencies with
> >>>   pip-tools (https://github.com/jazzband/pip-tools). We might also
> >> take a
> >>>   look at pipenv: https://pipenv.readthedocs.io/en/latest/
> >>>   - People who would like to upgrade some dependencies for their PRs
> >> will
> >>>   still be able to do it - but such upgrades will be in their PR thus
> >> they
> >>>   will go through TravisCI tests and they will also have to be
> specified
> >>> with
> >>>   pinned fixed versions (==). This should be part of review process to
> >>> make
> >>>   sure new/changed requirements are pinned.
> >>>   - In release process there will be a point where an upgrade will be
> >>>   attempted for all requirements (using pip-tools) so that we are not
> >>> stuck
> >>>   with older releases. This will be in controlled PR environment where
> >>> there
> >>>   will be time to fix all dependencies without impacting others and
> >> likely
> >>>   enough time to "vet" such changes (this can be done for alpha/beta
> >>> releases
> >>>   for example).
> >>>   - As a side effect dependencies specification will become far simpler
> >>>   and straightforward.
> >>>
> >>> Happy to hear community comments to the proposal. I am happy to take a
> >> lead
> >>> on that, open JIRA issue and implement if this is something community
> is
> >>> happy with.
> >>>
> >>> J.
> >>>
> >>> --
> >>>
> >>> *Jarek Potiuk, Principal Software Engineer*
> >>> Mobile: +48 660 796 129
> >>>
> >>
>
>

-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Re: Pinning dependencies for Apache Airflow

Posted by Ash Berlin-Taylor <as...@apache.org>.
One thing to point out here.

Right now if you `pip install apache-airflow=1.10.0` in a clean environment it will fail.

This is because we pin flask-login to 0.2.1 but flask-appbuilder is >= 1.11.1, so that pulls in 1.12.0 which requires flask-login >= 0.3.

So I do think there is maybe something to be said about pinning for releases. The down side to that is that if there are updates to a module that we want then we have to make a point release to let people get it

Both methods have draw-backs

-ash

> On 4 Oct 2018, at 17:13, Arthur Wiedmer <ar...@gmail.com> wrote:
> 
> Hi Jarek,
> 
> I will +1 the discussion Dan is referring to and George's advice.
> 
> I just want to double check we are talking about pinning in
> requirements.txt only.
> 
> This offers the ability to
> pip install -r requirements.txt
> pip install --no-deps airflow
> For a guaranteed install which works.
> 
> Several different requirement files can be provided for specific use cases,
> like a stable dev one for instance for people wanting to work on operators
> and non-core functions.
> 
> However, I think we should proactively test in CI against unpinned
> dependencies (though it might be a separate case in the matrix) , so that
> we get advance warning if possible that things will break.
> CI downtime is not a bad thing here, it actually caught a problem :)
> 
> We should unpin as possible in setup.py to only maintain minimum required
> compatibility. The process of pinning in setup.py is extremely detrimental
> when you have a large number of python libraries installed with different
> pinned versions.
> 
> Best,
> Arthur
> 
> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov <dd...@twitter.com.invalid>
> wrote:
> 
>> Relevant discussion about this:
>> 
>> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
>> 
>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>> 
>>> TL;DR; A change is coming in the way how dependencies/requirements are
>>> specified for Apache Airflow - they will be fixed rather than flexible
>> (==
>>> rather than >=).
>>> 
>>> This is follow up after Slack discussion we had with Ash and Kaxil -
>>> summarising what we propose we'll do.
>>> 
>>> *Problem:*
>>> During last few weeks we experienced quite a few downtimes of TravisCI
>>> builds (for all PRs/branches including master) as some of the transitive
>>> dependencies were automatically upgraded. This because in a number of
>>> dependencies we have  >= rather than == dependencies.
>>> 
>>> Whenever there is a new release of such dependency, it might cause chain
>>> reaction with upgrade of transitive dependencies which might get into
>>> conflict.
>>> 
>>> An example was Flask-AppBuilder vs flask-login transitive dependency with
>>> click. They started to conflict once AppBuilder has released version
>>> 1.12.0.
>>> 
>>> *Diagnosis:*
>>> Transitive dependencies with "flexible" versions (where >= is used
>> instead
>>> of ==) is a reason for "dependency hell". We will sooner or later hit
>> other
>>> cases where not fixed dependencies cause similar problems with other
>>> transitive dependencies. We need to fix-pin them. This causes problems
>> for
>>> both - released versions (cause they stop to work!) and for development
>>> (cause they break master builds in TravisCI and prevent people from
>>> installing development environment from the scratch.
>>> 
>>> *Solution:*
>>> 
>>>   - Following the old-but-good post
>>>   https://nvie.com/posts/pin-your-packages/ we are going to fix the
>>> pinned
>>>   dependencies to specific versions (so basically all dependencies are
>>>   "fixed").
>>>   - We will introduce mechanism to be able to upgrade dependencies with
>>>   pip-tools (https://github.com/jazzband/pip-tools). We might also
>> take a
>>>   look at pipenv: https://pipenv.readthedocs.io/en/latest/
>>>   - People who would like to upgrade some dependencies for their PRs
>> will
>>>   still be able to do it - but such upgrades will be in their PR thus
>> they
>>>   will go through TravisCI tests and they will also have to be specified
>>> with
>>>   pinned fixed versions (==). This should be part of review process to
>>> make
>>>   sure new/changed requirements are pinned.
>>>   - In release process there will be a point where an upgrade will be
>>>   attempted for all requirements (using pip-tools) so that we are not
>>> stuck
>>>   with older releases. This will be in controlled PR environment where
>>> there
>>>   will be time to fix all dependencies without impacting others and
>> likely
>>>   enough time to "vet" such changes (this can be done for alpha/beta
>>> releases
>>>   for example).
>>>   - As a side effect dependencies specification will become far simpler
>>>   and straightforward.
>>> 
>>> Happy to hear community comments to the proposal. I am happy to take a
>> lead
>>> on that, open JIRA issue and implement if this is something community is
>>> happy with.
>>> 
>>> J.
>>> 
>>> --
>>> 
>>> *Jarek Potiuk, Principal Software Engineer*
>>> Mobile: +48 660 796 129
>>> 
>> 


Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
Never tried poetry before, but it looks really good (it passes also my
aesthetic filter for slick design of the webpage). Quick look and it passes
a lot of criteria I have in my mind:

   - works on all platforms
   - easily installable with pip
   - uses standard PyPI repositories by default (but you can switch to
   private)
   - .lock file paradigm (similar to other pinning solutions - such as yarn
   and gradle)
   - automated virtualenv creation
   - has support for python 2.7 and 3.4
   - is pretty active and seems to have not very big but not small either
   number of contributors (https://github.com/sdispater/poetry)

The one thing about pip-tools which I do not like is that it actually uses
requirements.in -> requirements.txt generation and some people might not
realise that you should not modify the requirements.txt by hand (who reads
the comments anyway!). There is no such case for poetry it seems, but it
might be that some IDE support will be lost as well - for example the
excellent IntelliJ support (something I'd like to try).

I am tempted to try it and report how it works for Airflow. It's a question
to community whether they will be happy to accept such relatively new tool
in the standard toolchain. It's quite a change and it seems a bit more than
just package manager - with the virtualenv automated integration (on the
other hand it's kind of nice that by default it forces you to work in
virtualenv).

J.

On Fri, Oct 5, 2018 at 9:04 AM Björn Pollex
<bj...@soundcloud.com.invalid> wrote:

> Hi all,
>
> Have you considered looking into poetry[1]? I’ve had really good
> experiences with it, we specifically introduced it into our project because
> we were getting version conflicts, and it resolved them just fine. It
> properly supports semantic versioning, so package versions have upper
> bounds. It also has a full dependency resolver, so even when package
> upgrades are available, it will only upgrade if the version constraints
> allow it. It does have some issues though, most notably that it depends on
> package metadata being correct to properly resolve dependencies, and that’s
> not always the case.
>
> Cheers,
>
>         Björn
>
> [1]: https://poetry.eustace.io/
>
> > On 5. Oct 2018, at 03:58, James Meickle <jm...@quantopian.com.INVALID>
> wrote:
> >
> > I suggest not adopting pipenv. It has a nice "first five minutes" demo
> but
> > it's simply not baked enough to depend on as a swap in pip replacement.
> We
> > are in the process of removing it after finding several serious bugs in
> our
> > POC of it.
> >
> > On Thu, Oct 4, 2018, 20:30 Alex Guziel <al...@airbnb.com.invalid>
> > wrote:
> >
> >> FWIW, there's some value in using virtualenv with Docker to isolate
> >> yourself from your system's Python.
> >>
> >> It's worth noting that requirements files can link other requirements
> >> files, so that would make groups easier, but not that pip in one run
> has no
> >> guarantee of transitive dependencies not conflicting or overriding. You
> >> need pip check for that or use --no-deps.
> >>
> >> On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko <fo...@driesprong.frl>
> >> wrote:
> >>
> >>> Hi Jarek,
> >>>
> >>> Thanks for bringing this up. I missed the discussion on Slack since I'm
> >> on
> >>> holiday, but I saw the thread and it was way too interesting, and
> >> therefore
> >>> this email :)
> >>>
> >>> This is actually something that we need to address asap. Like you
> >> mention,
> >>> we saw it earlier that specific transient dependencies are not
> compatible
> >>> and then we end up with a breaking CI, or even worse, a broken release.
> >>> Earlier we had in the setup.py the fixed versions (==) and in a
> separate
> >>> requirements.txt the requirements for the CI. This was also far from
> >>> optimal since we had two versions of the requirements.
> >>>
> >>> I like the idea that you are proposing. Maybe we can do an experiment
> >> with
> >>> it, because of the nature of Airflow (orchestrating different systems),
> >> we
> >>> have a huge list of dependencies. To not install everything, we've
> >> created
> >>> groups. For example specific libraries when you're using the Google
> >> Cloud,
> >>> Elastic, Druid, etc. So I'm curious how it will work with the `
> >>> extras_require` of Airflow
> >>>
> >>> Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
> >>> Docker is much easier to work with. I'm also working on a PR to get rid
> >> of
> >>> tox for the testing, and move to a more Docker idiomatic test pipeline.
> >>> Curious what you thoughts are on that.
> >>>
> >>> Cheers, Fokko
> >>>
> >>> Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer <
> >>> arthur.wiedmer@gmail.com
> >>>> :
> >>>
> >>>> Thanks Jakob!
> >>>>
> >>>> I think that this is a huge risk of Slack.
> >>>> I am not against Slack as a support channel, but it is a slippery
> slope
> >>> to
> >>>> have more and more decisions/conversations happening there, contrary
> to
> >>>> what we hope to achieve with the ASF.
> >>>>
> >>>> When we are starting to discuss issues of development, extensions and
> >>>> improvements, it is important for the discussion to happen in the
> >> mailing
> >>>> list.
> >>>>
> >>>> Jarek, I wouldn't worry too much, we are still in the process of
> >> learning
> >>>> as a community. Welcome and thank you for your contribution!
> >>>>
> >>>> Best,
> >>>> Arthur.
> >>>>
> >>>> On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> >>>> wrote:
> >>>>
> >>>>> Thanks for pointing it out Jakob.
> >>>>>
> >>>>> I am still very fresh in the ASF community and learning the ropes and
> >>>>> etiquette and code of conduct. Apologies for my ignorance.
> >>>>> I re-read the conduct and FAQ now again - with more understanding and
> >>>> will
> >>>>> pay more attention to wording in the future. As you mentioned it's
> >> more
> >>>> the
> >>>>> wording than intentions, but since it was in TL;DR; it has stronger
> >>>>> consequences.
> >>>>>
> >>>>> BTW. Thanks for actually following the code of conduct and pointing
> >> it
> >>>> out
> >>>>> in respectful manner. I really appreciate it.
> >>>>>
> >>>>> J.
> >>>>>
> >>>>> Principal Software Engineer
> >>>>> Phone: +48660796129
> >>>>>
> >>>>> On Thu, 4 Oct 2018, 20:41 Jakob Homan, <jg...@gmail.com> wrote:
> >>>>>
> >>>>>>> TL;DR; A change is coming in the way how
> >> dependencies/requirements
> >>>> are
> >>>>>>> specified for Apache Airflow - they will be fixed rather than
> >>>> flexible
> >>>>>> (==
> >>>>>>> rather than >=).
> >>>>>>
> >>>>>>> This is follow up after Slack discussion we had with Ash and
> >> Kaxil
> >>> -
> >>>>>>> summarising what we propose we'll do.
> >>>>>>
> >>>>>> Hey all.  It's great that we're moving this discussion back from
> >>> Slack
> >>>>>> to the mailing list.  But I've gotta point out that the wording
> >> needs
> >>>>>> a small but critical fix up:
> >>>>>>
> >>>>>> "A change *is* coming... they *will* be fixed"
> >>>>>>
> >>>>>> needs to be
> >>>>>>
> >>>>>> "We'd like to propose a change... We would like to make them
> >> fixed."
> >>>>>>
> >>>>>> The first says that this decision has been made and the result of
> >> the
> >>>>>> decision, which was made on Slack, is being reported back to the
> >>>>>> mailing list.  The second is more accurate to the rest of the
> >>>>>> discussion ('what we propose...').  And again, since it's axiomatic
> >>> in
> >>>>>> ASF that if it didn't happen on a list, it didn't happen[1], we
> >> gotta
> >>>>>> make sure there's no confusion about where the community is on the
> >>>>>> decision-making process.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Jakob
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects
> >>>>>> ?
> >>>>>
> >>>>> On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
> >>>>>> <al...@airbnb.com.invalid> wrote:
> >>>>>>>
> >>>>>>> You should run `pip check` to ensure no conflicts. Pip does not
> >> do
> >>>> this
> >>>>>> on
> >>>>>>> its own.
> >>>>>>>
> >>>>>>> On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <
> >>>> Jarek.Potiuk@polidea.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Great that this discussion already happened :). Lots of useful
> >>>> things
> >>>>>> in
> >>>>>>>> it. And yes - it means pinning in requirement.txt - this is how
> >>>>>> pip-tools
> >>>>>>>> work.
> >>>>>>>>
> >>>>>>>> J.
> >>>>>>>>
> >>>>>>>> Principal Software Engineer
> >>>>>>>> Phone: +48660796129
> >>>>>>>>
> >>>>>>>> On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <
> >>>> arthur.wiedmer@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Jarek,
> >>>>>>>>>
> >>>>>>>>> I will +1 the discussion Dan is referring to and George's
> >>> advice.
> >>>>>>>>>
> >>>>>>>>> I just want to double check we are talking about pinning in
> >>>>>>>>> requirements.txt only.
> >>>>>>>>>
> >>>>>>>>> This offers the ability to
> >>>>>>>>> pip install -r requirements.txt
> >>>>>>>>> pip install --no-deps airflow
> >>>>>>>>> For a guaranteed install which works.
> >>>>>>>>>
> >>>>>>>>> Several different requirement files can be provided for
> >>> specific
> >>>>> use
> >>>>>>>> cases,
> >>>>>>>>> like a stable dev one for instance for people wanting to work
> >>> on
> >>>>>>>> operators
> >>>>>>>>> and non-core functions.
> >>>>>>>>>
> >>>>>>>>> However, I think we should proactively test in CI against
> >>>> unpinned
> >>>>>>>>> dependencies (though it might be a separate case in the
> >>> matrix) ,
> >>>>> so
> >>>>>> that
> >>>>>>>>> we get advance warning if possible that things will break.
> >>>>>>>>> CI downtime is not a bad thing here, it actually caught a
> >>> problem
> >>>>> :)
> >>>>>>>>>
> >>>>>>>>> We should unpin as possible in setup.py to only maintain
> >>> minimum
> >>>>>> required
> >>>>>>>>> compatibility. The process of pinning in setup.py is
> >> extremely
> >>>>>>>> detrimental
> >>>>>>>>> when you have a large number of python libraries installed
> >> with
> >>>>>> different
> >>>>>>>>> pinned versions.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Arthur
> >>>>>>>>>
> >>>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> >>>>>> <ddavydov@twitter.com.invalid
> >>>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Relevant discussion about this:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> >>>>>> Jarek.Potiuk@polidea.com
> >>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> TL;DR; A change is coming in the way how
> >>>>>> dependencies/requirements
> >>>>>>>> are
> >>>>>>>>>>> specified for Apache Airflow - they will be fixed rather
> >>> than
> >>>>>>>> flexible
> >>>>>>>>>> (==
> >>>>>>>>>>> rather than >=).
> >>>>>>>>>>>
> >>>>>>>>>>> This is follow up after Slack discussion we had with Ash
> >>> and
> >>>>>> Kaxil -
> >>>>>>>>>>> summarising what we propose we'll do.
> >>>>>>>>>>>
> >>>>>>>>>>> *Problem:*
> >>>>>>>>>>> During last few weeks we experienced quite a few
> >> downtimes
> >>> of
> >>>>>>>> TravisCI
> >>>>>>>>>>> builds (for all PRs/branches including master) as some of
> >>> the
> >>>>>>>>> transitive
> >>>>>>>>>>> dependencies were automatically upgraded. This because
> >> in a
> >>>>>> number of
> >>>>>>>>>>> dependencies we have  >= rather than == dependencies.
> >>>>>>>>>>>
> >>>>>>>>>>> Whenever there is a new release of such dependency, it
> >>> might
> >>>>>> cause
> >>>>>>>>> chain
> >>>>>>>>>>> reaction with upgrade of transitive dependencies which
> >>> might
> >>>>> get
> >>>>>> into
> >>>>>>>>>>> conflict.
> >>>>>>>>>>>
> >>>>>>>>>>> An example was Flask-AppBuilder vs flask-login transitive
> >>>>>> dependency
> >>>>>>>>> with
> >>>>>>>>>>> click. They started to conflict once AppBuilder has
> >>> released
> >>>>>> version
> >>>>>>>>>>> 1.12.0.
> >>>>>>>>>>>
> >>>>>>>>>>> *Diagnosis:*
> >>>>>>>>>>> Transitive dependencies with "flexible" versions (where
> >>> =
> >>> is
> >>>>>> used
> >>>>>>>>>> instead
> >>>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner
> >> or
> >>>>>> later hit
> >>>>>>>>>> other
> >>>>>>>>>>> cases where not fixed dependencies cause similar problems
> >>>> with
> >>>>>> other
> >>>>>>>>>>> transitive dependencies. We need to fix-pin them. This
> >>> causes
> >>>>>>>> problems
> >>>>>>>>>> for
> >>>>>>>>>>> both - released versions (cause they stop to work!) and
> >> for
> >>>>>>>> development
> >>>>>>>>>>> (cause they break master builds in TravisCI and prevent
> >>>> people
> >>>>>> from
> >>>>>>>>>>> installing development environment from the scratch.
> >>>>>>>>>>>
> >>>>>>>>>>> *Solution:*
> >>>>>>>>>>>
> >>>>>>>>>>>   - Following the old-but-good post
> >>>>>>>>>>>   https://nvie.com/posts/pin-your-packages/ we are
> >> going
> >>> to
> >>>>>> fix the
> >>>>>>>>>>> pinned
> >>>>>>>>>>>   dependencies to specific versions (so basically all
> >>>>>> dependencies
> >>>>>>>> are
> >>>>>>>>>>>   "fixed").
> >>>>>>>>>>>   - We will introduce mechanism to be able to upgrade
> >>>>>> dependencies
> >>>>>>>>> with
> >>>>>>>>>>>   pip-tools (https://github.com/jazzband/pip-tools). We
> >>>> might
> >>>>>> also
> >>>>>>>>>> take a
> >>>>>>>>>>>   look at pipenv:
> >>> https://pipenv.readthedocs.io/en/latest/
> >>>>>>>>>>>   - People who would like to upgrade some dependencies
> >> for
> >>>>>> their PRs
> >>>>>>>>>> will
> >>>>>>>>>>>   still be able to do it - but such upgrades will be in
> >>>> their
> >>>>> PR
> >>>>>>>> thus
> >>>>>>>>>> they
> >>>>>>>>>>>   will go through TravisCI tests and they will also have
> >>> to
> >>>> be
> >>>>>>>>> specified
> >>>>>>>>>>> with
> >>>>>>>>>>>   pinned fixed versions (==). This should be part of
> >>> review
> >>>>>> process
> >>>>>>>> to
> >>>>>>>>>>> make
> >>>>>>>>>>>   sure new/changed requirements are pinned.
> >>>>>>>>>>>   - In release process there will be a point where an
> >>>> upgrade
> >>>>>> will
> >>>>>>>> be
> >>>>>>>>>>>   attempted for all requirements (using pip-tools) so
> >> that
> >>>> we
> >>>>>> are
> >>>>>>>> not
> >>>>>>>>>>> stuck
> >>>>>>>>>>>   with older releases. This will be in controlled PR
> >>>>> environment
> >>>>>>>> where
> >>>>>>>>>>> there
> >>>>>>>>>>>   will be time to fix all dependencies without impacting
> >>>>> others
> >>>>>> and
> >>>>>>>>>> likely
> >>>>>>>>>>>   enough time to "vet" such changes (this can be done
> >> for
> >>>>>> alpha/beta
> >>>>>>>>>>> releases
> >>>>>>>>>>>   for example).
> >>>>>>>>>>>   - As a side effect dependencies specification will
> >>> become
> >>>>> far
> >>>>>>>>> simpler
> >>>>>>>>>>>   and straightforward.
> >>>>>>>>>>>
> >>>>>>>>>>> Happy to hear community comments to the proposal. I am
> >>> happy
> >>>> to
> >>>>>> take
> >>>>>>>> a
> >>>>>>>>>> lead
> >>>>>>>>>>> on that, open JIRA issue and implement if this is
> >> something
> >>>>>> community
> >>>>>>>>> is
> >>>>>>>>>>> happy with.
> >>>>>>>>>>>
> >>>>>>>>>>> J.
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>>
> >>>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
> >>>>>>>>>>> Mobile: +48 660 796 129
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Re: Pinning dependencies for Apache Airflow

Posted by Björn Pollex <bj...@soundcloud.com.INVALID>.
Hi all,

Have you considered looking into poetry[1]? I’ve had really good experiences with it, we specifically introduced it into our project because we were getting version conflicts, and it resolved them just fine. It properly supports semantic versioning, so package versions have upper bounds. It also has a full dependency resolver, so even when package upgrades are available, it will only upgrade if the version constraints allow it. It does have some issues though, most notably that it depends on package metadata being correct to properly resolve dependencies, and that’s not always the case. 

Cheers,

	Björn

[1]: https://poetry.eustace.io/

> On 5. Oct 2018, at 03:58, James Meickle <jm...@quantopian.com.INVALID> wrote:
> 
> I suggest not adopting pipenv. It has a nice "first five minutes" demo but
> it's simply not baked enough to depend on as a swap in pip replacement. We
> are in the process of removing it after finding several serious bugs in our
> POC of it.
> 
> On Thu, Oct 4, 2018, 20:30 Alex Guziel <al...@airbnb.com.invalid>
> wrote:
> 
>> FWIW, there's some value in using virtualenv with Docker to isolate
>> yourself from your system's Python.
>> 
>> It's worth noting that requirements files can link other requirements
>> files, so that would make groups easier, but not that pip in one run has no
>> guarantee of transitive dependencies not conflicting or overriding. You
>> need pip check for that or use --no-deps.
>> 
>> On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko <fo...@driesprong.frl>
>> wrote:
>> 
>>> Hi Jarek,
>>> 
>>> Thanks for bringing this up. I missed the discussion on Slack since I'm
>> on
>>> holiday, but I saw the thread and it was way too interesting, and
>> therefore
>>> this email :)
>>> 
>>> This is actually something that we need to address asap. Like you
>> mention,
>>> we saw it earlier that specific transient dependencies are not compatible
>>> and then we end up with a breaking CI, or even worse, a broken release.
>>> Earlier we had in the setup.py the fixed versions (==) and in a separate
>>> requirements.txt the requirements for the CI. This was also far from
>>> optimal since we had two versions of the requirements.
>>> 
>>> I like the idea that you are proposing. Maybe we can do an experiment
>> with
>>> it, because of the nature of Airflow (orchestrating different systems),
>> we
>>> have a huge list of dependencies. To not install everything, we've
>> created
>>> groups. For example specific libraries when you're using the Google
>> Cloud,
>>> Elastic, Druid, etc. So I'm curious how it will work with the `
>>> extras_require` of Airflow
>>> 
>>> Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
>>> Docker is much easier to work with. I'm also working on a PR to get rid
>> of
>>> tox for the testing, and move to a more Docker idiomatic test pipeline.
>>> Curious what you thoughts are on that.
>>> 
>>> Cheers, Fokko
>>> 
>>> Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer <
>>> arthur.wiedmer@gmail.com
>>>> :
>>> 
>>>> Thanks Jakob!
>>>> 
>>>> I think that this is a huge risk of Slack.
>>>> I am not against Slack as a support channel, but it is a slippery slope
>>> to
>>>> have more and more decisions/conversations happening there, contrary to
>>>> what we hope to achieve with the ASF.
>>>> 
>>>> When we are starting to discuss issues of development, extensions and
>>>> improvements, it is important for the discussion to happen in the
>> mailing
>>>> list.
>>>> 
>>>> Jarek, I wouldn't worry too much, we are still in the process of
>> learning
>>>> as a community. Welcome and thank you for your contribution!
>>>> 
>>>> Best,
>>>> Arthur.
>>>> 
>>>> On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>> 
>>>>> Thanks for pointing it out Jakob.
>>>>> 
>>>>> I am still very fresh in the ASF community and learning the ropes and
>>>>> etiquette and code of conduct. Apologies for my ignorance.
>>>>> I re-read the conduct and FAQ now again - with more understanding and
>>>> will
>>>>> pay more attention to wording in the future. As you mentioned it's
>> more
>>>> the
>>>>> wording than intentions, but since it was in TL;DR; it has stronger
>>>>> consequences.
>>>>> 
>>>>> BTW. Thanks for actually following the code of conduct and pointing
>> it
>>>> out
>>>>> in respectful manner. I really appreciate it.
>>>>> 
>>>>> J.
>>>>> 
>>>>> Principal Software Engineer
>>>>> Phone: +48660796129
>>>>> 
>>>>> On Thu, 4 Oct 2018, 20:41 Jakob Homan, <jg...@gmail.com> wrote:
>>>>> 
>>>>>>> TL;DR; A change is coming in the way how
>> dependencies/requirements
>>>> are
>>>>>>> specified for Apache Airflow - they will be fixed rather than
>>>> flexible
>>>>>> (==
>>>>>>> rather than >=).
>>>>>> 
>>>>>>> This is follow up after Slack discussion we had with Ash and
>> Kaxil
>>> -
>>>>>>> summarising what we propose we'll do.
>>>>>> 
>>>>>> Hey all.  It's great that we're moving this discussion back from
>>> Slack
>>>>>> to the mailing list.  But I've gotta point out that the wording
>> needs
>>>>>> a small but critical fix up:
>>>>>> 
>>>>>> "A change *is* coming... they *will* be fixed"
>>>>>> 
>>>>>> needs to be
>>>>>> 
>>>>>> "We'd like to propose a change... We would like to make them
>> fixed."
>>>>>> 
>>>>>> The first says that this decision has been made and the result of
>> the
>>>>>> decision, which was made on Slack, is being reported back to the
>>>>>> mailing list.  The second is more accurate to the rest of the
>>>>>> discussion ('what we propose...').  And again, since it's axiomatic
>>> in
>>>>>> ASF that if it didn't happen on a list, it didn't happen[1], we
>> gotta
>>>>>> make sure there's no confusion about where the community is on the
>>>>>> decision-making process.
>>>>>> 
>>>>>> Thanks,
>>>>>> Jakob
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects
>>>>>> ?
>>>>> 
>>>>> On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
>>>>>> <al...@airbnb.com.invalid> wrote:
>>>>>>> 
>>>>>>> You should run `pip check` to ensure no conflicts. Pip does not
>> do
>>>> this
>>>>>> on
>>>>>>> its own.
>>>>>>> 
>>>>>>> On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <
>>>> Jarek.Potiuk@polidea.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Great that this discussion already happened :). Lots of useful
>>>> things
>>>>>> in
>>>>>>>> it. And yes - it means pinning in requirement.txt - this is how
>>>>>> pip-tools
>>>>>>>> work.
>>>>>>>> 
>>>>>>>> J.
>>>>>>>> 
>>>>>>>> Principal Software Engineer
>>>>>>>> Phone: +48660796129
>>>>>>>> 
>>>>>>>> On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <
>>>> arthur.wiedmer@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Jarek,
>>>>>>>>> 
>>>>>>>>> I will +1 the discussion Dan is referring to and George's
>>> advice.
>>>>>>>>> 
>>>>>>>>> I just want to double check we are talking about pinning in
>>>>>>>>> requirements.txt only.
>>>>>>>>> 
>>>>>>>>> This offers the ability to
>>>>>>>>> pip install -r requirements.txt
>>>>>>>>> pip install --no-deps airflow
>>>>>>>>> For a guaranteed install which works.
>>>>>>>>> 
>>>>>>>>> Several different requirement files can be provided for
>>> specific
>>>>> use
>>>>>>>> cases,
>>>>>>>>> like a stable dev one for instance for people wanting to work
>>> on
>>>>>>>> operators
>>>>>>>>> and non-core functions.
>>>>>>>>> 
>>>>>>>>> However, I think we should proactively test in CI against
>>>> unpinned
>>>>>>>>> dependencies (though it might be a separate case in the
>>> matrix) ,
>>>>> so
>>>>>> that
>>>>>>>>> we get advance warning if possible that things will break.
>>>>>>>>> CI downtime is not a bad thing here, it actually caught a
>>> problem
>>>>> :)
>>>>>>>>> 
>>>>>>>>> We should unpin as possible in setup.py to only maintain
>>> minimum
>>>>>> required
>>>>>>>>> compatibility. The process of pinning in setup.py is
>> extremely
>>>>>>>> detrimental
>>>>>>>>> when you have a large number of python libraries installed
>> with
>>>>>> different
>>>>>>>>> pinned versions.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Arthur
>>>>>>>>> 
>>>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
>>>>>> <ddavydov@twitter.com.invalid
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Relevant discussion about this:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
>>>>>>>>>> 
>>>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
>>>>>> Jarek.Potiuk@polidea.com
>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> TL;DR; A change is coming in the way how
>>>>>> dependencies/requirements
>>>>>>>> are
>>>>>>>>>>> specified for Apache Airflow - they will be fixed rather
>>> than
>>>>>>>> flexible
>>>>>>>>>> (==
>>>>>>>>>>> rather than >=).
>>>>>>>>>>> 
>>>>>>>>>>> This is follow up after Slack discussion we had with Ash
>>> and
>>>>>> Kaxil -
>>>>>>>>>>> summarising what we propose we'll do.
>>>>>>>>>>> 
>>>>>>>>>>> *Problem:*
>>>>>>>>>>> During last few weeks we experienced quite a few
>> downtimes
>>> of
>>>>>>>> TravisCI
>>>>>>>>>>> builds (for all PRs/branches including master) as some of
>>> the
>>>>>>>>> transitive
>>>>>>>>>>> dependencies were automatically upgraded. This because
>> in a
>>>>>> number of
>>>>>>>>>>> dependencies we have  >= rather than == dependencies.
>>>>>>>>>>> 
>>>>>>>>>>> Whenever there is a new release of such dependency, it
>>> might
>>>>>> cause
>>>>>>>>> chain
>>>>>>>>>>> reaction with upgrade of transitive dependencies which
>>> might
>>>>> get
>>>>>> into
>>>>>>>>>>> conflict.
>>>>>>>>>>> 
>>>>>>>>>>> An example was Flask-AppBuilder vs flask-login transitive
>>>>>> dependency
>>>>>>>>> with
>>>>>>>>>>> click. They started to conflict once AppBuilder has
>>> released
>>>>>> version
>>>>>>>>>>> 1.12.0.
>>>>>>>>>>> 
>>>>>>>>>>> *Diagnosis:*
>>>>>>>>>>> Transitive dependencies with "flexible" versions (where
>>> =
>>> is
>>>>>> used
>>>>>>>>>> instead
>>>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner
>> or
>>>>>> later hit
>>>>>>>>>> other
>>>>>>>>>>> cases where not fixed dependencies cause similar problems
>>>> with
>>>>>> other
>>>>>>>>>>> transitive dependencies. We need to fix-pin them. This
>>> causes
>>>>>>>> problems
>>>>>>>>>> for
>>>>>>>>>>> both - released versions (cause they stop to work!) and
>> for
>>>>>>>> development
>>>>>>>>>>> (cause they break master builds in TravisCI and prevent
>>>> people
>>>>>> from
>>>>>>>>>>> installing development environment from the scratch.
>>>>>>>>>>> 
>>>>>>>>>>> *Solution:*
>>>>>>>>>>> 
>>>>>>>>>>>   - Following the old-but-good post
>>>>>>>>>>>   https://nvie.com/posts/pin-your-packages/ we are
>> going
>>> to
>>>>>> fix the
>>>>>>>>>>> pinned
>>>>>>>>>>>   dependencies to specific versions (so basically all
>>>>>> dependencies
>>>>>>>> are
>>>>>>>>>>>   "fixed").
>>>>>>>>>>>   - We will introduce mechanism to be able to upgrade
>>>>>> dependencies
>>>>>>>>> with
>>>>>>>>>>>   pip-tools (https://github.com/jazzband/pip-tools). We
>>>> might
>>>>>> also
>>>>>>>>>> take a
>>>>>>>>>>>   look at pipenv:
>>> https://pipenv.readthedocs.io/en/latest/
>>>>>>>>>>>   - People who would like to upgrade some dependencies
>> for
>>>>>> their PRs
>>>>>>>>>> will
>>>>>>>>>>>   still be able to do it - but such upgrades will be in
>>>> their
>>>>> PR
>>>>>>>> thus
>>>>>>>>>> they
>>>>>>>>>>>   will go through TravisCI tests and they will also have
>>> to
>>>> be
>>>>>>>>> specified
>>>>>>>>>>> with
>>>>>>>>>>>   pinned fixed versions (==). This should be part of
>>> review
>>>>>> process
>>>>>>>> to
>>>>>>>>>>> make
>>>>>>>>>>>   sure new/changed requirements are pinned.
>>>>>>>>>>>   - In release process there will be a point where an
>>>> upgrade
>>>>>> will
>>>>>>>> be
>>>>>>>>>>>   attempted for all requirements (using pip-tools) so
>> that
>>>> we
>>>>>> are
>>>>>>>> not
>>>>>>>>>>> stuck
>>>>>>>>>>>   with older releases. This will be in controlled PR
>>>>> environment
>>>>>>>> where
>>>>>>>>>>> there
>>>>>>>>>>>   will be time to fix all dependencies without impacting
>>>>> others
>>>>>> and
>>>>>>>>>> likely
>>>>>>>>>>>   enough time to "vet" such changes (this can be done
>> for
>>>>>> alpha/beta
>>>>>>>>>>> releases
>>>>>>>>>>>   for example).
>>>>>>>>>>>   - As a side effect dependencies specification will
>>> become
>>>>> far
>>>>>>>>> simpler
>>>>>>>>>>>   and straightforward.
>>>>>>>>>>> 
>>>>>>>>>>> Happy to hear community comments to the proposal. I am
>>> happy
>>>> to
>>>>>> take
>>>>>>>> a
>>>>>>>>>> lead
>>>>>>>>>>> on that, open JIRA issue and implement if this is
>> something
>>>>>> community
>>>>>>>>> is
>>>>>>>>>>> happy with.
>>>>>>>>>>> 
>>>>>>>>>>> J.
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> 
>>>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
>>>>>>>>>>> Mobile: +48 660 796 129
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: Pinning dependencies for Apache Airflow

Posted by James Meickle <jm...@quantopian.com.INVALID>.
I suggest not adopting pipenv. It has a nice "first five minutes" demo but
it's simply not baked enough to depend on as a swap in pip replacement. We
are in the process of removing it after finding several serious bugs in our
POC of it.

On Thu, Oct 4, 2018, 20:30 Alex Guziel <al...@airbnb.com.invalid>
wrote:

> FWIW, there's some value in using virtualenv with Docker to isolate
> yourself from your system's Python.
>
> It's worth noting that requirements files can link other requirements
> files, so that would make groups easier, but not that pip in one run has no
> guarantee of transitive dependencies not conflicting or overriding. You
> need pip check for that or use --no-deps.
>
> On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
> > Hi Jarek,
> >
> > Thanks for bringing this up. I missed the discussion on Slack since I'm
> on
> > holiday, but I saw the thread and it was way too interesting, and
> therefore
> > this email :)
> >
> > This is actually something that we need to address asap. Like you
> mention,
> > we saw it earlier that specific transient dependencies are not compatible
> > and then we end up with a breaking CI, or even worse, a broken release.
> > Earlier we had in the setup.py the fixed versions (==) and in a separate
> > requirements.txt the requirements for the CI. This was also far from
> > optimal since we had two versions of the requirements.
> >
> > I like the idea that you are proposing. Maybe we can do an experiment
> with
> > it, because of the nature of Airflow (orchestrating different systems),
> we
> > have a huge list of dependencies. To not install everything, we've
> created
> > groups. For example specific libraries when you're using the Google
> Cloud,
> > Elastic, Druid, etc. So I'm curious how it will work with the `
> > extras_require` of Airflow
> >
> > Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
> > Docker is much easier to work with. I'm also working on a PR to get rid
> of
> > tox for the testing, and move to a more Docker idiomatic test pipeline.
> > Curious what you thoughts are on that.
> >
> > Cheers, Fokko
> >
> > Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer <
> > arthur.wiedmer@gmail.com
> > >:
> >
> > > Thanks Jakob!
> > >
> > > I think that this is a huge risk of Slack.
> > > I am not against Slack as a support channel, but it is a slippery slope
> > to
> > > have more and more decisions/conversations happening there, contrary to
> > > what we hope to achieve with the ASF.
> > >
> > > When we are starting to discuss issues of development, extensions and
> > > improvements, it is important for the discussion to happen in the
> mailing
> > > list.
> > >
> > > Jarek, I wouldn't worry too much, we are still in the process of
> learning
> > > as a community. Welcome and thank you for your contribution!
> > >
> > > Best,
> > > Arthur.
> > >
> > > On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk <Ja...@polidea.com>
> > > wrote:
> > >
> > > > Thanks for pointing it out Jakob.
> > > >
> > > > I am still very fresh in the ASF community and learning the ropes and
> > > > etiquette and code of conduct. Apologies for my ignorance.
> > > > I re-read the conduct and FAQ now again - with more understanding and
> > > will
> > > > pay more attention to wording in the future. As you mentioned it's
> more
> > > the
> > > > wording than intentions, but since it was in TL;DR; it has stronger
> > > > consequences.
> > > >
> > > > BTW. Thanks for actually following the code of conduct and pointing
> it
> > > out
> > > > in respectful manner. I really appreciate it.
> > > >
> > > > J.
> > > >
> > > > Principal Software Engineer
> > > > Phone: +48660796129
> > > >
> > > > On Thu, 4 Oct 2018, 20:41 Jakob Homan, <jg...@gmail.com> wrote:
> > > >
> > > > > > TL;DR; A change is coming in the way how
> dependencies/requirements
> > > are
> > > > > > specified for Apache Airflow - they will be fixed rather than
> > > flexible
> > > > > (==
> > > > > > rather than >=).
> > > > >
> > > > > > This is follow up after Slack discussion we had with Ash and
> Kaxil
> > -
> > > > > > summarising what we propose we'll do.
> > > > >
> > > > > Hey all.  It's great that we're moving this discussion back from
> > Slack
> > > > > to the mailing list.  But I've gotta point out that the wording
> needs
> > > > > a small but critical fix up:
> > > > >
> > > > > "A change *is* coming... they *will* be fixed"
> > > > >
> > > > > needs to be
> > > > >
> > > > > "We'd like to propose a change... We would like to make them
> fixed."
> > > > >
> > > > > The first says that this decision has been made and the result of
> the
> > > > > decision, which was made on Slack, is being reported back to the
> > > > > mailing list.  The second is more accurate to the rest of the
> > > > > discussion ('what we propose...').  And again, since it's axiomatic
> > in
> > > > > ASF that if it didn't happen on a list, it didn't happen[1], we
> gotta
> > > > > make sure there's no confusion about where the community is on the
> > > > > decision-making process.
> > > > >
> > > > > Thanks,
> > > > > Jakob
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects
> > > > > ?
> > > >
> > > > On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
> > > > > <al...@airbnb.com.invalid> wrote:
> > > > > >
> > > > > > You should run `pip check` to ensure no conflicts. Pip does not
> do
> > > this
> > > > > on
> > > > > > its own.
> > > > > >
> > > > > > On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Great that this discussion already happened :). Lots of useful
> > > things
> > > > > in
> > > > > > > it. And yes - it means pinning in requirement.txt - this is how
> > > > > pip-tools
> > > > > > > work.
> > > > > > >
> > > > > > > J.
> > > > > > >
> > > > > > > Principal Software Engineer
> > > > > > > Phone: +48660796129
> > > > > > >
> > > > > > > On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <
> > > arthur.wiedmer@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jarek,
> > > > > > > >
> > > > > > > > I will +1 the discussion Dan is referring to and George's
> > advice.
> > > > > > > >
> > > > > > > > I just want to double check we are talking about pinning in
> > > > > > > > requirements.txt only.
> > > > > > > >
> > > > > > > > This offers the ability to
> > > > > > > > pip install -r requirements.txt
> > > > > > > > pip install --no-deps airflow
> > > > > > > > For a guaranteed install which works.
> > > > > > > >
> > > > > > > > Several different requirement files can be provided for
> > specific
> > > > use
> > > > > > > cases,
> > > > > > > > like a stable dev one for instance for people wanting to work
> > on
> > > > > > > operators
> > > > > > > > and non-core functions.
> > > > > > > >
> > > > > > > > However, I think we should proactively test in CI against
> > > unpinned
> > > > > > > > dependencies (though it might be a separate case in the
> > matrix) ,
> > > > so
> > > > > that
> > > > > > > > we get advance warning if possible that things will break.
> > > > > > > > CI downtime is not a bad thing here, it actually caught a
> > problem
> > > > :)
> > > > > > > >
> > > > > > > > We should unpin as possible in setup.py to only maintain
> > minimum
> > > > > required
> > > > > > > > compatibility. The process of pinning in setup.py is
> extremely
> > > > > > > detrimental
> > > > > > > > when you have a large number of python libraries installed
> with
> > > > > different
> > > > > > > > pinned versions.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Arthur
> > > > > > > >
> > > > > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > > > <ddavydov@twitter.com.invalid
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Relevant discussion about this:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > > > > > > > >
> > > > > > > > > On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > > > Jarek.Potiuk@polidea.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > TL;DR; A change is coming in the way how
> > > > > dependencies/requirements
> > > > > > > are
> > > > > > > > > > specified for Apache Airflow - they will be fixed rather
> > than
> > > > > > > flexible
> > > > > > > > > (==
> > > > > > > > > > rather than >=).
> > > > > > > > > >
> > > > > > > > > > This is follow up after Slack discussion we had with Ash
> > and
> > > > > Kaxil -
> > > > > > > > > > summarising what we propose we'll do.
> > > > > > > > > >
> > > > > > > > > > *Problem:*
> > > > > > > > > > During last few weeks we experienced quite a few
> downtimes
> > of
> > > > > > > TravisCI
> > > > > > > > > > builds (for all PRs/branches including master) as some of
> > the
> > > > > > > > transitive
> > > > > > > > > > dependencies were automatically upgraded. This because
> in a
> > > > > number of
> > > > > > > > > > dependencies we have  >= rather than == dependencies.
> > > > > > > > > >
> > > > > > > > > > Whenever there is a new release of such dependency, it
> > might
> > > > > cause
> > > > > > > > chain
> > > > > > > > > > reaction with upgrade of transitive dependencies which
> > might
> > > > get
> > > > > into
> > > > > > > > > > conflict.
> > > > > > > > > >
> > > > > > > > > > An example was Flask-AppBuilder vs flask-login transitive
> > > > > dependency
> > > > > > > > with
> > > > > > > > > > click. They started to conflict once AppBuilder has
> > released
> > > > > version
> > > > > > > > > > 1.12.0.
> > > > > > > > > >
> > > > > > > > > > *Diagnosis:*
> > > > > > > > > > Transitive dependencies with "flexible" versions (where
> >=
> > is
> > > > > used
> > > > > > > > > instead
> > > > > > > > > > of ==) is a reason for "dependency hell". We will sooner
> or
> > > > > later hit
> > > > > > > > > other
> > > > > > > > > > cases where not fixed dependencies cause similar problems
> > > with
> > > > > other
> > > > > > > > > > transitive dependencies. We need to fix-pin them. This
> > causes
> > > > > > > problems
> > > > > > > > > for
> > > > > > > > > > both - released versions (cause they stop to work!) and
> for
> > > > > > > development
> > > > > > > > > > (cause they break master builds in TravisCI and prevent
> > > people
> > > > > from
> > > > > > > > > > installing development environment from the scratch.
> > > > > > > > > >
> > > > > > > > > > *Solution:*
> > > > > > > > > >
> > > > > > > > > >    - Following the old-but-good post
> > > > > > > > > >    https://nvie.com/posts/pin-your-packages/ we are
> going
> > to
> > > > > fix the
> > > > > > > > > > pinned
> > > > > > > > > >    dependencies to specific versions (so basically all
> > > > > dependencies
> > > > > > > are
> > > > > > > > > >    "fixed").
> > > > > > > > > >    - We will introduce mechanism to be able to upgrade
> > > > > dependencies
> > > > > > > > with
> > > > > > > > > >    pip-tools (https://github.com/jazzband/pip-tools). We
> > > might
> > > > > also
> > > > > > > > > take a
> > > > > > > > > >    look at pipenv:
> > https://pipenv.readthedocs.io/en/latest/
> > > > > > > > > >    - People who would like to upgrade some dependencies
> for
> > > > > their PRs
> > > > > > > > > will
> > > > > > > > > >    still be able to do it - but such upgrades will be in
> > > their
> > > > PR
> > > > > > > thus
> > > > > > > > > they
> > > > > > > > > >    will go through TravisCI tests and they will also have
> > to
> > > be
> > > > > > > > specified
> > > > > > > > > > with
> > > > > > > > > >    pinned fixed versions (==). This should be part of
> > review
> > > > > process
> > > > > > > to
> > > > > > > > > > make
> > > > > > > > > >    sure new/changed requirements are pinned.
> > > > > > > > > >    - In release process there will be a point where an
> > > upgrade
> > > > > will
> > > > > > > be
> > > > > > > > > >    attempted for all requirements (using pip-tools) so
> that
> > > we
> > > > > are
> > > > > > > not
> > > > > > > > > > stuck
> > > > > > > > > >    with older releases. This will be in controlled PR
> > > > environment
> > > > > > > where
> > > > > > > > > > there
> > > > > > > > > >    will be time to fix all dependencies without impacting
> > > > others
> > > > > and
> > > > > > > > > likely
> > > > > > > > > >    enough time to "vet" such changes (this can be done
> for
> > > > > alpha/beta
> > > > > > > > > > releases
> > > > > > > > > >    for example).
> > > > > > > > > >    - As a side effect dependencies specification will
> > become
> > > > far
> > > > > > > > simpler
> > > > > > > > > >    and straightforward.
> > > > > > > > > >
> > > > > > > > > > Happy to hear community comments to the proposal. I am
> > happy
> > > to
> > > > > take
> > > > > > > a
> > > > > > > > > lead
> > > > > > > > > > on that, open JIRA issue and implement if this is
> something
> > > > > community
> > > > > > > > is
> > > > > > > > > > happy with.
> > > > > > > > > >
> > > > > > > > > > J.
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > *Jarek Potiuk, Principal Software Engineer*
> > > > > > > > > > Mobile: +48 660 796 129
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Pinning dependencies for Apache Airflow

Posted by Alex Guziel <al...@airbnb.com.INVALID>.
FWIW, there's some value in using virtualenv with Docker to isolate
yourself from your system's Python.

It's worth noting that requirements files can link other requirements
files, so that would make groups easier, but not that pip in one run has no
guarantee of transitive dependencies not conflicting or overriding. You
need pip check for that or use --no-deps.

On Thu, Oct 4, 2018 at 5:19 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Hi Jarek,
>
> Thanks for bringing this up. I missed the discussion on Slack since I'm on
> holiday, but I saw the thread and it was way too interesting, and therefore
> this email :)
>
> This is actually something that we need to address asap. Like you mention,
> we saw it earlier that specific transient dependencies are not compatible
> and then we end up with a breaking CI, or even worse, a broken release.
> Earlier we had in the setup.py the fixed versions (==) and in a separate
> requirements.txt the requirements for the CI. This was also far from
> optimal since we had two versions of the requirements.
>
> I like the idea that you are proposing. Maybe we can do an experiment with
> it, because of the nature of Airflow (orchestrating different systems), we
> have a huge list of dependencies. To not install everything, we've created
> groups. For example specific libraries when you're using the Google Cloud,
> Elastic, Druid, etc. So I'm curious how it will work with the `
> extras_require` of Airflow
>
> Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
> Docker is much easier to work with. I'm also working on a PR to get rid of
> tox for the testing, and move to a more Docker idiomatic test pipeline.
> Curious what you thoughts are on that.
>
> Cheers, Fokko
>
> Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer <
> arthur.wiedmer@gmail.com
> >:
>
> > Thanks Jakob!
> >
> > I think that this is a huge risk of Slack.
> > I am not against Slack as a support channel, but it is a slippery slope
> to
> > have more and more decisions/conversations happening there, contrary to
> > what we hope to achieve with the ASF.
> >
> > When we are starting to discuss issues of development, extensions and
> > improvements, it is important for the discussion to happen in the mailing
> > list.
> >
> > Jarek, I wouldn't worry too much, we are still in the process of learning
> > as a community. Welcome and thank you for your contribution!
> >
> > Best,
> > Arthur.
> >
> > On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > Thanks for pointing it out Jakob.
> > >
> > > I am still very fresh in the ASF community and learning the ropes and
> > > etiquette and code of conduct. Apologies for my ignorance.
> > > I re-read the conduct and FAQ now again - with more understanding and
> > will
> > > pay more attention to wording in the future. As you mentioned it's more
> > the
> > > wording than intentions, but since it was in TL;DR; it has stronger
> > > consequences.
> > >
> > > BTW. Thanks for actually following the code of conduct and pointing it
> > out
> > > in respectful manner. I really appreciate it.
> > >
> > > J.
> > >
> > > Principal Software Engineer
> > > Phone: +48660796129
> > >
> > > On Thu, 4 Oct 2018, 20:41 Jakob Homan, <jg...@gmail.com> wrote:
> > >
> > > > > TL;DR; A change is coming in the way how dependencies/requirements
> > are
> > > > > specified for Apache Airflow - they will be fixed rather than
> > flexible
> > > > (==
> > > > > rather than >=).
> > > >
> > > > > This is follow up after Slack discussion we had with Ash and Kaxil
> -
> > > > > summarising what we propose we'll do.
> > > >
> > > > Hey all.  It's great that we're moving this discussion back from
> Slack
> > > > to the mailing list.  But I've gotta point out that the wording needs
> > > > a small but critical fix up:
> > > >
> > > > "A change *is* coming... they *will* be fixed"
> > > >
> > > > needs to be
> > > >
> > > > "We'd like to propose a change... We would like to make them fixed."
> > > >
> > > > The first says that this decision has been made and the result of the
> > > > decision, which was made on Slack, is being reported back to the
> > > > mailing list.  The second is more accurate to the rest of the
> > > > discussion ('what we propose...').  And again, since it's axiomatic
> in
> > > > ASF that if it didn't happen on a list, it didn't happen[1], we gotta
> > > > make sure there's no confusion about where the community is on the
> > > > decision-making process.
> > > >
> > > > Thanks,
> > > > Jakob
> > > >
> > > > [1]
> > > >
> > >
> >
> https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects
> > > > ?
> > >
> > > On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
> > > > <al...@airbnb.com.invalid> wrote:
> > > > >
> > > > > You should run `pip check` to ensure no conflicts. Pip does not do
> > this
> > > > on
> > > > > its own.
> > > > >
> > > > > On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > > > > wrote:
> > > > >
> > > > > > Great that this discussion already happened :). Lots of useful
> > things
> > > > in
> > > > > > it. And yes - it means pinning in requirement.txt - this is how
> > > > pip-tools
> > > > > > work.
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > Principal Software Engineer
> > > > > > Phone: +48660796129
> > > > > >
> > > > > > On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <
> > arthur.wiedmer@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jarek,
> > > > > > >
> > > > > > > I will +1 the discussion Dan is referring to and George's
> advice.
> > > > > > >
> > > > > > > I just want to double check we are talking about pinning in
> > > > > > > requirements.txt only.
> > > > > > >
> > > > > > > This offers the ability to
> > > > > > > pip install -r requirements.txt
> > > > > > > pip install --no-deps airflow
> > > > > > > For a guaranteed install which works.
> > > > > > >
> > > > > > > Several different requirement files can be provided for
> specific
> > > use
> > > > > > cases,
> > > > > > > like a stable dev one for instance for people wanting to work
> on
> > > > > > operators
> > > > > > > and non-core functions.
> > > > > > >
> > > > > > > However, I think we should proactively test in CI against
> > unpinned
> > > > > > > dependencies (though it might be a separate case in the
> matrix) ,
> > > so
> > > > that
> > > > > > > we get advance warning if possible that things will break.
> > > > > > > CI downtime is not a bad thing here, it actually caught a
> problem
> > > :)
> > > > > > >
> > > > > > > We should unpin as possible in setup.py to only maintain
> minimum
> > > > required
> > > > > > > compatibility. The process of pinning in setup.py is extremely
> > > > > > detrimental
> > > > > > > when you have a large number of python libraries installed with
> > > > different
> > > > > > > pinned versions.
> > > > > > >
> > > > > > > Best,
> > > > > > > Arthur
> > > > > > >
> > > > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > > <ddavydov@twitter.com.invalid
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Relevant discussion about this:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > > > > > > >
> > > > > > > > On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > TL;DR; A change is coming in the way how
> > > > dependencies/requirements
> > > > > > are
> > > > > > > > > specified for Apache Airflow - they will be fixed rather
> than
> > > > > > flexible
> > > > > > > > (==
> > > > > > > > > rather than >=).
> > > > > > > > >
> > > > > > > > > This is follow up after Slack discussion we had with Ash
> and
> > > > Kaxil -
> > > > > > > > > summarising what we propose we'll do.
> > > > > > > > >
> > > > > > > > > *Problem:*
> > > > > > > > > During last few weeks we experienced quite a few downtimes
> of
> > > > > > TravisCI
> > > > > > > > > builds (for all PRs/branches including master) as some of
> the
> > > > > > > transitive
> > > > > > > > > dependencies were automatically upgraded. This because in a
> > > > number of
> > > > > > > > > dependencies we have  >= rather than == dependencies.
> > > > > > > > >
> > > > > > > > > Whenever there is a new release of such dependency, it
> might
> > > > cause
> > > > > > > chain
> > > > > > > > > reaction with upgrade of transitive dependencies which
> might
> > > get
> > > > into
> > > > > > > > > conflict.
> > > > > > > > >
> > > > > > > > > An example was Flask-AppBuilder vs flask-login transitive
> > > > dependency
> > > > > > > with
> > > > > > > > > click. They started to conflict once AppBuilder has
> released
> > > > version
> > > > > > > > > 1.12.0.
> > > > > > > > >
> > > > > > > > > *Diagnosis:*
> > > > > > > > > Transitive dependencies with "flexible" versions (where >=
> is
> > > > used
> > > > > > > > instead
> > > > > > > > > of ==) is a reason for "dependency hell". We will sooner or
> > > > later hit
> > > > > > > > other
> > > > > > > > > cases where not fixed dependencies cause similar problems
> > with
> > > > other
> > > > > > > > > transitive dependencies. We need to fix-pin them. This
> causes
> > > > > > problems
> > > > > > > > for
> > > > > > > > > both - released versions (cause they stop to work!) and for
> > > > > > development
> > > > > > > > > (cause they break master builds in TravisCI and prevent
> > people
> > > > from
> > > > > > > > > installing development environment from the scratch.
> > > > > > > > >
> > > > > > > > > *Solution:*
> > > > > > > > >
> > > > > > > > >    - Following the old-but-good post
> > > > > > > > >    https://nvie.com/posts/pin-your-packages/ we are going
> to
> > > > fix the
> > > > > > > > > pinned
> > > > > > > > >    dependencies to specific versions (so basically all
> > > > dependencies
> > > > > > are
> > > > > > > > >    "fixed").
> > > > > > > > >    - We will introduce mechanism to be able to upgrade
> > > > dependencies
> > > > > > > with
> > > > > > > > >    pip-tools (https://github.com/jazzband/pip-tools). We
> > might
> > > > also
> > > > > > > > take a
> > > > > > > > >    look at pipenv:
> https://pipenv.readthedocs.io/en/latest/
> > > > > > > > >    - People who would like to upgrade some dependencies for
> > > > their PRs
> > > > > > > > will
> > > > > > > > >    still be able to do it - but such upgrades will be in
> > their
> > > PR
> > > > > > thus
> > > > > > > > they
> > > > > > > > >    will go through TravisCI tests and they will also have
> to
> > be
> > > > > > > specified
> > > > > > > > > with
> > > > > > > > >    pinned fixed versions (==). This should be part of
> review
> > > > process
> > > > > > to
> > > > > > > > > make
> > > > > > > > >    sure new/changed requirements are pinned.
> > > > > > > > >    - In release process there will be a point where an
> > upgrade
> > > > will
> > > > > > be
> > > > > > > > >    attempted for all requirements (using pip-tools) so that
> > we
> > > > are
> > > > > > not
> > > > > > > > > stuck
> > > > > > > > >    with older releases. This will be in controlled PR
> > > environment
> > > > > > where
> > > > > > > > > there
> > > > > > > > >    will be time to fix all dependencies without impacting
> > > others
> > > > and
> > > > > > > > likely
> > > > > > > > >    enough time to "vet" such changes (this can be done for
> > > > alpha/beta
> > > > > > > > > releases
> > > > > > > > >    for example).
> > > > > > > > >    - As a side effect dependencies specification will
> become
> > > far
> > > > > > > simpler
> > > > > > > > >    and straightforward.
> > > > > > > > >
> > > > > > > > > Happy to hear community comments to the proposal. I am
> happy
> > to
> > > > take
> > > > > > a
> > > > > > > > lead
> > > > > > > > > on that, open JIRA issue and implement if this is something
> > > > community
> > > > > > > is
> > > > > > > > > happy with.
> > > > > > > > >
> > > > > > > > > J.
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > *Jarek Potiuk, Principal Software Engineer*
> > > > > > > > > Mobile: +48 660 796 129
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: Pinning dependencies for Apache Airflow

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Hi Jarek,

Thanks for bringing this up. I missed the discussion on Slack since I'm on
holiday, but I saw the thread and it was way too interesting, and therefore
this email :)

This is actually something that we need to address asap. Like you mention,
we saw it earlier that specific transient dependencies are not compatible
and then we end up with a breaking CI, or even worse, a broken release.
Earlier we had in the setup.py the fixed versions (==) and in a separate
requirements.txt the requirements for the CI. This was also far from
optimal since we had two versions of the requirements.

I like the idea that you are proposing. Maybe we can do an experiment with
it, because of the nature of Airflow (orchestrating different systems), we
have a huge list of dependencies. To not install everything, we've created
groups. For example specific libraries when you're using the Google Cloud,
Elastic, Druid, etc. So I'm curious how it will work with the `
extras_require` of Airflow

Regarding the pipenv. I don't use any pipenv/virtualenv anymore. For me
Docker is much easier to work with. I'm also working on a PR to get rid of
tox for the testing, and move to a more Docker idiomatic test pipeline.
Curious what you thoughts are on that.

Cheers, Fokko

Op do 4 okt. 2018 om 15:39 schreef Arthur Wiedmer <arthur.wiedmer@gmail.com
>:

> Thanks Jakob!
>
> I think that this is a huge risk of Slack.
> I am not against Slack as a support channel, but it is a slippery slope to
> have more and more decisions/conversations happening there, contrary to
> what we hope to achieve with the ASF.
>
> When we are starting to discuss issues of development, extensions and
> improvements, it is important for the discussion to happen in the mailing
> list.
>
> Jarek, I wouldn't worry too much, we are still in the process of learning
> as a community. Welcome and thank you for your contribution!
>
> Best,
> Arthur.
>
> On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > Thanks for pointing it out Jakob.
> >
> > I am still very fresh in the ASF community and learning the ropes and
> > etiquette and code of conduct. Apologies for my ignorance.
> > I re-read the conduct and FAQ now again - with more understanding and
> will
> > pay more attention to wording in the future. As you mentioned it's more
> the
> > wording than intentions, but since it was in TL;DR; it has stronger
> > consequences.
> >
> > BTW. Thanks for actually following the code of conduct and pointing it
> out
> > in respectful manner. I really appreciate it.
> >
> > J.
> >
> > Principal Software Engineer
> > Phone: +48660796129
> >
> > On Thu, 4 Oct 2018, 20:41 Jakob Homan, <jg...@gmail.com> wrote:
> >
> > > > TL;DR; A change is coming in the way how dependencies/requirements
> are
> > > > specified for Apache Airflow - they will be fixed rather than
> flexible
> > > (==
> > > > rather than >=).
> > >
> > > > This is follow up after Slack discussion we had with Ash and Kaxil -
> > > > summarising what we propose we'll do.
> > >
> > > Hey all.  It's great that we're moving this discussion back from Slack
> > > to the mailing list.  But I've gotta point out that the wording needs
> > > a small but critical fix up:
> > >
> > > "A change *is* coming... they *will* be fixed"
> > >
> > > needs to be
> > >
> > > "We'd like to propose a change... We would like to make them fixed."
> > >
> > > The first says that this decision has been made and the result of the
> > > decision, which was made on Slack, is being reported back to the
> > > mailing list.  The second is more accurate to the rest of the
> > > discussion ('what we propose...').  And again, since it's axiomatic in
> > > ASF that if it didn't happen on a list, it didn't happen[1], we gotta
> > > make sure there's no confusion about where the community is on the
> > > decision-making process.
> > >
> > > Thanks,
> > > Jakob
> > >
> > > [1]
> > >
> >
> https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects
> > > ?
> >
> > On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
> > > <al...@airbnb.com.invalid> wrote:
> > > >
> > > > You should run `pip check` to ensure no conflicts. Pip does not do
> this
> > > on
> > > > its own.
> > > >
> > > > On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > > wrote:
> > > >
> > > > > Great that this discussion already happened :). Lots of useful
> things
> > > in
> > > > > it. And yes - it means pinning in requirement.txt - this is how
> > > pip-tools
> > > > > work.
> > > > >
> > > > > J.
> > > > >
> > > > > Principal Software Engineer
> > > > > Phone: +48660796129
> > > > >
> > > > > On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <
> arthur.wiedmer@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jarek,
> > > > > >
> > > > > > I will +1 the discussion Dan is referring to and George's advice.
> > > > > >
> > > > > > I just want to double check we are talking about pinning in
> > > > > > requirements.txt only.
> > > > > >
> > > > > > This offers the ability to
> > > > > > pip install -r requirements.txt
> > > > > > pip install --no-deps airflow
> > > > > > For a guaranteed install which works.
> > > > > >
> > > > > > Several different requirement files can be provided for specific
> > use
> > > > > cases,
> > > > > > like a stable dev one for instance for people wanting to work on
> > > > > operators
> > > > > > and non-core functions.
> > > > > >
> > > > > > However, I think we should proactively test in CI against
> unpinned
> > > > > > dependencies (though it might be a separate case in the matrix) ,
> > so
> > > that
> > > > > > we get advance warning if possible that things will break.
> > > > > > CI downtime is not a bad thing here, it actually caught a problem
> > :)
> > > > > >
> > > > > > We should unpin as possible in setup.py to only maintain minimum
> > > required
> > > > > > compatibility. The process of pinning in setup.py is extremely
> > > > > detrimental
> > > > > > when you have a large number of python libraries installed with
> > > different
> > > > > > pinned versions.
> > > > > >
> > > > > > Best,
> > > > > > Arthur
> > > > > >
> > > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > <ddavydov@twitter.com.invalid
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Relevant discussion about this:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > > > > > >
> > > > > > > On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > TL;DR; A change is coming in the way how
> > > dependencies/requirements
> > > > > are
> > > > > > > > specified for Apache Airflow - they will be fixed rather than
> > > > > flexible
> > > > > > > (==
> > > > > > > > rather than >=).
> > > > > > > >
> > > > > > > > This is follow up after Slack discussion we had with Ash and
> > > Kaxil -
> > > > > > > > summarising what we propose we'll do.
> > > > > > > >
> > > > > > > > *Problem:*
> > > > > > > > During last few weeks we experienced quite a few downtimes of
> > > > > TravisCI
> > > > > > > > builds (for all PRs/branches including master) as some of the
> > > > > > transitive
> > > > > > > > dependencies were automatically upgraded. This because in a
> > > number of
> > > > > > > > dependencies we have  >= rather than == dependencies.
> > > > > > > >
> > > > > > > > Whenever there is a new release of such dependency, it might
> > > cause
> > > > > > chain
> > > > > > > > reaction with upgrade of transitive dependencies which might
> > get
> > > into
> > > > > > > > conflict.
> > > > > > > >
> > > > > > > > An example was Flask-AppBuilder vs flask-login transitive
> > > dependency
> > > > > > with
> > > > > > > > click. They started to conflict once AppBuilder has released
> > > version
> > > > > > > > 1.12.0.
> > > > > > > >
> > > > > > > > *Diagnosis:*
> > > > > > > > Transitive dependencies with "flexible" versions (where >= is
> > > used
> > > > > > > instead
> > > > > > > > of ==) is a reason for "dependency hell". We will sooner or
> > > later hit
> > > > > > > other
> > > > > > > > cases where not fixed dependencies cause similar problems
> with
> > > other
> > > > > > > > transitive dependencies. We need to fix-pin them. This causes
> > > > > problems
> > > > > > > for
> > > > > > > > both - released versions (cause they stop to work!) and for
> > > > > development
> > > > > > > > (cause they break master builds in TravisCI and prevent
> people
> > > from
> > > > > > > > installing development environment from the scratch.
> > > > > > > >
> > > > > > > > *Solution:*
> > > > > > > >
> > > > > > > >    - Following the old-but-good post
> > > > > > > >    https://nvie.com/posts/pin-your-packages/ we are going to
> > > fix the
> > > > > > > > pinned
> > > > > > > >    dependencies to specific versions (so basically all
> > > dependencies
> > > > > are
> > > > > > > >    "fixed").
> > > > > > > >    - We will introduce mechanism to be able to upgrade
> > > dependencies
> > > > > > with
> > > > > > > >    pip-tools (https://github.com/jazzband/pip-tools). We
> might
> > > also
> > > > > > > take a
> > > > > > > >    look at pipenv: https://pipenv.readthedocs.io/en/latest/
> > > > > > > >    - People who would like to upgrade some dependencies for
> > > their PRs
> > > > > > > will
> > > > > > > >    still be able to do it - but such upgrades will be in
> their
> > PR
> > > > > thus
> > > > > > > they
> > > > > > > >    will go through TravisCI tests and they will also have to
> be
> > > > > > specified
> > > > > > > > with
> > > > > > > >    pinned fixed versions (==). This should be part of review
> > > process
> > > > > to
> > > > > > > > make
> > > > > > > >    sure new/changed requirements are pinned.
> > > > > > > >    - In release process there will be a point where an
> upgrade
> > > will
> > > > > be
> > > > > > > >    attempted for all requirements (using pip-tools) so that
> we
> > > are
> > > > > not
> > > > > > > > stuck
> > > > > > > >    with older releases. This will be in controlled PR
> > environment
> > > > > where
> > > > > > > > there
> > > > > > > >    will be time to fix all dependencies without impacting
> > others
> > > and
> > > > > > > likely
> > > > > > > >    enough time to "vet" such changes (this can be done for
> > > alpha/beta
> > > > > > > > releases
> > > > > > > >    for example).
> > > > > > > >    - As a side effect dependencies specification will become
> > far
> > > > > > simpler
> > > > > > > >    and straightforward.
> > > > > > > >
> > > > > > > > Happy to hear community comments to the proposal. I am happy
> to
> > > take
> > > > > a
> > > > > > > lead
> > > > > > > > on that, open JIRA issue and implement if this is something
> > > community
> > > > > > is
> > > > > > > > happy with.
> > > > > > > >
> > > > > > > > J.
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > *Jarek Potiuk, Principal Software Engineer*
> > > > > > > > Mobile: +48 660 796 129
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Re: Pinning dependencies for Apache Airflow

Posted by Arthur Wiedmer <ar...@gmail.com>.
Thanks Jakob!

I think that this is a huge risk of Slack.
I am not against Slack as a support channel, but it is a slippery slope to
have more and more decisions/conversations happening there, contrary to
what we hope to achieve with the ASF.

When we are starting to discuss issues of development, extensions and
improvements, it is important for the discussion to happen in the mailing
list.

Jarek, I wouldn't worry too much, we are still in the process of learning
as a community. Welcome and thank you for your contribution!

Best,
Arthur.

On Thu, Oct 4, 2018 at 1:42 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Thanks for pointing it out Jakob.
>
> I am still very fresh in the ASF community and learning the ropes and
> etiquette and code of conduct. Apologies for my ignorance.
> I re-read the conduct and FAQ now again - with more understanding and will
> pay more attention to wording in the future. As you mentioned it's more the
> wording than intentions, but since it was in TL;DR; it has stronger
> consequences.
>
> BTW. Thanks for actually following the code of conduct and pointing it out
> in respectful manner. I really appreciate it.
>
> J.
>
> Principal Software Engineer
> Phone: +48660796129
>
> On Thu, 4 Oct 2018, 20:41 Jakob Homan, <jg...@gmail.com> wrote:
>
> > > TL;DR; A change is coming in the way how dependencies/requirements are
> > > specified for Apache Airflow - they will be fixed rather than flexible
> > (==
> > > rather than >=).
> >
> > > This is follow up after Slack discussion we had with Ash and Kaxil -
> > > summarising what we propose we'll do.
> >
> > Hey all.  It's great that we're moving this discussion back from Slack
> > to the mailing list.  But I've gotta point out that the wording needs
> > a small but critical fix up:
> >
> > "A change *is* coming... they *will* be fixed"
> >
> > needs to be
> >
> > "We'd like to propose a change... We would like to make them fixed."
> >
> > The first says that this decision has been made and the result of the
> > decision, which was made on Slack, is being reported back to the
> > mailing list.  The second is more accurate to the rest of the
> > discussion ('what we propose...').  And again, since it's axiomatic in
> > ASF that if it didn't happen on a list, it didn't happen[1], we gotta
> > make sure there's no confusion about where the community is on the
> > decision-making process.
> >
> > Thanks,
> > Jakob
> >
> > [1]
> >
> https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects
> > ?
>
> On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
> > <al...@airbnb.com.invalid> wrote:
> > >
> > > You should run `pip check` to ensure no conflicts. Pip does not do this
> > on
> > > its own.
> > >
> > > On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <Ja...@polidea.com>
> > > wrote:
> > >
> > > > Great that this discussion already happened :). Lots of useful things
> > in
> > > > it. And yes - it means pinning in requirement.txt - this is how
> > pip-tools
> > > > work.
> > > >
> > > > J.
> > > >
> > > > Principal Software Engineer
> > > > Phone: +48660796129
> > > >
> > > > On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <ar...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jarek,
> > > > >
> > > > > I will +1 the discussion Dan is referring to and George's advice.
> > > > >
> > > > > I just want to double check we are talking about pinning in
> > > > > requirements.txt only.
> > > > >
> > > > > This offers the ability to
> > > > > pip install -r requirements.txt
> > > > > pip install --no-deps airflow
> > > > > For a guaranteed install which works.
> > > > >
> > > > > Several different requirement files can be provided for specific
> use
> > > > cases,
> > > > > like a stable dev one for instance for people wanting to work on
> > > > operators
> > > > > and non-core functions.
> > > > >
> > > > > However, I think we should proactively test in CI against unpinned
> > > > > dependencies (though it might be a separate case in the matrix) ,
> so
> > that
> > > > > we get advance warning if possible that things will break.
> > > > > CI downtime is not a bad thing here, it actually caught a problem
> :)
> > > > >
> > > > > We should unpin as possible in setup.py to only maintain minimum
> > required
> > > > > compatibility. The process of pinning in setup.py is extremely
> > > > detrimental
> > > > > when you have a large number of python libraries installed with
> > different
> > > > > pinned versions.
> > > > >
> > > > > Best,
> > > > > Arthur
> > > > >
> > > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > <ddavydov@twitter.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Relevant discussion about this:
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > > > > >
> > > > > > On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > TL;DR; A change is coming in the way how
> > dependencies/requirements
> > > > are
> > > > > > > specified for Apache Airflow - they will be fixed rather than
> > > > flexible
> > > > > > (==
> > > > > > > rather than >=).
> > > > > > >
> > > > > > > This is follow up after Slack discussion we had with Ash and
> > Kaxil -
> > > > > > > summarising what we propose we'll do.
> > > > > > >
> > > > > > > *Problem:*
> > > > > > > During last few weeks we experienced quite a few downtimes of
> > > > TravisCI
> > > > > > > builds (for all PRs/branches including master) as some of the
> > > > > transitive
> > > > > > > dependencies were automatically upgraded. This because in a
> > number of
> > > > > > > dependencies we have  >= rather than == dependencies.
> > > > > > >
> > > > > > > Whenever there is a new release of such dependency, it might
> > cause
> > > > > chain
> > > > > > > reaction with upgrade of transitive dependencies which might
> get
> > into
> > > > > > > conflict.
> > > > > > >
> > > > > > > An example was Flask-AppBuilder vs flask-login transitive
> > dependency
> > > > > with
> > > > > > > click. They started to conflict once AppBuilder has released
> > version
> > > > > > > 1.12.0.
> > > > > > >
> > > > > > > *Diagnosis:*
> > > > > > > Transitive dependencies with "flexible" versions (where >= is
> > used
> > > > > > instead
> > > > > > > of ==) is a reason for "dependency hell". We will sooner or
> > later hit
> > > > > > other
> > > > > > > cases where not fixed dependencies cause similar problems with
> > other
> > > > > > > transitive dependencies. We need to fix-pin them. This causes
> > > > problems
> > > > > > for
> > > > > > > both - released versions (cause they stop to work!) and for
> > > > development
> > > > > > > (cause they break master builds in TravisCI and prevent people
> > from
> > > > > > > installing development environment from the scratch.
> > > > > > >
> > > > > > > *Solution:*
> > > > > > >
> > > > > > >    - Following the old-but-good post
> > > > > > >    https://nvie.com/posts/pin-your-packages/ we are going to
> > fix the
> > > > > > > pinned
> > > > > > >    dependencies to specific versions (so basically all
> > dependencies
> > > > are
> > > > > > >    "fixed").
> > > > > > >    - We will introduce mechanism to be able to upgrade
> > dependencies
> > > > > with
> > > > > > >    pip-tools (https://github.com/jazzband/pip-tools). We might
> > also
> > > > > > take a
> > > > > > >    look at pipenv: https://pipenv.readthedocs.io/en/latest/
> > > > > > >    - People who would like to upgrade some dependencies for
> > their PRs
> > > > > > will
> > > > > > >    still be able to do it - but such upgrades will be in their
> PR
> > > > thus
> > > > > > they
> > > > > > >    will go through TravisCI tests and they will also have to be
> > > > > specified
> > > > > > > with
> > > > > > >    pinned fixed versions (==). This should be part of review
> > process
> > > > to
> > > > > > > make
> > > > > > >    sure new/changed requirements are pinned.
> > > > > > >    - In release process there will be a point where an upgrade
> > will
> > > > be
> > > > > > >    attempted for all requirements (using pip-tools) so that we
> > are
> > > > not
> > > > > > > stuck
> > > > > > >    with older releases. This will be in controlled PR
> environment
> > > > where
> > > > > > > there
> > > > > > >    will be time to fix all dependencies without impacting
> others
> > and
> > > > > > likely
> > > > > > >    enough time to "vet" such changes (this can be done for
> > alpha/beta
> > > > > > > releases
> > > > > > >    for example).
> > > > > > >    - As a side effect dependencies specification will become
> far
> > > > > simpler
> > > > > > >    and straightforward.
> > > > > > >
> > > > > > > Happy to hear community comments to the proposal. I am happy to
> > take
> > > > a
> > > > > > lead
> > > > > > > on that, open JIRA issue and implement if this is something
> > community
> > > > > is
> > > > > > > happy with.
> > > > > > >
> > > > > > > J.
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > *Jarek Potiuk, Principal Software Engineer*
> > > > > > > Mobile: +48 660 796 129
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
Thanks for pointing it out Jakob.

I am still very fresh in the ASF community and learning the ropes and
etiquette and code of conduct. Apologies for my ignorance.
I re-read the conduct and FAQ now again - with more understanding and will
pay more attention to wording in the future. As you mentioned it's more the
wording than intentions, but since it was in TL;DR; it has stronger
consequences.

BTW. Thanks for actually following the code of conduct and pointing it out
in respectful manner. I really appreciate it.

J.

Principal Software Engineer
Phone: +48660796129

On Thu, 4 Oct 2018, 20:41 Jakob Homan, <jg...@gmail.com> wrote:

> > TL;DR; A change is coming in the way how dependencies/requirements are
> > specified for Apache Airflow - they will be fixed rather than flexible
> (==
> > rather than >=).
>
> > This is follow up after Slack discussion we had with Ash and Kaxil -
> > summarising what we propose we'll do.
>
> Hey all.  It's great that we're moving this discussion back from Slack
> to the mailing list.  But I've gotta point out that the wording needs
> a small but critical fix up:
>
> "A change *is* coming... they *will* be fixed"
>
> needs to be
>
> "We'd like to propose a change... We would like to make them fixed."
>
> The first says that this decision has been made and the result of the
> decision, which was made on Slack, is being reported back to the
> mailing list.  The second is more accurate to the rest of the
> discussion ('what we propose...').  And again, since it's axiomatic in
> ASF that if it didn't happen on a list, it didn't happen[1], we gotta
> make sure there's no confusion about where the community is on the
> decision-making process.
>
> Thanks,
> Jakob
>
> [1]
> https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects
> ?

On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
> <al...@airbnb.com.invalid> wrote:
> >
> > You should run `pip check` to ensure no conflicts. Pip does not do this
> on
> > its own.
> >
> > On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > Great that this discussion already happened :). Lots of useful things
> in
> > > it. And yes - it means pinning in requirement.txt - this is how
> pip-tools
> > > work.
> > >
> > > J.
> > >
> > > Principal Software Engineer
> > > Phone: +48660796129
> > >
> > > On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <ar...@gmail.com>
> > > wrote:
> > >
> > > > Hi Jarek,
> > > >
> > > > I will +1 the discussion Dan is referring to and George's advice.
> > > >
> > > > I just want to double check we are talking about pinning in
> > > > requirements.txt only.
> > > >
> > > > This offers the ability to
> > > > pip install -r requirements.txt
> > > > pip install --no-deps airflow
> > > > For a guaranteed install which works.
> > > >
> > > > Several different requirement files can be provided for specific use
> > > cases,
> > > > like a stable dev one for instance for people wanting to work on
> > > operators
> > > > and non-core functions.
> > > >
> > > > However, I think we should proactively test in CI against unpinned
> > > > dependencies (though it might be a separate case in the matrix) , so
> that
> > > > we get advance warning if possible that things will break.
> > > > CI downtime is not a bad thing here, it actually caught a problem :)
> > > >
> > > > We should unpin as possible in setup.py to only maintain minimum
> required
> > > > compatibility. The process of pinning in setup.py is extremely
> > > detrimental
> > > > when you have a large number of python libraries installed with
> different
> > > > pinned versions.
> > > >
> > > > Best,
> > > > Arthur
> > > >
> > > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> <ddavydov@twitter.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Relevant discussion about this:
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > > > >
> > > > > On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > TL;DR; A change is coming in the way how
> dependencies/requirements
> > > are
> > > > > > specified for Apache Airflow - they will be fixed rather than
> > > flexible
> > > > > (==
> > > > > > rather than >=).
> > > > > >
> > > > > > This is follow up after Slack discussion we had with Ash and
> Kaxil -
> > > > > > summarising what we propose we'll do.
> > > > > >
> > > > > > *Problem:*
> > > > > > During last few weeks we experienced quite a few downtimes of
> > > TravisCI
> > > > > > builds (for all PRs/branches including master) as some of the
> > > > transitive
> > > > > > dependencies were automatically upgraded. This because in a
> number of
> > > > > > dependencies we have  >= rather than == dependencies.
> > > > > >
> > > > > > Whenever there is a new release of such dependency, it might
> cause
> > > > chain
> > > > > > reaction with upgrade of transitive dependencies which might get
> into
> > > > > > conflict.
> > > > > >
> > > > > > An example was Flask-AppBuilder vs flask-login transitive
> dependency
> > > > with
> > > > > > click. They started to conflict once AppBuilder has released
> version
> > > > > > 1.12.0.
> > > > > >
> > > > > > *Diagnosis:*
> > > > > > Transitive dependencies with "flexible" versions (where >= is
> used
> > > > > instead
> > > > > > of ==) is a reason for "dependency hell". We will sooner or
> later hit
> > > > > other
> > > > > > cases where not fixed dependencies cause similar problems with
> other
> > > > > > transitive dependencies. We need to fix-pin them. This causes
> > > problems
> > > > > for
> > > > > > both - released versions (cause they stop to work!) and for
> > > development
> > > > > > (cause they break master builds in TravisCI and prevent people
> from
> > > > > > installing development environment from the scratch.
> > > > > >
> > > > > > *Solution:*
> > > > > >
> > > > > >    - Following the old-but-good post
> > > > > >    https://nvie.com/posts/pin-your-packages/ we are going to
> fix the
> > > > > > pinned
> > > > > >    dependencies to specific versions (so basically all
> dependencies
> > > are
> > > > > >    "fixed").
> > > > > >    - We will introduce mechanism to be able to upgrade
> dependencies
> > > > with
> > > > > >    pip-tools (https://github.com/jazzband/pip-tools). We might
> also
> > > > > take a
> > > > > >    look at pipenv: https://pipenv.readthedocs.io/en/latest/
> > > > > >    - People who would like to upgrade some dependencies for
> their PRs
> > > > > will
> > > > > >    still be able to do it - but such upgrades will be in their PR
> > > thus
> > > > > they
> > > > > >    will go through TravisCI tests and they will also have to be
> > > > specified
> > > > > > with
> > > > > >    pinned fixed versions (==). This should be part of review
> process
> > > to
> > > > > > make
> > > > > >    sure new/changed requirements are pinned.
> > > > > >    - In release process there will be a point where an upgrade
> will
> > > be
> > > > > >    attempted for all requirements (using pip-tools) so that we
> are
> > > not
> > > > > > stuck
> > > > > >    with older releases. This will be in controlled PR environment
> > > where
> > > > > > there
> > > > > >    will be time to fix all dependencies without impacting others
> and
> > > > > likely
> > > > > >    enough time to "vet" such changes (this can be done for
> alpha/beta
> > > > > > releases
> > > > > >    for example).
> > > > > >    - As a side effect dependencies specification will become far
> > > > simpler
> > > > > >    and straightforward.
> > > > > >
> > > > > > Happy to hear community comments to the proposal. I am happy to
> take
> > > a
> > > > > lead
> > > > > > on that, open JIRA issue and implement if this is something
> community
> > > > is
> > > > > > happy with.
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > --
> > > > > >
> > > > > > *Jarek Potiuk, Principal Software Engineer*
> > > > > > Mobile: +48 660 796 129
> > > > > >
> > > > >
> > > >
> > >
>

Re: Pinning dependencies for Apache Airflow

Posted by Jakob Homan <jg...@gmail.com>.
> TL;DR; A change is coming in the way how dependencies/requirements are
> specified for Apache Airflow - they will be fixed rather than flexible (==
> rather than >=).

> This is follow up after Slack discussion we had with Ash and Kaxil -
> summarising what we propose we'll do.

Hey all.  It's great that we're moving this discussion back from Slack
to the mailing list.  But I've gotta point out that the wording needs
a small but critical fix up:

"A change *is* coming... they *will* be fixed"

needs to be

"We'd like to propose a change... We would like to make them fixed."

The first says that this decision has been made and the result of the
decision, which was made on Slack, is being reported back to the
mailing list.  The second is more accurate to the rest of the
discussion ('what we propose...').  And again, since it's axiomatic in
ASF that if it didn't happen on a list, it didn't happen[1], we gotta
make sure there's no confusion about where the community is on the
decision-making process.

Thanks,
Jakob

[1] https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects?
On Thu, Oct 4, 2018 at 9:56 AM Alex Guziel
<al...@airbnb.com.invalid> wrote:
>
> You should run `pip check` to ensure no conflicts. Pip does not do this on
> its own.
>
> On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > Great that this discussion already happened :). Lots of useful things in
> > it. And yes - it means pinning in requirement.txt - this is how pip-tools
> > work.
> >
> > J.
> >
> > Principal Software Engineer
> > Phone: +48660796129
> >
> > On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <ar...@gmail.com>
> > wrote:
> >
> > > Hi Jarek,
> > >
> > > I will +1 the discussion Dan is referring to and George's advice.
> > >
> > > I just want to double check we are talking about pinning in
> > > requirements.txt only.
> > >
> > > This offers the ability to
> > > pip install -r requirements.txt
> > > pip install --no-deps airflow
> > > For a guaranteed install which works.
> > >
> > > Several different requirement files can be provided for specific use
> > cases,
> > > like a stable dev one for instance for people wanting to work on
> > operators
> > > and non-core functions.
> > >
> > > However, I think we should proactively test in CI against unpinned
> > > dependencies (though it might be a separate case in the matrix) , so that
> > > we get advance warning if possible that things will break.
> > > CI downtime is not a bad thing here, it actually caught a problem :)
> > >
> > > We should unpin as possible in setup.py to only maintain minimum required
> > > compatibility. The process of pinning in setup.py is extremely
> > detrimental
> > > when you have a large number of python libraries installed with different
> > > pinned versions.
> > >
> > > Best,
> > > Arthur
> > >
> > > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov <ddavydov@twitter.com.invalid
> > >
> > > wrote:
> > >
> > > > Relevant discussion about this:
> > > >
> > > >
> > >
> > https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > > >
> > > > On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> > >
> > > > wrote:
> > > >
> > > > > TL;DR; A change is coming in the way how dependencies/requirements
> > are
> > > > > specified for Apache Airflow - they will be fixed rather than
> > flexible
> > > > (==
> > > > > rather than >=).
> > > > >
> > > > > This is follow up after Slack discussion we had with Ash and Kaxil -
> > > > > summarising what we propose we'll do.
> > > > >
> > > > > *Problem:*
> > > > > During last few weeks we experienced quite a few downtimes of
> > TravisCI
> > > > > builds (for all PRs/branches including master) as some of the
> > > transitive
> > > > > dependencies were automatically upgraded. This because in a number of
> > > > > dependencies we have  >= rather than == dependencies.
> > > > >
> > > > > Whenever there is a new release of such dependency, it might cause
> > > chain
> > > > > reaction with upgrade of transitive dependencies which might get into
> > > > > conflict.
> > > > >
> > > > > An example was Flask-AppBuilder vs flask-login transitive dependency
> > > with
> > > > > click. They started to conflict once AppBuilder has released version
> > > > > 1.12.0.
> > > > >
> > > > > *Diagnosis:*
> > > > > Transitive dependencies with "flexible" versions (where >= is used
> > > > instead
> > > > > of ==) is a reason for "dependency hell". We will sooner or later hit
> > > > other
> > > > > cases where not fixed dependencies cause similar problems with other
> > > > > transitive dependencies. We need to fix-pin them. This causes
> > problems
> > > > for
> > > > > both - released versions (cause they stop to work!) and for
> > development
> > > > > (cause they break master builds in TravisCI and prevent people from
> > > > > installing development environment from the scratch.
> > > > >
> > > > > *Solution:*
> > > > >
> > > > >    - Following the old-but-good post
> > > > >    https://nvie.com/posts/pin-your-packages/ we are going to fix the
> > > > > pinned
> > > > >    dependencies to specific versions (so basically all dependencies
> > are
> > > > >    "fixed").
> > > > >    - We will introduce mechanism to be able to upgrade dependencies
> > > with
> > > > >    pip-tools (https://github.com/jazzband/pip-tools). We might also
> > > > take a
> > > > >    look at pipenv: https://pipenv.readthedocs.io/en/latest/
> > > > >    - People who would like to upgrade some dependencies for their PRs
> > > > will
> > > > >    still be able to do it - but such upgrades will be in their PR
> > thus
> > > > they
> > > > >    will go through TravisCI tests and they will also have to be
> > > specified
> > > > > with
> > > > >    pinned fixed versions (==). This should be part of review process
> > to
> > > > > make
> > > > >    sure new/changed requirements are pinned.
> > > > >    - In release process there will be a point where an upgrade will
> > be
> > > > >    attempted for all requirements (using pip-tools) so that we are
> > not
> > > > > stuck
> > > > >    with older releases. This will be in controlled PR environment
> > where
> > > > > there
> > > > >    will be time to fix all dependencies without impacting others and
> > > > likely
> > > > >    enough time to "vet" such changes (this can be done for alpha/beta
> > > > > releases
> > > > >    for example).
> > > > >    - As a side effect dependencies specification will become far
> > > simpler
> > > > >    and straightforward.
> > > > >
> > > > > Happy to hear community comments to the proposal. I am happy to take
> > a
> > > > lead
> > > > > on that, open JIRA issue and implement if this is something community
> > > is
> > > > > happy with.
> > > > >
> > > > > J.
> > > > >
> > > > > --
> > > > >
> > > > > *Jarek Potiuk, Principal Software Engineer*
> > > > > Mobile: +48 660 796 129
> > > > >
> > > >
> > >
> >

Re: Pinning dependencies for Apache Airflow

Posted by Alex Guziel <al...@airbnb.com.INVALID>.
You should run `pip check` to ensure no conflicts. Pip does not do this on
its own.

On Thu, Oct 4, 2018 at 9:20 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Great that this discussion already happened :). Lots of useful things in
> it. And yes - it means pinning in requirement.txt - this is how pip-tools
> work.
>
> J.
>
> Principal Software Engineer
> Phone: +48660796129
>
> On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <ar...@gmail.com>
> wrote:
>
> > Hi Jarek,
> >
> > I will +1 the discussion Dan is referring to and George's advice.
> >
> > I just want to double check we are talking about pinning in
> > requirements.txt only.
> >
> > This offers the ability to
> > pip install -r requirements.txt
> > pip install --no-deps airflow
> > For a guaranteed install which works.
> >
> > Several different requirement files can be provided for specific use
> cases,
> > like a stable dev one for instance for people wanting to work on
> operators
> > and non-core functions.
> >
> > However, I think we should proactively test in CI against unpinned
> > dependencies (though it might be a separate case in the matrix) , so that
> > we get advance warning if possible that things will break.
> > CI downtime is not a bad thing here, it actually caught a problem :)
> >
> > We should unpin as possible in setup.py to only maintain minimum required
> > compatibility. The process of pinning in setup.py is extremely
> detrimental
> > when you have a large number of python libraries installed with different
> > pinned versions.
> >
> > Best,
> > Arthur
> >
> > On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov <ddavydov@twitter.com.invalid
> >
> > wrote:
> >
> > > Relevant discussion about this:
> > >
> > >
> >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > >
> > > On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > > wrote:
> > >
> > > > TL;DR; A change is coming in the way how dependencies/requirements
> are
> > > > specified for Apache Airflow - they will be fixed rather than
> flexible
> > > (==
> > > > rather than >=).
> > > >
> > > > This is follow up after Slack discussion we had with Ash and Kaxil -
> > > > summarising what we propose we'll do.
> > > >
> > > > *Problem:*
> > > > During last few weeks we experienced quite a few downtimes of
> TravisCI
> > > > builds (for all PRs/branches including master) as some of the
> > transitive
> > > > dependencies were automatically upgraded. This because in a number of
> > > > dependencies we have  >= rather than == dependencies.
> > > >
> > > > Whenever there is a new release of such dependency, it might cause
> > chain
> > > > reaction with upgrade of transitive dependencies which might get into
> > > > conflict.
> > > >
> > > > An example was Flask-AppBuilder vs flask-login transitive dependency
> > with
> > > > click. They started to conflict once AppBuilder has released version
> > > > 1.12.0.
> > > >
> > > > *Diagnosis:*
> > > > Transitive dependencies with "flexible" versions (where >= is used
> > > instead
> > > > of ==) is a reason for "dependency hell". We will sooner or later hit
> > > other
> > > > cases where not fixed dependencies cause similar problems with other
> > > > transitive dependencies. We need to fix-pin them. This causes
> problems
> > > for
> > > > both - released versions (cause they stop to work!) and for
> development
> > > > (cause they break master builds in TravisCI and prevent people from
> > > > installing development environment from the scratch.
> > > >
> > > > *Solution:*
> > > >
> > > >    - Following the old-but-good post
> > > >    https://nvie.com/posts/pin-your-packages/ we are going to fix the
> > > > pinned
> > > >    dependencies to specific versions (so basically all dependencies
> are
> > > >    "fixed").
> > > >    - We will introduce mechanism to be able to upgrade dependencies
> > with
> > > >    pip-tools (https://github.com/jazzband/pip-tools). We might also
> > > take a
> > > >    look at pipenv: https://pipenv.readthedocs.io/en/latest/
> > > >    - People who would like to upgrade some dependencies for their PRs
> > > will
> > > >    still be able to do it - but such upgrades will be in their PR
> thus
> > > they
> > > >    will go through TravisCI tests and they will also have to be
> > specified
> > > > with
> > > >    pinned fixed versions (==). This should be part of review process
> to
> > > > make
> > > >    sure new/changed requirements are pinned.
> > > >    - In release process there will be a point where an upgrade will
> be
> > > >    attempted for all requirements (using pip-tools) so that we are
> not
> > > > stuck
> > > >    with older releases. This will be in controlled PR environment
> where
> > > > there
> > > >    will be time to fix all dependencies without impacting others and
> > > likely
> > > >    enough time to "vet" such changes (this can be done for alpha/beta
> > > > releases
> > > >    for example).
> > > >    - As a side effect dependencies specification will become far
> > simpler
> > > >    and straightforward.
> > > >
> > > > Happy to hear community comments to the proposal. I am happy to take
> a
> > > lead
> > > > on that, open JIRA issue and implement if this is something community
> > is
> > > > happy with.
> > > >
> > > > J.
> > > >
> > > > --
> > > >
> > > > *Jarek Potiuk, Principal Software Engineer*
> > > > Mobile: +48 660 796 129
> > > >
> > >
> >
>

Re: Pinning dependencies for Apache Airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
Great that this discussion already happened :). Lots of useful things in
it. And yes - it means pinning in requirement.txt - this is how pip-tools
work.

J.

Principal Software Engineer
Phone: +48660796129

On Thu, 4 Oct 2018, 18:14 Arthur Wiedmer, <ar...@gmail.com> wrote:

> Hi Jarek,
>
> I will +1 the discussion Dan is referring to and George's advice.
>
> I just want to double check we are talking about pinning in
> requirements.txt only.
>
> This offers the ability to
> pip install -r requirements.txt
> pip install --no-deps airflow
> For a guaranteed install which works.
>
> Several different requirement files can be provided for specific use cases,
> like a stable dev one for instance for people wanting to work on operators
> and non-core functions.
>
> However, I think we should proactively test in CI against unpinned
> dependencies (though it might be a separate case in the matrix) , so that
> we get advance warning if possible that things will break.
> CI downtime is not a bad thing here, it actually caught a problem :)
>
> We should unpin as possible in setup.py to only maintain minimum required
> compatibility. The process of pinning in setup.py is extremely detrimental
> when you have a large number of python libraries installed with different
> pinned versions.
>
> Best,
> Arthur
>
> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov <dd...@twitter.com.invalid>
> wrote:
>
> > Relevant discussion about this:
> >
> >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> >
> > On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > TL;DR; A change is coming in the way how dependencies/requirements are
> > > specified for Apache Airflow - they will be fixed rather than flexible
> > (==
> > > rather than >=).
> > >
> > > This is follow up after Slack discussion we had with Ash and Kaxil -
> > > summarising what we propose we'll do.
> > >
> > > *Problem:*
> > > During last few weeks we experienced quite a few downtimes of TravisCI
> > > builds (for all PRs/branches including master) as some of the
> transitive
> > > dependencies were automatically upgraded. This because in a number of
> > > dependencies we have  >= rather than == dependencies.
> > >
> > > Whenever there is a new release of such dependency, it might cause
> chain
> > > reaction with upgrade of transitive dependencies which might get into
> > > conflict.
> > >
> > > An example was Flask-AppBuilder vs flask-login transitive dependency
> with
> > > click. They started to conflict once AppBuilder has released version
> > > 1.12.0.
> > >
> > > *Diagnosis:*
> > > Transitive dependencies with "flexible" versions (where >= is used
> > instead
> > > of ==) is a reason for "dependency hell". We will sooner or later hit
> > other
> > > cases where not fixed dependencies cause similar problems with other
> > > transitive dependencies. We need to fix-pin them. This causes problems
> > for
> > > both - released versions (cause they stop to work!) and for development
> > > (cause they break master builds in TravisCI and prevent people from
> > > installing development environment from the scratch.
> > >
> > > *Solution:*
> > >
> > >    - Following the old-but-good post
> > >    https://nvie.com/posts/pin-your-packages/ we are going to fix the
> > > pinned
> > >    dependencies to specific versions (so basically all dependencies are
> > >    "fixed").
> > >    - We will introduce mechanism to be able to upgrade dependencies
> with
> > >    pip-tools (https://github.com/jazzband/pip-tools). We might also
> > take a
> > >    look at pipenv: https://pipenv.readthedocs.io/en/latest/
> > >    - People who would like to upgrade some dependencies for their PRs
> > will
> > >    still be able to do it - but such upgrades will be in their PR thus
> > they
> > >    will go through TravisCI tests and they will also have to be
> specified
> > > with
> > >    pinned fixed versions (==). This should be part of review process to
> > > make
> > >    sure new/changed requirements are pinned.
> > >    - In release process there will be a point where an upgrade will be
> > >    attempted for all requirements (using pip-tools) so that we are not
> > > stuck
> > >    with older releases. This will be in controlled PR environment where
> > > there
> > >    will be time to fix all dependencies without impacting others and
> > likely
> > >    enough time to "vet" such changes (this can be done for alpha/beta
> > > releases
> > >    for example).
> > >    - As a side effect dependencies specification will become far
> simpler
> > >    and straightforward.
> > >
> > > Happy to hear community comments to the proposal. I am happy to take a
> > lead
> > > on that, open JIRA issue and implement if this is something community
> is
> > > happy with.
> > >
> > > J.
> > >
> > > --
> > >
> > > *Jarek Potiuk, Principal Software Engineer*
> > > Mobile: +48 660 796 129
> > >
> >
>

Re: Pinning dependencies for Apache Airflow

Posted by Arthur Wiedmer <ar...@gmail.com>.
Hi Jarek,

I will +1 the discussion Dan is referring to and George's advice.

I just want to double check we are talking about pinning in
requirements.txt only.

This offers the ability to
pip install -r requirements.txt
pip install --no-deps airflow
For a guaranteed install which works.

Several different requirement files can be provided for specific use cases,
like a stable dev one for instance for people wanting to work on operators
and non-core functions.

However, I think we should proactively test in CI against unpinned
dependencies (though it might be a separate case in the matrix) , so that
we get advance warning if possible that things will break.
CI downtime is not a bad thing here, it actually caught a problem :)

We should unpin as possible in setup.py to only maintain minimum required
compatibility. The process of pinning in setup.py is extremely detrimental
when you have a large number of python libraries installed with different
pinned versions.

Best,
Arthur

On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov <dd...@twitter.com.invalid>
wrote:

> Relevant discussion about this:
>
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
>
> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > TL;DR; A change is coming in the way how dependencies/requirements are
> > specified for Apache Airflow - they will be fixed rather than flexible
> (==
> > rather than >=).
> >
> > This is follow up after Slack discussion we had with Ash and Kaxil -
> > summarising what we propose we'll do.
> >
> > *Problem:*
> > During last few weeks we experienced quite a few downtimes of TravisCI
> > builds (for all PRs/branches including master) as some of the transitive
> > dependencies were automatically upgraded. This because in a number of
> > dependencies we have  >= rather than == dependencies.
> >
> > Whenever there is a new release of such dependency, it might cause chain
> > reaction with upgrade of transitive dependencies which might get into
> > conflict.
> >
> > An example was Flask-AppBuilder vs flask-login transitive dependency with
> > click. They started to conflict once AppBuilder has released version
> > 1.12.0.
> >
> > *Diagnosis:*
> > Transitive dependencies with "flexible" versions (where >= is used
> instead
> > of ==) is a reason for "dependency hell". We will sooner or later hit
> other
> > cases where not fixed dependencies cause similar problems with other
> > transitive dependencies. We need to fix-pin them. This causes problems
> for
> > both - released versions (cause they stop to work!) and for development
> > (cause they break master builds in TravisCI and prevent people from
> > installing development environment from the scratch.
> >
> > *Solution:*
> >
> >    - Following the old-but-good post
> >    https://nvie.com/posts/pin-your-packages/ we are going to fix the
> > pinned
> >    dependencies to specific versions (so basically all dependencies are
> >    "fixed").
> >    - We will introduce mechanism to be able to upgrade dependencies with
> >    pip-tools (https://github.com/jazzband/pip-tools). We might also
> take a
> >    look at pipenv: https://pipenv.readthedocs.io/en/latest/
> >    - People who would like to upgrade some dependencies for their PRs
> will
> >    still be able to do it - but such upgrades will be in their PR thus
> they
> >    will go through TravisCI tests and they will also have to be specified
> > with
> >    pinned fixed versions (==). This should be part of review process to
> > make
> >    sure new/changed requirements are pinned.
> >    - In release process there will be a point where an upgrade will be
> >    attempted for all requirements (using pip-tools) so that we are not
> > stuck
> >    with older releases. This will be in controlled PR environment where
> > there
> >    will be time to fix all dependencies without impacting others and
> likely
> >    enough time to "vet" such changes (this can be done for alpha/beta
> > releases
> >    for example).
> >    - As a side effect dependencies specification will become far simpler
> >    and straightforward.
> >
> > Happy to hear community comments to the proposal. I am happy to take a
> lead
> > on that, open JIRA issue and implement if this is something community is
> > happy with.
> >
> > J.
> >
> > --
> >
> > *Jarek Potiuk, Principal Software Engineer*
> > Mobile: +48 660 796 129
> >
>

Re: Pinning dependencies for Apache Airflow

Posted by Dan Davydov <dd...@twitter.com.INVALID>.
Relevant discussion about this:
https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174

On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> TL;DR; A change is coming in the way how dependencies/requirements are
> specified for Apache Airflow - they will be fixed rather than flexible (==
> rather than >=).
>
> This is follow up after Slack discussion we had with Ash and Kaxil -
> summarising what we propose we'll do.
>
> *Problem:*
> During last few weeks we experienced quite a few downtimes of TravisCI
> builds (for all PRs/branches including master) as some of the transitive
> dependencies were automatically upgraded. This because in a number of
> dependencies we have  >= rather than == dependencies.
>
> Whenever there is a new release of such dependency, it might cause chain
> reaction with upgrade of transitive dependencies which might get into
> conflict.
>
> An example was Flask-AppBuilder vs flask-login transitive dependency with
> click. They started to conflict once AppBuilder has released version
> 1.12.0.
>
> *Diagnosis:*
> Transitive dependencies with "flexible" versions (where >= is used instead
> of ==) is a reason for "dependency hell". We will sooner or later hit other
> cases where not fixed dependencies cause similar problems with other
> transitive dependencies. We need to fix-pin them. This causes problems for
> both - released versions (cause they stop to work!) and for development
> (cause they break master builds in TravisCI and prevent people from
> installing development environment from the scratch.
>
> *Solution:*
>
>    - Following the old-but-good post
>    https://nvie.com/posts/pin-your-packages/ we are going to fix the
> pinned
>    dependencies to specific versions (so basically all dependencies are
>    "fixed").
>    - We will introduce mechanism to be able to upgrade dependencies with
>    pip-tools (https://github.com/jazzband/pip-tools). We might also take a
>    look at pipenv: https://pipenv.readthedocs.io/en/latest/
>    - People who would like to upgrade some dependencies for their PRs will
>    still be able to do it - but such upgrades will be in their PR thus they
>    will go through TravisCI tests and they will also have to be specified
> with
>    pinned fixed versions (==). This should be part of review process to
> make
>    sure new/changed requirements are pinned.
>    - In release process there will be a point where an upgrade will be
>    attempted for all requirements (using pip-tools) so that we are not
> stuck
>    with older releases. This will be in controlled PR environment where
> there
>    will be time to fix all dependencies without impacting others and likely
>    enough time to "vet" such changes (this can be done for alpha/beta
> releases
>    for example).
>    - As a side effect dependencies specification will become far simpler
>    and straightforward.
>
> Happy to hear community comments to the proposal. I am happy to take a lead
> on that, open JIRA issue and implement if this is something community is
> happy with.
>
> J.
>
> --
>
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129
>