You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Jarek Potiuk <ja...@potiuk.com> on 2022/03/05 20:42:11 UTC

[DISCUSSION] Potenatially remove/relax Dask Executor support in Airflow

Hello everyone,

This is the second time [1] I am raising the question on the devlist (last
time the Dask team helped and I am going to reach out to them as well).

We have quite a problem with DaskExecutor in Airflow.

Previously when I raised it, all tests in Dask Executor have been marked as
"skipped" and I asked whether to remove the Dask Executor altogether. The
Dask team responded and helped to enable the tests, however since then
there was no activity in this area. We have this code in our "dask" extra -
and it limits us. For example - we cannot merge the new looker library from
Google and (what's even more important) we cannot update airflow to Python
3.10 and MacOS ARM (Due to cloudpickle limitation that prevents us from
upgrading apache-beam and numpy).

Unfortunately Dask Executor - is part of the "core" of airflow, not a
provider. So we cannot really treat it as an "optional" provider..

Because of that, we are using a very old cloudpickle version and Dasks'
distributed library.

    # Dask support is limited, we need Dask team to upgrade support for
dask if we were to continue
    # Supporting it in the future
    # TODO: upgrade libraries used or maybe deprecate and drop DASK support
    'cloudpickle>=1.4.1, <1.5.0',
    'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
`distributed`
    'distributed>=2.11.1, <2.20',


I tried to fix the tests, but there are many changes in the Dask
`distributed` library - including removal of parts of the test harness that
is used by some tests.

My proposal (and I also created a PR
https://github.com/apache/airflow/pull/22017 for that):

* remove the limitations from Dask libraries
* "skip" all the tests of Dask until they are fixed
* ask the Dask team to help with fixing those until we release 2.3.0 - if
they won't fix them we will drop support for dask executor (or at least we
will not run tests for it and mark it as "untested")
* in the latter case we might actually bring back the dependencies that
"worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
unit tests but if someone install "dask" extra it will work (but this will
also mean that some older providers will need to be installed - because
they will conflict with dask extra)

Another possibility might be to simply remove Dask support altogether or
move it to a new provider.

Let me know what you think. This one pretty much blocks the release of new
providers (we are almost ready to add Looker) but more importantly it
blocks the effort of supporting Python 3.10 and ARM M1.

I hope we can quickly make a tactical decision to merge the PR and work
with the Dask team on the next steps and make the final decision later.

J.

[1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh

Re: [DISCUSSION] Potenatially remove/relax Dask Executor support in Airflow

Posted by Jarek Potiuk <ja...@potiuk.com>.
I think we should remove it to a separate provider at the very least.
Ideally DaskExecutor should be maintained by the Dask team IMHO, so I would
be for deprecating it now and removing it in 3.0 (and offering the Dask
team to take it over).


On Tue, Mar 8, 2022 at 5:42 PM Elad Kalif <el...@apache.org> wrote:

> In the last 2 surveys we had a question of "What executor type do you use?"
> Dask was included in the Other choice and as expected few users use this.
> While we can not really rely on this survey I think it does give some
> information about usage.
>
> Do we really want to maintain core functionality for such a small number
> of users? What is the value in it?
> And also, can we remove it in a feature release? I'm not 100% sure on that.
>
> On Tue, Mar 8, 2022 at 6:09 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> FYI Thanks to Kanthi, the Dask executor back (with all tests)
>> https://github.com/apache/airflow/pull/22027
>>
>> On Sat, Mar 5, 2022 at 10:03 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> FYI. I asked the question at Dask's discourse
>>> https://dask.discourse.group/t/potential-removal-of-dask-executor-support-in-airflow/433
>>>
>>> But I personally think we can make the "tactical" approach of ours on
>>> merging "disabling" Dask tests via
>>> https://github.com/apache/airflow/pull/22017 - it should not hold us
>>> back I think.
>>>
>>> On Sat, Mar 5, 2022 at 9:42 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> This is the second time [1] I am raising the question on the devlist
>>>> (last time the Dask team helped and I am going to reach out to them as
>>>> well).
>>>>
>>>> We have quite a problem with DaskExecutor in Airflow.
>>>>
>>>> Previously when I raised it, all tests in Dask Executor have been
>>>> marked as "skipped" and I asked whether to remove the Dask Executor
>>>> altogether. The Dask team responded and helped to enable the tests, however
>>>> since then there was no activity in this area. We have this code in our
>>>> "dask" extra - and it limits us. For example - we cannot merge the new
>>>> looker library from Google and (what's even more important) we cannot
>>>> update airflow to Python 3.10 and MacOS ARM (Due to cloudpickle limitation
>>>> that prevents us from upgrading apache-beam and numpy).
>>>>
>>>> Unfortunately Dask Executor - is part of the "core" of airflow, not a
>>>> provider. So we cannot really treat it as an "optional" provider..
>>>>
>>>> Because of that, we are using a very old cloudpickle version and Dasks'
>>>> distributed library.
>>>>
>>>>     # Dask support is limited, we need Dask team to upgrade support for
>>>> dask if we were to continue
>>>>     # Supporting it in the future
>>>>     # TODO: upgrade libraries used or maybe deprecate and drop DASK
>>>> support
>>>>     'cloudpickle>=1.4.1, <1.5.0',
>>>>     'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
>>>> `distributed`
>>>>     'distributed>=2.11.1, <2.20',
>>>>
>>>>
>>>> I tried to fix the tests, but there are many changes in the Dask
>>>> `distributed` library - including removal of parts of the test harness that
>>>> is used by some tests.
>>>>
>>>> My proposal (and I also created a PR
>>>> https://github.com/apache/airflow/pull/22017 for that):
>>>>
>>>> * remove the limitations from Dask libraries
>>>> * "skip" all the tests of Dask until they are fixed
>>>> * ask the Dask team to help with fixing those until we release 2.3.0 -
>>>> if they won't fix them we will drop support for dask executor (or at least
>>>> we will not run tests for it and mark it as "untested")
>>>> * in the latter case we might actually bring back the dependencies that
>>>> "worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
>>>> unit tests but if someone install "dask" extra it will work (but this will
>>>> also mean that some older providers will need to be installed - because
>>>> they will conflict with dask extra)
>>>>
>>>> Another possibility might be to simply remove Dask support altogether
>>>> or move it to a new provider.
>>>>
>>>> Let me know what you think. This one pretty much blocks the release of
>>>> new providers (we are almost ready to add Looker) but more importantly it
>>>> blocks the effort of supporting Python 3.10 and ARM M1.
>>>>
>>>> I hope we can quickly make a tactical decision to merge the PR and work
>>>> with the Dask team on the next steps and make the final decision later.
>>>>
>>>> J.
>>>>
>>>> [1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh
>>>>
>>>

Re: [DISCUSSION] Potenatially remove/relax Dask Executor support in Airflow

Posted by Jarek Potiuk <ja...@potiuk.com>.
I think we should remove it to a separate provider at the very least.
Ideally DaskExecutor should be maintained by the Dask team IMHO, so I would
be for deprecating it now and removing it in 3.0 (and offering the Dask
team to take it over).


On Tue, Mar 8, 2022 at 5:42 PM Elad Kalif <el...@apache.org> wrote:

> In the last 2 surveys we had a question of "What executor type do you use?"
> Dask was included in the Other choice and as expected few users use this.
> While we can not really rely on this survey I think it does give some
> information about usage.
>
> Do we really want to maintain core functionality for such a small number
> of users? What is the value in it?
> And also, can we remove it in a feature release? I'm not 100% sure on that.
>
> On Tue, Mar 8, 2022 at 6:09 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> FYI Thanks to Kanthi, the Dask executor back (with all tests)
>> https://github.com/apache/airflow/pull/22027
>>
>> On Sat, Mar 5, 2022 at 10:03 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> FYI. I asked the question at Dask's discourse
>>> https://dask.discourse.group/t/potential-removal-of-dask-executor-support-in-airflow/433
>>>
>>> But I personally think we can make the "tactical" approach of ours on
>>> merging "disabling" Dask tests via
>>> https://github.com/apache/airflow/pull/22017 - it should not hold us
>>> back I think.
>>>
>>> On Sat, Mar 5, 2022 at 9:42 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> This is the second time [1] I am raising the question on the devlist
>>>> (last time the Dask team helped and I am going to reach out to them as
>>>> well).
>>>>
>>>> We have quite a problem with DaskExecutor in Airflow.
>>>>
>>>> Previously when I raised it, all tests in Dask Executor have been
>>>> marked as "skipped" and I asked whether to remove the Dask Executor
>>>> altogether. The Dask team responded and helped to enable the tests, however
>>>> since then there was no activity in this area. We have this code in our
>>>> "dask" extra - and it limits us. For example - we cannot merge the new
>>>> looker library from Google and (what's even more important) we cannot
>>>> update airflow to Python 3.10 and MacOS ARM (Due to cloudpickle limitation
>>>> that prevents us from upgrading apache-beam and numpy).
>>>>
>>>> Unfortunately Dask Executor - is part of the "core" of airflow, not a
>>>> provider. So we cannot really treat it as an "optional" provider..
>>>>
>>>> Because of that, we are using a very old cloudpickle version and Dasks'
>>>> distributed library.
>>>>
>>>>     # Dask support is limited, we need Dask team to upgrade support for
>>>> dask if we were to continue
>>>>     # Supporting it in the future
>>>>     # TODO: upgrade libraries used or maybe deprecate and drop DASK
>>>> support
>>>>     'cloudpickle>=1.4.1, <1.5.0',
>>>>     'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
>>>> `distributed`
>>>>     'distributed>=2.11.1, <2.20',
>>>>
>>>>
>>>> I tried to fix the tests, but there are many changes in the Dask
>>>> `distributed` library - including removal of parts of the test harness that
>>>> is used by some tests.
>>>>
>>>> My proposal (and I also created a PR
>>>> https://github.com/apache/airflow/pull/22017 for that):
>>>>
>>>> * remove the limitations from Dask libraries
>>>> * "skip" all the tests of Dask until they are fixed
>>>> * ask the Dask team to help with fixing those until we release 2.3.0 -
>>>> if they won't fix them we will drop support for dask executor (or at least
>>>> we will not run tests for it and mark it as "untested")
>>>> * in the latter case we might actually bring back the dependencies that
>>>> "worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
>>>> unit tests but if someone install "dask" extra it will work (but this will
>>>> also mean that some older providers will need to be installed - because
>>>> they will conflict with dask extra)
>>>>
>>>> Another possibility might be to simply remove Dask support altogether
>>>> or move it to a new provider.
>>>>
>>>> Let me know what you think. This one pretty much blocks the release of
>>>> new providers (we are almost ready to add Looker) but more importantly it
>>>> blocks the effort of supporting Python 3.10 and ARM M1.
>>>>
>>>> I hope we can quickly make a tactical decision to merge the PR and work
>>>> with the Dask team on the next steps and make the final decision later.
>>>>
>>>> J.
>>>>
>>>> [1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh
>>>>
>>>

Re: [DISCUSSION] Potenatially remove/relax Dask Executor support in Airflow

Posted by Elad Kalif <el...@apache.org>.
In the last 2 surveys we had a question of "What executor type do you use?"
Dask was included in the Other choice and as expected few users use this.
While we can not really rely on this survey I think it does give some
information about usage.

Do we really want to maintain core functionality for such a small number of
users? What is the value in it?
And also, can we remove it in a feature release? I'm not 100% sure on that.

On Tue, Mar 8, 2022 at 6:09 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> FYI Thanks to Kanthi, the Dask executor back (with all tests)
> https://github.com/apache/airflow/pull/22027
>
> On Sat, Mar 5, 2022 at 10:03 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> FYI. I asked the question at Dask's discourse
>> https://dask.discourse.group/t/potential-removal-of-dask-executor-support-in-airflow/433
>>
>> But I personally think we can make the "tactical" approach of ours on
>> merging "disabling" Dask tests via
>> https://github.com/apache/airflow/pull/22017 - it should not hold us
>> back I think.
>>
>> On Sat, Mar 5, 2022 at 9:42 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Hello everyone,
>>>
>>> This is the second time [1] I am raising the question on the devlist
>>> (last time the Dask team helped and I am going to reach out to them as
>>> well).
>>>
>>> We have quite a problem with DaskExecutor in Airflow.
>>>
>>> Previously when I raised it, all tests in Dask Executor have been marked
>>> as "skipped" and I asked whether to remove the Dask Executor altogether.
>>> The Dask team responded and helped to enable the tests, however since then
>>> there was no activity in this area. We have this code in our "dask" extra -
>>> and it limits us. For example - we cannot merge the new looker library from
>>> Google and (what's even more important) we cannot update airflow to Python
>>> 3.10 and MacOS ARM (Due to cloudpickle limitation that prevents us from
>>> upgrading apache-beam and numpy).
>>>
>>> Unfortunately Dask Executor - is part of the "core" of airflow, not a
>>> provider. So we cannot really treat it as an "optional" provider..
>>>
>>> Because of that, we are using a very old cloudpickle version and Dasks'
>>> distributed library.
>>>
>>>     # Dask support is limited, we need Dask team to upgrade support for
>>> dask if we were to continue
>>>     # Supporting it in the future
>>>     # TODO: upgrade libraries used or maybe deprecate and drop DASK
>>> support
>>>     'cloudpickle>=1.4.1, <1.5.0',
>>>     'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
>>> `distributed`
>>>     'distributed>=2.11.1, <2.20',
>>>
>>>
>>> I tried to fix the tests, but there are many changes in the Dask
>>> `distributed` library - including removal of parts of the test harness that
>>> is used by some tests.
>>>
>>> My proposal (and I also created a PR
>>> https://github.com/apache/airflow/pull/22017 for that):
>>>
>>> * remove the limitations from Dask libraries
>>> * "skip" all the tests of Dask until they are fixed
>>> * ask the Dask team to help with fixing those until we release 2.3.0 -
>>> if they won't fix them we will drop support for dask executor (or at least
>>> we will not run tests for it and mark it as "untested")
>>> * in the latter case we might actually bring back the dependencies that
>>> "worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
>>> unit tests but if someone install "dask" extra it will work (but this will
>>> also mean that some older providers will need to be installed - because
>>> they will conflict with dask extra)
>>>
>>> Another possibility might be to simply remove Dask support altogether or
>>> move it to a new provider.
>>>
>>> Let me know what you think. This one pretty much blocks the release of
>>> new providers (we are almost ready to add Looker) but more importantly it
>>> blocks the effort of supporting Python 3.10 and ARM M1.
>>>
>>> I hope we can quickly make a tactical decision to merge the PR and work
>>> with the Dask team on the next steps and make the final decision later.
>>>
>>> J.
>>>
>>> [1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh
>>>
>>

Re: [DISCUSSION] Potenatially remove/relax Dask Executor support in Airflow

Posted by Elad Kalif <el...@apache.org>.
In the last 2 surveys we had a question of "What executor type do you use?"
Dask was included in the Other choice and as expected few users use this.
While we can not really rely on this survey I think it does give some
information about usage.

Do we really want to maintain core functionality for such a small number of
users? What is the value in it?
And also, can we remove it in a feature release? I'm not 100% sure on that.

On Tue, Mar 8, 2022 at 6:09 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> FYI Thanks to Kanthi, the Dask executor back (with all tests)
> https://github.com/apache/airflow/pull/22027
>
> On Sat, Mar 5, 2022 at 10:03 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> FYI. I asked the question at Dask's discourse
>> https://dask.discourse.group/t/potential-removal-of-dask-executor-support-in-airflow/433
>>
>> But I personally think we can make the "tactical" approach of ours on
>> merging "disabling" Dask tests via
>> https://github.com/apache/airflow/pull/22017 - it should not hold us
>> back I think.
>>
>> On Sat, Mar 5, 2022 at 9:42 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Hello everyone,
>>>
>>> This is the second time [1] I am raising the question on the devlist
>>> (last time the Dask team helped and I am going to reach out to them as
>>> well).
>>>
>>> We have quite a problem with DaskExecutor in Airflow.
>>>
>>> Previously when I raised it, all tests in Dask Executor have been marked
>>> as "skipped" and I asked whether to remove the Dask Executor altogether.
>>> The Dask team responded and helped to enable the tests, however since then
>>> there was no activity in this area. We have this code in our "dask" extra -
>>> and it limits us. For example - we cannot merge the new looker library from
>>> Google and (what's even more important) we cannot update airflow to Python
>>> 3.10 and MacOS ARM (Due to cloudpickle limitation that prevents us from
>>> upgrading apache-beam and numpy).
>>>
>>> Unfortunately Dask Executor - is part of the "core" of airflow, not a
>>> provider. So we cannot really treat it as an "optional" provider..
>>>
>>> Because of that, we are using a very old cloudpickle version and Dasks'
>>> distributed library.
>>>
>>>     # Dask support is limited, we need Dask team to upgrade support for
>>> dask if we were to continue
>>>     # Supporting it in the future
>>>     # TODO: upgrade libraries used or maybe deprecate and drop DASK
>>> support
>>>     'cloudpickle>=1.4.1, <1.5.0',
>>>     'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
>>> `distributed`
>>>     'distributed>=2.11.1, <2.20',
>>>
>>>
>>> I tried to fix the tests, but there are many changes in the Dask
>>> `distributed` library - including removal of parts of the test harness that
>>> is used by some tests.
>>>
>>> My proposal (and I also created a PR
>>> https://github.com/apache/airflow/pull/22017 for that):
>>>
>>> * remove the limitations from Dask libraries
>>> * "skip" all the tests of Dask until they are fixed
>>> * ask the Dask team to help with fixing those until we release 2.3.0 -
>>> if they won't fix them we will drop support for dask executor (or at least
>>> we will not run tests for it and mark it as "untested")
>>> * in the latter case we might actually bring back the dependencies that
>>> "worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
>>> unit tests but if someone install "dask" extra it will work (but this will
>>> also mean that some older providers will need to be installed - because
>>> they will conflict with dask extra)
>>>
>>> Another possibility might be to simply remove Dask support altogether or
>>> move it to a new provider.
>>>
>>> Let me know what you think. This one pretty much blocks the release of
>>> new providers (we are almost ready to add Looker) but more importantly it
>>> blocks the effort of supporting Python 3.10 and ARM M1.
>>>
>>> I hope we can quickly make a tactical decision to merge the PR and work
>>> with the Dask team on the next steps and make the final decision later.
>>>
>>> J.
>>>
>>> [1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh
>>>
>>

Re: [DISCUSSION] Potenatially remove/relax Dask Executor support in Airflow

Posted by Jarek Potiuk <ja...@potiuk.com>.
FYI Thanks to Kanthi, the Dask executor back (with all tests)
https://github.com/apache/airflow/pull/22027

On Sat, Mar 5, 2022 at 10:03 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> FYI. I asked the question at Dask's discourse
> https://dask.discourse.group/t/potential-removal-of-dask-executor-support-in-airflow/433
>
> But I personally think we can make the "tactical" approach of ours on
> merging "disabling" Dask tests via
> https://github.com/apache/airflow/pull/22017 - it should not hold us
> back I think.
>
> On Sat, Mar 5, 2022 at 9:42 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Hello everyone,
>>
>> This is the second time [1] I am raising the question on the devlist
>> (last time the Dask team helped and I am going to reach out to them as
>> well).
>>
>> We have quite a problem with DaskExecutor in Airflow.
>>
>> Previously when I raised it, all tests in Dask Executor have been marked
>> as "skipped" and I asked whether to remove the Dask Executor altogether.
>> The Dask team responded and helped to enable the tests, however since then
>> there was no activity in this area. We have this code in our "dask" extra -
>> and it limits us. For example - we cannot merge the new looker library from
>> Google and (what's even more important) we cannot update airflow to Python
>> 3.10 and MacOS ARM (Due to cloudpickle limitation that prevents us from
>> upgrading apache-beam and numpy).
>>
>> Unfortunately Dask Executor - is part of the "core" of airflow, not a
>> provider. So we cannot really treat it as an "optional" provider..
>>
>> Because of that, we are using a very old cloudpickle version and Dasks'
>> distributed library.
>>
>>     # Dask support is limited, we need Dask team to upgrade support for
>> dask if we were to continue
>>     # Supporting it in the future
>>     # TODO: upgrade libraries used or maybe deprecate and drop DASK
>> support
>>     'cloudpickle>=1.4.1, <1.5.0',
>>     'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
>> `distributed`
>>     'distributed>=2.11.1, <2.20',
>>
>>
>> I tried to fix the tests, but there are many changes in the Dask
>> `distributed` library - including removal of parts of the test harness that
>> is used by some tests.
>>
>> My proposal (and I also created a PR
>> https://github.com/apache/airflow/pull/22017 for that):
>>
>> * remove the limitations from Dask libraries
>> * "skip" all the tests of Dask until they are fixed
>> * ask the Dask team to help with fixing those until we release 2.3.0 - if
>> they won't fix them we will drop support for dask executor (or at least we
>> will not run tests for it and mark it as "untested")
>> * in the latter case we might actually bring back the dependencies that
>> "worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
>> unit tests but if someone install "dask" extra it will work (but this will
>> also mean that some older providers will need to be installed - because
>> they will conflict with dask extra)
>>
>> Another possibility might be to simply remove Dask support altogether or
>> move it to a new provider.
>>
>> Let me know what you think. This one pretty much blocks the release of
>> new providers (we are almost ready to add Looker) but more importantly it
>> blocks the effort of supporting Python 3.10 and ARM M1.
>>
>> I hope we can quickly make a tactical decision to merge the PR and work
>> with the Dask team on the next steps and make the final decision later.
>>
>> J.
>>
>> [1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh
>>
>

Re: [DISCUSSION] Potenatially remove/relax Dask Executor support in Airflow

Posted by Jarek Potiuk <ja...@potiuk.com>.
FYI Thanks to Kanthi, the Dask executor back (with all tests)
https://github.com/apache/airflow/pull/22027

On Sat, Mar 5, 2022 at 10:03 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> FYI. I asked the question at Dask's discourse
> https://dask.discourse.group/t/potential-removal-of-dask-executor-support-in-airflow/433
>
> But I personally think we can make the "tactical" approach of ours on
> merging "disabling" Dask tests via
> https://github.com/apache/airflow/pull/22017 - it should not hold us
> back I think.
>
> On Sat, Mar 5, 2022 at 9:42 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Hello everyone,
>>
>> This is the second time [1] I am raising the question on the devlist
>> (last time the Dask team helped and I am going to reach out to them as
>> well).
>>
>> We have quite a problem with DaskExecutor in Airflow.
>>
>> Previously when I raised it, all tests in Dask Executor have been marked
>> as "skipped" and I asked whether to remove the Dask Executor altogether.
>> The Dask team responded and helped to enable the tests, however since then
>> there was no activity in this area. We have this code in our "dask" extra -
>> and it limits us. For example - we cannot merge the new looker library from
>> Google and (what's even more important) we cannot update airflow to Python
>> 3.10 and MacOS ARM (Due to cloudpickle limitation that prevents us from
>> upgrading apache-beam and numpy).
>>
>> Unfortunately Dask Executor - is part of the "core" of airflow, not a
>> provider. So we cannot really treat it as an "optional" provider..
>>
>> Because of that, we are using a very old cloudpickle version and Dasks'
>> distributed library.
>>
>>     # Dask support is limited, we need Dask team to upgrade support for
>> dask if we were to continue
>>     # Supporting it in the future
>>     # TODO: upgrade libraries used or maybe deprecate and drop DASK
>> support
>>     'cloudpickle>=1.4.1, <1.5.0',
>>     'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
>> `distributed`
>>     'distributed>=2.11.1, <2.20',
>>
>>
>> I tried to fix the tests, but there are many changes in the Dask
>> `distributed` library - including removal of parts of the test harness that
>> is used by some tests.
>>
>> My proposal (and I also created a PR
>> https://github.com/apache/airflow/pull/22017 for that):
>>
>> * remove the limitations from Dask libraries
>> * "skip" all the tests of Dask until they are fixed
>> * ask the Dask team to help with fixing those until we release 2.3.0 - if
>> they won't fix them we will drop support for dask executor (or at least we
>> will not run tests for it and mark it as "untested")
>> * in the latter case we might actually bring back the dependencies that
>> "worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
>> unit tests but if someone install "dask" extra it will work (but this will
>> also mean that some older providers will need to be installed - because
>> they will conflict with dask extra)
>>
>> Another possibility might be to simply remove Dask support altogether or
>> move it to a new provider.
>>
>> Let me know what you think. This one pretty much blocks the release of
>> new providers (we are almost ready to add Looker) but more importantly it
>> blocks the effort of supporting Python 3.10 and ARM M1.
>>
>> I hope we can quickly make a tactical decision to merge the PR and work
>> with the Dask team on the next steps and make the final decision later.
>>
>> J.
>>
>> [1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh
>>
>

Re: [DISCUSSION] Potenatially remove/relax Dask Executor support in Airflow

Posted by Jarek Potiuk <ja...@potiuk.com>.
FYI. I asked the question at Dask's discourse
https://dask.discourse.group/t/potential-removal-of-dask-executor-support-in-airflow/433

But I personally think we can make the "tactical" approach of ours on
merging "disabling" Dask tests via
https://github.com/apache/airflow/pull/22017 - it should not hold us back I
think.

On Sat, Mar 5, 2022 at 9:42 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Hello everyone,
>
> This is the second time [1] I am raising the question on the devlist (last
> time the Dask team helped and I am going to reach out to them as well).
>
> We have quite a problem with DaskExecutor in Airflow.
>
> Previously when I raised it, all tests in Dask Executor have been marked
> as "skipped" and I asked whether to remove the Dask Executor altogether.
> The Dask team responded and helped to enable the tests, however since then
> there was no activity in this area. We have this code in our "dask" extra -
> and it limits us. For example - we cannot merge the new looker library from
> Google and (what's even more important) we cannot update airflow to Python
> 3.10 and MacOS ARM (Due to cloudpickle limitation that prevents us from
> upgrading apache-beam and numpy).
>
> Unfortunately Dask Executor - is part of the "core" of airflow, not a
> provider. So we cannot really treat it as an "optional" provider..
>
> Because of that, we are using a very old cloudpickle version and Dasks'
> distributed library.
>
>     # Dask support is limited, we need Dask team to upgrade support for
> dask if we were to continue
>     # Supporting it in the future
>     # TODO: upgrade libraries used or maybe deprecate and drop DASK support
>     'cloudpickle>=1.4.1, <1.5.0',
>     'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
> `distributed`
>     'distributed>=2.11.1, <2.20',
>
>
> I tried to fix the tests, but there are many changes in the Dask
> `distributed` library - including removal of parts of the test harness that
> is used by some tests.
>
> My proposal (and I also created a PR
> https://github.com/apache/airflow/pull/22017 for that):
>
> * remove the limitations from Dask libraries
> * "skip" all the tests of Dask until they are fixed
> * ask the Dask team to help with fixing those until we release 2.3.0 - if
> they won't fix them we will drop support for dask executor (or at least we
> will not run tests for it and mark it as "untested")
> * in the latter case we might actually bring back the dependencies that
> "worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
> unit tests but if someone install "dask" extra it will work (but this will
> also mean that some older providers will need to be installed - because
> they will conflict with dask extra)
>
> Another possibility might be to simply remove Dask support altogether or
> move it to a new provider.
>
> Let me know what you think. This one pretty much blocks the release of new
> providers (we are almost ready to add Looker) but more importantly it
> blocks the effort of supporting Python 3.10 and ARM M1.
>
> I hope we can quickly make a tactical decision to merge the PR and work
> with the Dask team on the next steps and make the final decision later.
>
> J.
>
> [1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh
>

Re: [DISCUSSION] Potenatially remove/relax Dask Executor support in Airflow

Posted by Jarek Potiuk <ja...@potiuk.com>.
FYI. I asked the question at Dask's discourse
https://dask.discourse.group/t/potential-removal-of-dask-executor-support-in-airflow/433

But I personally think we can make the "tactical" approach of ours on
merging "disabling" Dask tests via
https://github.com/apache/airflow/pull/22017 - it should not hold us back I
think.

On Sat, Mar 5, 2022 at 9:42 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Hello everyone,
>
> This is the second time [1] I am raising the question on the devlist (last
> time the Dask team helped and I am going to reach out to them as well).
>
> We have quite a problem with DaskExecutor in Airflow.
>
> Previously when I raised it, all tests in Dask Executor have been marked
> as "skipped" and I asked whether to remove the Dask Executor altogether.
> The Dask team responded and helped to enable the tests, however since then
> there was no activity in this area. We have this code in our "dask" extra -
> and it limits us. For example - we cannot merge the new looker library from
> Google and (what's even more important) we cannot update airflow to Python
> 3.10 and MacOS ARM (Due to cloudpickle limitation that prevents us from
> upgrading apache-beam and numpy).
>
> Unfortunately Dask Executor - is part of the "core" of airflow, not a
> provider. So we cannot really treat it as an "optional" provider..
>
> Because of that, we are using a very old cloudpickle version and Dasks'
> distributed library.
>
>     # Dask support is limited, we need Dask team to upgrade support for
> dask if we were to continue
>     # Supporting it in the future
>     # TODO: upgrade libraries used or maybe deprecate and drop DASK support
>     'cloudpickle>=1.4.1, <1.5.0',
>     'dask>=2.9.0, <2021.6.1',  # dask 2021.6.1 does not work with
> `distributed`
>     'distributed>=2.11.1, <2.20',
>
>
> I tried to fix the tests, but there are many changes in the Dask
> `distributed` library - including removal of parts of the test harness that
> is used by some tests.
>
> My proposal (and I also created a PR
> https://github.com/apache/airflow/pull/22017 for that):
>
> * remove the limitations from Dask libraries
> * "skip" all the tests of Dask until they are fixed
> * ask the Dask team to help with fixing those until we release 2.3.0 - if
> they won't fix them we will drop support for dask executor (or at least we
> will not run tests for it and mark it as "untested")
> * in the latter case we might actually bring back the dependencies that
> "worked" for "dask" extra in Airflow 2.3.0 - they will not be tested in our
> unit tests but if someone install "dask" extra it will work (but this will
> also mean that some older providers will need to be installed - because
> they will conflict with dask extra)
>
> Another possibility might be to simply remove Dask support altogether or
> move it to a new provider.
>
> Let me know what you think. This one pretty much blocks the release of new
> providers (we are almost ready to add Looker) but more importantly it
> blocks the effort of supporting Python 3.10 and ARM M1.
>
> I hope we can quickly make a tactical decision to merge the PR and work
> with the Dask team on the next steps and make the final decision later.
>
> J.
>
> [1] https://lists.apache.org/thread/875fpgb7vfpmtxrmt19jmo8d3p6mgqnh
>