You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/12/31 23:58:15 UTC

[GitHub] [airflow] dazza-codes opened a new pull request #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]

dazza-codes opened a new pull request #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]
URL: https://github.com/apache/airflow/pull/6984
 
 
   ## WIP status
   
   - this is a first sketch that requires a lot more work to evaluate
   - this is motivated by [AIP-28](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-28%3A+Add+AsyncExecutor+option)
   
   
   - [ ] Description above provides context of the change
   - [ ] Commit message contains [\[AIRFLOW-XXXX\]](https://issues.apache.org/jira/browse/AIRFLOW-XXXX) or `[AIRFLOW-XXXX]` for document-only changes
   - [ ] Unit tests coverage for changes (not needed for documentation changes)
   - [ ] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [ ] Relevant documentation is updated including usage instructions.
   - [ ] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]
URL: https://github.com/apache/airflow/pull/6984#issuecomment-611374931
 
 
   @ashb +1 -> for me this looks like 2.1 thing. 
   
   And also I believe rather than doing it on our own - we should rather wait for Celery 5 https://stackoverflow.com/questions/39815771/how-to-combine-celery-with-asyncio -> there likely we could (again) build on top of all the hard-work Celery people put into making us use it in the first place.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]
URL: https://github.com/apache/airflow/pull/6984#issuecomment-608492870
 
 
   Hey @dazza-codes, I'm all in favour of async in general, but can I ask a pointed question:
   
   What does this give us? If I'm reading the PR right (and I've just glanced at it) this creates an async (thread) inside the scheduler that makes network requests to Dask?
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] dazza-codes commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]

Posted by GitBox <gi...@apache.org>.
dazza-codes commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]
URL: https://github.com/apache/airflow/pull/6984#issuecomment-596666053
 
 
   I'll come back to this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] dazza-codes edited a comment on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]

Posted by GitBox <gi...@apache.org>.
dazza-codes edited a comment on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]
URL: https://github.com/apache/airflow/pull/6984#issuecomment-609060255
 
 
   At some point, I hope to circle back to asyncio in Airflow.  For now, I'm working more on the low-level details in another project, https://gitlab.com/dazza-codes/aio-aws, so that I can avoid some complexity of the legacy work in Airflow and any conflicts with the larger community use-cases.  There's a lot of details in the asyncio approach that need to be resolved before I could confidently contribute something to Airflow.
   
   With regard to dask and async executors, dask already has the right backend to support async execution, so this should be low-hanging fruit.  I apologise for neglecting it for a bit.  I did explore some basic elements of an async executor in
   - https://gitlab.com/dazza-codes/python-notes/-/blob/master/notes/async_executor.py
   - https://gitlab.com/dazza-codes/python-notes/-/blob/master/tests/test_async_executor.py
   
   I'm not entirely convinced that using a thread for the loop and passing coroutines across a thread-safe bridge is the right way to go.  For instance, it's noted in https://docs.python.org/3.6/library/asyncio-dev.html#concurrency-and-multithreading that `Most asyncio objects are not thread safe.`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]
URL: https://github.com/apache/airflow/pull/6984#issuecomment-611239987
 
 
   We can probably learn a lot form what Django of done -- they've just landed sync+async via https://github.com/django/asgiref/tree/master/asgiref

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] tooptoop4 commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]
URL: https://github.com/apache/airflow/pull/6984#issuecomment-570038947
 
 
   @dazza-codes seeing as u know dask, can u help with https://issues.apache.org/jira/browse/AIRFLOW-4796 ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] dazza-codes edited a comment on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]

Posted by GitBox <gi...@apache.org>.
dazza-codes edited a comment on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]
URL: https://github.com/apache/airflow/pull/6984#issuecomment-609060255
 
 
   At some point, I hope to circle back to asyncio in Airflow.  For now, I'm working more on the low-level details in another project, https://gitlab.com/dazza-codes/aio-aws, so that I can avoid some complexity of the legacy work in Airflow and any conflicts with the larger community use-cases.  There's a lot of details in the asyncio approach that need to be resolved before I could confidently contribute something to Airflow.
   
   With regard to dask and async executors, dask already has the right backend to support async execution, so this should be low-hanging fruit.  I apologise for neglecting it for a bit.  I did explore some basic elements of an async executor in
   - https://gitlab.com/dazza-codes/python-notes/-/blob/master/notes/async_executor.py
   - https://gitlab.com/dazza-codes/python-notes/-/blob/master/tests/test_async_executor.py
   - looks like I threw some of that code into this PR
   
   I got stuck on the PR in trying to test the dask async executor.  I started to learn more about testing dask, but need to circle back on that sometime.  Other things are shouting louder for now.
   
   I'm not entirely convinced that using a thread for the loop and passing coroutines across a thread-safe bridge is the right way to go.  For instance, it's noted in https://docs.python.org/3.6/library/asyncio-dev.html#concurrency-and-multithreading that `Most asyncio objects are not thread safe.`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] dazza-codes commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]

Posted by GitBox <gi...@apache.org>.
dazza-codes commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]
URL: https://github.com/apache/airflow/pull/6984#issuecomment-609060255
 
 
   At some point, I hope to circle back to asyncio in Airflow.  For now, I'm working more on the low-level details in another project, https://gitlab.com/dazza-codes/aio-aws, so that I can avoid some complexity of the legacy work in Airflow and any conflicts with the larger community use-cases.  There's a lot of details in the asyncio approach that need to be resolved before I could confidently contribute something to Airflow.
   
   With regard to dask and async executors, dask already has the right backend to support async execution, so this should be low-hanging fruit.  I apologise for neglecting it for a bit.  I did explore some basic elements of an async executor in
   - https://gitlab.com/dazza-codes/python-notes/-/blob/master/notes/async_executor.py
   - https://gitlab.com/dazza-codes/python-notes/-/blob/master/tests/test_async_executor.py
   I'm not entirely convinced that using a thread for the loop and passing coroutines across a thread-safe bridge is the right way to go.  For instance, it's noted in https://docs.python.org/3.6/library/asyncio-dev.html#concurrency-and-multithreading that `Most asyncio objects are not thread safe.`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] stale[bot] commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on issue #6984: [WIP][AIRFLOW-6395] Add an asyncio compatible dask executor [AIP-28]
URL: https://github.com/apache/airflow/pull/6984#issuecomment-596121082
 
 
   This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services