You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/01/02 17:02:15 UTC

[GitHub] [airflow] dazza-codes edited a comment on issue #5788: [POC] multi-threading using asyncio

dazza-codes edited a comment on issue #5788: [POC] multi-threading using asyncio
URL: https://github.com/apache/airflow/pull/5788#issuecomment-570269996
 
 
   I'm not opposed to this PR/POC if it works; are there additional unit tests and functional tests to demonstrate that it works?  I don't know enough about how it performs and how it manages the event loop and how the loop manages subprocesses - it seems like subprocesses cannot yield from (or await) to return control back to the main loop from anywhere within the `command` (i.e. the command itself cannot use asyncio).  I want to understand a bit more about the Airflow task ecosystem and how asyncio/coroutines fit into an existing workflow or require an alternative workflow ecosystem (additional notes are in AIP-28).
   
   In this PR, the primary point of interest and/or confusion is in
   ```
   foo = await asyncio.create_subprocess_exec(
                   *command,
                   stdin=asyncio.subprocess.PIPE,
                   stdout=asyncio.subprocess.PIPE,
                   stderr=asyncio.subprocess.PIPE,
                   loop=self.loop
               )
   ```
   vs. using an awaitable coroutine or task as in the sequence diagram noted in
   https://docs.python.org/3.6/library/asyncio-task.html#example-chain-coroutines
   
   By using a subprocess, the `command` itself has no ability to use asyncio coroutines - is that right?  That's not entirely a bad thing, but if that's correct, then it could mean that this worker could spawn a lot of processes that are all blocking within each process.  I just wonder whether it really differs from using multiprocessing in this regard.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services