You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/29 12:10:52 UTC

[GitHub] [arrow] tianjiqx opened a new issue, #13745: [Python] pyarrow.flight.FlightServerBase use threading instead of multiprocessing?

tianjiqx opened a new issue, #13745:
URL: https://github.com/apache/arrow/issues/13745

   I tried to use arrow flight, but when testing the performance (`do_exchange`), I found that the cpu usage can only be around 100%, maybe it is the GIL problem of python multi-threading.
   
   I didn't find anything about setting the thread pool for `FlightServerBase`, so it seems that if you want to improve performance, it seems that you can only use `multiprocessing.Pool` internally to handle the internal working logic.
   
   Do you have any other suggestions? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tianjiqx commented on issue #13745: [Python] pyarrow.flight.FlightServerBase use threading instead of multiprocessing?

Posted by GitBox <gi...@apache.org>.
tianjiqx commented on issue #13745:
URL: https://github.com/apache/arrow/issues/13745#issuecomment-1199227877

   I've searched for similar questions and maybe learned something, but anyone with more suggestions is welcome.
   - [[Python] [Flight] How does concurrency in Arrow Flight work? Where in the source code can I find it?](https://www.mail-archive.com/search?l=user@arrow.apache.org&q=subject:%22Re%5C%3A+%5C%5BPython%5C%5D+%5C%5BFlight%5C%5D+How+does+concurrency+in+Arrow+Flight+work%5C%3F+Where+in+the+source+code+can+I+find+it%5C%3F%22&o=newest)
   - #13300


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tianjiqx closed issue #13745: [Python][Flight] pyarrow.flight.FlightServerBase use threading instead of multiprocessing?

Posted by GitBox <gi...@apache.org>.
tianjiqx closed issue #13745: [Python][Flight] pyarrow.flight.FlightServerBase use threading instead of multiprocessing?
URL: https://github.com/apache/arrow/issues/13745


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tianjiqx commented on issue #13745: [Python][Flight] pyarrow.flight.FlightServerBase use threading instead of multiprocessing?

Posted by GitBox <gi...@apache.org>.
tianjiqx commented on issue #13745:
URL: https://github.com/apache/arrow/issues/13745#issuecomment-1199382200

   @lidavidm  thank you for your reply. My program still have some IO. Considering the copy cost of multi-process communication, it may be better to start more FlightServer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #13745: [Python][Flight] pyarrow.flight.FlightServerBase use threading instead of multiprocessing?

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #13745:
URL: https://github.com/apache/arrow/issues/13745#issuecomment-1200548779

   The option has to be passed through gRPC (the code above effectively does nothing). See [`generic_options`](https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightClient.html#pyarrow.flight.FlightClient) and the [gRPC documentation](https://grpc.github.io/grpc/python/glossary.html#term-channel_arguments)
   
   I'll see about adding a code snippet in the cookbook 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #13745: [Python][Flight] pyarrow.flight.FlightServerBase use threading instead of multiprocessing?

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #13745:
URL: https://github.com/apache/arrow/issues/13745#issuecomment-1199240912

   Yes, if you have heavy internal processing, your only real choice is to use multiprocessing. Depending on what you want to do, you could also set the `SO_REUSEPORT` gRPC option and spawn many processes listening to the same port. 
   
   There's no option to set the thread pool because 1) it's provided by gRPC and 2) it wouldn't help anyways, you're still limited by the GIL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tianjiqx commented on issue #13745: [Python][Flight] pyarrow.flight.FlightServerBase use threading instead of multiprocessing?

Posted by GitBox <gi...@apache.org>.
tianjiqx commented on issue #13745:
URL: https://github.com/apache/arrow/issues/13745#issuecomment-1200093916

   Probably didn't understand how to use SO_REUSEPORT. Sometimes it is possible to bind to the same port, but the created clients always only connect to the same server.
   
   ```
   @contextlib.contextmanager
   def _init_grpc_port(grpc_port):
       """Initialize grpc port for multiprocessing."""
       # Reference code.
       # https://github.com/grpc/grpc/issues/17659
   
       socket = py_socket.socket(py_socket.AF_INET6, py_socket.SOCK_STREAM)
       # in fact, py_socket.SOL_SOCKET = 65535
       # print(py_socket.SOL_SOCKET)
   
       socket.setsockopt(py_socket.SOL_SOCKET, py_socket.SO_REUSEADDR, 1)
       if socket.getsockopt(py_socket.SOL_SOCKET, py_socket.SO_REUSEADDR) == 0:
           raise RuntimeError("Failed to set SO_REUSEADDR.")
       socket.setsockopt(py_socket.SOL_SOCKET, py_socket.SO_REUSEPORT, 1)
       if socket.getsockopt(py_socket.SOL_SOCKET, py_socket.SO_REUSEPORT) == 0:
           raise RuntimeError("Failed to set SO_REUSEPORT.")
       socket.bind(('', grpc_port))
       try:
           yield socket.getsockname()[1]
       finally:
           socket.close()
   
   def main():
       with _init_grpc_port(5005) as port:
           pass
   
       for i in range(2):
           p = Process(target=server.start, args=(i, ))
           # if set time.sleep(1) then always throw "os_error":"Address already in use","syscall":"bind"
           p.start()
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #13745: [Python][Flight] pyarrow.flight.FlightServerBase use threading instead of multiprocessing?

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #13745:
URL: https://github.com/apache/arrow/issues/13745#issuecomment-1199792518

   Yeah, it does make it hard to build a CPU-intensive service in Python.
   
   Maybe the SO_REUSEPORT option can be an Arrow Cookbook recipe - is that a good solution here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org