You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Lubo Slivka (Jira)" <ji...@apache.org> on 2022/06/01 11:16:00 UTC
[jira] [Commented] (ARROW-16697) [FlightRPC][Python] Server seems to leak memory during DoPut

    [ https://issues.apache.org/jira/browse/ARROW-16697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544820#comment-17544820 ] 

Lubo Slivka commented on ARROW-16697:
-------------------------------------

Hello,

I'm doing some more research. Out of curiosity, I have also tried the DoGet. Server holds single table and multiple clients (separate threads in the same process) run repeated DoGet, each with its own FlightClient that stays connected. They throw batches that they read away.

In this scenario, the server memory footprint stays constant.

On the receiver side of things, the memory usage keeps on growing (although not as rapidly as on the server during DoPut). What is interesting, the memory footprint stays nearly the same even after all the clients get closed.

--L

> [FlightRPC][Python] Server seems to leak memory during DoPut
> ------------------------------------------------------------
>
>                 Key: ARROW-16697
>                 URL: https://issues.apache.org/jira/browse/ARROW-16697
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Lubo Slivka
>            Assignee: David Li
>            Priority: Major
>         Attachments: leak_repro_client.py, leak_repro_server.py, sample.csv.gz
>
>
> Hello,
> We are stress testing our Flight RPC server (PyArrow 8.0.0) with write-heavy workloads and are running into what appear to be memory leaks.
> The server is under pressure by a number of separate clients doing DoPut. What we are seeing is that server's memory usage only ever goes up until the server finally gets whacked by k8s due to hitting memory limit.
> I have spent many hours fishing through our code for memory leaks with no success. Even short-circuiting all our custom DoPut handling logic does not alleviate the situation. This led me to create a reproducer that uses nothing but PyArrow and I see the server process memory only increasing similar to what we see on our servers.
> The reproducer is in attachments + I included the test CSV file (20MB) that I use for my tests. Few notes:
>  * The client code has multiple threads, each emulating a separate Flight Client
>  * There are two variants where I see slightly different memory usage characteristic:
>  ** _do_put_with_client_reuse << one client opened at start of thread, then hammering many puts, finally closing the client; leaks appear to happen faster in this variant
>  ** _do_put_with_client_per_request << client opens & connects, does put, then disconnects; loop like this many times; leaks appear to happen slower in this variant if there are less concurrent clients; increasing number of threads 'helps'
>  * The server code handling do_put reads batch-by-batch & does nothing with the chunks
> Also one interesting (but highly likely unrelated thing) that I keep noticing is that _sometimes_ FlightClient takes long time to close (like 5seconds). It happens intermittently.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)