You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Philipp Moritz (JIRA)" <ji...@apache.org> on 2017/08/25 18:44:00 UTC

[jira] [Assigned] (ARROW-1410) Plasma object store occasionally pauses for a long time

     [ https://issues.apache.org/jira/browse/ARROW-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philipp Moritz reassigned ARROW-1410:
-------------------------------------

    Assignee: Robert Nishihara

> Plasma object store occasionally pauses for a long time
> -------------------------------------------------------
>
>                 Key: ARROW-1410
>                 URL: https://issues.apache.org/jira/browse/ARROW-1410
>             Project: Apache Arrow
>          Issue Type: Improvement
>         Environment: Ubuntu 16.04
>            Reporter: Robert Nishihara
>            Assignee: Robert Nishihara
>
> The problem can be reproduced as follows. First start a plasma store with
> {code}
> plasma_store -s /tmp/s1 -m 500000000000
> {code}
> Then continuously put in objects using a script like the following.
> {code}
> import pyarrow.plasma as plasma
> import numpy as np
> client = plasma.connect('/tmp/s1', '', 0)
> for i in range(20000):
>     print(i)
>     object_id = plasma.ObjectID(np.random.bytes(20))
>     client.create(object_id, np.random.randint(0, 100000000))
>     client.seal(object_id)
> {code}
> As the loop counters are being printed, you will see long pauses. The problem is the fact that we are mmapping pages with the MAP_POPULATE flag. Though this can be used to improve performance of subsequent object creations, it isn't worth the long pauses. We may want to find a way to populate the pages in the background.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)