You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Robert Nishihara (JIRA)" <ji...@apache.org> on 2017/11/15 04:53:00 UTC

[jira] [Commented] (ARROW-1792) [Plasma C++] continuous write tensor failed

    [ https://issues.apache.org/jira/browse/ARROW-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252940#comment-16252940 ] 

Robert Nishihara commented on ARROW-1792:
-----------------------------------------

One natural way to express this, would be the following. Create only one plasma client, and use the higher-level client APIs. For example:

First start the store with

{code}
plasma_store -m 800000000 -s /tmp/plasma
{code}

Then continuously put objects with

{code}
import pyarrow.plasma as plasma

client = plasma.connect("/tmp/plasma", "", 0)

import numpy as np

def write_object(num_bytes):
    object_id = plasma.ObjectID(np.random.bytes(20))
    x = np.ones(num_bytes, dtype=np.uint8)
    client.put(x)

for i in range(10):
    print(i)
    write_object(500000000)
{code}

This works for me (at least after https://github.com/apache/arrow/pull/1317, I haven't tried it on the master yet).

Would something like this work for you?

> [Plasma C++] continuous write tensor failed
> -------------------------------------------
>
>                 Key: ARROW-1792
>                 URL: https://issues.apache.org/jira/browse/ARROW-1792
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Plasma (C++)
>         Environment: ubuntu 14.04 gcc 4.8.4
>            Reporter: Lu Qi 
>   Original Estimate: 288h
>  Remaining Estimate: 288h
>
> start plasma using "plasma_store -m 8000000000 -s /tmp/plasma"
> write tensor in python using  
> {code:python}
> for i in range(10):
>         client = plasma.connect("/tmp/plasma", "", 0)
>         x = np.random.rand(1000,1000,5*256).astype("float32")    # write 5 GB
>         object_id = pa.plasma.ObjectID(random_object_id())
>         tensor = pa.Tensor.from_numpy(x)
>         data_size = pa.get_tensor_size(tensor)
>         buf = client.create(object_id, data_size)
>         stream = pa.FixedSizeBufferWriter(buf)
>         stream.set_memcopy_threads(6)
>         pa.write_tensor(tensor, stream)
>         client.seal(object_id)
> #        client.release(object_id)
> #        client.disconnect()
>         print(i)
> {code}
> The error is like below:
> pyarrow.lib.PlasmaStoreFull: object does not fit in the plasma store
> If I add "client.release(object_id)" ,the error is:
> /arrow/cpp/src/plasma/client.cc296 Check failed: object_entry != objects_in_use_.end()
> Also,sometimes error is:
>   buf = client.create(object_id, data_size)
>   File "pyarrow/plasma.pyx", line 301, in pyarrow.plasma.PlasmaClient.create (/arrow/python/build/temp.linux-x86_64-2.7/plasma.cxx:4382)
>   File "pyarrow/error.pxi", line 79, in pyarrow.lib.check_status (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:7888)
> pyarrow.lib.ArrowIOError: Broken pipe
> After adding "client.disconnect()" it seems to work , but using the code below will fail:
> {code:python}
> client = plasma.connect("/tmp/plasma", "", 0)
> for i in range(10):
>         x = np.random.rand(1000,1000,5*256).astype("float32")    // write 5 GB
>         object_id = pa.plasma.ObjectID(random_object_id())
>         tensor = pa.Tensor.from_numpy(x)
>         data_size = pa.get_tensor_size(tensor)
>         buf = client.create(object_id, data_size)
>         stream = pa.FixedSizeBufferWriter(buf)
>         stream.set_memcopy_threads(6)
>         pa.write_tensor(tensor, stream)
>         client.seal(object_id)
> #        client.release(object_id)
> #        client.disconnect()
>         print(i)
> {code}
> plus: I have input another issue about the memory evict policy [Arrow-1795]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)