You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Robert Nishihara (JIRA)" <ji...@apache.org> on 2017/11/15 04:53:00 UTC
[jira] [Commented] (ARROW-1792) [Plasma C++] continuous write
tensor failed
[ https://issues.apache.org/jira/browse/ARROW-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252940#comment-16252940 ]
Robert Nishihara commented on ARROW-1792:
-----------------------------------------
One natural way to express this, would be the following. Create only one plasma client, and use the higher-level client APIs. For example:
First start the store with
{code}
plasma_store -m 800000000 -s /tmp/plasma
{code}
Then continuously put objects with
{code}
import pyarrow.plasma as plasma
client = plasma.connect("/tmp/plasma", "", 0)
import numpy as np
def write_object(num_bytes):
object_id = plasma.ObjectID(np.random.bytes(20))
x = np.ones(num_bytes, dtype=np.uint8)
client.put(x)
for i in range(10):
print(i)
write_object(500000000)
{code}
This works for me (at least after https://github.com/apache/arrow/pull/1317, I haven't tried it on the master yet).
Would something like this work for you?
> [Plasma C++] continuous write tensor failed
> -------------------------------------------
>
> Key: ARROW-1792
> URL: https://issues.apache.org/jira/browse/ARROW-1792
> Project: Apache Arrow
> Issue Type: Bug
> Components: Plasma (C++)
> Environment: ubuntu 14.04 gcc 4.8.4
> Reporter: Lu Qi
> Original Estimate: 288h
> Remaining Estimate: 288h
>
> start plasma using "plasma_store -m 8000000000 -s /tmp/plasma"
> write tensor in python using
> {code:python}
> for i in range(10):
> client = plasma.connect("/tmp/plasma", "", 0)
> x = np.random.rand(1000,1000,5*256).astype("float32") # write 5 GB
> object_id = pa.plasma.ObjectID(random_object_id())
> tensor = pa.Tensor.from_numpy(x)
> data_size = pa.get_tensor_size(tensor)
> buf = client.create(object_id, data_size)
> stream = pa.FixedSizeBufferWriter(buf)
> stream.set_memcopy_threads(6)
> pa.write_tensor(tensor, stream)
> client.seal(object_id)
> # client.release(object_id)
> # client.disconnect()
> print(i)
> {code}
> The error is like below:
> pyarrow.lib.PlasmaStoreFull: object does not fit in the plasma store
> If I add "client.release(object_id)" ,the error is:
> /arrow/cpp/src/plasma/client.cc296 Check failed: object_entry != objects_in_use_.end()
> Also,sometimes error is:
> buf = client.create(object_id, data_size)
> File "pyarrow/plasma.pyx", line 301, in pyarrow.plasma.PlasmaClient.create (/arrow/python/build/temp.linux-x86_64-2.7/plasma.cxx:4382)
> File "pyarrow/error.pxi", line 79, in pyarrow.lib.check_status (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:7888)
> pyarrow.lib.ArrowIOError: Broken pipe
> After adding "client.disconnect()" it seems to work , but using the code below will fail:
> {code:python}
> client = plasma.connect("/tmp/plasma", "", 0)
> for i in range(10):
> x = np.random.rand(1000,1000,5*256).astype("float32") // write 5 GB
> object_id = pa.plasma.ObjectID(random_object_id())
> tensor = pa.Tensor.from_numpy(x)
> data_size = pa.get_tensor_size(tensor)
> buf = client.create(object_id, data_size)
> stream = pa.FixedSizeBufferWriter(buf)
> stream.set_memcopy_threads(6)
> pa.write_tensor(tensor, stream)
> client.seal(object_id)
> # client.release(object_id)
> # client.disconnect()
> print(i)
> {code}
> plus: I have input another issue about the memory evict policy [Arrow-1795]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)