You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by ALBERTO Bocchinfuso <al...@hotmail.it> on 2018/02/05 18:09:05 UTC

[Python] Retrieving a RecordBatch from plasma inside a function

Good morning,

I am experiencing problems with the RecordBatches stored in plasma in a particular situation.

If I return a RecordBatch as result of a python function, I am able to read just the metadata, while I get an error when reading the columns.

For example, the following code
def retrieve1():
        client = plasma.connect('test', "", 0)

        key = "keynumber1keynumber1"
        pid = plasma.ObjectID(bytearray(key,'UTF-8'))

        [buff] = client .get_buffers([pid])
        batch = pa.RecordBatchStreamReader(buff).read_next_batch()
        return batch

batch = retrieve1()
print(batch)
print(batch.schema)
print(batch[0])

Represents a simple python code in which a function is in charge of retrieving the RecordBatch from the plasma store, and then returns it to the caller. Running the previous example I get:
<pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
FIELD1: int32
metadata
--------
{}
<pyarrow.lib.Int32Array object at 0x7fd0ebfc0f98>
[
  1,
  12,
  23,
  3,
  21,
  34
]
<pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
FIELD1: int32
metadata
--------
{}
Errore di segmentazione (core dump creato)


If I retrieve and use the data in the same part of the code (as I do in the function retrieve1(), but it also works when I put everything in the main program.) everything runs without problems.

Also the problem seems to be related to the particular case in which I retrieve the RecordBatch from the plasma store, since the following (simpler) code:
def create():
        test1 = [1, 12, 23, 3, 21, 34]
        test1 = pa.array(test1, pa.int32())

        batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
        print(batch)
        print(batch.schema)
        print(batch[0])
        return batch

batch1 = create()
print(batch1)
print(batch1.schema)
print(batch1[0])

Prints:

<pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
FIELD1: int32
<pyarrow.lib.Int32Array object at 0x7f5f691fd9a8>
[
  1,
  12,
  23,
  3,
  21,
  34
]
<pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
FIELD1: int32
<pyarrow.lib.Int32Array object at 0x7f5f7e29f318>
[
  1,
  12,
  23,
  3,
  21,
  34
]

Which is what I expect.

Is this issue known or am I doing something wrong when retrieving the RecordBatch from plasma?

Also I would like to pinpoint the fact that this problem was as easy to find as hard to re-create. For this reason, there can be other situations in which the same problem arises that I did not experienced, since I mostly deal with plasma and I’ve been using only python so long: the description I gave is not intended to be complete.

Thank you,
Alberto

Re: [Python] Retrieving a RecordBatch from plasma inside a function

Posted by Philipp Moritz <pc...@gmail.com>.

I created one here: https://issues.apache.org/jira/browse/ARROW-2195

On Wed, Feb 21, 2018 at 8:11 AM, Wes McKinney <we...@gmail.com> wrote:

> Can we create a JIRA to track this issue?
>
> On Wed, Feb 21, 2018 at 5:04 AM, ALBERTO Bocchinfuso
> <al...@hotmail.it> wrote:
> > Hi,
> >
> > Have you had any news on this issue?
> > Do you plan to solve it for the next releases of Arrow, or is there any
> way to avoid the problem?
> >
> > Thanks in advance,
> > Alberto
> > Da: Philipp Moritz<ma...@gmail.com>
> > Inviato: venerdì 9 febbraio 2018 00:30
> > A: dev@arrow.apache.org<ma...@arrow.apache.org>
> > Oggetto: Re: [Python] Retrieving a RecordBatch from plasma inside a
> function
> >
> > Thanks! I can indeed reproduce this problem. I'm a bit busy right now and
> > plan to look into it on the weekend.
> >
> > Here is the preliminary backtrace for everybody interested:
> >
> > CESS (code=1, address=0x111138158)
> >
> >     frame #0: 0x000000010e6457fc
> > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) +
> 28
> >
> > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py:
> >
> > ->  0x10e6457fc <+28>: movslq (%rdx,%rcx,4), %rdi
> >
> >     0x10e645800 <+32>: callq  0x10e698170               ; symbol stub
> for:
> > PyInt_FromLong
> >
> >     0x10e645805 <+37>: testq  %rax, %rax
> >
> >     0x10e645808 <+40>: je     0x10e64580c               ; <+44>
> >
> > (lldb) bt
> >
> > * thread #1: tid = 0xf1378e, 0x000000010e6457fc
> > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) +
> 28,
> > queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1,
> > address=0x111138158)
> >
> >   * frame #0: 0x000000010e6457fc
> > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) +
> 28
> >
> >     frame #1: 0x000000010e5ccd35 lib.so`__Pyx_PyObject_
> CallNoArg(_object*)
> > + 133
> >
> >     frame #2: 0x000000010e613b25
> > lib.so`__pyx_pw_7pyarrow_3lib_10ArrayValue_3__repr__(_object*) + 933
> >
> >     frame #3: 0x000000010c2f83bc libpython2.7.dylib`PyObject_Repr + 60
> >
> >     frame #4: 0x000000010c35f651 libpython2.7.dylib`PyEval_EvalFrameEx +
> > 22305
> >
> > On Tue, Feb 6, 2018 at 1:24 AM, ALBERTO Bocchinfuso <
> > alberto_boc_94@hotmail.it> wrote:
> >
> >> Hi,
> >>
> >> I’m using python 3.5.2 and pyarrow 0.8.0
> >>
> >> As key, I put a string of 20 bytes, of course. I’m doing it differently
> >> from the canonical way since I’m no more using python 2.7, but python 3,
> >> and this seemed to me to be the right way to create a string of 20
> bytes.
> >> The full code is:
> >>
> >> import pyarrow as pa
> >> import pyarrow.plasma as plasma
> >>
> >> def retrieve1():
> >>              client = plasma.connect('test', "", 0)
> >>
> >>              key = "keynumber1keynumber1"
> >>              pid = plasma.ObjectID(bytearray(key,'UTF-8'))
> >>
> >>              [buff] = client .get_buffers([pid])
> >>              batch = pa.RecordBatchStreamReader(buff).read_next_batch()
> >>
> >>              print(batch)
> >>              print(batch.schema)
> >>              print(batch[0])
> >>
> >>              return batch
> >>
> >> client = plasma.connect('test', "", 0)
> >>
> >> test1 = [1, 12, 23, 3, 21, 34]
> >> test1 = pa.array(test1, pa.int32())
> >>
> >> batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
> >>
> >> key = "keynumber1keynumber1"
> >> pid = plasma.ObjectID(bytearray(key,'UTF-8'))
> >> sink = pa.MockOutputStream()
> >> stream_writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> >> stream_writer.write_batch(batch)
> >> stream_writer.close()
> >>
> >> bff = client.create(pid, sink.size())
> >>
> >> stream = pa.FixedSizeBufferWriter(bff)
> >> writer = pa.RecordBatchStreamWriter(stream, batch.schema)
> >> writer.write_batch(batch)
> >> client.seal(pid)
> >>
> >> batch = retrieve1()
> >> print(batch)
> >> print(batch.schema)
> >> print(batch[0])
> >>
> >> I hope this helps,
> >> thank you
> >>
> >> Da: Philipp Moritz<ma...@gmail.com>
> >> Inviato: martedì 6 febbraio 2018 00:00
> >> A: dev@arrow.apache.org<ma...@arrow.apache.org>
> >> Oggetto: Re: [Python] Retrieving a RecordBatch from plasma inside a
> >> function
> >>
> >> Hey Alberto,
> >>
> >> Thanks for your message! I'm trying to reproduce it.
> >>
> >> Can you attach the code you use to write the batch into the store?
> >>
> >> Also can you say which version of Python and Arrow you are using? On my
> >> installation, I get
> >>
> >> ```
> >>
> >> In [*5*]: plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))
> >>
> >> ------------------------------------------------------------
> >> ---------------
> >>
> >> ValueError                                Traceback (most recent call
> last)
> >>
> >> <ipython-input-5-fbec5bb33c33> in <module>()
> >>
> >> ----> 1 plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))
> >>
> >>
> >> plasma.pyx in pyarrow.plasma.ObjectID.__cinit__()
> >>
> >>
> >> ValueError: Object ID must by 20 bytes, is keynumber1keynumber1
> >> ```
> >>
> >> (the canonical way to do this would be plasma.ObjectID(b
> >> "keynumber1keynumber1"))
> >>
> >> Best,
> >> Philipp.
> >>
> >> On Mon, Feb 5, 2018 at 10:09 AM, ALBERTO Bocchinfuso <
> >> alberto_boc_94@hotmail.it> wrote:
> >>
> >> > Good morning,
> >> >
> >> > I am experiencing problems with the RecordBatches stored in plasma in
> a
> >> > particular situation.
> >> >
> >> > If I return a RecordBatch as result of a python function, I am able to
> >> > read just the metadata, while I get an error when reading the columns.
> >> >
> >> > For example, the following code
> >> > def retrieve1():
> >> >         client = plasma.connect('test', "", 0)
> >> >
> >> >         key = "keynumber1keynumber1"
> >> >         pid = plasma.ObjectID(bytearray(key,'UTF-8'))
> >> >
> >> >         [buff] = client .get_buffers([pid])
> >> >         batch = pa.RecordBatchStreamReader(buff).read_next_batch()
> >> >         return batch
> >> >
> >> > batch = retrieve1()
> >> > print(batch)
> >> > print(batch.schema)
> >> > print(batch[0])
> >> >
> >> > Represents a simple python code in which a function is in charge of
> >> > retrieving the RecordBatch from the plasma store, and then returns it
> to
> >> > the caller. Running the previous example I get:
> >> > <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
> >> > FIELD1: int32
> >> > metadata
> >> > --------
> >> > {}
> >> > <pyarrow.lib.Int32Array object at 0x7fd0ebfc0f98>
> >> > [
> >> >   1,
> >> >   12,
> >> >   23,
> >> >   3,
> >> >   21,
> >> >   34
> >> > ]
> >> > <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
> >> > FIELD1: int32
> >> > metadata
> >> > --------
> >> > {}
> >> > Errore di segmentazione (core dump creato)
> >> >
> >> >
> >> > If I retrieve and use the data in the same part of the code (as I do
> in
> >> > the function retrieve1(), but it also works when I put everything in
> the
> >> > main program.) everything runs without problems.
> >> >
> >> > Also the problem seems to be related to the particular case in which I
> >> > retrieve the RecordBatch from the plasma store, since the following
> >> > (simpler) code:
> >> > def create():
> >> >         test1 = [1, 12, 23, 3, 21, 34]
> >> >         test1 = pa.array(test1, pa.int32())
> >> >
> >> >         batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
> >> >         print(batch)
> >> >         print(batch.schema)
> >> >         print(batch[0])
> >> >         return batch
> >> >
> >> > batch1 = create()
> >> > print(batch1)
> >> > print(batch1.schema)
> >> > print(batch1[0])
> >> >
> >> > Prints:
> >> >
> >> > <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
> >> > FIELD1: int32
> >> > <pyarrow.lib.Int32Array object at 0x7f5f691fd9a8>
> >> > [
> >> >   1,
> >> >   12,
> >> >   23,
> >> >   3,
> >> >   21,
> >> >   34
> >> > ]
> >> > <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
> >> > FIELD1: int32
> >> > <pyarrow.lib.Int32Array object at 0x7f5f7e29f318>
> >> > [
> >> >   1,
> >> >   12,
> >> >   23,
> >> >   3,
> >> >   21,
> >> >   34
> >> > ]
> >> >
> >> > Which is what I expect.
> >> >
> >> > Is this issue known or am I doing something wrong when retrieving the
> >> > RecordBatch from plasma?
> >> >
> >> > Also I would like to pinpoint the fact that this problem was as easy
> to
> >> > find as hard to re-create. For this reason, there can be other
> situations
> >> > in which the same problem arises that I did not experienced, since I
> >> mostly
> >> > deal with plasma and I’ve been using only python so long: the
> >> description I
> >> > gave is not intended to be complete.
> >> >
> >> > Thank you,
> >> > Alberto
> >> >
> >>
> >>
> >
>

Re: [Python] Retrieving a RecordBatch from plasma inside a function

Posted by Wes McKinney <we...@gmail.com>.

Can we create a JIRA to track this issue?

On Wed, Feb 21, 2018 at 5:04 AM, ALBERTO Bocchinfuso
<al...@hotmail.it> wrote:
> Hi,
>
> Have you had any news on this issue?
> Do you plan to solve it for the next releases of Arrow, or is there any way to avoid the problem?
>
> Thanks in advance,
> Alberto
> Da: Philipp Moritz<ma...@gmail.com>
> Inviato: venerdì 9 febbraio 2018 00:30
> A: dev@arrow.apache.org<ma...@arrow.apache.org>
> Oggetto: Re: [Python] Retrieving a RecordBatch from plasma inside a function
>
> Thanks! I can indeed reproduce this problem. I'm a bit busy right now and
> plan to look into it on the weekend.
>
> Here is the preliminary backtrace for everybody interested:
>
> CESS (code=1, address=0x111138158)
>
>     frame #0: 0x000000010e6457fc
> lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28
>
> lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py:
>
> ->  0x10e6457fc <+28>: movslq (%rdx,%rcx,4), %rdi
>
>     0x10e645800 <+32>: callq  0x10e698170               ; symbol stub for:
> PyInt_FromLong
>
>     0x10e645805 <+37>: testq  %rax, %rax
>
>     0x10e645808 <+40>: je     0x10e64580c               ; <+44>
>
> (lldb) bt
>
> * thread #1: tid = 0xf1378e, 0x000000010e6457fc
> lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28,
> queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1,
> address=0x111138158)
>
>   * frame #0: 0x000000010e6457fc
> lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28
>
>     frame #1: 0x000000010e5ccd35 lib.so`__Pyx_PyObject_CallNoArg(_object*)
> + 133
>
>     frame #2: 0x000000010e613b25
> lib.so`__pyx_pw_7pyarrow_3lib_10ArrayValue_3__repr__(_object*) + 933
>
>     frame #3: 0x000000010c2f83bc libpython2.7.dylib`PyObject_Repr + 60
>
>     frame #4: 0x000000010c35f651 libpython2.7.dylib`PyEval_EvalFrameEx +
> 22305
>
> On Tue, Feb 6, 2018 at 1:24 AM, ALBERTO Bocchinfuso <
> alberto_boc_94@hotmail.it> wrote:
>
>> Hi,
>>
>> I’m using python 3.5.2 and pyarrow 0.8.0
>>
>> As key, I put a string of 20 bytes, of course. I’m doing it differently
>> from the canonical way since I’m no more using python 2.7, but python 3,
>> and this seemed to me to be the right way to create a string of 20 bytes.
>> The full code is:
>>
>> import pyarrow as pa
>> import pyarrow.plasma as plasma
>>
>> def retrieve1():
>>              client = plasma.connect('test', "", 0)
>>
>>              key = "keynumber1keynumber1"
>>              pid = plasma.ObjectID(bytearray(key,'UTF-8'))
>>
>>              [buff] = client .get_buffers([pid])
>>              batch = pa.RecordBatchStreamReader(buff).read_next_batch()
>>
>>              print(batch)
>>              print(batch.schema)
>>              print(batch[0])
>>
>>              return batch
>>
>> client = plasma.connect('test', "", 0)
>>
>> test1 = [1, 12, 23, 3, 21, 34]
>> test1 = pa.array(test1, pa.int32())
>>
>> batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
>>
>> key = "keynumber1keynumber1"
>> pid = plasma.ObjectID(bytearray(key,'UTF-8'))
>> sink = pa.MockOutputStream()
>> stream_writer = pa.RecordBatchStreamWriter(sink, batch.schema)
>> stream_writer.write_batch(batch)
>> stream_writer.close()
>>
>> bff = client.create(pid, sink.size())
>>
>> stream = pa.FixedSizeBufferWriter(bff)
>> writer = pa.RecordBatchStreamWriter(stream, batch.schema)
>> writer.write_batch(batch)
>> client.seal(pid)
>>
>> batch = retrieve1()
>> print(batch)
>> print(batch.schema)
>> print(batch[0])
>>
>> I hope this helps,
>> thank you
>>
>> Da: Philipp Moritz<ma...@gmail.com>
>> Inviato: martedì 6 febbraio 2018 00:00
>> A: dev@arrow.apache.org<ma...@arrow.apache.org>
>> Oggetto: Re: [Python] Retrieving a RecordBatch from plasma inside a
>> function
>>
>> Hey Alberto,
>>
>> Thanks for your message! I'm trying to reproduce it.
>>
>> Can you attach the code you use to write the batch into the store?
>>
>> Also can you say which version of Python and Arrow you are using? On my
>> installation, I get
>>
>> ```
>>
>> In [*5*]: plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))
>>
>> ------------------------------------------------------------
>> ---------------
>>
>> ValueError                                Traceback (most recent call last)
>>
>> <ipython-input-5-fbec5bb33c33> in <module>()
>>
>> ----> 1 plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))
>>
>>
>> plasma.pyx in pyarrow.plasma.ObjectID.__cinit__()
>>
>>
>> ValueError: Object ID must by 20 bytes, is keynumber1keynumber1
>> ```
>>
>> (the canonical way to do this would be plasma.ObjectID(b
>> "keynumber1keynumber1"))
>>
>> Best,
>> Philipp.
>>
>> On Mon, Feb 5, 2018 at 10:09 AM, ALBERTO Bocchinfuso <
>> alberto_boc_94@hotmail.it> wrote:
>>
>> > Good morning,
>> >
>> > I am experiencing problems with the RecordBatches stored in plasma in a
>> > particular situation.
>> >
>> > If I return a RecordBatch as result of a python function, I am able to
>> > read just the metadata, while I get an error when reading the columns.
>> >
>> > For example, the following code
>> > def retrieve1():
>> >         client = plasma.connect('test', "", 0)
>> >
>> >         key = "keynumber1keynumber1"
>> >         pid = plasma.ObjectID(bytearray(key,'UTF-8'))
>> >
>> >         [buff] = client .get_buffers([pid])
>> >         batch = pa.RecordBatchStreamReader(buff).read_next_batch()
>> >         return batch
>> >
>> > batch = retrieve1()
>> > print(batch)
>> > print(batch.schema)
>> > print(batch[0])
>> >
>> > Represents a simple python code in which a function is in charge of
>> > retrieving the RecordBatch from the plasma store, and then returns it to
>> > the caller. Running the previous example I get:
>> > <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
>> > FIELD1: int32
>> > metadata
>> > --------
>> > {}
>> > <pyarrow.lib.Int32Array object at 0x7fd0ebfc0f98>
>> > [
>> >   1,
>> >   12,
>> >   23,
>> >   3,
>> >   21,
>> >   34
>> > ]
>> > <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
>> > FIELD1: int32
>> > metadata
>> > --------
>> > {}
>> > Errore di segmentazione (core dump creato)
>> >
>> >
>> > If I retrieve and use the data in the same part of the code (as I do in
>> > the function retrieve1(), but it also works when I put everything in the
>> > main program.) everything runs without problems.
>> >
>> > Also the problem seems to be related to the particular case in which I
>> > retrieve the RecordBatch from the plasma store, since the following
>> > (simpler) code:
>> > def create():
>> >         test1 = [1, 12, 23, 3, 21, 34]
>> >         test1 = pa.array(test1, pa.int32())
>> >
>> >         batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
>> >         print(batch)
>> >         print(batch.schema)
>> >         print(batch[0])
>> >         return batch
>> >
>> > batch1 = create()
>> > print(batch1)
>> > print(batch1.schema)
>> > print(batch1[0])
>> >
>> > Prints:
>> >
>> > <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
>> > FIELD1: int32
>> > <pyarrow.lib.Int32Array object at 0x7f5f691fd9a8>
>> > [
>> >   1,
>> >   12,
>> >   23,
>> >   3,
>> >   21,
>> >   34
>> > ]
>> > <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
>> > FIELD1: int32
>> > <pyarrow.lib.Int32Array object at 0x7f5f7e29f318>
>> > [
>> >   1,
>> >   12,
>> >   23,
>> >   3,
>> >   21,
>> >   34
>> > ]
>> >
>> > Which is what I expect.
>> >
>> > Is this issue known or am I doing something wrong when retrieving the
>> > RecordBatch from plasma?
>> >
>> > Also I would like to pinpoint the fact that this problem was as easy to
>> > find as hard to re-create. For this reason, there can be other situations
>> > in which the same problem arises that I did not experienced, since I
>> mostly
>> > deal with plasma and I’ve been using only python so long: the
>> description I
>> > gave is not intended to be complete.
>> >
>> > Thank you,
>> > Alberto
>> >
>>
>>
>

R: [Python] Retrieving a RecordBatch from plasma inside a function

Posted by ALBERTO Bocchinfuso <al...@hotmail.it>.

Hi,

Have you had any news on this issue?
Do you plan to solve it for the next releases of Arrow, or is there any way to avoid the problem?

Thanks in advance,
Alberto
Da: Philipp Moritz<ma...@gmail.com>
Inviato: venerdì 9 febbraio 2018 00:30
A: dev@arrow.apache.org<ma...@arrow.apache.org>
Oggetto: Re: [Python] Retrieving a RecordBatch from plasma inside a function

Thanks! I can indeed reproduce this problem. I'm a bit busy right now and
plan to look into it on the weekend.

Here is the preliminary backtrace for everybody interested:

CESS (code=1, address=0x111138158)

    frame #0: 0x000000010e6457fc
lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28

lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py:

->  0x10e6457fc <+28>: movslq (%rdx,%rcx,4), %rdi

    0x10e645800 <+32>: callq  0x10e698170               ; symbol stub for:
PyInt_FromLong

    0x10e645805 <+37>: testq  %rax, %rax

    0x10e645808 <+40>: je     0x10e64580c               ; <+44>

(lldb) bt

* thread #1: tid = 0xf1378e, 0x000000010e6457fc
lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28,
queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1,
address=0x111138158)

  * frame #0: 0x000000010e6457fc
lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28

    frame #1: 0x000000010e5ccd35 lib.so`__Pyx_PyObject_CallNoArg(_object*)
+ 133

    frame #2: 0x000000010e613b25
lib.so`__pyx_pw_7pyarrow_3lib_10ArrayValue_3__repr__(_object*) + 933

    frame #3: 0x000000010c2f83bc libpython2.7.dylib`PyObject_Repr + 60

    frame #4: 0x000000010c35f651 libpython2.7.dylib`PyEval_EvalFrameEx +
22305

On Tue, Feb 6, 2018 at 1:24 AM, ALBERTO Bocchinfuso <
alberto_boc_94@hotmail.it> wrote:

> Hi,
>
> I’m using python 3.5.2 and pyarrow 0.8.0
>
> As key, I put a string of 20 bytes, of course. I’m doing it differently
> from the canonical way since I’m no more using python 2.7, but python 3,
> and this seemed to me to be the right way to create a string of 20 bytes.
> The full code is:
>
> import pyarrow as pa
> import pyarrow.plasma as plasma
>
> def retrieve1():
>              client = plasma.connect('test', "", 0)
>
>              key = "keynumber1keynumber1"
>              pid = plasma.ObjectID(bytearray(key,'UTF-8'))
>
>              [buff] = client .get_buffers([pid])
>              batch = pa.RecordBatchStreamReader(buff).read_next_batch()
>
>              print(batch)
>              print(batch.schema)
>              print(batch[0])
>
>              return batch
>
> client = plasma.connect('test', "", 0)
>
> test1 = [1, 12, 23, 3, 21, 34]
> test1 = pa.array(test1, pa.int32())
>
> batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
>
> key = "keynumber1keynumber1"
> pid = plasma.ObjectID(bytearray(key,'UTF-8'))
> sink = pa.MockOutputStream()
> stream_writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> stream_writer.write_batch(batch)
> stream_writer.close()
>
> bff = client.create(pid, sink.size())
>
> stream = pa.FixedSizeBufferWriter(bff)
> writer = pa.RecordBatchStreamWriter(stream, batch.schema)
> writer.write_batch(batch)
> client.seal(pid)
>
> batch = retrieve1()
> print(batch)
> print(batch.schema)
> print(batch[0])
>
> I hope this helps,
> thank you
>
> Da: Philipp Moritz<ma...@gmail.com>
> Inviato: martedì 6 febbraio 2018 00:00
> A: dev@arrow.apache.org<ma...@arrow.apache.org>
> Oggetto: Re: [Python] Retrieving a RecordBatch from plasma inside a
> function
>
> Hey Alberto,
>
> Thanks for your message! I'm trying to reproduce it.
>
> Can you attach the code you use to write the batch into the store?
>
> Also can you say which version of Python and Arrow you are using? On my
> installation, I get
>
> ```
>
> In [*5*]: plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))
>
> ------------------------------------------------------------
> ---------------
>
> ValueError                                Traceback (most recent call last)
>
> <ipython-input-5-fbec5bb33c33> in <module>()
>
> ----> 1 plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))
>
>
> plasma.pyx in pyarrow.plasma.ObjectID.__cinit__()
>
>
> ValueError: Object ID must by 20 bytes, is keynumber1keynumber1
> ```
>
> (the canonical way to do this would be plasma.ObjectID(b
> "keynumber1keynumber1"))
>
> Best,
> Philipp.
>
> On Mon, Feb 5, 2018 at 10:09 AM, ALBERTO Bocchinfuso <
> alberto_boc_94@hotmail.it> wrote:
>
> > Good morning,
> >
> > I am experiencing problems with the RecordBatches stored in plasma in a
> > particular situation.
> >
> > If I return a RecordBatch as result of a python function, I am able to
> > read just the metadata, while I get an error when reading the columns.
> >
> > For example, the following code
> > def retrieve1():
> >         client = plasma.connect('test', "", 0)
> >
> >         key = "keynumber1keynumber1"
> >         pid = plasma.ObjectID(bytearray(key,'UTF-8'))
> >
> >         [buff] = client .get_buffers([pid])
> >         batch = pa.RecordBatchStreamReader(buff).read_next_batch()
> >         return batch
> >
> > batch = retrieve1()
> > print(batch)
> > print(batch.schema)
> > print(batch[0])
> >
> > Represents a simple python code in which a function is in charge of
> > retrieving the RecordBatch from the plasma store, and then returns it to
> > the caller. Running the previous example I get:
> > <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
> > FIELD1: int32
> > metadata
> > --------
> > {}
> > <pyarrow.lib.Int32Array object at 0x7fd0ebfc0f98>
> > [
> >   1,
> >   12,
> >   23,
> >   3,
> >   21,
> >   34
> > ]
> > <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
> > FIELD1: int32
> > metadata
> > --------
> > {}
> > Errore di segmentazione (core dump creato)
> >
> >
> > If I retrieve and use the data in the same part of the code (as I do in
> > the function retrieve1(), but it also works when I put everything in the
> > main program.) everything runs without problems.
> >
> > Also the problem seems to be related to the particular case in which I
> > retrieve the RecordBatch from the plasma store, since the following
> > (simpler) code:
> > def create():
> >         test1 = [1, 12, 23, 3, 21, 34]
> >         test1 = pa.array(test1, pa.int32())
> >
> >         batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
> >         print(batch)
> >         print(batch.schema)
> >         print(batch[0])
> >         return batch
> >
> > batch1 = create()
> > print(batch1)
> > print(batch1.schema)
> > print(batch1[0])
> >
> > Prints:
> >
> > <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
> > FIELD1: int32
> > <pyarrow.lib.Int32Array object at 0x7f5f691fd9a8>
> > [
> >   1,
> >   12,
> >   23,
> >   3,
> >   21,
> >   34
> > ]
> > <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
> > FIELD1: int32
> > <pyarrow.lib.Int32Array object at 0x7f5f7e29f318>
> > [
> >   1,
> >   12,
> >   23,
> >   3,
> >   21,
> >   34
> > ]
> >
> > Which is what I expect.
> >
> > Is this issue known or am I doing something wrong when retrieving the
> > RecordBatch from plasma?
> >
> > Also I would like to pinpoint the fact that this problem was as easy to
> > find as hard to re-create. For this reason, there can be other situations
> > in which the same problem arises that I did not experienced, since I
> mostly
> > deal with plasma and I’ve been using only python so long: the
> description I
> > gave is not intended to be complete.
> >
> > Thank you,
> > Alberto
> >
>
>

Re: [Python] Retrieving a RecordBatch from plasma inside a function

Posted by Philipp Moritz <pc...@gmail.com>.

Thanks! I can indeed reproduce this problem. I'm a bit busy right now and
plan to look into it on the weekend.

Here is the preliminary backtrace for everybody interested:

CESS (code=1, address=0x111138158)

    frame #0: 0x000000010e6457fc
lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28

lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py:

->  0x10e6457fc <+28>: movslq (%rdx,%rcx,4), %rdi

    0x10e645800 <+32>: callq  0x10e698170               ; symbol stub for:
PyInt_FromLong

    0x10e645805 <+37>: testq  %rax, %rax

    0x10e645808 <+40>: je     0x10e64580c               ; <+44>

(lldb) bt

* thread #1: tid = 0xf1378e, 0x000000010e6457fc
lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28,
queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1,
address=0x111138158)

  * frame #0: 0x000000010e6457fc
lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28

    frame #1: 0x000000010e5ccd35 lib.so`__Pyx_PyObject_CallNoArg(_object*)
+ 133

    frame #2: 0x000000010e613b25
lib.so`__pyx_pw_7pyarrow_3lib_10ArrayValue_3__repr__(_object*) + 933

    frame #3: 0x000000010c2f83bc libpython2.7.dylib`PyObject_Repr + 60

    frame #4: 0x000000010c35f651 libpython2.7.dylib`PyEval_EvalFrameEx +
22305

On Tue, Feb 6, 2018 at 1:24 AM, ALBERTO Bocchinfuso <
alberto_boc_94@hotmail.it> wrote:

> Hi,
>
> I’m using python 3.5.2 and pyarrow 0.8.0
>
> As key, I put a string of 20 bytes, of course. I’m doing it differently
> from the canonical way since I’m no more using python 2.7, but python 3,
> and this seemed to me to be the right way to create a string of 20 bytes.
> The full code is:
>
> import pyarrow as pa
> import pyarrow.plasma as plasma
>
> def retrieve1():
>              client = plasma.connect('test', "", 0)
>
>              key = "keynumber1keynumber1"
>              pid = plasma.ObjectID(bytearray(key,'UTF-8'))
>
>              [buff] = client .get_buffers([pid])
>              batch = pa.RecordBatchStreamReader(buff).read_next_batch()
>
>              print(batch)
>              print(batch.schema)
>              print(batch[0])
>
>              return batch
>
> client = plasma.connect('test', "", 0)
>
> test1 = [1, 12, 23, 3, 21, 34]
> test1 = pa.array(test1, pa.int32())
>
> batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
>
> key = "keynumber1keynumber1"
> pid = plasma.ObjectID(bytearray(key,'UTF-8'))
> sink = pa.MockOutputStream()
> stream_writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> stream_writer.write_batch(batch)
> stream_writer.close()
>
> bff = client.create(pid, sink.size())
>
> stream = pa.FixedSizeBufferWriter(bff)
> writer = pa.RecordBatchStreamWriter(stream, batch.schema)
> writer.write_batch(batch)
> client.seal(pid)
>
> batch = retrieve1()
> print(batch)
> print(batch.schema)
> print(batch[0])
>
> I hope this helps,
> thank you
>
> Da: Philipp Moritz<ma...@gmail.com>
> Inviato: martedì 6 febbraio 2018 00:00
> A: dev@arrow.apache.org<ma...@arrow.apache.org>
> Oggetto: Re: [Python] Retrieving a RecordBatch from plasma inside a
> function
>
> Hey Alberto,
>
> Thanks for your message! I'm trying to reproduce it.
>
> Can you attach the code you use to write the batch into the store?
>
> Also can you say which version of Python and Arrow you are using? On my
> installation, I get
>
> ```
>
> In [*5*]: plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))
>
> ------------------------------------------------------------
> ---------------
>
> ValueError                                Traceback (most recent call last)
>
> <ipython-input-5-fbec5bb33c33> in <module>()
>
> ----> 1 plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))
>
>
> plasma.pyx in pyarrow.plasma.ObjectID.__cinit__()
>
>
> ValueError: Object ID must by 20 bytes, is keynumber1keynumber1
> ```
>
> (the canonical way to do this would be plasma.ObjectID(b
> "keynumber1keynumber1"))
>
> Best,
> Philipp.
>
> On Mon, Feb 5, 2018 at 10:09 AM, ALBERTO Bocchinfuso <
> alberto_boc_94@hotmail.it> wrote:
>
> > Good morning,
> >
> > I am experiencing problems with the RecordBatches stored in plasma in a
> > particular situation.
> >
> > If I return a RecordBatch as result of a python function, I am able to
> > read just the metadata, while I get an error when reading the columns.
> >
> > For example, the following code
> > def retrieve1():
> >         client = plasma.connect('test', "", 0)
> >
> >         key = "keynumber1keynumber1"
> >         pid = plasma.ObjectID(bytearray(key,'UTF-8'))
> >
> >         [buff] = client .get_buffers([pid])
> >         batch = pa.RecordBatchStreamReader(buff).read_next_batch()
> >         return batch
> >
> > batch = retrieve1()
> > print(batch)
> > print(batch.schema)
> > print(batch[0])
> >
> > Represents a simple python code in which a function is in charge of
> > retrieving the RecordBatch from the plasma store, and then returns it to
> > the caller. Running the previous example I get:
> > <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
> > FIELD1: int32
> > metadata
> > --------
> > {}
> > <pyarrow.lib.Int32Array object at 0x7fd0ebfc0f98>
> > [
> >   1,
> >   12,
> >   23,
> >   3,
> >   21,
> >   34
> > ]
> > <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
> > FIELD1: int32
> > metadata
> > --------
> > {}
> > Errore di segmentazione (core dump creato)
> >
> >
> > If I retrieve and use the data in the same part of the code (as I do in
> > the function retrieve1(), but it also works when I put everything in the
> > main program.) everything runs without problems.
> >
> > Also the problem seems to be related to the particular case in which I
> > retrieve the RecordBatch from the plasma store, since the following
> > (simpler) code:
> > def create():
> >         test1 = [1, 12, 23, 3, 21, 34]
> >         test1 = pa.array(test1, pa.int32())
> >
> >         batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
> >         print(batch)
> >         print(batch.schema)
> >         print(batch[0])
> >         return batch
> >
> > batch1 = create()
> > print(batch1)
> > print(batch1.schema)
> > print(batch1[0])
> >
> > Prints:
> >
> > <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
> > FIELD1: int32
> > <pyarrow.lib.Int32Array object at 0x7f5f691fd9a8>
> > [
> >   1,
> >   12,
> >   23,
> >   3,
> >   21,
> >   34
> > ]
> > <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
> > FIELD1: int32
> > <pyarrow.lib.Int32Array object at 0x7f5f7e29f318>
> > [
> >   1,
> >   12,
> >   23,
> >   3,
> >   21,
> >   34
> > ]
> >
> > Which is what I expect.
> >
> > Is this issue known or am I doing something wrong when retrieving the
> > RecordBatch from plasma?
> >
> > Also I would like to pinpoint the fact that this problem was as easy to
> > find as hard to re-create. For this reason, there can be other situations
> > in which the same problem arises that I did not experienced, since I
> mostly
> > deal with plasma and I’ve been using only python so long: the
> description I
> > gave is not intended to be complete.
> >
> > Thank you,
> > Alberto
> >
>
>

R: [Python] Retrieving a RecordBatch from plasma inside a function

Posted by ALBERTO Bocchinfuso <al...@hotmail.it>.

Hi,

I’m using python 3.5.2 and pyarrow 0.8.0

As key, I put a string of 20 bytes, of course. I’m doing it differently from the canonical way since I’m no more using python 2.7, but python 3, and this seemed to me to be the right way to create a string of 20 bytes.
The full code is:

import pyarrow as pa
import pyarrow.plasma as plasma

def retrieve1():
             client = plasma.connect('test', "", 0)

             key = "keynumber1keynumber1"
             pid = plasma.ObjectID(bytearray(key,'UTF-8'))

             [buff] = client .get_buffers([pid])
             batch = pa.RecordBatchStreamReader(buff).read_next_batch()

             print(batch)
             print(batch.schema)
             print(batch[0])

             return batch

client = plasma.connect('test', "", 0)

test1 = [1, 12, 23, 3, 21, 34]
test1 = pa.array(test1, pa.int32())

batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])

key = "keynumber1keynumber1"
pid = plasma.ObjectID(bytearray(key,'UTF-8'))
sink = pa.MockOutputStream()
stream_writer = pa.RecordBatchStreamWriter(sink, batch.schema)
stream_writer.write_batch(batch)
stream_writer.close()

bff = client.create(pid, sink.size())

stream = pa.FixedSizeBufferWriter(bff)
writer = pa.RecordBatchStreamWriter(stream, batch.schema)
writer.write_batch(batch)
client.seal(pid)

batch = retrieve1()
print(batch)
print(batch.schema)
print(batch[0])

I hope this helps,
thank you

Da: Philipp Moritz<ma...@gmail.com>
Inviato: martedì 6 febbraio 2018 00:00
A: dev@arrow.apache.org<ma...@arrow.apache.org>
Oggetto: Re: [Python] Retrieving a RecordBatch from plasma inside a function

Hey Alberto,

Thanks for your message! I'm trying to reproduce it.

Can you attach the code you use to write the batch into the store?

Also can you say which version of Python and Arrow you are using? On my
installation, I get

```

In [*5*]: plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-5-fbec5bb33c33> in <module>()

----> 1 plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))


plasma.pyx in pyarrow.plasma.ObjectID.__cinit__()


ValueError: Object ID must by 20 bytes, is keynumber1keynumber1
```

(the canonical way to do this would be plasma.ObjectID(b
"keynumber1keynumber1"))

Best,
Philipp.

On Mon, Feb 5, 2018 at 10:09 AM, ALBERTO Bocchinfuso <
alberto_boc_94@hotmail.it> wrote:

> Good morning,
>
> I am experiencing problems with the RecordBatches stored in plasma in a
> particular situation.
>
> If I return a RecordBatch as result of a python function, I am able to
> read just the metadata, while I get an error when reading the columns.
>
> For example, the following code
> def retrieve1():
>         client = plasma.connect('test', "", 0)
>
>         key = "keynumber1keynumber1"
>         pid = plasma.ObjectID(bytearray(key,'UTF-8'))
>
>         [buff] = client .get_buffers([pid])
>         batch = pa.RecordBatchStreamReader(buff).read_next_batch()
>         return batch
>
> batch = retrieve1()
> print(batch)
> print(batch.schema)
> print(batch[0])
>
> Represents a simple python code in which a function is in charge of
> retrieving the RecordBatch from the plasma store, and then returns it to
> the caller. Running the previous example I get:
> <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
> FIELD1: int32
> metadata
> --------
> {}
> <pyarrow.lib.Int32Array object at 0x7fd0ebfc0f98>
> [
>   1,
>   12,
>   23,
>   3,
>   21,
>   34
> ]
> <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
> FIELD1: int32
> metadata
> --------
> {}
> Errore di segmentazione (core dump creato)
>
>
> If I retrieve and use the data in the same part of the code (as I do in
> the function retrieve1(), but it also works when I put everything in the
> main program.) everything runs without problems.
>
> Also the problem seems to be related to the particular case in which I
> retrieve the RecordBatch from the plasma store, since the following
> (simpler) code:
> def create():
>         test1 = [1, 12, 23, 3, 21, 34]
>         test1 = pa.array(test1, pa.int32())
>
>         batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
>         print(batch)
>         print(batch.schema)
>         print(batch[0])
>         return batch
>
> batch1 = create()
> print(batch1)
> print(batch1.schema)
> print(batch1[0])
>
> Prints:
>
> <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
> FIELD1: int32
> <pyarrow.lib.Int32Array object at 0x7f5f691fd9a8>
> [
>   1,
>   12,
>   23,
>   3,
>   21,
>   34
> ]
> <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
> FIELD1: int32
> <pyarrow.lib.Int32Array object at 0x7f5f7e29f318>
> [
>   1,
>   12,
>   23,
>   3,
>   21,
>   34
> ]
>
> Which is what I expect.
>
> Is this issue known or am I doing something wrong when retrieving the
> RecordBatch from plasma?
>
> Also I would like to pinpoint the fact that this problem was as easy to
> find as hard to re-create. For this reason, there can be other situations
> in which the same problem arises that I did not experienced, since I mostly
> deal with plasma and I’ve been using only python so long: the description I
> gave is not intended to be complete.
>
> Thank you,
> Alberto
>

Re: [Python] Retrieving a RecordBatch from plasma inside a function

Posted by Philipp Moritz <pc...@gmail.com>.

Hey Alberto,

Thanks for your message! I'm trying to reproduce it.

Can you attach the code you use to write the batch into the store?

Also can you say which version of Python and Arrow you are using? On my
installation, I get

```

In [*5*]: plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-5-fbec5bb33c33> in <module>()

----> 1 plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8"))


plasma.pyx in pyarrow.plasma.ObjectID.__cinit__()


ValueError: Object ID must by 20 bytes, is keynumber1keynumber1
```

(the canonical way to do this would be plasma.ObjectID(b
"keynumber1keynumber1"))

Best,
Philipp.

On Mon, Feb 5, 2018 at 10:09 AM, ALBERTO Bocchinfuso <
alberto_boc_94@hotmail.it> wrote:

> Good morning,
>
> I am experiencing problems with the RecordBatches stored in plasma in a
> particular situation.
>
> If I return a RecordBatch as result of a python function, I am able to
> read just the metadata, while I get an error when reading the columns.
>
> For example, the following code
> def retrieve1():
>         client = plasma.connect('test', "", 0)
>
>         key = "keynumber1keynumber1"
>         pid = plasma.ObjectID(bytearray(key,'UTF-8'))
>
>         [buff] = client .get_buffers([pid])
>         batch = pa.RecordBatchStreamReader(buff).read_next_batch()
>         return batch
>
> batch = retrieve1()
> print(batch)
> print(batch.schema)
> print(batch[0])
>
> Represents a simple python code in which a function is in charge of
> retrieving the RecordBatch from the plasma store, and then returns it to
> the caller. Running the previous example I get:
> <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
> FIELD1: int32
> metadata
> --------
> {}
> <pyarrow.lib.Int32Array object at 0x7fd0ebfc0f98>
> [
>   1,
>   12,
>   23,
>   3,
>   21,
>   34
> ]
> <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
> FIELD1: int32
> metadata
> --------
> {}
> Errore di segmentazione (core dump creato)
>
>
> If I retrieve and use the data in the same part of the code (as I do in
> the function retrieve1(), but it also works when I put everything in the
> main program.) everything runs without problems.
>
> Also the problem seems to be related to the particular case in which I
> retrieve the RecordBatch from the plasma store, since the following
> (simpler) code:
> def create():
>         test1 = [1, 12, 23, 3, 21, 34]
>         test1 = pa.array(test1, pa.int32())
>
>         batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
>         print(batch)
>         print(batch.schema)
>         print(batch[0])
>         return batch
>
> batch1 = create()
> print(batch1)
> print(batch1.schema)
> print(batch1[0])
>
> Prints:
>
> <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
> FIELD1: int32
> <pyarrow.lib.Int32Array object at 0x7f5f691fd9a8>
> [
>   1,
>   12,
>   23,
>   3,
>   21,
>   34
> ]
> <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
> FIELD1: int32
> <pyarrow.lib.Int32Array object at 0x7f5f7e29f318>
> [
>   1,
>   12,
>   23,
>   3,
>   21,
>   34
> ]
>
> Which is what I expect.
>
> Is this issue known or am I doing something wrong when retrieving the
> RecordBatch from plasma?
>
> Also I would like to pinpoint the fact that this problem was as easy to
> find as hard to re-create. For this reason, there can be other situations
> in which the same problem arises that I did not experienced, since I mostly
> deal with plasma and I’ve been using only python so long: the description I
> gave is not intended to be complete.
>
> Thank you,
> Alberto
>