You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Xander Dunn <xa...@xander.ai> on 2021/04/19 07:27:45 UTC
[C++] Decoding Plasma Objects as RecordBatches
In Python I'm encoding RecordBatches like this:
```python
client = plasma.connect("/tmp/plasma")
object_id = plasma.ObjectID(bytes(my_id, "ascii"))
mock_sink = pa.MockOutputStream()
stream_writer = pa.RecordBatchStreamWriter(mock_sink, pybatch.schema)
stream_writer.write_batch(pybatch)
stream_writer.close()
data_size = mock_sink.size()
buf = client.create(object_id, data_size)
stream = pa.FixedSizeBufferWriter(buf)
stream_writer = pa.RecordBatchStreamWriter(stream, pybatch.schema)
stream_writer.write_batch(pybatch)
stream_writer.close()
client.seal(object_id)
```
This works, code from here: https://arrow.apache.org/docs/python/plasma.html
I am now having difficulty figuring out the right calls on the C++ side to decode these RecordBatch messages. I am successfully getting the Plasma objects as arrow::Buffer's, but I haven't managed to decode it into a RecordBatch:
```c++
auto object_id = plasma::ObjectID::from_binary(my_id);
plasma::ObjectBuffer object_buffer;
arrow::Status status = client.Get(&object_id, 1, -1, &object_buffer);
std::shared_ptr<Buffer> data = object_buffer.data;
fmt::print("{} Got data with size {}\n", current_id, data->size());
// Everything above works and prints the object size in bytes I'm expecting
//auto buf_reader = arrow::io::BufferReader(buffer);
//auto reader = arrow::ipc::RecordBatchStreamReader::Open(&buf_reader);
// auto batch = reader.ReadNextBatch();
```
I'm stuck on creating the BufferReader. I believe it's declared in arrow/io/memory.h, so I include that with `#include <arrow/io/memory.h>` and get these compile errors in clang:
```
pymydata/pymydata/PreprocessData.cc:312:18 ( http://pymydata/pymydata/PreprocessData.cc:312:18 ) : error: call to implicitly-deleted copy constructor of 'arrow::io::BufferReader'
auto buf_reader = arrow::io::BufferReader(buffer);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/xander/anaconda3/envs/my_model/include/arrow/io/memory.h:146:7: note: copy constructor of 'BufferReader' is implicitly deleted because base class 'internal::RandomAccessFileConcurrencyWrapper<BufferReader>' has a deleted copy constructor
: public internal::RandomAccessFileConcurrencyWrapper<BufferReader> {
^
/home/xander/anaconda3/envs/my_model/include/arrow/io/concurrency.h:162:57: note: copy constructor of 'RandomAccessFileConcurrencyWrapper<arrow::io::BufferReader>' is implicitly deleted because base class 'arrow::io::RandomAccessFile' has a deleted copy constructor
class ARROW_EXPORT RandomAccessFileConcurrencyWrapper : public RandomAccessFile {
^
/home/xander/anaconda3/envs/my_model/include/arrow/io/interfaces.h:187:7: note: copy constructor of 'RandomAccessFile' is implicitly deleted because base class 'arrow::io::InputStream' has a deleted copy constructor
public InputStream,
^
```
GCC produces similar errors.
Arrow 3.0.0. C++11. Am I calling the BufferReader init correctly? Do I have the right #include? Any pointers on decoding RecordBatches on the C++ side will be helpful. Hopefully I'm missing something simple.
Thanks,
Xander
Re: [C++] Decoding Plasma Objects as RecordBatches
Posted by Xander Dunn <xa...@xander.ai>.
Oh boy, that did it. Thank you!
I originally tried that after seeing it in the arrow code base, but I got different errors. That may have been before I figured out the correct #include. Now I need to understand the difference in syntax. Why does this init syntax work whereas ` auto buf_reader = arrow::io::BufferReader(buffer);` doesn't? I think this is a C++ question rather than an arrow question.
Thanks,
Xander
On Mon, Apr 19, 2021 at 1:17 AM, Antoine Pitrou < antoine@python.org > wrote:
>
>
>
> On Mon , 19 Apr 2021 07:27:45 +0000
> "Xander Dunn" < xander@ xander. ai ( xander@xander.ai ) > wrote:
>
>
>>
>>
>> I'm stuck on creating the BufferReader. I believe it's declared in
>> arrow/io/memory.h, so I include that with `#include <arrow/io/memory.h>`
>> and get these compile errors in clang:
>>
>>
>>
>> ```
>>
>>
>>
>> pymydata/ pymydata/ PreprocessData. cc:312:18 (
>> http://pymydata/pymydata/PreprocessData.cc:312:18 ) ( http:/ / pymydata/ pymydata/
>> PreprocessData. cc:312:18 (
>> http://pymydata/pymydata/PreprocessData.cc:312:18 ) ) : error: call to
>> implicitly-deleted copy constructor of 'arrow::io::BufferReader'
>>
>>
>>
>> auto buf_reader = arrow::io::BufferReader(buffer);
>>
>>
>
>
>
> How about
>
>
>
> arrow::io::BufferReader buf_reader(buffer);
>
>
>
> ?
>
>
>
Re: [C++] Decoding Plasma Objects as RecordBatches
Posted by Antoine Pitrou <an...@python.org>.
On Mon, 19 Apr 2021 07:27:45 +0000
"Xander Dunn" <xa...@xander.ai> wrote:
>
> I'm stuck on creating the BufferReader. I believe it's declared in arrow/io/memory.h, so I include that with `#include <arrow/io/memory.h>` and get these compile errors in clang:
>
> ```
>
> pymydata/pymydata/PreprocessData.cc:312:18 ( http://pymydata/pymydata/PreprocessData.cc:312:18 ) : error: call to implicitly-deleted copy constructor of 'arrow::io::BufferReader'
>
> auto buf_reader = arrow::io::BufferReader(buffer);
How about
arrow::io::BufferReader buf_reader(buffer);
?