You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Steven Fackler (JIRA)" <ji...@apache.org> on 2019/07/22 21:17:00 UTC

[jira] [Updated] (ARROW-6006) [C++] Error reading an empty IPC stream with a dictionary-encoded column

     [ https://issues.apache.org/jira/browse/ARROW-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Fackler updated ARROW-6006:
----------------------------------
    Description: 
 
{code:java}
#include <arrow/api.h>
#include <arrow/ipc/api.h>
#include <arrow/io/api.h>

void check(arrow::Status status) {
    if (!status.ok()) {
        status.Abort();
    }
}

int main() {
    auto type = arrow::dictionary(arrow::int8(), arrow::utf8());
    auto f0 = arrow::field("f0", type);
    auto schema = arrow::schema({f0});

    std::shared_ptr<arrow::io::BufferOutputStream> os;
    check(arrow::io::BufferOutputStream::Create(0, arrow::default_memory_pool(), &os));

    std::shared_ptr<arrow::ipc::RecordBatchWriter> writer;
    check(arrow::ipc::RecordBatchStreamWriter::Open(&*os, schema, &writer));
    check(writer->Close());

    std::shared_ptr<arrow::Buffer> buffer;
    check(os->Finish(&buffer));
    arrow::io::BufferReader is(buffer);

    std::shared_ptr<arrow::ipc::RecordBatchReader> reader;
    check(arrow::ipc::RecordBatchStreamReader::Open(&is, &reader));

    std::shared_ptr<arrow::RecordBatch> batch;
    check(reader->ReadNext(&batch));
}
{code}
 
{noformat}
-- Arrow Fatal Error --
Invalid: Expected message in stream, was null or length 0{noformat}
It seems like this was caused by [https://github.com/apache/arrow/commit/e68ca7f9aed876a1afcad81a417afb87c94ee951], which moved the dictionary values from the DataType to the array itself.

I initially thought I could work around this by writing a zero-length table but that doesn't seem to actually work.

 

  was:
 
{code:java}
#include <arrow/api.h>
#include <arrow/ipc/api.h>
#include <arrow/io/api.h>
void check(arrow::Status status) {
 if (!status.ok()) {
 status.Abort();
 }
}
int main() {
 auto type = arrow::dictionary(arrow::int8(), arrow::utf8());
 auto f0 = arrow::field("f0", type);
 auto schema = arrow::schema({f0});
std::shared_ptr<arrow::io::BufferOutputStream> os;
 check(arrow::io::BufferOutputStream::Create(0, arrow::default_memory_pool(), &os));
std::shared_ptr<arrow::ipc::RecordBatchWriter> writer;
 check(arrow::ipc::RecordBatchStreamWriter::Open(&*os, schema, &writer));
 check(writer->Close());
std::shared_ptr<arrow::Buffer> buffer;
 check(os->Finish(&buffer));
arrow::io::BufferReader is(buffer);
std::shared_ptr<arrow::ipc::RecordBatchReader> reader;
 check(arrow::ipc::RecordBatchStreamReader::Open(&is, &reader));
std::shared_ptr<arrow::RecordBatch> batch;
 check(reader->ReadNext(&batch));
}
{code}
 
{noformat}
-- Arrow Fatal Error --
Invalid: Expected message in stream, was null or length 0{noformat}
It seems like this was caused by [https://github.com/apache/arrow/commit/e68ca7f9aed876a1afcad81a417afb87c94ee951], which moved the dictionary values from the DataType to the array itself.

I initially thought I could work around this by writing a zero-length table but that doesn't seem to actually work.

 


> [C++] Error reading an empty IPC stream with a dictionary-encoded column
> ------------------------------------------------------------------------
>
>                 Key: ARROW-6006
>                 URL: https://issues.apache.org/jira/browse/ARROW-6006
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Steven Fackler
>            Priority: Major
>
>  
> {code:java}
> #include <arrow/api.h>
> #include <arrow/ipc/api.h>
> #include <arrow/io/api.h>
> void check(arrow::Status status) {
>     if (!status.ok()) {
>         status.Abort();
>     }
> }
> int main() {
>     auto type = arrow::dictionary(arrow::int8(), arrow::utf8());
>     auto f0 = arrow::field("f0", type);
>     auto schema = arrow::schema({f0});
>     std::shared_ptr<arrow::io::BufferOutputStream> os;
>     check(arrow::io::BufferOutputStream::Create(0, arrow::default_memory_pool(), &os));
>     std::shared_ptr<arrow::ipc::RecordBatchWriter> writer;
>     check(arrow::ipc::RecordBatchStreamWriter::Open(&*os, schema, &writer));
>     check(writer->Close());
>     std::shared_ptr<arrow::Buffer> buffer;
>     check(os->Finish(&buffer));
>     arrow::io::BufferReader is(buffer);
>     std::shared_ptr<arrow::ipc::RecordBatchReader> reader;
>     check(arrow::ipc::RecordBatchStreamReader::Open(&is, &reader));
>     std::shared_ptr<arrow::RecordBatch> batch;
>     check(reader->ReadNext(&batch));
> }
> {code}
>  
> {noformat}
> -- Arrow Fatal Error --
> Invalid: Expected message in stream, was null or length 0{noformat}
> It seems like this was caused by [https://github.com/apache/arrow/commit/e68ca7f9aed876a1afcad81a417afb87c94ee951], which moved the dictionary values from the DataType to the array itself.
> I initially thought I could work around this by writing a zero-length table but that doesn't seem to actually work.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)