You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2020/05/09 20:47:00 UTC
[jira] [Created] (ARROW-8749) [C++] IpcFormatWriter writes
dictionary batches with wrong ID
David Li created ARROW-8749:
-------------------------------
Summary: [C++] IpcFormatWriter writes dictionary batches with wrong ID
Key: ARROW-8749
URL: https://issues.apache.org/jira/browse/ARROW-8749
Project: Apache Arrow
Issue Type: Bug
Components: C++
Affects Versions: 0.17.0
Reporter: David Li
IpcFormatWriter assigns dictionary IDs once when it writes the schema message. Then, when it writes dictionary batches, it assigns dictionary IDs again because it re-collects dictionaries from the given batch. So for example, if you have 5 dictionaries, the first dictionary will end up with ID 0 but be written with ID 5.
For example, this will fail with "'_error_or_value11.status()' failed with Key error: No record of dictionary type with id 9"
{code:cpp}
TEST_F(TestMetadata, DoPutDictionaries) {
ASSERT_OK_AND_ASSIGN(auto sink, arrow::io::BufferOutputStream::Create());
std::shared_ptr<Schema> schema = ExampleDictSchema();
BatchVector expected_batches;
ASSERT_OK(ExampleDictBatches(&expected_batches));
ASSERT_OK_AND_ASSIGN(auto writer, arrow::ipc::NewStreamWriter(sink.get(), schema));
for (auto& batch : expected_batches) {
ASSERT_OK(writer->WriteRecordBatch(*batch));
}
ASSERT_OK_AND_ASSIGN(auto buf, sink->Finish());
arrow::io::BufferReader source(buf);
ASSERT_OK_AND_ASSIGN(auto reader, arrow::ipc::RecordBatchStreamReader::Open(&source));
AssertSchemaEqual(schema, reader->schema());
for (auto& batch : expected_batches) {
ASSERT_OK_AND_ASSIGN(auto actual, reader->Next());
AssertBatchesEqual(*actual, *batch);
}
}{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)