You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Antoine Pitrou (JIRA)" <ji...@apache.org> on 2019/04/09 10:10:00 UTC

[jira] [Commented] (ARROW-5145) RecordBatchFileWriter fails to write string dictionary

    [ https://issues.apache.org/jira/browse/ARROW-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813217#comment-16813217 ] 

Antoine Pitrou commented on ARROW-5145:
---------------------------------------

The following line is wrong:
{code:cpp}
auto sch = schema({arrow::field("TEST", dictionary(utf8(), dict_array))});
{code}
as it defines the index type of the dictionary type (normally an integer type) to be "utf8".

Instead you can use:
{code:cpp}
auto sch = schema({arrow::field("TEST", col->type())});
{code}


> RecordBatchFileWriter fails to write string dictionary
> ------------------------------------------------------
>
>                 Key: ARROW-5145
>                 URL: https://issues.apache.org/jira/browse/ARROW-5145
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 0.13.0
>         Environment: CentOS 7
> devtoolset-8 Software Collection
> g++ (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3)
>            Reporter: Ankur deDev
>            Priority: Minor
>         Attachments: repro.cpp
>
>
> I am a new user of the C++ library trying to output data that contains dictionary columns using {{RecordBatchFileWriter}}. The attached code seg faults, it is able to write a {{Feather}} file but fails when I use the {{RecordBatchFileWriter}}.
> I am not sure whether I am using the library correctly, but the produced Feather file loads properly in Julia and has the expected data. I installed Arrow following the [instructions|https://arrow.apache.org/install/] for CentOS. 
> Here is the stacktrace of the executable compiled with the command 'g++ -g -larrow repro.cpp' :
>  
> {{Program terminated with signal SIGSEGV, Segmentation fault.}}
> {{#0 0x0000000000000000 in ?? ()}}
> {{Missing separate debuginfos, use: debuginfo-install arrow-libs-0.13.0-1.el7.x86_64 boost-filesystem-1.53.0-27.el7.x86_64 boost-regex-1.53.0-27.el7.x86_64 boost-system-1.53.0-27.el7.x86_64 double-conversion-2.0.1-3.el7.x86_64 gflags-2.1.1-6.el7.x86_64 glibc-2.17-260.el7_6.3.x86_64 glog-0.3.3-8.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libicu-50.1.2-17.el7.x86_64 libstdc++-4.8.5-36.el7.x86_64 libzstd-1.3.8-1.el7.x86_64 lz4-1.7.5-2.el7.x86_64 snappy-1.1.0-3.el7.x86_64 zlib-1.2.7-18.el7.x86_64}}
> {{(gdb) bt}}
> {{#0 0x0000000000000000 in ?? ()}}
> {{#1 0x00007f43fc8870d1 in arrow::ipc::internal::FieldToFlatbufferVisitor::GetResult(arrow::Field const&, flatbuffers::Offset<org::apache::arrow::flatbuf::Field>*) ()}}
> {{ from /lib64/libarrow.so.13}}
> {{#2 0x00007f43fc880374 in arrow::ipc::internal::FieldToFlatbuffer(flatbuffers::FlatBufferBuilder&, arrow::Field const&, arrow::ipc::DictionaryMemo*, flatbuffers::Offset<org::apache::arrow::flatbuf::Field>*) () from /lib64/libarrow.so.13}}
> {{#3 0x00007f43fc880759 in arrow::ipc::internal::SchemaToFlatbuffer(flatbuffers::FlatBufferBuilder&, arrow::Schema const&, arrow::ipc::DictionaryMemo*, flatbuffers::Offset<org::apache::arrow::flatbuf::Schema>*) [clone .constprop.548] () from /lib64/libarrow.so.13}}
> {{#4 0x00007f43fc880f7f in arrow::ipc::internal::WriteSchemaMessage(arrow::Schema const&, arrow::ipc::DictionaryMemo*, std::shared_ptr<arrow::Buffer>*) () from /lib64/libarrow.so.13}}
> {{#5 0x00007f43fc8986eb in arrow::ipc::RecordBatchStreamWriter::RecordBatchStreamWriterImpl::Start() () from /lib64/libarrow.so.13}}
> {{#6 0x00007f43fc898936 in arrow::ipc::RecordBatchFileWriter::RecordBatchFileWriterImpl::Start() () from /lib64/libarrow.so.13}}
> {{#7 0x00007f43fc891cfc in arrow::ipc::RecordBatchFileWriter::WriteRecordBatch(arrow::RecordBatch const&, bool) () from /lib64/libarrow.so.13}}
> {{#8 0x00000000004022ec in job () at repro.cpp:63}}
> {{#9 0x00000000004026e7 in main (argc=1, argv=0x7ffef5892268) at repro.cpp:77}}
>  
> Thanks for your help. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)