You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2021/08/22 17:58:00 UTC
[jira] [Commented] (ORC-959) C++ reader crash in resolving nested
List columns for SearchArgument
[ https://issues.apache.org/jira/browse/ORC-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402856#comment-17402856 ]
Dongjoon Hyun commented on ORC-959:
-----------------------------------
Thank you for reporting, [~stigahuang].
> C++ reader crash in resolving nested List columns for SearchArgument
> --------------------------------------------------------------------
>
> Key: ORC-959
> URL: https://issues.apache.org/jira/browse/ORC-959
> Project: ORC
> Issue Type: Bug
> Components: C++
> Reporter: Quanlong Huang
> Priority: Major
> Attachments: complextypestbl.orc
>
>
> SearchArgument currently only provides interfaces using column names. Only columns of struct fields can be correctly resolved. Other columns (e.g. inside LIST or MAP) will cause crash in resolving them.
> The following codes reproduce the issue: {code:cpp}
> #include <orc/OrcFile.hh>
> using namespace std;
> using namespace orc;
> int main() {
> ORC_UNIQUE_PTR<InputStream> inStream = readLocalFile("complextypestbl.orc");
> ReaderOptions options;
> ORC_UNIQUE_PTR<Reader> reader = createReader(move(inStream), options);
> RowReaderOptions rowReaderOptions;
> ORC_UNIQUE_PTR<SearchArgumentBuilder> sarg = SearchArgumentFactory::newBuilder();
> sarg->lessThanEquals("f", PredicateDataType::STRING, Literal("bbb", 3));
> ORC_UNIQUE_PTR<SearchArgument> final_sarg = sarg->build();
> rowReaderOptions.searchArgument(move(final_sarg));
> ORC_UNIQUE_PTR<RowReader> rowReader = reader->createRowReader(rowReaderOptions);
> ORC_UNIQUE_PTR<ColumnVectorBatch> batch = rowReader->createRowBatch(1024);
> return 0;
> }
> {code}
> complextypestbl.orc is an ORC file of a ACID table with the following schema:
> {code}
> id bigint
> int_array array<int>
> int_array_array array<array<int>>
> int_map map<string, int>
> int_map_array array<map<string, int>>
> nested_struct struct<a: int, b: array<int>, c: struct<d: array<array<struct<e: int, f: string>>>>, g: map<string, struct<h: struct<i: array<double>>>>>
> {code}
> The above C++ codes push down a predicate on the "f" column. GDB stacktrace for the crash:
> {code}
> Program received signal SIGSEGV, Segmentation fault.
> orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:28
> 28 if (type.getFieldName(i) == colName) {
> (gdb) bt
> #0 orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:28
> #1 0x000000000045a518 in orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:31
> #2 0x000000000045a518 in orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:31
> #3 0x000000000045a67f in orc::SargsApplier::SargsApplier (this=0x200b9f0, type=..., searchArgument=<optimized out>, rowIndexStride=<optimized out>, writerVersion=<optimized out>)
> at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:56
> #4 0x00000000004253f8 in orc::RowReaderImpl::RowReaderImpl (this=0x2009760, _contents=..., opts=...) at /home/quanlong/workspace/orc/c++/src/Reader.cc:244
> #5 0x00000000004257ad in orc::ReaderImpl::createRowReader (this=<optimized out>, opts=...) at /home/quanlong/workspace/orc/c++/src/Reader.cc:765
> #6 0x000000000040b688 in main ()
> (gdb) l
> 23
> 24 // find column id from column name
> 25 uint64_t SargsApplier::findColumn(const Type& type,
> 26 const std::string& colName) {
> 27 for (uint64_t i = 0; i != type.getSubtypeCount(); ++i) {
> 28 if (type.getFieldName(i) == colName) {
> 29 return type.getSubtype(i)->getColumnId();
> 30 } else {
> 31 uint64_t ret = findColumn(*type.getSubtype(i), colName);
> 32 if (ret != INVALID_COLUMN_ID) {
> (gdb) p type.getKind()
> $16 = orc::LIST
> {code}
> Only STRUCT type has valid field names. So the above codes crash.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)