You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/08/23 00:44:00 UTC

[jira] [Assigned] (ORC-959) C++ reader crash in resolving nested List columns for SearchArgument

     [ https://issues.apache.org/jira/browse/ORC-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang reassigned ORC-959:
----------------------------------

    Assignee: Quanlong Huang

> C++ reader crash in resolving nested List columns for SearchArgument
> --------------------------------------------------------------------
>
>                 Key: ORC-959
>                 URL: https://issues.apache.org/jira/browse/ORC-959
>             Project: ORC
>          Issue Type: Bug
>          Components: C++
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>         Attachments: complextypestbl.orc
>
>
> SearchArgument currently only provides interfaces using column names. Only columns of struct fields can be correctly resolved. Other columns (e.g. inside LIST or MAP) will cause crash in resolving them.
> The following codes reproduce the issue: {code:cpp}
> #include <orc/OrcFile.hh>
> using namespace std;
> using namespace orc;
> int main() {
>   ORC_UNIQUE_PTR<InputStream> inStream = readLocalFile("complextypestbl.orc");
>   ReaderOptions options;
>   ORC_UNIQUE_PTR<Reader> reader = createReader(move(inStream), options);
>   RowReaderOptions rowReaderOptions;
>   ORC_UNIQUE_PTR<SearchArgumentBuilder> sarg = SearchArgumentFactory::newBuilder();
>   sarg->lessThanEquals("f", PredicateDataType::STRING, Literal("bbb", 3));
>   ORC_UNIQUE_PTR<SearchArgument> final_sarg = sarg->build();
>   rowReaderOptions.searchArgument(move(final_sarg));
>   ORC_UNIQUE_PTR<RowReader> rowReader = reader->createRowReader(rowReaderOptions);
>   ORC_UNIQUE_PTR<ColumnVectorBatch> batch = rowReader->createRowBatch(1024);
>   return 0;
> }
> {code}
> complextypestbl.orc is an ORC file of a ACID table with the following schema:
> {code}
> id bigint
> int_array array<int>
> int_array_array array<array<int>>
> int_map map<string, int> 
> int_map_array array<map<string, int>>
> nested_struct struct<a: int, b: array<int>, c: struct<d: array<array<struct<e: int, f: string>>>>, g: map<string, struct<h: struct<i: array<double>>>>>
> {code}
> The above C++ codes push down a predicate on the "f" column. GDB stacktrace for the crash:
> {code}
> Program received signal SIGSEGV, Segmentation fault.
> orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:28
> 28	      if (type.getFieldName(i) == colName) {
> (gdb) bt
> #0  orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:28
> #1  0x000000000045a518 in orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:31
> #2  0x000000000045a518 in orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:31
> #3  0x000000000045a67f in orc::SargsApplier::SargsApplier (this=0x200b9f0, type=..., searchArgument=<optimized out>, rowIndexStride=<optimized out>, writerVersion=<optimized out>)
>     at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:56
> #4  0x00000000004253f8 in orc::RowReaderImpl::RowReaderImpl (this=0x2009760, _contents=..., opts=...) at /home/quanlong/workspace/orc/c++/src/Reader.cc:244
> #5  0x00000000004257ad in orc::ReaderImpl::createRowReader (this=<optimized out>, opts=...) at /home/quanlong/workspace/orc/c++/src/Reader.cc:765
> #6  0x000000000040b688 in main ()
> (gdb) l
> 23	
> 24	  // find column id from column name
> 25	  uint64_t SargsApplier::findColumn(const Type& type,
> 26	                                    const std::string& colName) {
> 27	    for (uint64_t i = 0; i != type.getSubtypeCount(); ++i) {
> 28	      if (type.getFieldName(i) == colName) {
> 29	        return type.getSubtype(i)->getColumnId();
> 30	      } else {
> 31	        uint64_t ret = findColumn(*type.getSubtype(i), colName);
> 32	        if (ret != INVALID_COLUMN_ID) {
> (gdb) p type.getKind()
> $16 = orc::LIST
> {code}
> Only STRUCT type has valid field names. So the above codes crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)