You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/08/22 08:54:00 UTC

[jira] [Created] (ORC-959) C++ reader crash in resolving nested List columns for SearchArgument

Quanlong Huang created ORC-959:
----------------------------------

             Summary: C++ reader crash in resolving nested List columns for SearchArgument
                 Key: ORC-959
                 URL: https://issues.apache.org/jira/browse/ORC-959
             Project: ORC
          Issue Type: Bug
          Components: C++
            Reporter: Quanlong Huang
         Attachments: complextypestbl.orc

SearchArgument currently only provides interfaces using column names. Only columns of struct fields can be correctly resolved. Other columns (e.g. inside LIST or MAP) will cause crash in resolving them.

The following codes reproduce the issue: {code:cpp}
#include <orc/OrcFile.hh>
using namespace std;
using namespace orc;

int main() {
  ORC_UNIQUE_PTR<InputStream> inStream = readLocalFile("complextypestbl.orc");
  ReaderOptions options;
  ORC_UNIQUE_PTR<Reader> reader = createReader(move(inStream), options);

  RowReaderOptions rowReaderOptions;

  ORC_UNIQUE_PTR<SearchArgumentBuilder> sarg = SearchArgumentFactory::newBuilder();
  sarg->lessThanEquals("f", PredicateDataType::STRING, Literal("bbb", 3));
  ORC_UNIQUE_PTR<SearchArgument> final_sarg = sarg->build();
  rowReaderOptions.searchArgument(move(final_sarg));

  ORC_UNIQUE_PTR<RowReader> rowReader = reader->createRowReader(rowReaderOptions);
  ORC_UNIQUE_PTR<ColumnVectorBatch> batch = rowReader->createRowBatch(1024);
  return 0;
}
{code}

complextypestbl.orc is an ORC file of a ACID table with the following schema:
{code}
id bigint
int_array array<int>
int_array_array array<array<int>>
int_map map<string, int> 
int_map_array array<map<string, int>>
nested_struct struct<a: int, b: array<int>, c: struct<d: array<array<struct<e: int, f: string>>>>, g: map<string, struct<h: struct<i: array<double>>>>>
{code}

The above C++ codes push down a predicate on the "f" column. GDB stacktrace for the crash:
{code}
Program received signal SIGSEGV, Segmentation fault.
orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:28
28	      if (type.getFieldName(i) == colName) {
(gdb) bt
#0  orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:28
#1  0x000000000045a518 in orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:31
#2  0x000000000045a518 in orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:31
#3  0x000000000045a67f in orc::SargsApplier::SargsApplier (this=0x200b9f0, type=..., searchArgument=<optimized out>, rowIndexStride=<optimized out>, writerVersion=<optimized out>)
    at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:56
#4  0x00000000004253f8 in orc::RowReaderImpl::RowReaderImpl (this=0x2009760, _contents=..., opts=...) at /home/quanlong/workspace/orc/c++/src/Reader.cc:244
#5  0x00000000004257ad in orc::ReaderImpl::createRowReader (this=<optimized out>, opts=...) at /home/quanlong/workspace/orc/c++/src/Reader.cc:765
#6  0x000000000040b688 in main ()
(gdb) l
23	
24	  // find column id from column name
25	  uint64_t SargsApplier::findColumn(const Type& type,
26	                                    const std::string& colName) {
27	    for (uint64_t i = 0; i != type.getSubtypeCount(); ++i) {
28	      if (type.getFieldName(i) == colName) {
29	        return type.getSubtype(i)->getColumnId();
30	      } else {
31	        uint64_t ret = findColumn(*type.getSubtype(i), colName);
32	        if (ret != INVALID_COLUMN_ID) {
(gdb) p type.getKind()
$16 = orc::LIST
{code}

Only STRUCT type has valid field names. So the above codes crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)