You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/08/22 09:16:00 UTC

[jira] [Created] (ORC-960) Create SearchArgument using column ids

Quanlong Huang created ORC-960:
----------------------------------

             Summary: Create SearchArgument using column ids
                 Key: ORC-960
                 URL: https://issues.apache.org/jira/browse/ORC-960
             Project: ORC
          Issue Type: New Feature
          Components: C++
            Reporter: Quanlong Huang


Currently, SearchArguments are created using column names, e.g. in orc/sargs/SearchArgument.hh
{code:cpp}
virtual SearchArgumentBuilder& lessThan(const std::string& column,
                                        PredicateDataType type,
                                        Literal literal) = 0;{code}
The name string is the leaf field name which can be duplicated if there are nested types, e.g.
{code:sql}
id int
s1 struct<id:int,name:string>
s2 struct<id:int,name:string>
{code}
There are 3 leaf columns using name 'id'. The current code of resolving the column name can only found the first matched one:
{code:cpp}
  // find column id from column name
  uint64_t SargsApplier::findColumn(const Type& type,
                                    const std::string& colName) {
    for (uint64_t i = 0; i != type.getSubtypeCount(); ++i) {
      if (type.getFieldName(i) == colName) {
        return type.getSubtype(i)->getColumnId();
      } else {
        uint64_t ret = findColumn(*type.getSubtype(i), colName);
        if (ret != INVALID_COLUMN_ID) {
          return ret;
        }
      }
    }
    return INVALID_COLUMN_ID;
  }
{code}
[https://github.com/apache/orc/blob/2dcbd6281e2fbeeaf0ffe46aa3b78cd3df96ed62/c%2B%2B/src/sargs/SargsApplier.cc#L25]

Since what we need is actually the column id, let's provide intefaces for column ids directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)