You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2021/08/22 17:59:00 UTC

[jira] [Commented] (ORC-960) Create SearchArgument using column ids

    [ https://issues.apache.org/jira/browse/ORC-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402857#comment-17402857 ] 

Dongjoon Hyun commented on ORC-960:
-----------------------------------

Thank you for reporting, [~stigahuang].
cc [~gangwu]

> Create SearchArgument using column ids
> --------------------------------------
>
>                 Key: ORC-960
>                 URL: https://issues.apache.org/jira/browse/ORC-960
>             Project: ORC
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Quanlong Huang
>            Priority: Major
>
> Currently, SearchArguments are created using column names, e.g. in orc/sargs/SearchArgument.hh
> {code:cpp}
> virtual SearchArgumentBuilder& lessThan(const std::string& column,
>                                         PredicateDataType type,
>                                         Literal literal) = 0;{code}
> The name string is the leaf field name which can be duplicated if there are nested types, e.g.
> {code:sql}
> id int
> s1 struct<id:int,name:string>
> s2 struct<id:int,name:string>
> {code}
> There are 3 leaf columns using name 'id'. The current code of resolving the column name can only found the first matched one:
> {code:cpp}
>   // find column id from column name
>   uint64_t SargsApplier::findColumn(const Type& type,
>                                     const std::string& colName) {
>     for (uint64_t i = 0; i != type.getSubtypeCount(); ++i) {
>       if (type.getFieldName(i) == colName) {
>         return type.getSubtype(i)->getColumnId();
>       } else {
>         uint64_t ret = findColumn(*type.getSubtype(i), colName);
>         if (ret != INVALID_COLUMN_ID) {
>           return ret;
>         }
>       }
>     }
>     return INVALID_COLUMN_ID;
>   }
> {code}
> [https://github.com/apache/orc/blob/2dcbd6281e2fbeeaf0ffe46aa3b78cd3df96ed62/c%2B%2B/src/sargs/SargsApplier.cc#L25]
> Since what we need is actually the column id, let's provide intefaces for column ids directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)