You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/11/25 20:28:00 UTC

[jira] [Commented] (ARROW-14805) [C++] Support Count function without projections in ORC to avoid loading all columns

    [ https://issues.apache.org/jira/browse/ARROW-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449306#comment-17449306 ] 

Joris Van den Bossche commented on ARROW-14805:
-----------------------------------------------

Yes, I would indeed be good to implement the fast path for this.

> [C++] Support Count function without projections in ORC to avoid loading all columns
> ------------------------------------------------------------------------------------
>
>                 Key: ARROW-14805
>                 URL: https://issues.apache.org/jira/browse/ARROW-14805
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: xiangxiang Shen
>            Priority: Major
>              Labels: ORC, c++, dataset
>
> For ORC support in dataset, when execute count query without projections, just like "select count(*) from table", it will load all columns. Because orc lib code is that [https://github.com/apache/orc/blob/22828f79a526069d9629719c9476b7addad91ae6/c%2B%2B/src/Reader.cc#L120-L144.]
>  
> Arrow side can improve it like parquet in dataset.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)