You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Penglei Shi (Jira)" <ji...@apache.org> on 2023/01/03 10:29:00 UTC

[jira] [Comment Edited] (ORC-350) Optionally disable/specify indexes for columns

    [ https://issues.apache.org/jira/browse/ORC-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653932#comment-17653932 ] 

Penglei Shi edited comment on ORC-350 at 1/3/23 10:28 AM:
----------------------------------------------------------

In some cases,  especially columns which are big json or text, predicates pushdown is needless, column statistics and row index are barely used, Can ORC disable column statistics? Not only does not write RowIndexEntry via `orc.create.index=false`, but also does not call methods like 'updateString/updateDouble/updateXXX'. 


was (Author: penglei shi):
In some cases,  especially columns which are big json or text, predicates pushdown is needless, column statistics and row index are barely used, Can ORC disables column statistics? Not only does not write RowIndexEntry via `orc.create.index=false`, but also does not call methods like 'updateString/updateDouble/updateXXX'. 

> Optionally disable/specify indexes for columns
> ----------------------------------------------
>
>                 Key: ORC-350
>                 URL: https://issues.apache.org/jira/browse/ORC-350
>             Project: ORC
>          Issue Type: Sub-task
>            Reporter: Prasanth Jayachandran
>            Priority: Major
>
> There are many cases where entire xml or big json is stored as string column. If we autogenerate indexes on those columns, we often run into issues with protobuf stream explosion. The only workaround for now is to change from string to binary. It will be good to have an option to disable indexes on specific columns. 
> Regardless, I think we should have max limits on string column statistics. If that limit is exceeded PPD should handle it accordingly (by returning YES_NO_NULL).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)