You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/08/05 15:42:00 UTC

[jira] [Resolved] (ORC-1232) [C++]Disable metrics collector by default

     [ https://issues.apache.org/jira/browse/ORC-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved ORC-1232.
--------------------------------
    Fix Version/s: 1.9.0
       Resolution: Fixed

Issue resolved by pull request 1206
[https://github.com/apache/orc/pull/1206]

> [C++]Disable metrics collector by default
> -----------------------------------------
>
>                 Key: ORC-1232
>                 URL: https://issues.apache.org/jira/browse/ORC-1232
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Xinyu Zeng
>            Assignee: ZhangXin
>            Priority: Major
>             Fix For: 1.9.0
>
>
> ORC-961 introduced a metrics collector for the reader. However, it may affect the performance of reading ORC files. It may be helpful to disable it as default.
>  
> Reproducable experiment result:
> Alibaba Cloud [ecs.s6-c1m4.xlarge|https://help.aliyun.com/document_detail/25378.html#s6], running Ubuntu 20.04, ESSD PL1 40GB
> The original file is 4.1GB csv file with generated string with some degree of repetiveness (the value of one column follows a zipfian distribution). The ORC file with dictionary encoding and no block compression is 319MB.
>  
> Time of running orc-scan with metrics enabled: 7.5s
> Time of running orc-scan with metrics disabled: 1.5s
> The action of disable is implemented by adding 
> {code:java}
> readerOpts.setReaderMetrics(nullptr); {code}
> after [https://github.com/apache/orc/blob/02e48107b36b8ed868797dadcd7355a632519d48/tools/src/FileScan.cc#L26]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)