You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/08/01 22:26:00 UTC

[jira] [Commented] (ORC-1232) [C++]Disable metrics collector by default

    [ https://issues.apache.org/jira/browse/ORC-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573946#comment-17573946 ] 

Dongjoon Hyun commented on ORC-1232:
------------------------------------

Thank you for reporting, [~xzeng] .

> [C++]Disable metrics collector by default
> -----------------------------------------
>
>                 Key: ORC-1232
>                 URL: https://issues.apache.org/jira/browse/ORC-1232
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Xinyu Zeng
>            Priority: Major
>
> ORC-961 introduced a metrics collector for the reader. However, it may affect the performance of reading ORC files. It may be helpful to disable it as default.
>  
> Reproducable experiment result:
> Alibaba Cloud [ecs.s6-c1m4.xlarge|https://help.aliyun.com/document_detail/25378.html#s6], running Ubuntu 20.04, ESSD PL1 40GB
> The original file is 4.1GB csv file with generated string with some degree of repetiveness (the value of one column follows a zipfian distribution). The ORC file with dictionary encoding and no block compression is 319MB.
>  
> Time of running orc-scan with metrics enabled: 7.5s
> Time of running orc-scan with metrics disabled: 1.5s
> The action of disable is implemented by adding 
> {code:java}
> readerOpts.setReaderMetrics(nullptr); {code}
> after [https://github.com/apache/orc/blob/02e48107b36b8ed868797dadcd7355a632519d48/tools/src/FileScan.cc#L26]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)