You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2023/07/01 01:08:00 UTC

[jira] [Commented] (ORC-1458) reduce namenode getFileinfo rpc

    [ https://issues.apache.org/jira/browse/ORC-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739256#comment-17739256 ] 

Dongjoon Hyun commented on ORC-1458:
------------------------------------

Thank you for reporting, [~mamingchen]. Please feel free to make a PR for that.

> reduce namenode getFileinfo rpc 
> --------------------------------
>
>                 Key: ORC-1458
>                 URL: https://issues.apache.org/jira/browse/ORC-1458
>             Project: ORC
>          Issue Type: Wish
>          Components: Java, Reader
>            Reporter: Mingchen_Ma
>            Priority: Minor
>
> In the ReaderImpl.java code, there is the following logic:
> {code:java}
> if (maxFileLength == Long. MAX_VALUE) { 
> FileStatus fileStatus = fs.getFileStatus(path);          
> size = fileStatus. getLen();          
> modificationTime = fileStatus. getModificationTime(); 
> }
> {code}
> The above logic is to obtain the length of the file so as to read the footer of orc. But because of this, when we read the orc file on hdfs, an open operation will cause an additional getFileinfo rpc operation by default (unless we set the file length through ReaderOptions.set before the orc open).
> Because we have opened the file in ReaderImpl, can we optimize the rpc call of NN in the following way (in a high-load cluster, the pressure on the namenode can be significantly reduced):
> {code:java}
> if (maxFileLength == Long. MAX_VALUE) {           
> size = (DFSInputStream)file.getWrappedStream.getFileLength();          
> modificationTime = -1; 
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)