You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/06/29 01:12:00 UTC

[jira] [Commented] (SPARK-28208) When upgrading to ORC 1.5.6, the reader needs to be closed.

    [ https://issues.apache.org/jira/browse/SPARK-28208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875335#comment-16875335 ] 

Dongjoon Hyun commented on SPARK-28208:
---------------------------------------

As I commented on ORC-525, this is unexpected behavior change to the users at the bug fix release.
> Why do we enforce such a behavior change at bug fix release from 1.5.5 to 1.5.6?

> When upgrading to ORC 1.5.6, the reader needs to be closed.
> -----------------------------------------------------------
>
>                 Key: SPARK-28208
>                 URL: https://issues.apache.org/jira/browse/SPARK-28208
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Owen O'Malley
>            Priority: Major
>
> As part of the ORC 1.5.6 release, we optimized the common pattern of:
> {code:java}
> Reader reader = OrcFile.createReader(...);
> RecordReader rows = reader.rows(...);{code}
> which used to open one file handle in the Reader and a second one in the RecordReader. Users were seeing this as a regression when moving from the old Spark ORC reader via hive to the new native reader, because it opened twice as many files on the NameNode.
> In ORC 1.5.6, we changed the ORC library so that it keeps the file handle in the Reader until it is either closed or a RecordReader is created from it. This has cut down the number of file open requests on the NameNode by half in typical spark applications. (Hive's ORC code avoided this problem by putting the file footer in to the input splits, but that has other problems.)
> To get the new optimization without leaking file handles, Spark needs to be close the readers that aren't used to create RecordReaders.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org