You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2021/12/23 19:48:00 UTC

[jira] [Assigned] (ORC-1063) Avoid ORC Reader Max Length Confusion

     [ https://issues.apache.org/jira/browse/ORC-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Mollitor reassigned ORC-1063:
-----------------------------------


> Avoid ORC Reader Max Length Confusion
> -------------------------------------
>
>                 Key: ORC-1063
>                 URL: https://issues.apache.org/jira/browse/ORC-1063
>             Project: ORC
>          Issue Type: Improvement
>          Components: Java
>    Affects Versions: 1.7.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>
> I just came across this confusion in the wild (i.e. production system).
> {code:java|title=ReaderImpl.java}
>   @Override
>   public String toString() {
>     StringBuilder buffer = new StringBuilder();
>     buffer.append("ORC Reader(");
>     buffer.append(path);
>     if (maxLength != -1) {
>       buffer.append(", ");
>       buffer.append(maxLength);
>     }
>     buffer.append(")");
>     return buffer.toString();
>   }
> {code}
> {code:java|title=OrcConf.java}
>   MAX_FILE_LENGTH("orc.max.file.length", "orc.max.file.length", Long.MAX_VALUE,
>       "The maximum size of the file to read for finding the file tail. This\n" +
>           "is primarily used for streaming ingest to read intermediate\n" +
>           "footers while the file is still open"),
> {code}
> https://github.com/apache/orc/blob/883aae8757257a8314c0ece07e5ef0238600717c/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L1107-L1109
> There seems to be some confusion here about how to set this value to "there is no maximum value."  The configuration denotes {{MAX_VALUE}} as having no value, but the {{toString()}} code is expecting "no maximum value" to be equal to -1.  I came across this because I saw some logging that indicated that I had a file that was of length ~9000PB.  This did not make any sense and was confusing.
> I suggest changing this to be any value less than 0 denotes "no maximum" and to use a Java {{Optional}} to avoid this confusion again.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)