You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2021/12/23 19:48:00 UTC

[jira] [Created] (ORC-1063) Avoid ORC Reader Max Length Confusion

David Mollitor created ORC-1063:
-----------------------------------

             Summary: Avoid ORC Reader Max Length Confusion
                 Key: ORC-1063
                 URL: https://issues.apache.org/jira/browse/ORC-1063
             Project: ORC
          Issue Type: Improvement
          Components: Java
    Affects Versions: 1.7.0
            Reporter: David Mollitor
            Assignee: David Mollitor


I just came across this confusion in the wild (i.e. production system).

{code:java|title=ReaderImpl.java}
  @Override
  public String toString() {
    StringBuilder buffer = new StringBuilder();
    buffer.append("ORC Reader(");
    buffer.append(path);
    if (maxLength != -1) {
      buffer.append(", ");
      buffer.append(maxLength);
    }
    buffer.append(")");
    return buffer.toString();
  }
{code}

{code:java|title=OrcConf.java}
  MAX_FILE_LENGTH("orc.max.file.length", "orc.max.file.length", Long.MAX_VALUE,
      "The maximum size of the file to read for finding the file tail. This\n" +
          "is primarily used for streaming ingest to read intermediate\n" +
          "footers while the file is still open"),
{code}

https://github.com/apache/orc/blob/883aae8757257a8314c0ece07e5ef0238600717c/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L1107-L1109

There seems to be some confusion here about how to set this value to "there is no maximum value."  The configuration denotes {{MAX_VALUE}} as having no value, but the {{toString()}} code is expecting "no maximum value" to be equal to -1.  I came across this because I saw some logging that indicated that I had a file that was of length ~9000PB.  This did not make any sense and was confusing.

I suggest changing this to be any value less than 0 denotes "no maximum" and to use a Java {{Optional}} to avoid this confusion again.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)