You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Brandon DeVries (JIRA)" <ji...@apache.org> on 2016/11/17 17:15:58 UTC

[jira] [Updated] (NIFI-3055) StandardRecordWriter can throw UTFDataFormatException

     [ https://issues.apache.org/jira/browse/NIFI-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon DeVries updated NIFI-3055:
----------------------------------
    Description: 
StandardRecordWriter.writeRecord()\[1] uses DataOutputStream.writeUTF()\[2] without checking the length of the value to be written.  If this length is greater than  65535 (2^16 - 1), you get a UTFDataFormatException "encoded string too long..."\[3].  Ultimately, this can 

Several of the field values being written in this way are pre-defined, and thus not likely an issue.  However, the "details" field can be populated by a processor, and can be of an arbitrary length.  Additionally, if the detail filed is indexed (which I didn't investigate, but I'm sure is easy enough to determine), then the length might be subject to the Lucene limit discussed in NIFI-2787.

\[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/StandardRecordWriter.java#L163-L173
\[2] http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeUTF%28java.lang.String%29
\[3] http://stackoverflow.com/questions/22741556/dataoutputstream-purpose-of-the-encoded-string-too-long-restriction

  was:
StandardRecordWriter.writeRecord()\[1] uses DataOutputStream.writeUTF()\[2] without checking the length of the value to be written.  If this length is greater than  65535 (2^16 - 1), you get a UTFDataFormatException "encoded string too long..."\[3].  Several of the field values being written in this way are pre-defined, and thus not likely an issue.  However, the "details" field can be populated by a processor, and can be of an arbitrary length.  Additionally, if the detail filed is indexed (which I didn't investigate, but I'm sure is easy enough to determine), then the length might be subject to the Lucene limit discussed in NIFI-2787.

\[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/StandardRecordWriter.java#L163-L173
\[2] http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeUTF%28java.lang.String%29
\[3] http://stackoverflow.com/questions/22741556/dataoutputstream-purpose-of-the-encoded-string-too-long-restriction


> StandardRecordWriter can throw UTFDataFormatException
> -----------------------------------------------------
>
>                 Key: NIFI-3055
>                 URL: https://issues.apache.org/jira/browse/NIFI-3055
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 0.7.1
>            Reporter: Brandon DeVries
>
> StandardRecordWriter.writeRecord()\[1] uses DataOutputStream.writeUTF()\[2] without checking the length of the value to be written.  If this length is greater than  65535 (2^16 - 1), you get a UTFDataFormatException "encoded string too long..."\[3].  Ultimately, this can 
> Several of the field values being written in this way are pre-defined, and thus not likely an issue.  However, the "details" field can be populated by a processor, and can be of an arbitrary length.  Additionally, if the detail filed is indexed (which I didn't investigate, but I'm sure is easy enough to determine), then the length might be subject to the Lucene limit discussed in NIFI-2787.
> \[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/StandardRecordWriter.java#L163-L173
> \[2] http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeUTF%28java.lang.String%29
> \[3] http://stackoverflow.com/questions/22741556/dataoutputstream-purpose-of-the-encoded-string-too-long-restriction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)