You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Michael Coon (JIRA)" <ji...@apache.org> on 2016/09/15 14:46:20 UTC
[jira] [Created] (AVRO-1917) DataFileStream Skips Blocks with
hasNext and nextBlock calls
Michael Coon created AVRO-1917:
----------------------------------
Summary: DataFileStream Skips Blocks with hasNext and nextBlock calls
Key: AVRO-1917
URL: https://issues.apache.org/jira/browse/AVRO-1917
Project: Avro
Issue Type: Bug
Components: java
Reporter: Michael Coon
We have a situation where there are potentially large segments of data embedded in an Avro data item. Sometimes, an upstream system will become corrupted and add hundreds of thousands of array items in the structure. When I try to read the item as a Datum record, it blows the heap immediately.
To catch this situation, I needed to create a custom DatumReader that checked the size of arrays and byte[] and if exceeding a threshold, throws a custom exception that I detect and skip the corrupted item in the file. However, to accomplish the try-catch-skip functionality, I had to use a hasNext, and nextBlock to get the ByteBuffer and send to my reader to catch the situation. Unfortunately, calling "hasNext" and then "nextBlock" actually skips the first block in the underlying data stream. This is because "nextBlock" calls "hasNext", which reads the next block. So I called it, then nextBlock called it, causing bytes to be skipped. My solution is to do a do...while loop and catch "NoSuchElementException", but this is not intuitive and required me to review the code to know how to work around it. The fix is to create a condition that both hasNext and nextBlock agree so that it doesn't advance forward reading the next block in hasNext call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)