You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2016/09/16 17:04:20 UTC

[jira] [Commented] (AVRO-1917) DataFileStream Skips Blocks with hasNext and nextBlock calls

    [ https://issues.apache.org/jira/browse/AVRO-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496825#comment-15496825 ] 

Doug Cutting commented on AVRO-1917:
------------------------------------

Can you please post some code that illustrates this?  That way we can better evaluate how to improve things.  Ideally this would be a unit test with a custom DatumReader that rejects data with certain criteria.  Thanks!

> DataFileStream Skips Blocks with hasNext and nextBlock calls
> ------------------------------------------------------------
>
>                 Key: AVRO-1917
>                 URL: https://issues.apache.org/jira/browse/AVRO-1917
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Michael Coon
>
> We have a situation where there are potentially large segments of data embedded in an Avro data item. Sometimes, an upstream system will become corrupted and add hundreds of thousands of array items in the structure. When I try to read the item as a Datum record, it blows the heap immediately. 
> To catch this situation, I needed to create a custom DatumReader that checked the size of arrays and byte[] and if exceeding a threshold, throws a custom exception that I detect and skip the corrupted item in the file. However, to accomplish the try-catch-skip functionality, I had to use a hasNext, and nextBlock to get the ByteBuffer and send to my reader to catch the situation. Unfortunately, calling "hasNext" and then "nextBlock" actually skips the first block in the underlying data stream. This is because "nextBlock" calls "hasNext", which reads the next block. So I called it, then nextBlock called it, causing bytes to be skipped. My solution is to do a do...while loop and catch "NoSuchElementException", but this is not intuitive and required me to review the code to know how to work around it. The fix is to create a condition that both hasNext and nextBlock agree so that it doesn't advance forward reading the next block in hasNext call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)