You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Kevin Irwin (JIRA)" <ji...@apache.org> on 2013/05/08 17:49:16 UTC

[jira] [Created] (AVRO-1326) Files written with bzip2 codec cannot be read

Kevin Irwin created AVRO-1326:
---------------------------------

             Summary: Files written with bzip2 codec cannot be read
                 Key: AVRO-1326
                 URL: https://issues.apache.org/jira/browse/AVRO-1326
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.7.4
            Reporter: Kevin Irwin
            Priority: Minor


When attempting to read a file written using the bzip2 codec for compression, the following exception is thrown upon completion of the first encoded block:

Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
	at BzipTests.main(BzipTests.java:28)
Caused by: java.io.IOException: Block read partially, the data may be corrupt
	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194)
	... 1 more

An inspection of BZip2Codec indicates the root cause is in the compress() method. The entire supplied ByteBuffer is compressed, not just the valid portion of the buffer.  On decompress, the resultant length is then larger than the recorded uncompressed block size.

On line 51:
outputStream.write(uncompressedData.array());

should be:
outputStream.write(uncompressedData.array(), uncompressedData.position(), uncompressedData.remaining());




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira