You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2010/05/14 00:43:43 UTC

[jira] Created: (AVRO-541) Java: TestDataFileConcat sometimes fails

Java: TestDataFileConcat sometimes fails
----------------------------------------

                 Key: AVRO-541
                 URL: https://issues.apache.org/jira/browse/AVRO-541
             Project: Avro
          Issue Type: Bug
          Components: java
            Reporter: Doug Cutting
            Priority: Critical


TestDataFileConcat intermittently fails with:

{code}
Testcase: testConcateateFiles[5] took 0.032 sec
        Caused an ERROR
java.io.IOException: Block read partially, the data may be corrupt
org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
        at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
        at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
Caused by: java.io.IOException: Block read partially, the data may be corrupt
        at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting resolved AVRO-541.
-------------------------------

    Hadoop Flags: [Reviewed]
      Resolution: Fixed

I just committed this.  Thanks, Scott!

> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch, AVRO-541.patch, AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900550#action_12900550 ] 

Scott Carey commented on AVRO-541:
----------------------------------

Sounds good.  I will be offline and unable to get to this until Monday 8/23.  I can do those changes and commit it then if you have not gotten to it.

> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch, AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901658#action_12901658 ] 

Doug Cutting commented on AVRO-541:
-----------------------------------

+1 looks good to me!

> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch, AVRO-541.patch, AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-541:
------------------------------

    Fix Version/s: 1.4.0

> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Priority: Critical
>             Fix For: 1.4.0
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-541:
-----------------------------

    Attachment: AVRO-541.patch

This patch addresses the issue here.

Furthermore, it cleans up and refactors DataFileStream and DataFileWriter a little bit, encapsulating block write, decode, and encode work in DataFileStream.DataBlock for consistency.

The bug here was caused by a quirk in how Inflater.java works.   This quirk ONLY affects deflate with 'nowrap' mode.  Simply changing nowrap to false stops this bug, but is not up to spec.

The simplest work-around was to use InflaterOutputStream instead of InflaterInputStream.  This also allows for sharing more code between compress() and decompress().

The OutputStream variations avoid the complexity of having to deal with detecting the end of the stream that happens with the read() methods of the OutputStream interface, making it all much simpler, both in our code and in the internals of InflaterOutputStream and DeflaterOutputStream compared to the InputStream variants.   Its just easier to 'push' to the Inflate and Deflate API than to pull.

For some information on the sorts of things that were happening, see this Java bug: 
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4795299

The work-arounds there do not work well for a case where the end of the array is not guaranteed to be the end of the stream, which it is not when abstracted through a ByteBuffer for input in decompress().


> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch, AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897566#action_12897566 ] 

Scott Carey commented on AVRO-541:
----------------------------------

Additonally, the last patch has a more robust test to expose these issues -- looping through the work 50 times with different buffer sizes (TestDataFileConcat).    I have tested this patch with the unit test set to run on test.count = 300 with the syncInterval at every integer between 100 and 3099 inclusive successfully.

Interop tests for files pass between Java and Python. 

> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch, AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated AVRO-541:
-----------------------------

    Attachment: AVRO-541.patch

Updated patch fixing checkstyle issue and changing the data file concatenation test to use the current time as the random seed.  
I will commit this tomorrow afternoon if there are no objections.

> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch, AVRO-541.patch, AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900534#action_12900534 ] 

Doug Cutting commented on AVRO-541:
-----------------------------------

Scott, if I don't hear otherwise, I'll change the test back to use a random seed, fix the checkstyle warning and commit this.

> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch, AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-541:
------------------------------

    Attachment: AVRO-541.patch

Here's a patch (not for commit) that forces this to fail every time, deterministically, by hardwiring the random seed to a value that triggers failures.

Some observations:
 - failures appear to happen roughly 1/16 times
 - failures observed were always when appending a compressed file to an uncompressed
 - in this particular failure, the final bytes of an appended block are incorrect, just before the sync marker.  These bytes should be '355 205 335 356 236   r' but are instead 'w   y   b   p   d   D 215 252 270 335 324 335'.

I found this by looking for the value that fails the unit test:

expected:<{"stringField": "dwvpxfdknqocdbppkpjfkmkmppcowqcmw", "longField": -4115970600535328707}> but was:<{"stringField": "dwvpxfdknqocdbppkpjfkmkmppcowqcmw", "longField": -125568963}>

One can scan the file for "dwv..." to find where this should be.  Fortunately the bug is in an uncompressed file, build/test/test-null-A.avro.  To find what the bytes for "longField" should be, one can look for "dwv..." in build/test/test-null-A.avro.  Note that the sync marker, unique per file, is found following the null byte following the schema text at the head of the file.

So it appears that, for some reason, the uncompressed data buffer that's appended in this is both too long and contains some incorrect data at its end.  I have no idea yet why.

Scott, as the author of much of this, do you have any idea?

> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897901#action_12897901 ] 

Doug Cutting commented on AVRO-541:
-----------------------------------

Thanks for fixing this!  Overall the patch looks good.  A few nits:
 - should the random seed still be a constant, or should we switch back to currentTimeMillis()?
 - checkstyle complains about an empty statement at line 120 of DeflateCodec.java

I switched back to a random seed and ran the test 50+ times without a failure.

> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch, AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896944#action_12896944 ] 

Scott Carey commented on AVRO-541:
----------------------------------

OK, this one is getting interesting.  I wish it wasn't so interesting...

So, here are some perplexing details noticed while stepping through in the debugger:
* Concatenate from gzip-6 >> null uncompresses the gzip blocks and writes uncompressed blocks.  This is where we are seeing the bug.
* Take the exact same gzip-6 file, and concatenate it to another gzip'ed file and tell it to force compression, causes the same code to execute on the front side as expected:  uncompress the blocks.
* In the first case above, the data sometimes comes out of a block corrupted.  The first 'chunk' from java.util.zip.Inflater is good, then it is junk after that.  This junk is always at the end of our block as the first chunk from Inflater is slightly smaller than our block size.
* In the second case, the data does not come out of the block corrupted!  That is, sometimes Inflater.java uncompresses the same block fine, and sometimes it does not.
* If you change the avro file block size (syncInterval) you will get different results.  A different block size will cause _different_ blocks of the file to be corrupt, or none of them corrupt.  So it seems that the random seed's influence on reproducing the bug it is not caused by a data block's data contents, but its data size in relation to the block size.
* If you change the schema from (string, long) to (long, string) you get different errors -- mostly hard exceptions rather than validation errors.
* The issue always corresponds with the InflaterInputStream not returning '-1' at the end of stream but throwing an apparently spurious exception (because the same data file can sometimes be uncompressed successfully).


The good news:
When you comment out the JUnit assertEquals(), the test still fails because Avro detects that the block is corrupted, or some other error occurs.  So most likely any user running into this in the real world would not have silent data corruption.

I currently use this feature quite a bit, but only concatenating from gzip to gzip without uncompressing the block (I can concatenate faster than the disks can handle this way -- from thousands of smaller files into larger ones).

I should have more time to look into this later today.  Some next steps:
Produce a reduced test case where decompression fails, side by side with one where it works on the same file.  Hopefully this will either help pinpoint a bug in our use of InflaterInputStream or in Inflater itself.
Possible work-arounds and code cleanup:
- Use Inflater directly instead of InflaterInputStream to reduce the layers between gzip compression and the JNI code in Inflater.java.
- Refactor DataFileStream and DataFileWriter to use the same codepath for block decompression.



> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896387#action_12896387 ] 

Scott Carey commented on AVRO-541:
----------------------------------

Thanks for getting it this far Doug, I'll dig into it.

If the failure is always when appending compressed to uncompressed, then either the extra data comes from when the block was compressed, or is a flaw in handling buffers when we are decompressing.  



> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (AVRO-541) Java: TestDataFileConcat sometimes fails

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey reassigned AVRO-541:
--------------------------------

    Assignee: Scott Carey

> Java: TestDataFileConcat sometimes fails
> ----------------------------------------
>
>                 Key: AVRO-541
>                 URL: https://issues.apache.org/jira/browse/AVRO-541
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.4.0
>
>         Attachments: AVRO-541.patch
>
>
> TestDataFileConcat intermittently fails with:
> {code}
> Testcase: testConcateateFiles[5] took 0.032 sec
>         Caused an ERROR
> java.io.IOException: Block read partially, the data may be corrupt
> org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:173)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:193)
>         at org.apache.avro.TestDataFileConcat.testConcateateFiles(TestDataFileConcat.java:141)
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:157)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.