You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Dmitriy Smirnov (JIRA)" <ji...@apache.org> on 2011/07/24 02:01:09 UTC

[jira] [Created] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
--------------------------------------------------------------------------------------------

                 Key: COMPRESS-146
                 URL: https://issues.apache.org/jira/browse/COMPRESS-146
             Project: Commons Compress
          Issue Type: Bug
          Components: Compressors
         Environment: all
            Reporter: Dmitriy Smirnov
            Priority: Critical


BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
This error occurs mostly on large size files as sudden EOF somwere in the middle of the file.

An example of data from archived file:
$ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
--
24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
--
40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
.....


Suggested solution:

    private void initBlock() throws IOException {
        char magic0 = bsGetUByte();
        char magic1 = bsGetUByte();
        char magic2 = bsGetUByte();
        char magic3 = bsGetUByte();
        char magic4 = bsGetUByte();
        char magic5 = bsGetUByte();

        if( magic0 == 0x17 && magic1 == 0x72 && magic2 == 0x45
            && magic3 == 0x38 && magic4 == 0x50 && magic5 == 0x90 ) 
          
        {
        	if( complete() ) // end of file);
        	{
        		return;
        	} else
        	{
        		magic0 = bsGetUByte();
                magic1 = bsGetUByte();
                magic2 = bsGetUByte();
                magic3 = bsGetUByte();
                magic4 = bsGetUByte();
                magic5 = bsGetUByte();
        	}
        } 

        if (magic0 != 0x31 || // '1'
                   magic1 != 0x41 || // 'A'
                   magic2 != 0x59 || // 'Y'
                   magic3 != 0x26 || // '&'
                   magic4 != 0x53 || // 'S'
                   magic5 != 0x59 // 'Y'
                   ) {
            this.currentState = EOF;
            throw new IOException("bad block header");
        } else {
            this.storedBlockCRC = bsGetInt();
            this.blockRandomised = bsR(1) == 1;

            /**
             * Allocate data here instead in constructor, so we do not allocate
             * it if the input file is empty.
             */
            if (this.data == null) {
                this.data = new Data(this.blockSize100k);
            }

            // currBlockNo++;
            getAndMoveToFrontDecode();

            this.crc.initialiseCRC();
            this.currentState = START_BLOCK_STATE;
        }
    }


    private boolean 
    complete() throws IOException 
    { 
    	boolean result = false;
        this.storedCombinedCRC = bsGetInt();
        try
        {
            if (in.available() == 0 ) 
            {
                throw new IOException( "EOF" );
            }
            checkMagicChar('B', "first");
            checkMagicChar('Z', "second");
            checkMagicChar('h', "third");

            int blockSize = this.in.read();
            if ((blockSize < '1') || (blockSize > '9')) {
                throw new IOException("Stream is not BZip2 formatted: illegal "
                                      + "blocksize " + (char) blockSize);
            }

            this.blockSize100k = blockSize - '0';
            this.bsLive = 0;
            this.bsBuff = 0;

        } catch( IOException e )
        {
        	this.currentState = EOF;
        	
        	result = true;
        }
        
        this.data = null;
        if (this.storedCombinedCRC != this.computedCombinedCRC) {
            throw new IOException("BZip2 CRC error");
        }
        this.computedCombinedCRC = 0;    
        return result;
    }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

Posted by "Lasse Collin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lasse Collin updated COMPRESS-146:
----------------------------------

    Attachment: bzip2-concatenated.patch

I attached a patch that adds support for concatenated .bz2 streams. There is a new constructor that allows getting the old behavior where the decompressor stops after the first .bz2 stream.

There is also a fix for another bug: The error detail messages of the exceptions in the init and checkMagicChar functions could contain control characters. Having that information in the exceptions isn't very useful so I omitted them. I had to modify one of those messages so I thought I'll include a fix for the other too.

> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> --------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-146
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-146
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>         Environment: all
>            Reporter: Dmitriy Smirnov
>            Priority: Critical
>              Labels: 0x177245385090
>         Attachments: bzip2-concatenated.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> This error occurs mostly on large size files as sudden EOF somwere in the middle of the file.
> An example of data from archived file:
> $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
> 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
> 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
> --
> 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
> 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
> --
> 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
> 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
> .....
> Suggested solution:
>     private void initBlock() throws IOException {
>         char magic0 = bsGetUByte();
>         char magic1 = bsGetUByte();
>         char magic2 = bsGetUByte();
>         char magic3 = bsGetUByte();
>         char magic4 = bsGetUByte();
>         char magic5 = bsGetUByte();
>         if( magic0 == 0x17 && magic1 == 0x72 && magic2 == 0x45
>             && magic3 == 0x38 && magic4 == 0x50 && magic5 == 0x90 ) 
>           
>         {
>         	if( complete() ) // end of file);
>         	{
>         		return;
>         	} else
>         	{
>         		magic0 = bsGetUByte();
>                 magic1 = bsGetUByte();
>                 magic2 = bsGetUByte();
>                 magic3 = bsGetUByte();
>                 magic4 = bsGetUByte();
>                 magic5 = bsGetUByte();
>         	}
>         } 
>         if (magic0 != 0x31 || // '1'
>                    magic1 != 0x41 || // 'A'
>                    magic2 != 0x59 || // 'Y'
>                    magic3 != 0x26 || // '&'
>                    magic4 != 0x53 || // 'S'
>                    magic5 != 0x59 // 'Y'
>                    ) {
>             this.currentState = EOF;
>             throw new IOException("bad block header");
>         } else {
>             this.storedBlockCRC = bsGetInt();
>             this.blockRandomised = bsR(1) == 1;
>             /**
>              * Allocate data here instead in constructor, so we do not allocate
>              * it if the input file is empty.
>              */
>             if (this.data == null) {
>                 this.data = new Data(this.blockSize100k);
>             }
>             // currBlockNo++;
>             getAndMoveToFrontDecode();
>             this.crc.initialiseCRC();
>             this.currentState = START_BLOCK_STATE;
>         }
>     }
>     private boolean 
>     complete() throws IOException 
>     { 
>     	boolean result = false;
>         this.storedCombinedCRC = bsGetInt();
>         try
>         {
>             if (in.available() == 0 ) 
>             {
>                 throw new IOException( "EOF" );
>             }
>             checkMagicChar('B', "first");
>             checkMagicChar('Z', "second");
>             checkMagicChar('h', "third");
>             int blockSize = this.in.read();
>             if ((blockSize < '1') || (blockSize > '9')) {
>                 throw new IOException("Stream is not BZip2 formatted: illegal "
>                                       + "blocksize " + (char) blockSize);
>             }
>             this.blockSize100k = blockSize - '0';
>             this.bsLive = 0;
>             this.bsBuff = 0;
>         } catch( IOException e )
>         {
>         	this.currentState = EOF;
>         	
>         	result = true;
>         }
>         
>         this.data = null;
>         if (this.storedCombinedCRC != this.computedCombinedCRC) {
>             throw new IOException("BZip2 CRC error");
>         }
>         this.computedCombinedCRC = 0;    
>         return result;
>     }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

Posted by "Lasse Collin (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145620#comment-13145620 ] 

Lasse Collin commented on COMPRESS-146:
---------------------------------------

XZCompressorInputStream defaults to concatenated, so doing it differently in BZip2CompressorInputStream is inconsistent. Maybe XZ needs to be changed too.

I think that in most cases concatenation support is wanted. Maybe the one-argument constructor should have a warning in the docs that usually one doesn't want to use it. It would still be easy to forget to use the right constructor since non-concatenated version works on most files. I wouldn't be surprised if most of the existing users of these classes would never get fixed to use the new constructor.
                
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> --------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-146
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-146
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>         Environment: all
>            Reporter: Dmitriy Smirnov
>            Priority: Critical
>              Labels: 0x177245385090
>             Fix For: 1.4
>
>         Attachments: bzip2-concatenated.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> This error occurs mostly on large size files as sudden EOF somwere in the middle of the file.
> An example of data from archived file:
> $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
> 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
> 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
> --
> 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
> 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
> --
> 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
> 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
> .....
> Suggested solution:
>     private void initBlock() throws IOException {
>         char magic0 = bsGetUByte();
>         char magic1 = bsGetUByte();
>         char magic2 = bsGetUByte();
>         char magic3 = bsGetUByte();
>         char magic4 = bsGetUByte();
>         char magic5 = bsGetUByte();
>         if( magic0 == 0x17 && magic1 == 0x72 && magic2 == 0x45
>             && magic3 == 0x38 && magic4 == 0x50 && magic5 == 0x90 ) 
>           
>         {
>         	if( complete() ) // end of file);
>         	{
>         		return;
>         	} else
>         	{
>         		magic0 = bsGetUByte();
>                 magic1 = bsGetUByte();
>                 magic2 = bsGetUByte();
>                 magic3 = bsGetUByte();
>                 magic4 = bsGetUByte();
>                 magic5 = bsGetUByte();
>         	}
>         } 
>         if (magic0 != 0x31 || // '1'
>                    magic1 != 0x41 || // 'A'
>                    magic2 != 0x59 || // 'Y'
>                    magic3 != 0x26 || // '&'
>                    magic4 != 0x53 || // 'S'
>                    magic5 != 0x59 // 'Y'
>                    ) {
>             this.currentState = EOF;
>             throw new IOException("bad block header");
>         } else {
>             this.storedBlockCRC = bsGetInt();
>             this.blockRandomised = bsR(1) == 1;
>             /**
>              * Allocate data here instead in constructor, so we do not allocate
>              * it if the input file is empty.
>              */
>             if (this.data == null) {
>                 this.data = new Data(this.blockSize100k);
>             }
>             // currBlockNo++;
>             getAndMoveToFrontDecode();
>             this.crc.initialiseCRC();
>             this.currentState = START_BLOCK_STATE;
>         }
>     }
>     private boolean 
>     complete() throws IOException 
>     { 
>     	boolean result = false;
>         this.storedCombinedCRC = bsGetInt();
>         try
>         {
>             if (in.available() == 0 ) 
>             {
>                 throw new IOException( "EOF" );
>             }
>             checkMagicChar('B', "first");
>             checkMagicChar('Z', "second");
>             checkMagicChar('h', "third");
>             int blockSize = this.in.read();
>             if ((blockSize < '1') || (blockSize > '9')) {
>                 throw new IOException("Stream is not BZip2 formatted: illegal "
>                                       + "blocksize " + (char) blockSize);
>             }
>             this.blockSize100k = blockSize - '0';
>             this.bsLive = 0;
>             this.bsBuff = 0;
>         } catch( IOException e )
>         {
>         	this.currentState = EOF;
>         	
>         	result = true;
>         }
>         
>         this.data = null;
>         if (this.storedCombinedCRC != this.computedCombinedCRC) {
>             throw new IOException("BZip2 CRC error");
>         }
>         this.computedCombinedCRC = 0;    
>         return result;
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

Posted by "Stefan Bodewig (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146180#comment-13146180 ] 

Stefan Bodewig commented on COMPRESS-146:
-----------------------------------------

yes, we probably want all three formats to be consistent here.

I'm not sure what the danger of changing the default really would be, I vaguelly recall people complaining about GzipInputStream after JDK7 added support for concatenated streams (I may be totally wrong on this, though).
                
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> --------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-146
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-146
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>         Environment: all
>            Reporter: Dmitriy Smirnov
>            Priority: Critical
>              Labels: 0x177245385090
>             Fix For: 1.4
>
>         Attachments: bzip2-concatenated.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> This error occurs mostly on large size files as sudden EOF somwere in the middle of the file.
> An example of data from archived file:
> $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
> 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
> 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
> --
> 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
> 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
> --
> 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
> 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
> .....
> Suggested solution:
>     private void initBlock() throws IOException {
>         char magic0 = bsGetUByte();
>         char magic1 = bsGetUByte();
>         char magic2 = bsGetUByte();
>         char magic3 = bsGetUByte();
>         char magic4 = bsGetUByte();
>         char magic5 = bsGetUByte();
>         if( magic0 == 0x17 && magic1 == 0x72 && magic2 == 0x45
>             && magic3 == 0x38 && magic4 == 0x50 && magic5 == 0x90 ) 
>           
>         {
>         	if( complete() ) // end of file);
>         	{
>         		return;
>         	} else
>         	{
>         		magic0 = bsGetUByte();
>                 magic1 = bsGetUByte();
>                 magic2 = bsGetUByte();
>                 magic3 = bsGetUByte();
>                 magic4 = bsGetUByte();
>                 magic5 = bsGetUByte();
>         	}
>         } 
>         if (magic0 != 0x31 || // '1'
>                    magic1 != 0x41 || // 'A'
>                    magic2 != 0x59 || // 'Y'
>                    magic3 != 0x26 || // '&'
>                    magic4 != 0x53 || // 'S'
>                    magic5 != 0x59 // 'Y'
>                    ) {
>             this.currentState = EOF;
>             throw new IOException("bad block header");
>         } else {
>             this.storedBlockCRC = bsGetInt();
>             this.blockRandomised = bsR(1) == 1;
>             /**
>              * Allocate data here instead in constructor, so we do not allocate
>              * it if the input file is empty.
>              */
>             if (this.data == null) {
>                 this.data = new Data(this.blockSize100k);
>             }
>             // currBlockNo++;
>             getAndMoveToFrontDecode();
>             this.crc.initialiseCRC();
>             this.currentState = START_BLOCK_STATE;
>         }
>     }
>     private boolean 
>     complete() throws IOException 
>     { 
>     	boolean result = false;
>         this.storedCombinedCRC = bsGetInt();
>         try
>         {
>             if (in.available() == 0 ) 
>             {
>                 throw new IOException( "EOF" );
>             }
>             checkMagicChar('B', "first");
>             checkMagicChar('Z', "second");
>             checkMagicChar('h', "third");
>             int blockSize = this.in.read();
>             if ((blockSize < '1') || (blockSize > '9')) {
>                 throw new IOException("Stream is not BZip2 formatted: illegal "
>                                       + "blocksize " + (char) blockSize);
>             }
>             this.blockSize100k = blockSize - '0';
>             this.bsLive = 0;
>             this.bsBuff = 0;
>         } catch( IOException e )
>         {
>         	this.currentState = EOF;
>         	
>         	result = true;
>         }
>         
>         this.data = null;
>         if (this.storedCombinedCRC != this.computedCombinedCRC) {
>             throw new IOException("BZip2 CRC error");
>         }
>         this.computedCombinedCRC = 0;    
>         return result;
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

Posted by "Stefan Bodewig (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Bodewig resolved COMPRESS-146.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4

applied Lasse's patch with a small modification.  The default constructor will not support concatenated streams for maximum backwards compatibility.

This will need some docs that I'm going to add once I've looked into the similar patch by Lasse for gzip as well.

Thanks!
                
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> --------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-146
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-146
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>         Environment: all
>            Reporter: Dmitriy Smirnov
>            Priority: Critical
>              Labels: 0x177245385090
>             Fix For: 1.4
>
>         Attachments: bzip2-concatenated.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> This error occurs mostly on large size files as sudden EOF somwere in the middle of the file.
> An example of data from archived file:
> $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
> 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
> 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
> --
> 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
> 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
> --
> 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
> 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
> .....
> Suggested solution:
>     private void initBlock() throws IOException {
>         char magic0 = bsGetUByte();
>         char magic1 = bsGetUByte();
>         char magic2 = bsGetUByte();
>         char magic3 = bsGetUByte();
>         char magic4 = bsGetUByte();
>         char magic5 = bsGetUByte();
>         if( magic0 == 0x17 && magic1 == 0x72 && magic2 == 0x45
>             && magic3 == 0x38 && magic4 == 0x50 && magic5 == 0x90 ) 
>           
>         {
>         	if( complete() ) // end of file);
>         	{
>         		return;
>         	} else
>         	{
>         		magic0 = bsGetUByte();
>                 magic1 = bsGetUByte();
>                 magic2 = bsGetUByte();
>                 magic3 = bsGetUByte();
>                 magic4 = bsGetUByte();
>                 magic5 = bsGetUByte();
>         	}
>         } 
>         if (magic0 != 0x31 || // '1'
>                    magic1 != 0x41 || // 'A'
>                    magic2 != 0x59 || // 'Y'
>                    magic3 != 0x26 || // '&'
>                    magic4 != 0x53 || // 'S'
>                    magic5 != 0x59 // 'Y'
>                    ) {
>             this.currentState = EOF;
>             throw new IOException("bad block header");
>         } else {
>             this.storedBlockCRC = bsGetInt();
>             this.blockRandomised = bsR(1) == 1;
>             /**
>              * Allocate data here instead in constructor, so we do not allocate
>              * it if the input file is empty.
>              */
>             if (this.data == null) {
>                 this.data = new Data(this.blockSize100k);
>             }
>             // currBlockNo++;
>             getAndMoveToFrontDecode();
>             this.crc.initialiseCRC();
>             this.currentState = START_BLOCK_STATE;
>         }
>     }
>     private boolean 
>     complete() throws IOException 
>     { 
>     	boolean result = false;
>         this.storedCombinedCRC = bsGetInt();
>         try
>         {
>             if (in.available() == 0 ) 
>             {
>                 throw new IOException( "EOF" );
>             }
>             checkMagicChar('B', "first");
>             checkMagicChar('Z', "second");
>             checkMagicChar('h', "third");
>             int blockSize = this.in.read();
>             if ((blockSize < '1') || (blockSize > '9')) {
>                 throw new IOException("Stream is not BZip2 formatted: illegal "
>                                       + "blocksize " + (char) blockSize);
>             }
>             this.blockSize100k = blockSize - '0';
>             this.bsLive = 0;
>             this.bsBuff = 0;
>         } catch( IOException e )
>         {
>         	this.currentState = EOF;
>         	
>         	result = true;
>         }
>         
>         this.data = null;
>         if (this.storedCombinedCRC != this.computedCombinedCRC) {
>             throw new IOException("BZip2 CRC error");
>         }
>         this.computedCombinedCRC = 0;    
>         return result;
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

Posted by "Lasse Collin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088237#comment-13088237 ] 

Lasse Collin commented on COMPRESS-146:
---------------------------------------

I didn't test, but I think the suggested fix has a few problems:

in.available() == 0 cannot be used to test if end of input was reached.

It doesn't handle concatenated empty .bz2 streams.

The test for magic bytes and some other code is duplicated.

IOException in complete() is lost.

> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> --------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-146
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-146
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>         Environment: all
>            Reporter: Dmitriy Smirnov
>            Priority: Critical
>              Labels: 0x177245385090
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> This error occurs mostly on large size files as sudden EOF somwere in the middle of the file.
> An example of data from archived file:
> $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
> 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
> 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
> --
> 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
> 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
> --
> 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
> 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
> .....
> Suggested solution:
>     private void initBlock() throws IOException {
>         char magic0 = bsGetUByte();
>         char magic1 = bsGetUByte();
>         char magic2 = bsGetUByte();
>         char magic3 = bsGetUByte();
>         char magic4 = bsGetUByte();
>         char magic5 = bsGetUByte();
>         if( magic0 == 0x17 && magic1 == 0x72 && magic2 == 0x45
>             && magic3 == 0x38 && magic4 == 0x50 && magic5 == 0x90 ) 
>           
>         {
>         	if( complete() ) // end of file);
>         	{
>         		return;
>         	} else
>         	{
>         		magic0 = bsGetUByte();
>                 magic1 = bsGetUByte();
>                 magic2 = bsGetUByte();
>                 magic3 = bsGetUByte();
>                 magic4 = bsGetUByte();
>                 magic5 = bsGetUByte();
>         	}
>         } 
>         if (magic0 != 0x31 || // '1'
>                    magic1 != 0x41 || // 'A'
>                    magic2 != 0x59 || // 'Y'
>                    magic3 != 0x26 || // '&'
>                    magic4 != 0x53 || // 'S'
>                    magic5 != 0x59 // 'Y'
>                    ) {
>             this.currentState = EOF;
>             throw new IOException("bad block header");
>         } else {
>             this.storedBlockCRC = bsGetInt();
>             this.blockRandomised = bsR(1) == 1;
>             /**
>              * Allocate data here instead in constructor, so we do not allocate
>              * it if the input file is empty.
>              */
>             if (this.data == null) {
>                 this.data = new Data(this.blockSize100k);
>             }
>             // currBlockNo++;
>             getAndMoveToFrontDecode();
>             this.crc.initialiseCRC();
>             this.currentState = START_BLOCK_STATE;
>         }
>     }
>     private boolean 
>     complete() throws IOException 
>     { 
>     	boolean result = false;
>         this.storedCombinedCRC = bsGetInt();
>         try
>         {
>             if (in.available() == 0 ) 
>             {
>                 throw new IOException( "EOF" );
>             }
>             checkMagicChar('B', "first");
>             checkMagicChar('Z', "second");
>             checkMagicChar('h', "third");
>             int blockSize = this.in.read();
>             if ((blockSize < '1') || (blockSize > '9')) {
>                 throw new IOException("Stream is not BZip2 formatted: illegal "
>                                       + "blocksize " + (char) blockSize);
>             }
>             this.blockSize100k = blockSize - '0';
>             this.bsLive = 0;
>             this.bsBuff = 0;
>         } catch( IOException e )
>         {
>         	this.currentState = EOF;
>         	
>         	result = true;
>         }
>         
>         this.data = null;
>         if (this.storedCombinedCRC != this.computedCombinedCRC) {
>             throw new IOException("BZip2 CRC error");
>         }
>         this.computedCombinedCRC = 0;    
>         return result;
>     }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

Posted by "Stefan Bodewig (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147139#comment-13147139 ] 

Stefan Bodewig commented on COMPRESS-146:
-----------------------------------------

I've updated the documentation with svn revision 1199823.

It would be good to have testcases with concatenated streams, I'll look into creating some.

Yes, I think we should change the defaults with 2.0.  Deprecations won't help in the light of our factory that people may use instead of using the constructors directly.  Adding a new flag to the factory method looks wrong since there are formats (pack200) supported by the factory that don't know anything about concatenated streams.
                
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> --------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-146
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-146
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>         Environment: all
>            Reporter: Dmitriy Smirnov
>            Priority: Critical
>              Labels: 0x177245385090
>             Fix For: 1.4
>
>         Attachments: bzip2-concatenated.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> This error occurs mostly on large size files as sudden EOF somwere in the middle of the file.
> An example of data from archived file:
> $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
> 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
> 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
> --
> 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
> 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
> --
> 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
> 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
> .....
> Suggested solution:
>     private void initBlock() throws IOException {
>         char magic0 = bsGetUByte();
>         char magic1 = bsGetUByte();
>         char magic2 = bsGetUByte();
>         char magic3 = bsGetUByte();
>         char magic4 = bsGetUByte();
>         char magic5 = bsGetUByte();
>         if( magic0 == 0x17 && magic1 == 0x72 && magic2 == 0x45
>             && magic3 == 0x38 && magic4 == 0x50 && magic5 == 0x90 ) 
>           
>         {
>         	if( complete() ) // end of file);
>         	{
>         		return;
>         	} else
>         	{
>         		magic0 = bsGetUByte();
>                 magic1 = bsGetUByte();
>                 magic2 = bsGetUByte();
>                 magic3 = bsGetUByte();
>                 magic4 = bsGetUByte();
>                 magic5 = bsGetUByte();
>         	}
>         } 
>         if (magic0 != 0x31 || // '1'
>                    magic1 != 0x41 || // 'A'
>                    magic2 != 0x59 || // 'Y'
>                    magic3 != 0x26 || // '&'
>                    magic4 != 0x53 || // 'S'
>                    magic5 != 0x59 // 'Y'
>                    ) {
>             this.currentState = EOF;
>             throw new IOException("bad block header");
>         } else {
>             this.storedBlockCRC = bsGetInt();
>             this.blockRandomised = bsR(1) == 1;
>             /**
>              * Allocate data here instead in constructor, so we do not allocate
>              * it if the input file is empty.
>              */
>             if (this.data == null) {
>                 this.data = new Data(this.blockSize100k);
>             }
>             // currBlockNo++;
>             getAndMoveToFrontDecode();
>             this.crc.initialiseCRC();
>             this.currentState = START_BLOCK_STATE;
>         }
>     }
>     private boolean 
>     complete() throws IOException 
>     { 
>     	boolean result = false;
>         this.storedCombinedCRC = bsGetInt();
>         try
>         {
>             if (in.available() == 0 ) 
>             {
>                 throw new IOException( "EOF" );
>             }
>             checkMagicChar('B', "first");
>             checkMagicChar('Z', "second");
>             checkMagicChar('h', "third");
>             int blockSize = this.in.read();
>             if ((blockSize < '1') || (blockSize > '9')) {
>                 throw new IOException("Stream is not BZip2 formatted: illegal "
>                                       + "blocksize " + (char) blockSize);
>             }
>             this.blockSize100k = blockSize - '0';
>             this.bsLive = 0;
>             this.bsBuff = 0;
>         } catch( IOException e )
>         {
>         	this.currentState = EOF;
>         	
>         	result = true;
>         }
>         
>         this.data = null;
>         if (this.storedCombinedCRC != this.computedCombinedCRC) {
>             throw new IOException("BZip2 CRC error");
>         }
>         this.computedCombinedCRC = 0;    
>         return result;
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

Posted by "Lasse Collin (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146202#comment-13146202 ] 

Lasse Collin commented on COMPRESS-146:
---------------------------------------

The danger is that when you have compressed data inside another file format, you want the decompressor to stop after it has decompressed one compressed stream. For example, if you have bzip2 data in a .zip file, the decompressor must stop after the first stream or things will go wrong.

If the default is changed, a few applications that need the above feature (I don't know if there actually are any) will break badly. At the same time it would fix many other applications that handle standalone .gz or .bz2 files and thus should support concatenated streams. Since concatenated streams aren't very common, many don't know that their application has such a bug.

Maybe the default should be changed in Compress 2.0, if API changes are OK at that point. Or maybe the single-argument constructor should be marked as deprecated so that it creates a warning at compile time, which should hopefully make people notice that there's a new two-argument constructor.
                
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> --------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-146
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-146
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>         Environment: all
>            Reporter: Dmitriy Smirnov
>            Priority: Critical
>              Labels: 0x177245385090
>             Fix For: 1.4
>
>         Attachments: bzip2-concatenated.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS
> This error occurs mostly on large size files as sudden EOF somwere in the middle of the file.
> An example of data from archived file:
> $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
> 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
> 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
> --
> 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
> 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
> --
> 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
> 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
> .....
> Suggested solution:
>     private void initBlock() throws IOException {
>         char magic0 = bsGetUByte();
>         char magic1 = bsGetUByte();
>         char magic2 = bsGetUByte();
>         char magic3 = bsGetUByte();
>         char magic4 = bsGetUByte();
>         char magic5 = bsGetUByte();
>         if( magic0 == 0x17 && magic1 == 0x72 && magic2 == 0x45
>             && magic3 == 0x38 && magic4 == 0x50 && magic5 == 0x90 ) 
>           
>         {
>         	if( complete() ) // end of file);
>         	{
>         		return;
>         	} else
>         	{
>         		magic0 = bsGetUByte();
>                 magic1 = bsGetUByte();
>                 magic2 = bsGetUByte();
>                 magic3 = bsGetUByte();
>                 magic4 = bsGetUByte();
>                 magic5 = bsGetUByte();
>         	}
>         } 
>         if (magic0 != 0x31 || // '1'
>                    magic1 != 0x41 || // 'A'
>                    magic2 != 0x59 || // 'Y'
>                    magic3 != 0x26 || // '&'
>                    magic4 != 0x53 || // 'S'
>                    magic5 != 0x59 // 'Y'
>                    ) {
>             this.currentState = EOF;
>             throw new IOException("bad block header");
>         } else {
>             this.storedBlockCRC = bsGetInt();
>             this.blockRandomised = bsR(1) == 1;
>             /**
>              * Allocate data here instead in constructor, so we do not allocate
>              * it if the input file is empty.
>              */
>             if (this.data == null) {
>                 this.data = new Data(this.blockSize100k);
>             }
>             // currBlockNo++;
>             getAndMoveToFrontDecode();
>             this.crc.initialiseCRC();
>             this.currentState = START_BLOCK_STATE;
>         }
>     }
>     private boolean 
>     complete() throws IOException 
>     { 
>     	boolean result = false;
>         this.storedCombinedCRC = bsGetInt();
>         try
>         {
>             if (in.available() == 0 ) 
>             {
>                 throw new IOException( "EOF" );
>             }
>             checkMagicChar('B', "first");
>             checkMagicChar('Z', "second");
>             checkMagicChar('h', "third");
>             int blockSize = this.in.read();
>             if ((blockSize < '1') || (blockSize > '9')) {
>                 throw new IOException("Stream is not BZip2 formatted: illegal "
>                                       + "blocksize " + (char) blockSize);
>             }
>             this.blockSize100k = blockSize - '0';
>             this.bsLive = 0;
>             this.bsBuff = 0;
>         } catch( IOException e )
>         {
>         	this.currentState = EOF;
>         	
>         	result = true;
>         }
>         
>         this.data = null;
>         if (this.storedCombinedCRC != this.computedCombinedCRC) {
>             throw new IOException("BZip2 CRC error");
>         }
>         this.computedCombinedCRC = 0;    
>         return result;
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira