You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by "Peter De Maeyer (JIRA)" <ji...@apache.org> on 2012/10/24 12:10:12 UTC

[jira] [Created] (COMPRESS-206) TarArchiveOutputStream sometimes writes garbage beyond the end of the archive

Peter De Maeyer created COMPRESS-206:
----------------------------------------

             Summary: TarArchiveOutputStream sometimes writes garbage beyond the end of the archive
                 Key: COMPRESS-206
                 URL: https://issues.apache.org/jira/browse/COMPRESS-206
             Project: Commons Compress
          Issue Type: Bug
          Components: Compressors
    Affects Versions: 1.4.1, 1.0
         Environment: Linux x86
            Reporter: Peter De Maeyer
         Attachments: GarbageBeyondEndTest.java

For some combinations of file lengths, the archive created by TarArchiveOutputStream writes garbage beyond the end of the TAR stream. TarArchiveInputStream can still read the stream without problems, but it does not read beyond the garbage. This is problematic for my use case because I write a checksum _after_ the TAR content. If I then try to read the checksum back, I read garbage instead.

Functional impact:
* TarArchiveInputStream is asymmetrical with respect to TarArchiveOutputStream, in the sense that TarArchiveInputStream does not read everything that was written by TarArchiveOutputStream.
* The content is unnecessarily large. The garbage is totally unnecessarily large: ~10K overhead compared to Linux command-line tar.

This symptom is remarkably similar to #COMPRESS-81, which is supposedly fixed since 1.1. Except for the fact that this issue still exists... I've tested this with 1.0 and 1.4.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (COMPRESS-206) TarArchiveOutputStream sometimes writes garbage beyond the end of the archive

Posted by "Peter De Maeyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COMPRESS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter De Maeyer updated COMPRESS-206:
-------------------------------------

    Attachment: GarbageBeyondEndTest.java

Attached a stand-alone junit test illustrating the problem.
                
> TarArchiveOutputStream sometimes writes garbage beyond the end of the archive
> -----------------------------------------------------------------------------
>
>                 Key: COMPRESS-206
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-206
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.0, 1.4.1
>         Environment: Linux x86
>            Reporter: Peter De Maeyer
>         Attachments: GarbageBeyondEndTest.java
>
>
> For some combinations of file lengths, the archive created by TarArchiveOutputStream writes garbage beyond the end of the TAR stream. TarArchiveInputStream can still read the stream without problems, but it does not read beyond the garbage. This is problematic for my use case because I write a checksum _after_ the TAR content. If I then try to read the checksum back, I read garbage instead.
> Functional impact:
> * TarArchiveInputStream is asymmetrical with respect to TarArchiveOutputStream, in the sense that TarArchiveInputStream does not read everything that was written by TarArchiveOutputStream.
> * The content is unnecessarily large. The garbage is totally unnecessarily large: ~10K overhead compared to Linux command-line tar.
> This symptom is remarkably similar to #COMPRESS-81, which is supposedly fixed since 1.1. Except for the fact that this issue still exists... I've tested this with 1.0 and 1.4.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (COMPRESS-206) TarArchiveOutputStream sometimes writes garbage beyond the end of the archive

Posted by "Peter De Maeyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COMPRESS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter De Maeyer updated COMPRESS-206:
-------------------------------------

    Attachment:     (was: GarbageBeyondEndTest.java)
    
> TarArchiveOutputStream sometimes writes garbage beyond the end of the archive
> -----------------------------------------------------------------------------
>
>                 Key: COMPRESS-206
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-206
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.0, 1.4.1
>         Environment: Linux x86
>            Reporter: Peter De Maeyer
>         Attachments: COMPRESS-206.patch
>
>
> For some combinations of file lengths, the archive created by TarArchiveOutputStream writes garbage beyond the end of the TAR stream. TarArchiveInputStream can still read the stream without problems, but it does not read beyond the garbage. This is problematic for my use case because I write a checksum _after_ the TAR content. If I then try to read the checksum back, I read garbage instead.
> Functional impact:
> * TarArchiveInputStream is asymmetrical with respect to TarArchiveOutputStream, in the sense that TarArchiveInputStream does not read everything that was written by TarArchiveOutputStream.
> * The content is unnecessarily large. The garbage is totally unnecessarily large: ~10K overhead compared to Linux command-line tar.
> This symptom is remarkably similar to #COMPRESS-81, which is supposedly fixed since 1.1. Except for the fact that this issue still exists... I've tested this with 1.0 and 1.4.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (COMPRESS-206) TarArchiveOutputStream sometimes writes garbage beyond the end of the archive

Posted by "Peter De Maeyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COMPRESS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter De Maeyer updated COMPRESS-206:
-------------------------------------

    Attachment: COMPRESS-206.patch

I've attached a patch.

Note that both the test and fix turned out slightly different (but even better ;)).
                
> TarArchiveOutputStream sometimes writes garbage beyond the end of the archive
> -----------------------------------------------------------------------------
>
>                 Key: COMPRESS-206
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-206
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.0, 1.4.1
>         Environment: Linux x86
>            Reporter: Peter De Maeyer
>         Attachments: COMPRESS-206.patch
>
>
> For some combinations of file lengths, the archive created by TarArchiveOutputStream writes garbage beyond the end of the TAR stream. TarArchiveInputStream can still read the stream without problems, but it does not read beyond the garbage. This is problematic for my use case because I write a checksum _after_ the TAR content. If I then try to read the checksum back, I read garbage instead.
> Functional impact:
> * TarArchiveInputStream is asymmetrical with respect to TarArchiveOutputStream, in the sense that TarArchiveInputStream does not read everything that was written by TarArchiveOutputStream.
> * The content is unnecessarily large. The garbage is totally unnecessarily large: ~10K overhead compared to Linux command-line tar.
> This symptom is remarkably similar to #COMPRESS-81, which is supposedly fixed since 1.1. Except for the fact that this issue still exists... I've tested this with 1.0 and 1.4.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (COMPRESS-206) TarArchiveOutputStream sometimes writes garbage beyond the end of the archive

Posted by "Peter De Maeyer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COMPRESS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483185#comment-13483185 ] 

Peter De Maeyer commented on COMPRESS-206:
------------------------------------------

I think the root cause is that TarArchiveInputStream stops reading the input stream once it hits the first EOF record. Any lingering records after that are left unconsumed. It would be best to consume these as well.

In {{TarArchiveInputStream.getRecord()}}, I suggest to replace

{code}
  ...
  } else if (buffer.isEOFRecord(headerBuf)) {
    hasHitEOF = true;
  }
  ...
{code}

with

{code}
  } else if (buffer.isEOFRecord(headerBuf)) {
    while (buffer.readRecord() != null) { // Consume any lingering records
      ;
    }
    hasHitEOF = true;
  }
{code}

This fixes the test. It doesn't seem to break any other tests either, although I did not run all of them because they take a long time and I didn't have the patience.
                
> TarArchiveOutputStream sometimes writes garbage beyond the end of the archive
> -----------------------------------------------------------------------------
>
>                 Key: COMPRESS-206
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-206
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.0, 1.4.1
>         Environment: Linux x86
>            Reporter: Peter De Maeyer
>         Attachments: GarbageBeyondEndTest.java
>
>
> For some combinations of file lengths, the archive created by TarArchiveOutputStream writes garbage beyond the end of the TAR stream. TarArchiveInputStream can still read the stream without problems, but it does not read beyond the garbage. This is problematic for my use case because I write a checksum _after_ the TAR content. If I then try to read the checksum back, I read garbage instead.
> Functional impact:
> * TarArchiveInputStream is asymmetrical with respect to TarArchiveOutputStream, in the sense that TarArchiveInputStream does not read everything that was written by TarArchiveOutputStream.
> * The content is unnecessarily large. The garbage is totally unnecessarily large: ~10K overhead compared to Linux command-line tar.
> This symptom is remarkably similar to #COMPRESS-81, which is supposedly fixed since 1.1. Except for the fact that this issue still exists... I've tested this with 1.0 and 1.4.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (COMPRESS-206) TarArchiveOutputStream sometimes writes garbage beyond the end of the archive

Posted by "Gary Gregory (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COMPRESS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483223#comment-13483223 ] 

Gary Gregory commented on COMPRESS-206:
---------------------------------------

If you take care of your relationship with patience and package the test and fix into an SVN diff patch file, the likelihood of this issue being addressed quickly will increase tremendously!
                
> TarArchiveOutputStream sometimes writes garbage beyond the end of the archive
> -----------------------------------------------------------------------------
>
>                 Key: COMPRESS-206
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-206
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.0, 1.4.1
>         Environment: Linux x86
>            Reporter: Peter De Maeyer
>         Attachments: GarbageBeyondEndTest.java
>
>
> For some combinations of file lengths, the archive created by TarArchiveOutputStream writes garbage beyond the end of the TAR stream. TarArchiveInputStream can still read the stream without problems, but it does not read beyond the garbage. This is problematic for my use case because I write a checksum _after_ the TAR content. If I then try to read the checksum back, I read garbage instead.
> Functional impact:
> * TarArchiveInputStream is asymmetrical with respect to TarArchiveOutputStream, in the sense that TarArchiveInputStream does not read everything that was written by TarArchiveOutputStream.
> * The content is unnecessarily large. The garbage is totally unnecessarily large: ~10K overhead compared to Linux command-line tar.
> This symptom is remarkably similar to #COMPRESS-81, which is supposedly fixed since 1.1. Except for the fact that this issue still exists... I've tested this with 1.0 and 1.4.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (COMPRESS-206) TarArchiveOutputStream sometimes writes garbage beyond the end of the archive

Posted by "Peter De Maeyer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COMPRESS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483185#comment-13483185 ] 

Peter De Maeyer edited comment on COMPRESS-206 at 10/24/12 12:51 PM:
---------------------------------------------------------------------

I think the root cause is that TarArchiveInputStream stops reading the input stream once it hits the first EOF record. Any lingering records after that are left unconsumed. It would be best to consume these as well.

In {{TarArchiveInputStream.getRecord()}}, I suggest to replace

{code}
  } else if (buffer.isEOFRecord(headerBuf)) {
    hasHitEOF = true;
  }
{code}

with

{code}
  } else if (buffer.isEOFRecord(headerBuf)) {
    while (buffer.readRecord() != null) { // Consume any lingering records
      ;
    }
    hasHitEOF = true;
  }
{code}

This fixes the test. It doesn't seem to break any other tests either, although I did not run all of them because they take a long time and I didn't have the patience.
                
      was (Author: peterdm):
    I think the root cause is that TarArchiveInputStream stops reading the input stream once it hits the first EOF record. Any lingering records after that are left unconsumed. It would be best to consume these as well.

In {{TarArchiveInputStream.getRecord()}}, I suggest to replace

{code}
  ...
  } else if (buffer.isEOFRecord(headerBuf)) {
    hasHitEOF = true;
  }
  ...
{code}

with

{code}
  } else if (buffer.isEOFRecord(headerBuf)) {
    while (buffer.readRecord() != null) { // Consume any lingering records
      ;
    }
    hasHitEOF = true;
  }
{code}

This fixes the test. It doesn't seem to break any other tests either, although I did not run all of them because they take a long time and I didn't have the patience.
                  
> TarArchiveOutputStream sometimes writes garbage beyond the end of the archive
> -----------------------------------------------------------------------------
>
>                 Key: COMPRESS-206
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-206
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.0, 1.4.1
>         Environment: Linux x86
>            Reporter: Peter De Maeyer
>         Attachments: GarbageBeyondEndTest.java
>
>
> For some combinations of file lengths, the archive created by TarArchiveOutputStream writes garbage beyond the end of the TAR stream. TarArchiveInputStream can still read the stream without problems, but it does not read beyond the garbage. This is problematic for my use case because I write a checksum _after_ the TAR content. If I then try to read the checksum back, I read garbage instead.
> Functional impact:
> * TarArchiveInputStream is asymmetrical with respect to TarArchiveOutputStream, in the sense that TarArchiveInputStream does not read everything that was written by TarArchiveOutputStream.
> * The content is unnecessarily large. The garbage is totally unnecessarily large: ~10K overhead compared to Linux command-line tar.
> This symptom is remarkably similar to #COMPRESS-81, which is supposedly fixed since 1.1. Except for the fact that this issue still exists... I've tested this with 1.0 and 1.4.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (COMPRESS-206) TarArchiveOutputStream sometimes writes garbage beyond the end of the archive

Posted by "Peter De Maeyer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COMPRESS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483185#comment-13483185 ] 

Peter De Maeyer edited comment on COMPRESS-206 at 10/24/12 12:55 PM:
---------------------------------------------------------------------

I think the root cause is that TarArchiveInputStream stops reading the input stream once it hits the first EOF record. Any lingering records after that are left unconsumed. It would be best to consume these as well.

In {{TarArchiveInputStream.getRecord()}}, I suggest to replace

{code}
} else if (buffer.isEOFRecord(headerBuf)) {
  hasHitEOF = true;
}
{code}

with

{code}
} else if (buffer.isEOFRecord(headerBuf)) {
  while (buffer.readRecord() != null) { // Consume any lingering records
    ;
  }
  hasHitEOF = true;
}
{code}

This fixes the test. It doesn't seem to break any other tests either, although I did not run all of them because they take a long time and I didn't have the patience.
                
      was (Author: peterdm):
    I think the root cause is that TarArchiveInputStream stops reading the input stream once it hits the first EOF record. Any lingering records after that are left unconsumed. It would be best to consume these as well.

In {{TarArchiveInputStream.getRecord()}}, I suggest to replace

{code}
  } else if (buffer.isEOFRecord(headerBuf)) {
    hasHitEOF = true;
  }
{code}

with

{code}
  } else if (buffer.isEOFRecord(headerBuf)) {
    while (buffer.readRecord() != null) { // Consume any lingering records
      ;
    }
    hasHitEOF = true;
  }
{code}

This fixes the test. It doesn't seem to break any other tests either, although I did not run all of them because they take a long time and I didn't have the patience.
                  
> TarArchiveOutputStream sometimes writes garbage beyond the end of the archive
> -----------------------------------------------------------------------------
>
>                 Key: COMPRESS-206
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-206
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.0, 1.4.1
>         Environment: Linux x86
>            Reporter: Peter De Maeyer
>         Attachments: GarbageBeyondEndTest.java
>
>
> For some combinations of file lengths, the archive created by TarArchiveOutputStream writes garbage beyond the end of the TAR stream. TarArchiveInputStream can still read the stream without problems, but it does not read beyond the garbage. This is problematic for my use case because I write a checksum _after_ the TAR content. If I then try to read the checksum back, I read garbage instead.
> Functional impact:
> * TarArchiveInputStream is asymmetrical with respect to TarArchiveOutputStream, in the sense that TarArchiveInputStream does not read everything that was written by TarArchiveOutputStream.
> * The content is unnecessarily large. The garbage is totally unnecessarily large: ~10K overhead compared to Linux command-line tar.
> This symptom is remarkably similar to #COMPRESS-81, which is supposedly fixed since 1.1. Except for the fact that this issue still exists... I've tested this with 1.0 and 1.4.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (COMPRESS-206) TarArchiveOutputStream sometimes writes garbage beyond the end of the archive

Posted by "Peter De Maeyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COMPRESS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter De Maeyer updated COMPRESS-206:
-------------------------------------

    Fix Version/s: 1.5
    
> TarArchiveOutputStream sometimes writes garbage beyond the end of the archive
> -----------------------------------------------------------------------------
>
>                 Key: COMPRESS-206
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-206
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.0, 1.4.1
>         Environment: Linux x86
>            Reporter: Peter De Maeyer
>             Fix For: 1.5
>
>         Attachments: COMPRESS-206.patch
>
>
> For some combinations of file lengths, the archive created by TarArchiveOutputStream writes garbage beyond the end of the TAR stream. TarArchiveInputStream can still read the stream without problems, but it does not read beyond the garbage. This is problematic for my use case because I write a checksum _after_ the TAR content. If I then try to read the checksum back, I read garbage instead.
> Functional impact:
> * TarArchiveInputStream is asymmetrical with respect to TarArchiveOutputStream, in the sense that TarArchiveInputStream does not read everything that was written by TarArchiveOutputStream.
> * The content is unnecessarily large. The garbage is totally unnecessarily large: ~10K overhead compared to Linux command-line tar.
> This symptom is remarkably similar to #COMPRESS-81, which is supposedly fixed since 1.1. Except for the fact that this issue still exists... I've tested this with 1.0 and 1.4.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira