You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "unascribed (via GitHub)" <gi...@apache.org> on 2023/02/12 21:44:35 UTC

[GitHub] [commons-compress] unascribed opened a new pull request, #360: Fix FileBands misusing InputStream#read(byte[])

unascribed opened a new pull request, #360:
URL: https://github.com/apache/commons-compress/pull/360

   The Pack200 decompressor expects `read` to always return the amount of bytes it requested. This is *usually* true for raw input streams, such as File or ByteArray, but breaks down when combined with other block-based decompressors. For example, combining an LZMAInputStream with a Pack200 decompressor results in an error similar to the following if a file falls between block boundaries:
   
   ```
   Exception in thread "main" org.apache.commons.compress.harmony.pack200.Pack200Exception: Expected to read 48873 bytes but read 3274
   	at org.apache.commons.compress.harmony.unpack200.FileBands.processFileBits(FileBands.java:87)
   	at org.apache.commons.compress.harmony.unpack200.Segment.readSegment(Segment.java:460)
   	at org.apache.commons.compress.harmony.unpack200.Segment.unpackRead(Segment.java:514)
   	at org.apache.commons.compress.harmony.unpack200.Segment.unpack(Segment.java:484)
   	at org.apache.commons.compress.harmony.unpack200.Archive.unpack(Archive.java:198)
   	at com.unascribed.unpacktest.UnpackTest.main(UnpackTest.java:20)
   ```
   
   Pack200 is *intended* to be combined with other forms of compression, so this is a major issue in some cases.
   
   This reimplements the portion of the method to use `read` correctly, keeping track of how much has been read. I did not address the TODO as I don't believe there's a good way to implement it with the current architecture of the decompressor.
   
   The file I'm having issues with is a Pack200+LZMA file that does not contain any class files. Kind of a weird case, but it's what I'm stuck with, and Commons Compress is the only real extant Pack200 decompressor for modern Java.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] unascribed commented on pull request #360: pack200: Fix FileBands misusing InputStream#read(byte[])

Posted by "unascribed (via GitHub)" <gi...@apache.org>.
unascribed commented on PR #360:
URL: https://github.com/apache/commons-compress/pull/360#issuecomment-1431969886

   I sure hope this isn't a common pattern throughout the codebase.
   
   I'd imagine this is an isolated incident due to this being a very seldom used feature of the library, and a case of inheriting poorly tested code from Harmony, which itself was written in something of a hurry and without thorough testing be a priority — just "get API compatibility with pre-open Java ASAP".
   
   A quick random look around the codebase guided by Open Call Hierarchy in Eclipse suggests this is done correctly elsewhere, but it was only a random sample.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] garydgregory merged pull request #360: pack200: Fix FileBands misusing InputStream#read(byte[])

Posted by "garydgregory (via GitHub)" <gi...@apache.org>.
garydgregory merged PR #360:
URL: https://github.com/apache/commons-compress/pull/360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] garydgregory commented on pull request #360: pack200: Fix FileBands misusing InputStream#read(byte[])

Posted by "garydgregory (via GitHub)" <gi...@apache.org>.
garydgregory commented on PR #360:
URL: https://github.com/apache/commons-compress/pull/360#issuecomment-1427165594

   @unascribed 
   Thank you for the PR. You'll need to update this PR with a unit test. Ty.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] unascribed commented on pull request #360: pack200: Fix FileBands misusing InputStream#read(byte[])

Posted by "unascribed (via GitHub)" <gi...@apache.org>.
unascribed commented on PR #360:
URL: https://github.com/apache/commons-compress/pull/360#issuecomment-1429399685

   > Could one the method be reused from IOUtils?
   
   Yes, that readFully method does the same thing. I'm not familiar with this project's internals so I wasn't aware it existed — nowadays I use InputStream#readNBytes, but that's Java 11.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] jochenw commented on pull request #360: pack200: Fix FileBands misusing InputStream#read(byte[])

Posted by "jochenw (via GitHub)" <gi...@apache.org>.
jochenw commented on PR #360:
URL: https://github.com/apache/commons-compress/pull/360#issuecomment-1429303474

   +1 for pulling in, even without a test, The user has obviously spend considerable work on this, and manual inspection looks fine.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] unascribed commented on pull request #360: pack200: Fix FileBands misusing InputStream#read(byte[])

Posted by "unascribed (via GitHub)" <gi...@apache.org>.
unascribed commented on PR #360:
URL: https://github.com/apache/commons-compress/pull/360#issuecomment-1427171789

   The file I encountered this issue with is not redistributable, and I'm unsure of the best way to synthesize an equivalent test. I'll mess around when I have time.
   
   ----
   
   For passerby with the same issue, in the meantime this is published to [my Maven](https://repo.sleeping.town) with coordinate `com.unascribed:commons-compress:1.23-pack200fix`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] garydgregory commented on pull request #360: pack200: Fix FileBands misusing InputStream#read(byte[])

Posted by "garydgregory (via GitHub)" <gi...@apache.org>.
garydgregory commented on PR #360:
URL: https://github.com/apache/commons-compress/pull/360#issuecomment-1431964342

   Merged. TY.  Now, where else should we do the same kind of change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] codecov-commenter commented on pull request #360: Fix FileBands misusing InputStream#read(byte[])

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #360:
URL: https://github.com/apache/commons-compress/pull/360#issuecomment-1427140840

   # [Codecov](https://codecov.io/gh/apache/commons-compress/pull/360?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#360](https://codecov.io/gh/apache/commons-compress/pull/360?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1d14914) into [master](https://codecov.io/gh/apache/commons-compress/commit/e6930d06cfebbdbee586b863481779b2c4b9b202?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e6930d0) will **decrease** coverage by `0.01%`.
   > The diff coverage is `83.33%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #360      +/-   ##
   ============================================
   - Coverage     80.43%   80.43%   -0.01%     
   - Complexity     6723     6724       +1     
   ============================================
     Files           343      343              
     Lines         25330    25335       +5     
     Branches       4106     4108       +2     
   ============================================
   + Hits          20374    20378       +4     
     Misses         3371     3371              
   - Partials       1585     1586       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/commons-compress/pull/360?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../commons/compress/harmony/unpack200/FileBands.java](https://codecov.io/gh/apache/commons-compress/pull/360?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY29tcHJlc3MvaGFybW9ueS91bnBhY2syMDAvRmlsZUJhbmRzLmphdmE=) | `91.42% <83.33%> (-1.91%)` | :arrow_down: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] quat1024 commented on pull request #360: pack200: Fix FileBands misusing InputStream#read(byte[])

Posted by "quat1024 (via GitHub)" <gi...@apache.org>.
quat1024 commented on PR #360:
URL: https://github.com/apache/commons-compress/pull/360#issuecomment-1429003923

   A little bit more code digging, just for fun:
   
   * If the provided `InputStream` does not support marking, [`unpack200.Segment` will wrap it in a `BufferedInputStream`](https://github.com/apache/commons-compress/blob/e6930d06cfebbdbee586b863481779b2c4b9b202/src/main/java/org/apache/commons/compress/harmony/unpack200/Segment.java#L499-L502), which does.
   * `BufferedInputStream`, "as a convenience", repeatedly calls `read` until it fills as many bytes of the user-provided input array as possible, using a driver loop not unlike the one in this PR. However, if the wrapped stream's `available` method returns `0` (signaling that further `read` calls will block), `BufferedInputStream` will stop early before filling the entire array (because it doesn't want to block).
   * `LZMACompressorInputStream` [delegates the `available` method](https://github.com/apache/commons-compress/blob/e6930d06cfebbdbee586b863481779b2c4b9b202/src/main/java/org/apache/commons/compress/compressors/lzma/LZMACompressorInputStream.java#L102-L106) through to `xz`'s `LZMAInputStream`.
   * [`LZMAInputStream` doesn't override `available`](https://git.tukaani.org/?p=xz-java.git;a=blob;f=src/org/tukaani/xz/LZMAInputStream.java;h=1432eb1037a513dcd997fb0a946cc978032565aa;hb=HEAD), so the default implementation is used.
   * The default implementation always returns 0,
   * -> causing `BufferedInputStream` to be okay with returning a half-filled buffer,
   * -> triggering the correctness error in the code that assumes `read(byte[])` fills the whole byte array,
   * -> breaking the file parse.
   
   A cheeky workaround is to wrap the stream provided to `Pack200CompressorInputStream` in one that returns `false` in `markSupported` and `Integer.MAX_VALUE` in `available`, this makes `BufferedInputStream` do your work for you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] garydgregory commented on pull request #360: pack200: Fix FileBands misusing InputStream#read(byte[])

Posted by "garydgregory (via GitHub)" <gi...@apache.org>.
garydgregory commented on PR #360:
URL: https://github.com/apache/commons-compress/pull/360#issuecomment-1429105640

   Could one the method be reused from https://github.com/apache/commons-compress/blob/master/src/main/java/org/apache/commons/compress/utils/IOUtils.java ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org