You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2022/08/12 08:52:13 UTC

[GitHub] [commons-compress] dweiss opened a new pull request, #306: COMPRESS-623: Speed up ZipFile's raw stream copying by not seeking for local headers until they're needed

dweiss opened a new pull request, #306:
URL: https://github.com/apache/commons-compress/pull/306

   See https://issues.apache.org/jira/browse/COMPRESS-623


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] dweiss commented on pull request #306: COMPRESS-623: make ZipFile's getRawInputStream usable when local headers are not read

Posted by GitBox <gi...@apache.org>.
dweiss commented on PR #306:
URL: https://github.com/apache/commons-compress/pull/306#issuecomment-1221371921

   That's fine. Not my first time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] dweiss commented on pull request #306: COMPRESS-623: Speed up ZipFile's raw stream copying by not seeking for local headers until they're needed

Posted by GitBox <gi...@apache.org>.
dweiss commented on PR #306:
URL: https://github.com/apache/commons-compress/pull/306#issuecomment-1212955649

   Eh. It wasn't me - it was intellij for some reason... That's why I like spotless/google formatter... makes life much easier for folks using different tools. Give me a minute to revert those changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] garydgregory commented on pull request #306: COMPRESS-623: make ZipFile's getRawInputStream usable when local headers are not read

Posted by GitBox <gi...@apache.org>.
garydgregory commented on PR #306:
URL: https://github.com/apache/commons-compress/pull/306#issuecomment-1221371134

   You can expect bursts of activity and inactivity, we're all volunteers here. I'll take a look over the weekend hopefully.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] dweiss commented on a diff in pull request #306: COMPRESS-623: make ZipFile's getRawInputStream usable when local headers are not read

Posted by GitBox <gi...@apache.org>.
dweiss commented on code in PR #306:
URL: https://github.com/apache/commons-compress/pull/306#discussion_r944327197


##########
src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java:
##########
@@ -638,13 +646,11 @@ public InputStream getInputStream(final ZipArchiveEntry ze)
         }
         // cast validity is checked just above
         ZipUtil.checkRequestedFeatures(ze);
-        final long start = getDataOffset(ze);
 
         // doesn't get closed if the method is not supported - which
         // should never happen because of the checkRequestedFeatures
         // call above
-        final InputStream is =
-            new BufferedInputStream(createBoundedInputStream(start, ze.getCompressedSize())); //NOSONAR
+        final InputStream is = new BufferedInputStream(getRawInputStream(ze)); //NOSONAR
         switch (ZipMethod.getMethodByCode(ze.getMethod())) {

Review Comment:
   This replaces duplicate code by just requesting a raw compressed stream, which should be always available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] garydgregory merged pull request #306: COMPRESS-623: make ZipFile's getRawInputStream usable when local headers are not read

Posted by GitBox <gi...@apache.org>.
garydgregory merged PR #306:
URL: https://github.com/apache/commons-compress/pull/306


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] dweiss commented on pull request #306: COMPRESS-623: make ZipFile's getRawInputStream usable when local headers are not read

Posted by GitBox <gi...@apache.org>.
dweiss commented on PR #306:
URL: https://github.com/apache/commons-compress/pull/306#issuecomment-1221356359

   Ping. There's not much for me to do here; I've reverted whitespace changes as requested. You probably looked at an outdated diff, when you requested changes, @garydgregory ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] dweiss commented on a diff in pull request #306: COMPRESS-623: Speed up ZipFile's raw stream copying by not seeking for local headers until they're needed

Posted by GitBox <gi...@apache.org>.
dweiss commented on code in PR #306:
URL: https://github.com/apache/commons-compress/pull/306#discussion_r944327197


##########
src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java:
##########
@@ -638,13 +646,11 @@ public InputStream getInputStream(final ZipArchiveEntry ze)
         }
         // cast validity is checked just above
         ZipUtil.checkRequestedFeatures(ze);
-        final long start = getDataOffset(ze);
 
         // doesn't get closed if the method is not supported - which
         // should never happen because of the checkRequestedFeatures
         // call above
-        final InputStream is =
-            new BufferedInputStream(createBoundedInputStream(start, ze.getCompressedSize())); //NOSONAR
+        final InputStream is = new BufferedInputStream(getRawInputStream(ze)); //NOSONAR
         switch (ZipMethod.getMethodByCode(ze.getMethod())) {

Review Comment:
   This replaces duplicate code with just requesting a raw compressed stream, which should be always available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] dweiss commented on a diff in pull request #306: COMPRESS-623: make ZipFile's getRawInputStream usable when local headers are not read

Posted by GitBox <gi...@apache.org>.
dweiss commented on code in PR #306:
URL: https://github.com/apache/commons-compress/pull/306#discussion_r945132989


##########
src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java:
##########
@@ -613,7 +621,7 @@ public InputStream getRawInputStream(final ZipArchiveEntry ze) {
      * @throws IOException on error
      */
     public void copyRawEntries(final ZipArchiveOutputStream target, final ZipArchiveEntryPredicate predicate)
-            throws IOException {
+        throws IOException {

Review Comment:
   I think that's an outdated diff, I have already reverted these changes. I really think code formatting should be done automatically - life's so much easier then (https://issues.apache.org/jira/browse/LUCENE-9564).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] garydgregory commented on a diff in pull request #306: COMPRESS-623: make ZipFile's getRawInputStream usable when local headers are not read

Posted by GitBox <gi...@apache.org>.
garydgregory commented on code in PR #306:
URL: https://github.com/apache/commons-compress/pull/306#discussion_r945131683


##########
src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java:
##########
@@ -613,7 +621,7 @@ public InputStream getRawInputStream(final ZipArchiveEntry ze) {
      * @throws IOException on error
      */
     public void copyRawEntries(final ZipArchiveOutputStream target, final ZipArchiveEntryPredicate predicate)
-            throws IOException {
+        throws IOException {

Review Comment:
   Just get your IDE to behave ;-) The more noise, the longer it takes to review and it's annoying as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] dweiss commented on pull request #306: COMPRESS-623: Speed up ZipFile's raw stream copying by not seeking for local headers until they're needed

Posted by GitBox <gi...@apache.org>.
dweiss commented on PR #306:
URL: https://github.com/apache/commons-compress/pull/306#issuecomment-1212962568

   Removed those whitespace changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] codecov-commenter commented on pull request #306: COMPRESS-623: Speed up ZipFile's raw stream copying by not seeking for local headers until they're needed

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #306:
URL: https://github.com/apache/commons-compress/pull/306#issuecomment-1212884617

   # [Codecov](https://codecov.io/gh/apache/commons-compress/pull/306?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#306](https://codecov.io/gh/apache/commons-compress/pull/306?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3c6b0b0) into [master](https://codecov.io/gh/apache/commons-compress/commit/7cae698b1b593de908722b5089cf8506a9e67057?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7cae698) will **decrease** coverage by `0.03%`.
   > The diff coverage is `53.12%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #306      +/-   ##
   ============================================
   - Coverage     80.05%   80.01%   -0.04%     
   + Complexity     6621     6620       -1     
   ============================================
     Files           339      339              
     Lines         25416    25420       +4     
     Branches       4199     4200       +1     
   ============================================
   - Hits          20346    20341       -5     
   - Misses         3473     3480       +7     
   - Partials       1597     1599       +2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/commons-compress/pull/306?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...apache/commons/compress/archivers/zip/ZipFile.java](https://codecov.io/gh/apache/commons-compress/pull/306/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY29tcHJlc3MvYXJjaGl2ZXJzL3ppcC9aaXBGaWxlLmphdmE=) | `77.00% <53.12%> (-1.77%)` | :arrow_down: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] dweiss commented on a diff in pull request #306: COMPRESS-623: Speed up ZipFile's raw stream copying by not seeking for local headers until they're needed

Posted by GitBox <gi...@apache.org>.
dweiss commented on code in PR #306:
URL: https://github.com/apache/commons-compress/pull/306#discussion_r944253806


##########
src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java:
##########
@@ -595,11 +598,16 @@ public InputStream getRawInputStream(final ZipArchiveEntry ze) {
         if (!(ze instanceof Entry)) {
             return null;
         }
-        final long start = ze.getDataOffset();
-        if (start == EntryStreamOffsets.OFFSET_UNKNOWN) {
-            return null;
+
+        try {
+            final long start = getDataOffset(ze);
+            if (start == EntryStreamOffsets.OFFSET_UNKNOWN) {
+                return null;
+            }
+            return createBoundedInputStream(start, ze.getCompressedSize());
+        } catch (IOException e) {
+            throw new UncheckedIOException(e);

Review Comment:
   It'd be probably cleaner to change the signature of getRawInputStream to throw IOException but this would be backward-incompatible.



##########
src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java:
##########
@@ -595,11 +598,16 @@ public InputStream getRawInputStream(final ZipArchiveEntry ze) {
         if (!(ze instanceof Entry)) {
             return null;
         }
-        final long start = ze.getDataOffset();
-        if (start == EntryStreamOffsets.OFFSET_UNKNOWN) {
-            return null;
+
+        try {
+            final long start = getDataOffset(ze);
+            if (start == EntryStreamOffsets.OFFSET_UNKNOWN) {
+                return null;
+            }
+            return createBoundedInputStream(start, ze.getCompressedSize());
+        } catch (IOException e) {
+            throw new UncheckedIOException(e);

Review Comment:
   I also don't know but it seems like the returned offset from getDataOffset should never be unknown so the check there maybe removed.



##########
src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java:
##########
@@ -613,7 +621,7 @@ public InputStream getRawInputStream(final ZipArchiveEntry ze) {
      * @throws IOException on error
      */
     public void copyRawEntries(final ZipArchiveOutputStream target, final ZipArchiveEntryPredicate predicate)
-            throws IOException {
+        throws IOException {

Review Comment:
   Sorry for whitespace changes - I can revert these (any way to reformat automatically to conform to the convention used?).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-compress] dweiss commented on pull request #306: COMPRESS-623: make ZipFile's getRawInputStream usable when local headers are not read

Posted by GitBox <gi...@apache.org>.
dweiss commented on PR #306:
URL: https://github.com/apache/commons-compress/pull/306#issuecomment-1213893701

   I was a bit lazy and wanted to piggyback too many changes, sorry about this. I've removed linked list removal for now and just left the changes related to getRawInputStream - hope it'll make them clearer to read. I've ran ```mvn test -Prun-zipit``` and everything passed. I'm not sure why getRawInputStream wasn't used inside getInputStream itself but if you take a look at the patch, this seems like a better decision to me.
   
   The only vital non-backward compatible change is the additional throws IOException added to getRawInputStream. I think this is more transparent in the end than hiding the exception under an unchecked exception (or swallowing it). People who use the API will most likely not even notice the change because they catch/handle IOException with other methods of ZipFile.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org