You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@jena.apache.org by GitBox <gi...@apache.org> on 2022/08/31 21:46:24 UTC

[GitHub] [jena] afs opened a new pull request, #1503: GH-1501: Buffer bz2 decompression

afs opened a new pull request, #1503:
URL: https://github.com/apache/jena/pull/1503

   GitHub issue resolved: #1501
   
   Pull request Description: #1501
   
   
   
   ----
   
    - [x] Key commit messages start with the issue number (GH-xxxx or JENA-xxxx)
   
   By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the [Contributor's Agreement](https://www.apache.org/licenses/contributor-agreements.html).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a diff in pull request #1503: GH-1501: Buffer bz2 decompression

Posted by GitBox <gi...@apache.org>.
afs commented on code in PR #1503:
URL: https://github.com/apache/jena/pull/1503#discussion_r960390956


##########
jena-base/src/main/java/org/apache/jena/atlas/io/IO.java:
##########
@@ -86,13 +86,34 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo
             filename = filename.substring("file:".length());
             filename = IRILib.decodeHex(filename);
         }
-        InputStream in = new FileInputStream(filename);
+        InputStream in0 = new FileInputStream(filename);
+        InputStream in = IO.ensureBuffered(in0);
         String ext = getExtension(filename);
+
+        // Input is a file stream.
+        // https://commons.apache.org/proper/commons-compress/examples.html#Buffering :
+        // """
+        // The stream classes all wrap around streams provided by the calling
+        // code and they work on them directly without any additional
+        // buffering. On the other hand most of them will benefit from
+        // buffering so it is highly recommended that users wrap their stream
+        // in Buffered(In|Out)putStreams before using the Commons Compress
+        // API.
+        // """
+        // GZip and Snappy have internal buffering.
+        // BZip2 does not.
         switch ( ext ) {
-            case "":        return in;
-            case ext_gz:    return new GZIPInputStream(in);
-            case ext_bz2:   return new BZip2CompressorInputStream(in, true);
-            case ext_sz:    return new SnappyCompressorInputStream(in);
+            case "":
+                return in;
+            case ext_gz:
+                // Makes a small improvement (<5%) to use 8K.
+                return new GZIPInputStream(in, 8*1024);
+            case ext_bz2:
+                // Make a huge improvement. x10 faster.
+                in = IO.ensureBuffered(in);

Review Comment:
   I'll fix it.  The first one isn't intended; it's left after trying both approaches. GZzip doesn't need the buffering and t has it's own internal workspace that performs the same task (yea - open source!). Bumping it's workspace to 8K (from default 512) is better. 



##########
jena-base/src/main/java/org/apache/jena/atlas/io/IO.java:
##########
@@ -86,13 +86,34 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo
             filename = filename.substring("file:".length());
             filename = IRILib.decodeHex(filename);
         }
-        InputStream in = new FileInputStream(filename);
+        InputStream in0 = new FileInputStream(filename);
+        InputStream in = IO.ensureBuffered(in0);
         String ext = getExtension(filename);
+
+        // Input is a file stream.
+        // https://commons.apache.org/proper/commons-compress/examples.html#Buffering :
+        // """
+        // The stream classes all wrap around streams provided by the calling
+        // code and they work on them directly without any additional
+        // buffering. On the other hand most of them will benefit from
+        // buffering so it is highly recommended that users wrap their stream
+        // in Buffered(In|Out)putStreams before using the Commons Compress
+        // API.
+        // """
+        // GZip and Snappy have internal buffering.
+        // BZip2 does not.
         switch ( ext ) {
-            case "":        return in;
-            case ext_gz:    return new GZIPInputStream(in);
-            case ext_bz2:   return new BZip2CompressorInputStream(in, true);
-            case ext_sz:    return new SnappyCompressorInputStream(in);
+            case "":
+                return in;
+            case ext_gz:
+                // Makes a small improvement (<5%) to use 8K.
+                return new GZIPInputStream(in, 8*1024);
+            case ext_bz2:
+                // Make a huge improvement. x10 faster.
+                in = IO.ensureBuffered(in);

Review Comment:
   Good catch. I'll fix it.  The first one isn't intended; it's left after trying both approaches. GZzip doesn't need the buffering and t has it's own internal workspace that performs the same task (yea - open source!). Bumping it's workspace to 8K (from default 512) is better. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs merged pull request #1503: GH-1501: Buffer bz2 decompression

Posted by GitBox <gi...@apache.org>.
afs merged PR #1503:
URL: https://github.com/apache/jena/pull/1503


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] kinow commented on a diff in pull request #1503: GH-1501: Buffer bz2 decompression

Posted by GitBox <gi...@apache.org>.
kinow commented on code in PR #1503:
URL: https://github.com/apache/jena/pull/1503#discussion_r960054966


##########
jena-base/src/main/java/org/apache/jena/atlas/io/IO.java:
##########
@@ -86,13 +86,34 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo
             filename = filename.substring("file:".length());
             filename = IRILib.decodeHex(filename);
         }
-        InputStream in = new FileInputStream(filename);
+        InputStream in0 = new FileInputStream(filename);
+        InputStream in = IO.ensureBuffered(in0);
         String ext = getExtension(filename);
+
+        // Input is a file stream.
+        // https://commons.apache.org/proper/commons-compress/examples.html#Buffering :
+        // """
+        // The stream classes all wrap around streams provided by the calling
+        // code and they work on them directly without any additional
+        // buffering. On the other hand most of them will benefit from
+        // buffering so it is highly recommended that users wrap their stream
+        // in Buffered(In|Out)putStreams before using the Commons Compress
+        // API.
+        // """
+        // GZip and Snappy have internal buffering.
+        // BZip2 does not.
         switch ( ext ) {
-            case "":        return in;
-            case ext_gz:    return new GZIPInputStream(in);
-            case ext_bz2:   return new BZip2CompressorInputStream(in, true);
-            case ext_sz:    return new SnappyCompressorInputStream(in);
+            case "":
+                return in;
+            case ext_gz:
+                // Makes a small improvement (<5%) to use 8K.
+                return new GZIPInputStream(in, 8*1024);
+            case ext_bz2:
+                // Make a huge improvement. x10 faster.
+                in = IO.ensureBuffered(in);

Review Comment:
   @afs this part is looking weird.
   
   We have earlier called `InputStream in = IO.ensureBuffered(in0);`. Then if bz2, we call `in = IO.ensureBuffered(in);` again? Like twice calling `IO.ensureBuffered`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] rvesse commented on a diff in pull request #1503: GH-1501: Buffer bz2 decompression

Posted by GitBox <gi...@apache.org>.
rvesse commented on code in PR #1503:
URL: https://github.com/apache/jena/pull/1503#discussion_r960378950


##########
jena-base/src/main/java/org/apache/jena/atlas/io/IO.java:
##########
@@ -86,13 +86,34 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo
             filename = filename.substring("file:".length());
             filename = IRILib.decodeHex(filename);
         }
-        InputStream in = new FileInputStream(filename);
+        InputStream in0 = new FileInputStream(filename);
+        InputStream in = IO.ensureBuffered(in0);
         String ext = getExtension(filename);
+
+        // Input is a file stream.
+        // https://commons.apache.org/proper/commons-compress/examples.html#Buffering :
+        // """
+        // The stream classes all wrap around streams provided by the calling
+        // code and they work on them directly without any additional
+        // buffering. On the other hand most of them will benefit from
+        // buffering so it is highly recommended that users wrap their stream
+        // in Buffered(In|Out)putStreams before using the Commons Compress
+        // API.
+        // """
+        // GZip and Snappy have internal buffering.
+        // BZip2 does not.
         switch ( ext ) {
-            case "":        return in;
-            case ext_gz:    return new GZIPInputStream(in);
-            case ext_bz2:   return new BZip2CompressorInputStream(in, true);
-            case ext_sz:    return new SnappyCompressorInputStream(in);
+            case "":
+                return in;
+            case ext_gz:
+                // Makes a small improvement (<5%) to use 8K.
+                return new GZIPInputStream(in, 8*1024);
+            case ext_bz2:
+                // Make a huge improvement. x10 faster.
+                in = IO.ensureBuffered(in);

Review Comment:
   `ensureBuffered()` checks whether the passed input stream is already buffered and if so is a no-op so don't think it matters if we call it twice



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org