You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by afs <gi...@git.apache.org> on 2018/06/03 09:55:05 UTC
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
GitHub user afs opened a pull request:
https://github.com/apache/jena/pull/427
JENA-1554, JENA-1555: Support bz2 compressed files directly from Java.
JENA-1555, JENA-1554: Update awaitility ; add Apache Commons compress
JENA-1554: Add bz2 compression/decompression
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/afs/jena compressed
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/jena/pull/427.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #427
----
commit eb9ba394f59ae5f827a54db718b032d797d1bafb
Author: Andy Seaborne <an...@...>
Date: 2018-06-03T08:51:44Z
JENA-1555, JENA-1554: Update awaitility ; add Apache Commons compress
commit f88fbc578d02ed8925104bf5d4a03795470d9275
Author: Andy Seaborne <an...@...>
Date: 2018-06-03T09:11:13Z
JENA-1554: Add bz2 compression/decompression
Add Snappy
default 32k block
decompress only; compressor not available
Update javadoc (RDFLanguages, BinRDF) that mentions gz.
----
---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:
https://github.com/apache/jena/pull/427#discussion_r192794776
--- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
@@ -158,15 +177,18 @@ static public OutputStream openOutputFileEx(String filename) throws FileNotFound
filename = IRILib.decode(filename) ;
}
OutputStream out = new FileOutputStream(filename) ;
- if ( filename.endsWith(".gz") )
- out = new GZIPOutputStream(out) ;
+ String ext = FileOps.extension(filename);
--- End diff --
Good idea as a separate "clean up FileOps/FileUtils" item and let this PR go in now. Got to finish sometime!
---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/jena/pull/427
---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on a diff in the pull request:
https://github.com/apache/jena/pull/427#discussion_r192694578
--- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
@@ -77,10 +81,28 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo
filename = IRILib.decode(filename) ;
}
InputStream in = new FileInputStream(filename) ;
- if ( filename.endsWith(".gz") )
- in = new GZIPInputStream(in) ;
+ String ext = FileOps.extension(filename);
+ switch ( ext ) {
+ case "": return in;
+ case "gz": return new GZIPInputStream(in) ;
+ case "bz2": return new BZip2CompressorInputStream(in);
+ case "sz": return new SnappyCompressorInputStream(in);
+ }
return in ;
}
+
+ private static String[] extensions = { ".gz", ".bz2", ".sz" };
+
+ /** The filename without any compression extension, or the original filename.
+ * It tests for compression types handled by {@link #openFileEx}.
+ */
+ static public String filenameNoCompression(String filename) {
+ for ( String ext : extensions ) {
+ if ( filename.endsWith(ext) )
+ return filename.substring(0, filename.length()-ext.length());
+ }
+ return filename;
+ }
--- End diff --
Maybe instead
```java
/** The filename without any compression extension, or the original filename.
* It tests for compression types handled by {@link #openFileEx}.
*/
static public String filenameNoCompression(String filename) {
if ( FilenameUtils.isExtension(filename, extensions) ) {
return FilenameUtils.removeExtension(filename);
}
return filename;
}
```
I believe we have commons-io already in the dependencies list. There's some extra check for null bytes in the extension check... but that's not so important. Just simpler I think.
---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:
https://github.com/apache/jena/pull/427#discussion_r192794317
--- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
@@ -77,10 +81,28 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo
filename = IRILib.decode(filename) ;
}
InputStream in = new FileInputStream(filename) ;
- if ( filename.endsWith(".gz") )
- in = new GZIPInputStream(in) ;
+ String ext = FileOps.extension(filename);
+ switch ( ext ) {
+ case "": return in;
+ case "gz": return new GZIPInputStream(in) ;
+ case "bz2": return new BZip2CompressorInputStream(in);
+ case "sz": return new SnappyCompressorInputStream(in);
+ }
return in ;
}
+
+ private static String[] extensions = { ".gz", ".bz2", ".sz" };
+
+ /** The filename without any compression extension, or the original filename.
+ * It tests for compression types handled by {@link #openFileEx}.
+ */
+ static public String filenameNoCompression(String filename) {
+ for ( String ext : extensions ) {
+ if ( filename.endsWith(ext) )
+ return filename.substring(0, filename.length()-ext.length());
+ }
+ return filename;
+ }
--- End diff --
Done.
---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on a diff in the pull request:
https://github.com/apache/jena/pull/427#discussion_r192605754
--- Diff: pom.xml ---
@@ -68,6 +68,7 @@
<ver.commonslang3>3.4</ver.commonslang3>
<ver.commonscsv>1.5</ver.commonscsv>
<ver.commons-codec>1.11</ver.commons-codec>
+ <ver.commons-compress>1.16.1</ver.commons-compress>
--- End diff --
1.17 was just released... maybe worth using it instead? Just received Stefan's announcement message about it in the commons mailing list.
---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:
https://github.com/apache/jena/pull/427#discussion_r192678383
--- Diff: pom.xml ---
@@ -68,6 +68,7 @@
<ver.commonslang3>3.4</ver.commonslang3>
<ver.commonscsv>1.5</ver.commonscsv>
<ver.commons-codec>1.11</ver.commons-codec>
+ <ver.commons-compress>1.16.1</ver.commons-compress>
--- End diff --
Yes! Thanks for the pointer.
---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on a diff in the pull request:
https://github.com/apache/jena/pull/427#discussion_r192701731
--- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
@@ -158,15 +177,18 @@ static public OutputStream openOutputFileEx(String filename) throws FileNotFound
filename = IRILib.decode(filename) ;
}
OutputStream out = new FileOutputStream(filename) ;
- if ( filename.endsWith(".gz") )
- out = new GZIPOutputStream(out) ;
+ String ext = FileOps.extension(filename);
--- End diff --
Digressing; but as we have `FilenameUtils.getExtension()` in the classpath, from commons-io, perhaps this could later be marked as `deprecated`?
---