You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by afs <gi...@git.apache.org> on 2018/06/03 09:55:05 UTC

[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

GitHub user afs opened a pull request:

    https://github.com/apache/jena/pull/427

    JENA-1554, JENA-1555: Support bz2 compressed files directly from Java.

    JENA-1555, JENA-1554: Update awaitility ; add Apache Commons compress
    JENA-1554: Add bz2 compression/decompression


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/afs/jena compressed

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/427.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #427
    
----
commit eb9ba394f59ae5f827a54db718b032d797d1bafb
Author: Andy Seaborne <an...@...>
Date:   2018-06-03T08:51:44Z

    JENA-1555, JENA-1554: Update awaitility ; add Apache Commons compress

commit f88fbc578d02ed8925104bf5d4a03795470d9275
Author: Andy Seaborne <an...@...>
Date:   2018-06-03T09:11:13Z

    JENA-1554: Add bz2 compression/decompression
    
    Add Snappy
      default 32k block
      decompress only; compressor not available
    
    Update javadoc (RDFLanguages, BinRDF) that mentions gz.

----


---

[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/427#discussion_r192794776
  
    --- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
    @@ -158,15 +177,18 @@ static public OutputStream openOutputFileEx(String filename) throws FileNotFound
                 filename = IRILib.decode(filename) ;
             }
             OutputStream out = new FileOutputStream(filename) ;
    -        if ( filename.endsWith(".gz") )
    -            out = new GZIPOutputStream(out) ;
    +        String ext = FileOps.extension(filename);
    --- End diff --
    
    Good idea as a separate "clean up FileOps/FileUtils" item and let this PR go in now.  Got to finish sometime!



---

[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/427


---

[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on a diff in the pull request:

    https://github.com/apache/jena/pull/427#discussion_r192694578
  
    --- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
    @@ -77,10 +81,28 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo
                 filename = IRILib.decode(filename) ;
             }
             InputStream in = new FileInputStream(filename) ;
    -        if ( filename.endsWith(".gz") )
    -            in = new GZIPInputStream(in) ;
    +        String ext = FileOps.extension(filename);
    +        switch ( ext ) {
    +            case "":        return in;
    +            case "gz":      return new GZIPInputStream(in) ;
    +            case "bz2":     return new BZip2CompressorInputStream(in);
    +            case "sz":      return new SnappyCompressorInputStream(in);
    +        }
             return in ;
         }
    +
    +    private static String[] extensions = { ".gz", ".bz2", ".sz" }; 
    +    
    +    /** The filename without any compression extension, or the original filename.
    +     *  It tests for compression types handled by {@link #openFileEx}.
    +     */
    +    static public String filenameNoCompression(String filename) {
    +        for ( String ext : extensions ) {
    +            if ( filename.endsWith(ext) )
    +                return filename.substring(0, filename.length()-ext.length());
    +        }
    +        return filename;
    +    }
    --- End diff --
    
    Maybe instead
    
    ```java
        /** The filename without any compression extension, or the original filename.
         *  It tests for compression types handled by {@link #openFileEx}.
         */
        static public String filenameNoCompression(String filename) {
            if ( FilenameUtils.isExtension(filename, extensions) ) {
                return FilenameUtils.removeExtension(filename);
            }
            return filename;
        }
    ```
    
    I believe we have commons-io already in the dependencies list. There's some extra check for null bytes in the extension check... but that's not so important. Just simpler I think.


---

[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/427#discussion_r192794317
  
    --- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
    @@ -77,10 +81,28 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo
                 filename = IRILib.decode(filename) ;
             }
             InputStream in = new FileInputStream(filename) ;
    -        if ( filename.endsWith(".gz") )
    -            in = new GZIPInputStream(in) ;
    +        String ext = FileOps.extension(filename);
    +        switch ( ext ) {
    +            case "":        return in;
    +            case "gz":      return new GZIPInputStream(in) ;
    +            case "bz2":     return new BZip2CompressorInputStream(in);
    +            case "sz":      return new SnappyCompressorInputStream(in);
    +        }
             return in ;
         }
    +
    +    private static String[] extensions = { ".gz", ".bz2", ".sz" }; 
    +    
    +    /** The filename without any compression extension, or the original filename.
    +     *  It tests for compression types handled by {@link #openFileEx}.
    +     */
    +    static public String filenameNoCompression(String filename) {
    +        for ( String ext : extensions ) {
    +            if ( filename.endsWith(ext) )
    +                return filename.substring(0, filename.length()-ext.length());
    +        }
    +        return filename;
    +    }
    --- End diff --
    
    Done.


---

[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on a diff in the pull request:

    https://github.com/apache/jena/pull/427#discussion_r192605754
  
    --- Diff: pom.xml ---
    @@ -68,6 +68,7 @@
         <ver.commonslang3>3.4</ver.commonslang3>
         <ver.commonscsv>1.5</ver.commonscsv>
         <ver.commons-codec>1.11</ver.commons-codec>
    +    <ver.commons-compress>1.16.1</ver.commons-compress>
    --- End diff --
    
    1.17 was just released... maybe worth using it instead? Just received Stefan's announcement message about it in the commons mailing list.


---

[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/427#discussion_r192678383
  
    --- Diff: pom.xml ---
    @@ -68,6 +68,7 @@
         <ver.commonslang3>3.4</ver.commonslang3>
         <ver.commonscsv>1.5</ver.commonscsv>
         <ver.commons-codec>1.11</ver.commons-codec>
    +    <ver.commons-compress>1.16.1</ver.commons-compress>
    --- End diff --
    
    Yes! Thanks for the pointer.


---

[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on a diff in the pull request:

    https://github.com/apache/jena/pull/427#discussion_r192701731
  
    --- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
    @@ -158,15 +177,18 @@ static public OutputStream openOutputFileEx(String filename) throws FileNotFound
                 filename = IRILib.decode(filename) ;
             }
             OutputStream out = new FileOutputStream(filename) ;
    -        if ( filename.endsWith(".gz") )
    -            out = new GZIPOutputStream(out) ;
    +        String ext = FileOps.extension(filename);
    --- End diff --
    
    Digressing; but as we have `FilenameUtils.getExtension()` in the classpath, from commons-io, perhaps this could later be marked as `deprecated`?


---