You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2013/12/04 15:14:36 UTC

[jira] [Created] (JENA-601) Provide better support for compressed input formats

Rob Vesse created JENA-601:
------------------------------

             Summary: Provide better support for compressed input formats
                 Key: JENA-601
                 URL: https://issues.apache.org/jira/browse/JENA-601
             Project: Apache Jena
          Issue Type: Improvement
          Components: RIOT
    Affects Versions: Jena 2.11.0
            Reporter: Rob Vesse


Currently Jena has little or not support for compressed input formats.  There are the odd cases where some consideration is given e.g.

- {{RDFLanguages.filenameToLang()}} strips off {{.gz}} extensions to help it correctly detect file types
- HTTP responses can deal with compressed responses by virtue of Apache HttpClient

What would be nice is to have a better strategy for handling compressed inputs.  For example having a registry of known compression extensions e.g. {{.gz}}, {{.bz2}}, {{.deflate}} which ARQ would strip off when trying to deduce format from the filename.

It would also be useful if the various locator implementations took compression into account when opening input streams as I'm fairly sure if you asked ARQ to open a {{foo.nt.gz}} file it would just open a raw input stream and then the reading would fail.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Re: [jira] [Created] (JENA-601) Provide better support for compressed input formats

Posted by Andy Seaborne <an...@apache.org>.
On 04/12/13 14:14, Rob Vesse (JIRA) wrote:
> It would also be useful if the various locator implementations took compression into account when opening input streams

I would not built that in at the lowest level - LocatorFile should set 
an appropriate content type for the compressed stream.  The caller may 
just want to do byte-copy of a stream, leaving it compressed.

> I'm fairly sure if you asked ARQ to open a {{foo.nt.gz}} file it would just open a raw input stream and then the reading would fail.

If true, surely the fix is to make ARQ go through RIOT which does handle 
at least .gz.

	Andy