You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2013/12/04 15:14:36 UTC
[jira] [Created] (JENA-601) Provide better support for compressed
input formats
Rob Vesse created JENA-601:
------------------------------
Summary: Provide better support for compressed input formats
Key: JENA-601
URL: https://issues.apache.org/jira/browse/JENA-601
Project: Apache Jena
Issue Type: Improvement
Components: RIOT
Affects Versions: Jena 2.11.0
Reporter: Rob Vesse
Currently Jena has little or not support for compressed input formats. There are the odd cases where some consideration is given e.g.
- {{RDFLanguages.filenameToLang()}} strips off {{.gz}} extensions to help it correctly detect file types
- HTTP responses can deal with compressed responses by virtue of Apache HttpClient
What would be nice is to have a better strategy for handling compressed inputs. For example having a registry of known compression extensions e.g. {{.gz}}, {{.bz2}}, {{.deflate}} which ARQ would strip off when trying to deduce format from the filename.
It would also be useful if the various locator implementations took compression into account when opening input streams as I'm fairly sure if you asked ARQ to open a {{foo.nt.gz}} file it would just open a raw input stream and then the reading would fail.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
Re: [jira] [Created] (JENA-601) Provide better support for compressed
input formats
Posted by Andy Seaborne <an...@apache.org>.
On 04/12/13 14:14, Rob Vesse (JIRA) wrote:
> It would also be useful if the various locator implementations took compression into account when opening input streams
I would not built that in at the lowest level - LocatorFile should set
an appropriate content type for the compressed stream. The caller may
just want to do byte-copy of a stream, leaving it compressed.
> I'm fairly sure if you asked ARQ to open a {{foo.nt.gz}} file it would just open a raw input stream and then the reading would fail.
If true, surely the fix is to make ARQ go through RIOT which does handle
at least .gz.
Andy