You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Stian Soiland-Reyes (JIRA)" <ji...@apache.org> on 2015/06/08 15:59:00 UTC

[jira] [Issue Comment Deleted] (JENA-959) riot: gzip output option

     [ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stian Soiland-Reyes updated JENA-959:
-------------------------------------
    Comment: was deleted

(was: Yeah, either should work. It might be worth also having explicit compression support for input formats.. FOr instance now it works with:

{code}
    riot --syntax=turtle chembl_20.0_target_targetcmpt_ls.ttl.gz

<http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022> <http://www.w3.org/2004/02/skos/core#relatedMatch> <http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7619> .
<http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022> <http://www.w3.org/2004/02/skos/core#relatedMatch> <http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7612> .
<http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022> <http://www.w3.org/2004/02/skos/core#relatedMatch> <http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7611> .

{code}

but it is still guessing the .gz from the filename.. so I can't do the same if I have piped in a gziped stream or don't have a valid extension:

{code}
stain@biggie-utopic:~/Downloads$ riot --syntax=nquads fred
stain@biggie-utopic:~/Downloads$ riot --syntax=turtle fred
Exception in thread "main" org.apache.jena.atlas.RuntimeIOException: java.nio.charset.MalformedInputException: Input length = 1
	at org.apache.jena.atlas.io.IO.exception(IO.java:222)
	at org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
	at org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
	at org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
	at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:241)
	at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:235)
	at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:157)
	at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:98)
	at org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41)
	at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:138)
	at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:180)
	at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:267)
	at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:185)
	at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:175)
	at riotcmd.CmdLangParse.exec(CmdLangParse.java:148)
	at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
	at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
	at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
	at riotcmd.riot.main(riot.java:35)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
	at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.Read
{code}


So for this I would appreciate if --syntax supported the same compression option:

{code}
stain@biggie-utopic:~/Downloads$ riot --syntax=turtle.gz fred
Can not detemine the synatx from 'turtle.gz'
{code})

> riot: gzip output option
> ------------------------
>
>                 Key: JENA-959
>                 URL: https://issues.apache.org/jira/browse/JENA-959
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: RIOT
>            Reporter: Stian Soiland-Reyes
>            Priority: Trivial
>
> The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats.
> This can of course also be achieved with piping and gzip, but that is easily platform-specific. Writing *.format.gz with the command line is probably as much within remit of someone using riot on the command line as for reading those.
> So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
> For example:
> {code}
> stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz 
> Not recognized as an RDF language : 'nquads.gz'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)