You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by an...@apache.org on 2014/09/03 16:59:46 UTC

svn commit: r1622276 - in /jena/site/trunk/content/documentation/io: index.mdtext streaming-io.mdtext

Author: andy
Date: Wed Sep  3 14:59:45 2014
New Revision: 1622276

URL: http://svn.apache.org/r1622276
Log:
Documentation on using StreamRDF for stream processing

Added:
    jena/site/trunk/content/documentation/io/streaming-io.mdtext
Modified:
    jena/site/trunk/content/documentation/io/index.mdtext

Modified: jena/site/trunk/content/documentation/io/index.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/io/index.mdtext?rev=1622276&r1=1622275&r2=1622276&view=diff
==============================================================================
--- jena/site/trunk/content/documentation/io/index.mdtext (original)
+++ jena/site/trunk/content/documentation/io/index.mdtext Wed Sep  3 14:59:45 2014
@@ -6,6 +6,7 @@ This page details the setup of RDF I/O t
 * [Commands](#command-line-tools)
 * [Reading RDF in Jena](rdf-input.html)
 * [Writing RDF in Jena](rdf-output.html)
+* [Working with RDF Streams](streaming-io.html)
 * [Additional details on working with RDF/XML](rdfxml_howto.html)
 
 ## Formats

Added: jena/site/trunk/content/documentation/io/streaming-io.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/io/streaming-io.mdtext?rev=1622276&view=auto
==============================================================================
--- jena/site/trunk/content/documentation/io/streaming-io.mdtext (added)
+++ jena/site/trunk/content/documentation/io/streaming-io.mdtext Wed Sep  3 14:59:45 2014
@@ -0,0 +1,103 @@
+Title: Working with RDF Streams in Apache Jena
+
+Jena has operations useful in processing RDF in a streaming
+fashion. Streaming can be used for manipulating RDF at scale.  Jena
+provides high performance readers and writers for all standard RDF formats,
+and it can be extended with custom formats.
+
+The [RDF Thrift](http://afs.github.io/rdf-thrift) provides the highest
+input parsing performance.  N-Triples/N-Quads provide the highest
+input parsing performance using W3C Standards.
+
+Files ending in `.gz` are assumed to be gzip-compressed. Input and output
+to such files takes this into account, including looking for the other file
+extension.  `data.nt.gz` is a parsed as a gzip-compressed N-Triples file.
+
+== `StreamRDF`
+
+The central abstraction is `StreamRDF` which is an interface for streamed
+RDF data.  It covers triples and quads, and also parser events such as
+prefix settings and base URI declarations.
+
+```
+public interface StreamRDF {}
+{
+    /** Start parsing */
+    public void start() ;
+   
+    /** Triple emitted */
+    public void triple(Triple triple) ;
+
+    /** Quad emitted */
+    public void quad(Quad quad) ;
+
+    /** base declaration seen */
+    public void base(String base) ;
+
+    /** prefix declaration seen */
+    public void prefix(String prefix, String iri) ;
+
+    /** Finish parsing */
+    public void finish() ;
+}
+```
+
+There are utilities to help:
+
+* `StreamRDFLib` -- create `StreamRDF` objects
+* `StreamOps` -- helpers for sending RDF data to `StreamRDF` objects
+
+== Reading data
+
+All parses of RDF syntaxes provided by RIOT are streaming with the
+exception of JSON-LD.  A JSON object can have members in any order so the
+parser may need the whole top-level object in order to have the information
+needed for parsing.
+
+The `parse` functions of `RDFDataMgr` direct the output of the parser to a
+`StreamRDF`.  For example:
+
+    StreamRDF destinination = ... 
+    RDFDataMgr.parse(destination, "http://example/data.ttl") ;
+
+reads the remote URL, with content negotiation, and send the triples to the
+`destination`.
+
+== Writing data
+
+Not all RDF formats are suitable for writing as a stream.  Formats that
+provide pretty printing (for example the default `RDFFormat` for each of
+Turtle, TriG and RDF/XML) require analysis of the whole of a model in order
+to determine nestable structures of blank nodes and for using specific
+syntax for RDF lists.
+
+These languages can be used for streaming output but with an appearance
+that is necessarily "less pretty".
+See ["Streamed Block Formats"](rdf-output.html#streamed-block-formats) 
+for details.
+
+The `StreamRDFWriter` class has functions that write graphs and datasets
+using a streaming writer and also provides for the creation of
+an`StreamRDF` backed by a stream-based writer
+
+    StreamWriter.write(output, model.getGraph(), lang) ;
+
+which can be done as:
+
+    StreamRDF writer = StreamWriter.getWriterStream(output, lang) ;
+    StreamOps.graphToStream(writer, model.getGraph()) ;
+
+
+N-Triples and N-Quads are always written as a stream.
+
+| Lang             | RDFFormat                  |
+|------------------|----------------------------|
+| `Lang.TURTLE`    | `RDFFormat.TURTLE_BLOCKS`  |
+|                  | `RDFFormat.TURTLE_FLAT`    |
+| `Lang.TRIG`      | `RDFFormat.TRIG_BLOCKS`    |
+|                  | `RDFFormat.TRIG_FLAT`      |
+| `Lang.NTRIPLES`  | `RDFFormat.NTRIPLES_UTF8`  |
+|                  | `RDFFormat.NTRIPLES_ASCII` |
+| `Lang.NQUADS`    | `RDFFormat.NQUADS_UTF8`    |
+|                  | `RDFFormat.NQUADS_ASCII`   |
+| `Lang.RDFTHRIFT` | `RDFFormat.RDF_THRIFT`     |