You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by an...@apache.org on 2013/03/09 15:07:00 UTC

svn commit: r1454711 - /jena/site/trunk/content/documentation/io/output.mdtext

Author: andy
Date: Sat Mar  9 14:06:59 2013
New Revision: 1454711

URL: http://svn.apache.org/r1454711
Log:
Add draft RIOT writer documentation

Added:
    jena/site/trunk/content/documentation/io/output.mdtext

Added: jena/site/trunk/content/documentation/io/output.mdtext
URL: http://svn.apache.org/viewvc/jena/site/trunk/content/documentation/io/output.mdtext?rev=1454711&view=auto
==============================================================================
--- jena/site/trunk/content/documentation/io/output.mdtext (added)
+++ jena/site/trunk/content/documentation/io/output.mdtext Sat Mar  9 14:06:59 2013
@@ -0,0 +1,280 @@
+Title: Reading and Writing RDF in Apache Jena
+
+This page describes the RIOT (RDF I/O technology) output capabilities.
+
+See [Reading RDF](index.html) for details of the RIOT Reader system.
+
+See [Advanced RDF/XML Output](rdfxml_howto.html#advanced-rdfxml-output) 
+for details of the Jena RDF/XML writer.
+
+- [API](API)
+- [RDFFormat](rdfformat)
+- [`RDFFormat`s and Jena syntax names](rdfformats-and-jena-syntax-names)
+- [Formats](formats)
+  - [Normal Printing](normal-printing)
+  - [Pretty Printed Languages](pretty-printed-formats)
+  - [Streamed Block Formats](streamed-block-formats)
+  - [Line printed formats](line-printed-formats)
+  - [N-triples and N-Quads](n-triples-and-n-quads)
+  - [RDF/XML](rdfxml)
+- [Examples](examples)
+- [Notes](Notes)
+
+## API
+
+There are two ways to write RDF data using Apache Jena RIOT, 
+either via the `RDFDataMgr` 
+
+    RDFDataMgr.write(OutputStream, Model, RDFFormat) ;
+    RDFDataMgr.write(OutputStream, Dataset, RDFFormat) ;
+
+or using the `model` API:
+
+    model.write(output, "<i>format</i>") ;
+
+The `<i>format</i>` names are described below; they are a superset of the
+names Jena has supported before RIOT.
+
+Many variations of these methods exist.  See the full javadoc for details.
+
+## `RDFFormat`
+
+Output using RIOT depends on the format, which involves both the language (syntax)
+being written and the variant of that syntax. 
+
+The RIOT writer architecture is extensible.  The following languages
+are available as part of the standard setup.
+
+* Turtle
+* N-Triples
+* RDF/XML
+* RDF/JSON
+* TriG
+* NQuads
+
+In addition, there are variants of Trutle, TriG for pretty printing, 
+streamed output and flat output.  RDF/XML has variants for pretty printing 
+and plain output.  Jena RIOT uses `org.apache.jena.riot.RDFFormat` as a way
+to identfy the language and variant to be written.  The class contains constants
+for the standard supported formats.
+
+Note:
+
+* RDF/JSON is not JSON-LD. See the [description of RDF/JSON](rdf-json.html)].
+* N3 is treated as Turtle for output.
+
+## `RDFFormat`s and Jena syntax names
+
+The string name traditionally used in `model.write` is mapped to RIOT `RDFFormat`
+as follows:
+
+| Jena writer name     | RIOT RDFFormat   |
+|----------------------|------------------|
+| `"TURTLE"`           | `TURTLE`         |
+| `"TTL"`              | `TURTLE`         |
+| `"Turtle"`           | `TURTLE`         |
+| `"N-TRIPLES"`        | `NTRIPLES`       |
+| `"N-TRIPLE"`         | `NTRIPLES`       |
+| `"NT"`               | `NTRIPLES`       |
+| `"RDF/XML-ABBREV"`   | `RDFXML`         |
+| `"RDF/XML"`          | `RDFXML_PLAIN`   |
+| `"N3"`               | `N3`             |
+| `"RDF/JSON"`         | `RDFJSON`        |
+
+## Formats
+
+### Normal Printing
+
+A `Lang` can be used for the writer format, in which case it is mapped to
+an `RDFFormat` internally.  The normal writers are:
+
+| RDFFormat or Lang |                         |
+|-------------------|-------------------------|
+| TURTLE            | Turtle, pretty printed  |
+| TTL               | Same                    |
+| NTRIPLES          | N-triples               |
+| TRIG              | TriG, pretty printed    |
+| NQUADS            |                         |
+| RDFXML            | RDF/XML, pretty printed |
+
+Pretty printed RDF/XML is also known as RDF/XML-ABBREV
+
+### Pretty Printed Languages
+
+All Turtle and TriG formats use
+prefix names, and short forms for literals.
+
+The pretty printed versions of Turtle and TriG prints 
+data with the same subject in the same graph together.
+All the properties for a given subject are sorted 
+into a predefined order. RDF lists are printed as
+`(...)` and `[...]` is used for blank nodes where possible.  
+
+The analysis for determing what can be pretty printed requires
+temporary datastructures and also a scan of the whole graph before
+writing begins.  Therefore, pretty printed formats are not suitable
+for writing persistent graphs and datasets.
+
+When writing at scale use either a "blocked" version of Turtle or TriG, 
+or write N-triples/N-Quads.
+
+Example:
+
+    @prefix :      <http://example/> .
+    @prefix dc:    <http://purl.org/dc/elements/1.1/> .
+    @prefix foaf:  <http://xmlns.com/foaf/0.1/> .
+    @prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
+    
+    :book   dc:author  ( :a :b ) .
+    
+    :a      a           foaf:Person ;
+            foaf:knows  [ foaf:name  "Bob" ] ;
+            foaf:name   "Alice" .
+    
+    :b      foaf:knows  :a .
+
+Pretty printed formats:
+
+| RDFFormat      | Same as               |
+|----------------|-----------------------|
+| TURTLE_PRETTY  | TURTLE, TTL           |
+| TRIG_PRETTY    | TRIG                  |
+| RDFXML_PRETTY  | RDFXML_ABBREV, RDFXML |
+
+### Streamed Block Formats
+
+The streamed formats write triples or quads as given.  
+They group together data by adjacent subject or graph/subject
+in the output stream.
+
+The written data is like the pretty printed forms but without
+RDF lists being written in the '(...)' form, and it does not
+use the blank node form `[...]`.
+
+This gives some degree of readability while not requiring
+excessive temporary datastructure. Data larger than the size of RAM 
+can be written but blank node labels need to be tracked in order
+to use the short label form.
+
+Example:
+
+    @prefix :  <http://example/> .
+    @prefix dc:  <http://purl.org/dc/elements/1.1/> .
+    @prefix foaf:  <http://xmlns.com/foaf/0.1/> .
+    @prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
+    
+    :book   dc:author  _:b0 .
+    
+    _:b0    rdf:rest   _:b1 ;
+            rdf:first  :a .
+    
+    :a      foaf:knows  _:b2 ;
+            foaf:name   "Alice" ;
+            rdf:type    foaf:Person .
+    
+    _:b2    foaf:name  "Bob" .
+    
+    :b      foaf:knows  :a .
+    
+    _:b1    rdf:rest   rdf:nil ;
+            rdf:first  :b .
+ 
+Formats:
+
+| RDFFormat      |
+|----------------|
+| TURTLE_BLOCKS  |
+| TRIG_BLOCKS    |
+
+### Line printed 
+
+There are writers for Turtle and Trig that use the abbreviated formats for
+prefix names and short forms for literals. They write each triple or quad
+on a single line.
+
+The regularity of the output can be useful for test processing data.  
+These formats do not offer more scalabilty than the stream forms.
+
+Example:
+
+The FLAT writers abbreviates IRIs, literals and blank node labels
+but always writes one complete triple on one line (no use of `;`).
+
+    @prefix :  <http://example/> .
+    @prefix dc:  <http://purl.org/dc/elements/1.1/> .
+    @prefix foaf:  <http://xmlns.com/foaf/0.1/> .
+    @prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
+    _:b0 foaf:name "Bob" .
+    :book dc:author _:b1 .
+    _:b2 rdf:rest rdf:nil .
+    _:b2 rdf:first :b .
+    :a foaf:knows _:b0 .
+    :a foaf:name "Alice" .
+    :a rdf:type foaf:Person .
+    _:b1 rdf:rest _:b2 .
+    _:b1 rdf:first :a .
+    :b foaf:knows :a .
+
+
+| RDFFormat   |
+|-------------|
+| TURTLE_FLAT |
+| TRIG_FLAT   |
+
+### N-triples and N-Quads
+
+These provide the formats that are fastest to write, 
+and data of any size can be output.  They do not use any
+internal state. They maximise the 
+interoperability with other systems and are useful
+for database dumps. They are not human readable, 
+even at moderate scale.
+
+The files can be large but they compress well with gzip.
+Compression ratios of x8-x10 can often be obtained.
+
+Example:
+
+The N-Triples writer makes no attempt to make it's output readable.
+It uses internal blank nodes to ensure correct labeling without
+needing any writer state.
+
+  _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7fff <http://xmlns.com/foaf/0.1/name> "Bob" .
+  <http://example/book> <http://purl.org/dc/elements/1.1/author> _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffe .
+  _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffd <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
+  _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffd <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> <http://example/b> .
+  <http://example/a> <http://xmlns.com/foaf/0.1/knows> _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7fff .
+  <http://example/a> <http://xmlns.com/foaf/0.1/name> "Alice" .
+  <http://example/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
+  _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffe <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffd .
+  _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffe <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> <http://example/a> .
+  <http://example/b> <http://xmlns.com/foaf/0.1/knows> <http://example/a> .
+
+
+| RDFFormat | Other names     |
+|-----------|-----------------|
+| NTRIPLE   | NTRIPLE, NT, NQ |
+| NQUADS    | NQUADS          |
+
+### RDF/XML
+
+RIOT supports output in RDF/XML. RIOT RDFFormats defaults to pretty printed RDF/XML,
+while the jena writer writer name defaults to a streaming plain output.
+
+| RDFFormat | Other names              | Jena writer name            |
+|-----------|--------------------------|-----------------------------|
+| RDFXML    | RDFXML_PRETTY, RDF_XML_ABBREV | "RDF/XML-ABBREV" |
+| RDFXML_PLAIN |                            | "RDF/XML"        |
+
+## Examples
+
+@@TODO
+
+## Notes
+
+Using `OutputStream`s is strongly encouraged.  This allows the writers
+to manage the character encoding using UTF-8.  Using `java.io.Writer` 
+does not allow this; on platforms such as MS Windows, the default
+configuration of a `Writer` is not suitable for Turtle because
+the characte set is the platform default, and not UTF-8.
+The only use of wirters that is useful is using `java.io.StringWriter`.