You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by an...@apache.org on 2021/09/26 08:54:11 UTC
[jena-site] branch main updated: Protobuf (#69)
This is an automated email from the ASF dual-hosted git repository.
andy pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/jena-site.git
The following commit(s) were added to refs/heads/main by this push:
new 4a04f19 Protobuf (#69)
4a04f19 is described below
commit 4a04f19e4f3f2b568d21276216913b56cdb131dd
Author: Andy Seaborne <an...@apache.org>
AuthorDate: Sun Sep 26 09:54:06 2021 +0100
Protobuf (#69)
* Remove Elephas from the 'Learn' dropdown
* Doc for RDF Binary with Protobuf
---
layouts/_default/baseof.html | 1 -
source/documentation/io/__index.md | 28 +++---
source/documentation/io/rdf-binary.md | 161 ++++++++++++++++++++++++++++++--
source/documentation/io/rdf-input.md | 18 ++--
source/documentation/io/rdf-output.md | 27 +++---
source/documentation/io/streaming-io.md | 5 +-
6 files changed, 195 insertions(+), 45 deletions(-)
diff --git a/layouts/_default/baseof.html b/layouts/_default/baseof.html
index 9067cd4..f683f7f 100644
--- a/layouts/_default/baseof.html
+++ b/layouts/_default/baseof.html
@@ -83,7 +83,6 @@
<li><a href="/documentation/shex/index.html">ShEx</a></li>
<li><a href="/documentation/rdfstar/index.html">RDF-star</a></li>
<li><a href="/documentation/tools/index.html">Command-line tools</a></li>
- <li><a href="/documentation/hadoop/index.html">Elephas - tools for RDF on Hadoop</a></li>
<li><a href="/documentation/jdbc/index.html">SPARQL over JDBC</a></li>
<li><a href="/documentation/permissions/index.html">Permissions</a></li>
<li><a href="/documentation/assembler/index.html">Assembler</a></li>
diff --git a/source/documentation/io/__index.md b/source/documentation/io/__index.md
index 2f34b40..59a76da 100644
--- a/source/documentation/io/__index.md
+++ b/source/documentation/io/__index.md
@@ -34,7 +34,7 @@ See "[Reading JSON-LD 1.1](json-ld-11.html)" for additional setup and use for
reading JSON-LD 1.1. JSON-LD 1.0 is the current default in Jena.
RDF Binary is a binary encoding of RDF (graphs and datasets) that can be useful
-for fast parsing. See [RDF Binary using Apache Thrift](rdf-binary.html).
+for fast parsing. See [RDF Binary](rdf-binary.html).
## Command line tools
@@ -49,18 +49,20 @@ These can be called directly as Java programs:
The file extensions understood are:
| Extension | Language |
-|-----------|------------|
-| `.ttl` | Turtle |
-| `.nt` | N-Triples |
-| `.nq` | N-Quads |
-| `.trig` | TriG |
-| `.rdf` | RDF/XML |
-| `.owl` | RDF/XML |
-| `.jsonld` | JSON-LD |
-| `.trdf` | RDF Thrift |
-| `.rt` | RDF Thrift |
-| `.rj` | RDF/JSON |
-| `.trix` | TriX |
+|-----------|--------------|
+| `.ttl` | Turtle |
+| `.nt` | N-Triples |
+| `.nq` | N-Quads |
+| `.trig` | TriG |
+| `.rdf` | RDF/XML |
+| `.owl` | RDF/XML |
+| `.jsonld` | JSON-LD |
+| `.trdf` | RDF Thrift |
+| `.rt` | RDF Thrift |
+| `.rpb | RDF Protobuf |
+| `.pbrdf` | RDF Protobuf |
+| `.rj` | RDF/JSON |
+| `.trix` | TriX |
`.n3` is supported but only as a synonym for Turtle.
diff --git a/source/documentation/io/rdf-binary.md b/source/documentation/io/rdf-binary.md
index 8f36982..7b1fa89 100644
--- a/source/documentation/io/rdf-binary.md
+++ b/source/documentation/io/rdf-binary.md
@@ -3,7 +3,9 @@ title: RDF Binary using Apache Thrift
---
"RDF Binary" is a efficient format for RDF and RDF-related data using
-[Apache Thrift](https://thrift.apache.org/) as the binary encoding.
+[Apache Thrift](https://thrift.apache.org/)
+or [Google Protocol Buffers](https://developers.google.com/protocol-buffers)
+as the binary data encoding.
The W3C standard RDF syntaxes are text or XML based. These incur costs in
parsing; the most human-readable formats also incur high costs to write, and
@@ -16,14 +18,14 @@ terms, then builds data formats for RDF graphs, RDF datasets, and for
SPARQL result sets. This gives a basis for high-performance linked data
systems.
-[Apache Thrift](https://thrift.apache.org/) provides an efficient,
-wide-used binary encoding layer with a large number of language bindings.
+[Thrift](https://thrift.apache.org/) and
+[Protobuf](https://developers.google.com/protocol-buffers) provides efficient,
+widely-used, binary encoding layers each with a large number of language
+bindings.
For more details of [RDF Thrift](http://afs.github.io/rdf-thrift).
-This pages gives the details of RDF Binary encoding in [Apache Thrift](http://thrift.apache.org/).
-
-## Thrift encoding of RDF Terms {#encoding-terms}
+## Thrift encoding of RDF Terms {#encoding-terms-thrift}
RDF Thrift uses the Thrift compact protocol.
@@ -84,7 +86,7 @@ Source: [BinaryRDF.thrift](https://github.com/apache/jena/blob/main/jena-arq/Gra
12: RDF_Decimal valDecimal
}
-### Thrift encoding of Triples, Quads and rows. {#encoding-tuples}
+### Thrift encoding of Triples, Quads and rows. {#encoding-thrift-tuples}
struct RDF_Triple {
1: required RDF_Term S
@@ -104,7 +106,7 @@ Source: [BinaryRDF.thrift](https://github.com/apache/jena/blob/main/jena-arq/Gra
2: required string uri ;
}
-### Thrift encoding of RDF Graphs and RDF Datasets {#encoding-graphs-datasets}
+### Thrift encoding of RDF Graphs and RDF Datasets {#encoding-thrift-graphs-datasets}
union RDF_StreamRow {
1: RDF_PrefixDecl prefixDecl
@@ -116,7 +118,7 @@ RDF Graphs are encoded as a stream of `RDF_Triple` and `RDF_PrefixDecl`.
RDF Datasets are encoded as a stream of `RDF_Triple`, `RDF-Quad` and `RDF_PrefixDecl`.
-### Thrift encoding of SPARQL Result Sets {#encoding-result-sets}
+### Thrift encoding of SPARQL Result Sets {#encoding-thrift-result-sets}
A SPARQL Result Set is encoded as a list of variables (the header), then
a stream of rows (the results).
@@ -128,3 +130,144 @@ a stream of rows (the results).
struct RDF_DataTuple {
1: list<RDF_Term> row
}
+
+## Protobuf encoding of RDF Terms {#encoding-terms-protobuf}
+
+The Protobuf schema is simialr.
+
+Source:
+[binary-rdf.proto](https://github.com/apache/jena/blob/main/jena-arq/Grammar/RDF-Protobuf/binary-rdf.proto)
+
+Streaming isused to allow for abitrary size graphs. Therefore the steram items
+(`RDF_StreamRow` below) are written with an initial length (`writeDelimitedTo`
+in the Java API).
+
+See
+[Protobuf Techniques Streaming](https://developers.google.com/protocol-buffers/docs/techniques#streaming).
+
+```
+syntax = "proto3";
+
+option java_package = "org.apache.jena.riot.protobuf.wire" ;
+
+// Prefer one file with static inner classes.
+option java_outer_classname = "PB_RDF" ;
+// Optimize for speed (default)
+option optimize_for = SPEED ;
+
+//option java_multiple_files = true;
+// ==== RDF Term Definitions
+
+message RDF_IRI {
+ string iri = 1 ;
+}
+
+// A prefix name (abbrev for an IRI)
+message RDF_PrefixName {
+ string prefix = 1 ;
+ string localName = 2 ;
+}
+
+message RDF_BNode {
+ string label = 1 ;
+ // 2 * fixed64
+}
+
+// Common abbreviations for datatypes and other URIs?
+// union with additional values.
+
+message RDF_Literal {
+ string lex = 1 ;
+ oneof literalKind {
+ bool simple = 9 ;
+ string langtag = 2 ;
+ string datatype = 3 ;
+ RDF_PrefixName dtPrefix = 4 ;
+ }
+}
+
+message RDF_Decimal {
+ sint64 value = 1 ;
+ sint32 scale = 2 ;
+}
+
+message RDF_Var {
+ string name = 1 ;
+}
+
+message RDF_ANY { }
+
+message RDF_UNDEF { }
+
+message RDF_REPEAT { }
+
+message RDF_Term {
+ oneof term {
+ RDF_IRI iri = 1 ;
+ RDF_BNode bnode = 2 ;
+ RDF_Literal literal = 3 ;
+ RDF_PrefixName prefixName = 4 ;
+ RDF_Var variable = 5 ;
+ RDF_Triple tripleTerm = 6 ;
+ RDF_ANY any = 7 ;
+ RDF_UNDEF undefined = 8 ;
+ RDF_REPEAT repeat = 9 ;
+
+ // Value forms of literals.
+ sint64 valInteger = 20 ;
+ double valDouble = 21 ;
+ RDF_Decimal valDecimal = 22 ;
+ }
+}
+
+// === StreamRDF items
+
+message RDF_Triple {
+ RDF_Term S = 1 ;
+ RDF_Term P = 2 ;
+ RDF_Term O = 3 ;
+}
+
+message RDF_Quad {
+ RDF_Term S = 1 ;
+ RDF_Term P = 2 ;
+ RDF_Term O = 3 ;
+ RDF_Term G = 4 ;
+}
+
+// Prefix declaration
+message RDF_PrefixDecl {
+ string prefix = 1;
+ string uri = 2 ;
+}
+
+// StreamRDF
+message RDF_StreamRow {
+ oneof row {
+ RDF_PrefixDecl prefixDecl = 1 ;
+ RDF_Triple triple = 2 ;
+ RDF_Quad quad = 3 ;
+ RDF_IRI base = 4 ;
+ }
+}
+
+message RDF_Stream {
+ repeated RDF_StreamRow row = 1 ;
+}
+
+// ==== SPARQL Result Sets
+
+message RDF_VarTuple {
+ repeated RDF_Var vars = 1 ;
+}
+
+message RDF_DataTuple {
+ repeated RDF_Term row = 1 ;
+}
+
+// ==== RDF Graph
+
+message RDF_Graph {
+ repeated RDF_Triple triple = 1 ;
+}
+```
diff --git a/source/documentation/io/rdf-input.md b/source/documentation/io/rdf-input.md
index feaf8d7..46ee2df 100644
--- a/source/documentation/io/rdf-input.md
+++ b/source/documentation/io/rdf-input.md
@@ -67,18 +67,18 @@ as:
The following is a suggested Apache httpd .htaccess file:
- AddType text/turtle .ttl
- AddType application/rdf+xml .rdf
- AddType application/n-triples .nt
+ AddType text/turtle .ttl
+ AddType application/rdf+xml .rdf
+ AddType application/n-triples .nt
- AddType application/ld+json .jsonld
- AddType application/owl+xml .owl
+ AddType application/ld+json .jsonld
- AddType text/trig .trig
- AddType application/n-quads .nq
+ AddType text/trig .trig
+ AddType application/n-quads .nq
- AddType application/trix+xml .trix
- AddType application/rdf+thrift .trdf
+ AddType application/trix+xml .trix
+ AddType application/rdf+thrift .rt
+ AddType application/rdf+protobuf .rpb
### Example 1 : Using the RDFDataMgr {#using-rdfdatamgr}
diff --git a/source/documentation/io/rdf-output.md b/source/documentation/io/rdf-output.md
index 4f078cf..fe75c46 100644
--- a/source/documentation/io/rdf-output.md
+++ b/source/documentation/io/rdf-output.md
@@ -17,7 +17,7 @@ See [Reading RDF](rdf-input.html) for details of the RIOT Reader system.
- [Turtle and Trig format options](#opt-turtle-trig)
- [N-Triples and N-Quads](#n-triples-and-n-quads)
- [JSON-LD](#json-ld)
- - [RDF Binary](#rdf-thrift)
+ - [RDF Binary](#rdf-binary)
- [RDF/XML](#rdfxml)
- [Examples](#examples)
- [Notes](#notes)
@@ -110,9 +110,10 @@ an `RDFFormat` internally. The normal writers are:
| RDFXML | RDF/XML, pretty printed |
| RDFJSON | |
| TRIX | |
-| RDFTHRFT | RDF Thrift |
+| RDFTHRFT | RDF Binary Thrift |
+| RDFPROTO | RDF Binary Protobuf |
-Pretty printed RDF/XML is also known as RDF/XML-ABBREV
+Pretty printed RDF/XML is also known as RDF/XML-ABBREV.
### Pretty Printed Languages
@@ -369,21 +370,25 @@ cases.
What can be done, and how it can be, is explained in the
[sample code](https://github.com/apache/jena/tree/main/jena-arq/src-examples/arq/examples/riot/Ex_WriteJsonLD.java).
-### RDF Binary {#rdf-thrift}
+### RDF Binary {#rdf-binary}
[This is a binary encoding](rdf-binary.html) using
-[Apache Thrift](https://thrift.apache.org/) for RDF Graphs
+[Apache Thrift](https://thrift.apache.org/) or
+[Google Protocol Buffers](https://developers.google.com/protocol-buffers)
+for RDF Graphs
and RDF Datasets, as well as SPARQL Result Sets, and it provides faster parsing
compared to the text-based standardised syntax such as N-triples, Turtle or RDF/XML.
-| RDFFormat |
-|------------------|
-| RDFTHRIFT |
-| RDFTHRIFT_VALUES |
+| RDFFormat |
+|-------------------|
+| RDF_THRIFT |
+| RDF_THRIFT_VALUES |
+| RDF_PROTO |
+| RDF_PROTO_VALUES |
-`RDFTHRIFT_VALUES` is a variant where numeric values are written as values,
+`RDF_THRIFT_VALUES` and `RDF_PROTO_VALUES` are variants where numeric values are written as values,
not as lexical format and datatype. See the
-[description of RDF Thrift](http://afs.github.io/rdf-thrift)
+[description of RDF Binary](https://rdf-binary.html).
for discussion.
### RDF/XML {#rdfxml}
diff --git a/source/documentation/io/streaming-io.md b/source/documentation/io/streaming-io.md
index dc866d8..73842a3 100644
--- a/source/documentation/io/streaming-io.md
+++ b/source/documentation/io/streaming-io.md
@@ -7,8 +7,8 @@ fashion. Streaming can be used for manipulating RDF at scale. Jena
provides high performance readers and writers for all standard RDF formats,
and it can be extended with custom formats.
-The [RDF Binary using Apache Thrift](rdf-binary.html) provides the highest
-input parsing performance. N-Triples/N-Quads provide the highest
+The [RDF Binary](rdf-binary.html) provides the highest
+input parsing performance. N-Triples/N-Quads provide the highest
input parsing performance using W3C Standards.
Files ending in `.gz` are assumed to be gzip-compressed. Input and output
@@ -105,3 +105,4 @@ N-Triples and N-Quads are always written as a stream.
| `RDFFormat.NQUADS_ASCII` | |
| `RDFFormat.TRIX` | `Lang.TRIX` |
| `RDFFormat.RDF_THRIFT` | `Lang.RDFTHRIFT` |
+| `RDFFormat.RDF_PROTO` | `Lang.RDFPROTO` |