You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by gi...@apache.org on 2022/02/20 23:23:15 UTC

[jena-site] branch asf-site updated: Updated site from main (16bc5feeda9e30004b5142ce50af431388533bba)

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/jena-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 040cc39  Updated site from main (16bc5feeda9e30004b5142ce50af431388533bba)
040cc39 is described below

commit 040cc395ce78eb764e01d9bb5ce7cac7b059869f
Author: jenkins <bu...@apache.org>
AuthorDate: Sun Feb 20 23:23:12 2022 +0000

    Updated site from main (16bc5feeda9e30004b5142ce50af431388533bba)
---
 content/documentation/index.xml                 | 12 ++--
 content/documentation/tdb/architecture.html     | 93 ++++++++++++++++++-------
 content/documentation/tdb/tdb_transactions.html | 73 ++++++++-----------
 content/index.xml                               | 12 ++--
 content/sitemap.xml                             |  4 +-
 5 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/content/documentation/index.xml b/content/documentation/index.xml
index dd4ee85..418d99b 100644
--- a/content/documentation/index.xml
+++ b/content/documentation/index.xml
@@ -1674,8 +1674,8 @@ Setting Store Parameters In TDB, there is exactly one internal object for each d
       <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
       
       <guid>https://jena.apache.org/documentation/tdb/architecture.html</guid>
-      <description>This page gives an overview of the TDB architecture. Specific details refer to TDB 0.8.
-Contents  Terminology Design  The Node Table Triple and Quad indexes Prefixes Table TDB B+Trees   Inline values Query Processing Caching on 32 and 64 bit Java systems  Terminology Terms like &amp;ldquo;table&amp;rdquo; and &amp;ldquo;index&amp;rdquo; are used in this description. They don&amp;rsquo;t directly correspond to concepts in SQL, For example, in SQL terms, there is no triple table; that can be seen as just having indexes for the table or, alternatively, there are 3 tables, each [...]
+      <description>This page gives an overview of the TDB architecture. It applies to TDB1 and TDB2 with differences noted.
+Contents  Terminology Design  The Node Table Triple and Quad indexes Prefixes Table TDB B+Trees Transactions   Inline values Query Processing Caching on 32 and 64 bit Java systems  Terminology Terms like &amp;ldquo;table&amp;rdquo; and &amp;ldquo;index&amp;rdquo; are used in this description. They don&amp;rsquo;t directly correspond to concepts in SQL, For example, in SQL terms, there is no triple table; that can be seen as just having indexes for the table or, alternatively, there are 3 [...]
     </item>
     
     <item>
@@ -1787,10 +1787,10 @@ On 32 bit platforms, TDB uses in-heap caching of file data. In practice, the JVM
       <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
       
       <guid>https://jena.apache.org/documentation/tdb/tdb_transactions.html</guid>
-      <description>TDB provides ACID transaction support through the use of write-ahead-logging.
-Use of transactions protects a TDB dataset against data corruption, unexpected process termination and system crashes and therefore use of transactions is strongly recommended.
-This feature is part of version TDB 0.9.0 and later. Databases created with version of TDB 0.8.X can be used with 0.9.X to add transactional capability.
-Contents  Overview Limitations API for Transactions  Read transactions Write transactions   Multi-threaded use Bulk loading Multi JVM Migration from TDB 0.</description>
+      <description>TDB provides ACID transaction support through the use of write-ahead-logging in TDB1 and copy-on-write MVCC structures in TDB2.
+Use of transactions protects a TDB dataset against data corruption, unexpected process termination and system crashes.
+Non-transactional use of TDB1 should be avoided; TDB2 only operates with transactions.
+Contents  Overview Limitations API for Transactions  Read transactions Write transactions   Multi-threaded use Bulk loading Multi JVM  Overview TDB2 uses MVCC via a copy-on-write mechanism.</description>
     </item>
     
     <item>
diff --git a/content/documentation/tdb/architecture.html b/content/documentation/tdb/architecture.html
index 9da83a2..fafd8fc 100644
--- a/content/documentation/tdb/architecture.html
+++ b/content/documentation/tdb/architecture.html
@@ -180,8 +180,8 @@
             </div>
             <h1 class="title">TDB Architecture</h1>
             
-	<p>This page gives an overview of the TDB architecture. Specific
-details refer to TDB 0.8.</p>
+	<p>This page gives an overview of the TDB architecture.
+It applies to TDB1 and TDB2 with differences noted.</p>
 <h2 id="contents">Contents</h2>
 <ul>
 <li><a href="#terminology">Terminology</a></li>
@@ -191,6 +191,7 @@ details refer to TDB 0.8.</p>
 <li><a href="#triple-and-quad-indexes">Triple and Quad indexes</a></li>
 <li><a href="#prefixes-table">Prefixes Table</a></li>
 <li><a href="#tdb-btrees">TDB B+Trees</a></li>
+<li><a href="#tdb-transactions">Transactions</a></li>
 </ul>
 </li>
 <li><a href="#inline-values">Inline values</a></li>
@@ -215,8 +216,7 @@ filing system. A dataset consists of</p>
 <h3 id="the-node-table">The Node Table</h3>
 <p>The node table stores the representation of RDF terms (except for
 inlined value - see below). It provides two mappings from Node to
-NodeId and from NodeId to Node. This is sometimes called a
-dictionary.</p>
+NodeId and from NodeId to Node.</p>
 <p>The Node to NodeId mapping is used during data loading and when
 converting constant terms in queries from their Jena Node
 representation to the TDB-specific internal ids.</p>
@@ -251,18 +251,59 @@ or
 <a href="http://www.w3.org/TeamSubmission/turtle/" title="http://www.w3.org/TeamSubmission/turtle/">Turtle</a>.</p>
 <h3 id="tdb-btrees">TDB B+Trees</h3>
 <p>Many of the persistent data structures in TDB use a custom
-implementation of threaded
+implementation of
 <a href="http://en.wikipedia.org/wiki/B+_tree" title="http://en.wikipedia.org/wiki/B%2B_tree">B+Trees</a>.
 The TDB implementation only provides for fixed length key and fixed
-length value. There is no use of the value part in triple indexes.</p>
-<p>The threaded nature means that long scans of indexes proceeds
-without needing to traverse the branches of the tree.</p>
-<p>See the description of index caching below.</p>
+length value. There is no use of the value part in triple and quads indexes.</p>
+<h3 id="tdb-transactions">Transactions</h3>
+<p>Both TDB1 and TDB2 provide database transactions.
+The API is described on the <a href="/docuemntation/txn/" title="Jena Transactions">Jena Transactions page</a>.</p>
+<p>When running with transactions, TDB1 and TDB2 provide support for multiple read
+and write transactions without application involvement. There will be multiple
+readers active, and also a single writer active (referred to as &ldquo;MR+SW&rdquo;). TDB
+itself manages multiple writers, queuing them as necessary.</p>
+<p>To support transactions, TDB2 uses copy-on-write MVCC data structures internally.</p>
+<p>TDB1 can run non-transactionally but the application is responsible for ensuring
+that there is one writer or several readers, not both. This is referred to as
+&ldquo;MRSW&rdquo;. Misuse of TDB1 in non-transactional mode can corrupt the database.</p>
 <h2 id="inline-values">Inline values</h2>
-<p>Values of certain datatypes are held as part of the NodeId in the
-bottom 56 bits. The top 8 bits indicates the type - external NodeId
-or the value space.</p>
-<p>The value spaces handled are (TDB 0.8): xsd:decimal, xsd:integer,
+<p>Values of certain datatypes are held as part of the NodeId.
+The top bit indicates whether the remaining 63 bits are a position in the stored
+RDF terms file (high bit is 0) or an encoded value (high bit 1).</p>
+<p>By storing the value, the exact lexical form is not recorded. The
+integers 01 and 1 will both be treated as the value 1.</p>
+<h3 id="tdb2">TDB2</h3>
+<p>The TDB2 encoding is as follows:</p>
+<ul>
+<li>High bit (bit 63) 0 means the node is in the object table (PTR).</li>
+<li>High bit (bit 63) 1, bit 62 1: double as 62 bits.</li>
+<li>High bit (bit 63) 1, bit 62 0: 6 bits of type, 56 bits of value.</li>
+</ul>
+<p>If a value would not fit, it will be stored externally so there is no
+guarantee that all integers, say, are store inline.</p>
+<ul>
+<li>Integer format: signed 56 bit number, the type field has the XSD type.</li>
+<li>Derived types of integer, each with their own datatype.</li>
+<li>Decimal format: 8 bits scale, 48bits of signed valued.</li>
+<li>Date and DateTime</li>
+<li>Boolean</li>
+<li>Float</li>
+</ul>
+<p>In the case of xsd:double, the standard Java 64 bit format is used except that the range
+of the exponent is reduced by 2 bits.</p>
+<ul>
+<li>bit  63    : sign bit</li>
+<li>bits 52-62 : exponent, 11 bits, the power of 2, bias -1023.</li>
+<li>bits 0-51  : mantissa (significand) 52 bits (the leading one is not stored).</li>
+</ul>
+<p>Exponents are 11 bits, with values -1022 to +1023 held as 1 to 2046 (11 bits, bias -1023)
+Exponents 0x000 and 0x7ff have a special meaning:</p>
+<p>The xsd:dateTime and xsd:date ranges cover about 8000 years from
+year zero with a precision down to 1 millisecond. Timezone
+information is retained to an accuracy of 15 minutes with special
+timezones for Z and for no explicit timezone.</p>
+<h3 id="tdb1">TDB1</h3>
+<p>The value spaces handled are: xsd:decimal, xsd:integer,
 xsd:dateTime, xsd:date and xsd:boolean. Each has its own encoding
 to fit in 56 bits. If a node falls outside of the range of values
 that can be represented in the 56 bit encoding.</p>
@@ -270,24 +311,22 @@ that can be represented in the 56 bit encoding.</p>
 year zero with a precision down to 1 millisecond. Timezone
 information is retained to an accuracy of 15 minutes with special
 timezones for Z and for no explicit timezone.</p>
-<p>By storing the value, the exact lexical form is not recorded. The
-integers 01 and 1 will both be treated as the value 1.</p>
 <p>Derived XSD datatypes are held as their base type. The exact
-datatype is not retained; the value of the RDF term is.</p>
+datatype is not retained; the value of the RDF term is.
+An input of <code>xsd:int</code> will become <code>xsd:integer</code>.</p>
 <h2 id="query-processing">Query Processing</h2>
-<p>TDB uses the
-<a href="TODO">OpExecutor extension point of ARQ</a>.
+<p>TDB uses quad-execution rewriting SPARQL algebra <code>(graph...)</code> to blocks of quads
+where possible. It extends <code>OpExecutor</code>.
 TDB provides low level optimization of basic graph patterns using a
-<a href="optimizer.html" title="TDB/Optimizer">statistics based optimizer</a>.</p>
+<a href="optimizer.html">statistics based optimizer</a>.</p>
 <h2 id="caching-on-32-and-64-bit-java-systems">Caching on 32 and 64 bit Java systems</h2>
-<p>TDB runs on both 32-bit and 64-bit Java Virtual Machines. The same
-file formats are used on both systems and database files can be
-transferred between architectures (no TDB system should be running
-for the database at the time of copy). What differs is the file
-access mechanism used.</p>
-<p>TDB is faster on a 64 bit JVM because more memory is available for
-file caching.</p>
-<p>The node table caches are always in the Java heap.</p>
+<p>TDB runs on both 32-bit and 64-bit Java Virtual Machines.  A 64-bit Java Virtual
+Machine is the normal mode of use.  The same file formats are used on both
+systems and database files can be transferred between architectures (no TDB
+system should be running for the database at the time of copy). What differs is
+the file access mechanism used.</p>
+<p>The node table caches are always in the Java heap but otherwise the OS file
+system plays an important part in index caching.</p>
 <p>The file access mechanism can be set explicitly, but this is not a
 good idea for production usage, only for experimentation - see the
 <a href="configuration.html#File_Access_Mode" title="TDB/Configuration">File Access mode option</a>.</p>
diff --git a/content/documentation/tdb/tdb_transactions.html b/content/documentation/tdb/tdb_transactions.html
index ea8f571..f94c2bf 100644
--- a/content/documentation/tdb/tdb_transactions.html
+++ b/content/documentation/tdb/tdb_transactions.html
@@ -183,11 +183,11 @@
 	<p>TDB provides
 <a href="http://en.wikipedia.org/wiki/ACID">ACID</a>
 transaction support through the use of
-<a href="http://en.wikipedia.org/wiki/Write-ahead_logging">write-ahead-logging</a>.</p>
-<p>Use of transactions protects a TDB dataset
-against data corruption, unexpected process termination and system crashes and therefore use of transactions is <strong>strongly</strong> recommended.</p>
-<p>This feature is part of version TDB 0.9.0 and later.  Databases created with version of TDB 0.8.X can be used with 0.9.X
-to add transactional capability.</p>
+<a href="http://en.wikipedia.org/wiki/Write-ahead_logging">write-ahead-logging</a> in TDB1
+and copy-on-write MVCC structures in TDB2.</p>
+<p>Use of transactions protects a TDB dataset against data corruption, unexpected
+process termination and system crashes.</p>
+<p>Non-transactional use of TDB1 should be avoided; TDB2 only operates with transactions.</p>
 <h2 id="contents">Contents</h2>
 <ul>
 <li><a href="#overview">Overview</a></li>
@@ -201,17 +201,17 @@ to add transactional capability.</p>
 <li><a href="#multi-threaded-use">Multi-threaded use</a></li>
 <li><a href="#bulk-loading">Bulk loading</a></li>
 <li><a href="#multi-jvm">Multi JVM</a></li>
-<li><a href="#migration-from-tdb-08x">Migration from TDB 0.8.X</a></li>
-<li><a href="#reverting-to-tdb-08x">Reverting to TDB 0.8.X</a></li>
 </ul>
 <h2 id="overview">Overview</h2>
-<p>The transaction mechanism in TDB is based on
-<a href="http://en.wikipedia.org/wiki/Write-ahead_logging">write-ahead-logging</a>.
-All changes made inside a write-transaction are written to
-<a href="http://en.wikipedia.org/wiki/Journaling_file_system">journals</a>,
-then propagated to the main database at a suitable moment. This
-design allows for read-transactions to proceed without locking or
-other overhead over the base database.</p>
+<p>TDB2 uses <a href="https://en.wikipedia.org/wiki/Multiversion_concurrency_control">MVCC</a>
+via a copy-on-write mechanism. Update transactions can be of any size.</p>
+<p>The TDB1 transaction mechanism is based on
+<a href="http://en.wikipedia.org/wiki/Write-ahead_logging">write-ahead-logging</a>.  All
+changes made inside a write-transaction are written to
+<a href="http://en.wikipedia.org/wiki/Journaling_file_system">journals</a>, then propagated
+to the main database at a suitable moment.  Transactions is TDB1 are limited in
+size to a few 10&rsquo;s of million triples because they retain data in-memory until
+indexes can be updated.</p>
 <p>Transactional TDB supports one active write transaction, and
 multiple read transactions at the same time. Read-transactions
 started before a write-transaction commits see the database in a
@@ -231,20 +231,23 @@ transactions, the highest
 <p>(some of these limitations may be removed in later versions)</p>
 <ul>
 <li>Bulk loads: the TDB bulk loader is not transactional</li>
-<li><a href="http://en.wikipedia.org/wiki/Nested_transaction">Nested transactions</a> are not supported.</li>
+<li><a href="http://en.wikipedia.org/wiki/Nested_transaction">Nested transactions</a>
+are not supported.</li>
+</ul>
+<p>TDB2 remved the limitations of TDB1:</p>
+<ul>
 <li>Some active transaction state is held exclusively in-memory,
 limiting scalability.</li>
 <li>Long-running transactions. Read-transactions cause a build-up
 of pending changes;</li>
 </ul>
 <p>If a single read transaction runs for a long time when there are
-many updates, the system will consume a lot of temporary
+many updates, the TDB1 system will consume a lot of temporary
 resources.</p>
 <h2 id="api-for-transactions">API for Transactions</h2>
-<p>TDB supports the general Jena API for transactions on RDF datasets
-(introduced in Jena 2.7.0, ARQ 2.9.0).</p>
-<p>A TDB-backed dataset can be used non-transactionally but once used in a transaction,
-it must be used transactionally after that.</p>
+<p>Ths section uses the primitives of the transaction mechanism.</p>
+<p>Better APIs are described in <a href="/documentation/txn/">the transaction API
+documentation</a>.</p>
 <h3 id="read-transactions">Read transactions</h3>
 <p>These are used for SPARQL queries and code using the Jena API
 actions that do not change the data.  The general pattern is:</p>
@@ -367,31 +370,15 @@ same storage. in both cases, the transactions are independent.</p>
 <p>Multiple applications, running in multiple JVMs, using the same
 file databases is not supported and has a high risk of data corruption.  Once corrupted a database cannot be repaired
 and must be rebuilt from the original source data. Therefore there <strong>must</strong> be a single JVM
-controlling the database directory and files.  From 1.1.0 onwards TDB includes automatic prevention against multi-JVM
+controlling the database directory and files. TDB includes automatic prevention against multi-JVM
 which prevents this under most circumstances.</p>
-<p>Use our <a href="../fuseki2/">Fuseki</a> component to provide a
-database server for multiple applications. Fuseki supports
-<a href="http://www.w3.org/TR/sparql11-query/">SPARQL Query</a>,
-<a href="http://www.w3.org/TR/sparql11-update/">SPARQL Update</a> and the
-<a href="http://www.w3.org/TR/sparql11-http-rdf-update/">SPARQL Graph Store protocol</a>.</p>
+<p>Use <a href="../fuseki2/">Fuseki</a> to provide a database server for multiple
+applications. Fuseki supports <a href="http://www.w3.org/TR/sparql11-query/">SPARQL
+Query</a>, <a href="http://www.w3.org/TR/sparql11-update/">SPARQL
+Update</a> and the <a href="http://www.w3.org/TR/sparql11-http-rdf-update/">SPARQL Graph Store
+protocol</a>.</p>
 <h2 id="bulk-loading">Bulk loading</h2>
-<p>The bulk loader is not transactional.</p>
-<h2 id="migration-from-tdb-08x">Migration from TDB 0.8.X</h2>
-<p>The database files used by TDB 0.9.0 are fully compatible with TDB
-0.8.X; there are no file format changes and application code using
-the interface provided by <code>TDBFactory</code> will continue to work as
-before, without transaction capabilities. The only addition is the
-presence of journal files.</p>
-<p>Transactions use a new API: the <code>TDBFactory</code> API is still present.
-If an application simply uses the TDB 0.9 codebase, it will work as
-before without transactions.</p>
-<p>Applications can start using transaction by coding to the new API.</p>
-<h2 id="reverting-to-tdb-08x">Reverting to TDB 0.8.X</h2>
-<p>A database can be reverted to TDB 0.8.X by running <code>tdb.tdbrecover</code></p>
-<ul>
-<li>this program recovers any committed transaction with pending
-actions. The database can then be used with TDB 0.8.X.</li>
-</ul>
+<p>Bulk loaders are not transactional.</p>
 
 
         </div>
diff --git a/content/index.xml b/content/index.xml
index 7b1f12a..68fd474 100644
--- a/content/index.xml
+++ b/content/index.xml
@@ -1957,8 +1957,8 @@ Setting Store Parameters In TDB, there is exactly one internal object for each d
       <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
       
       <guid>https://jena.apache.org/documentation/tdb/architecture.html</guid>
-      <description>This page gives an overview of the TDB architecture. Specific details refer to TDB 0.8.
-Contents  Terminology Design  The Node Table Triple and Quad indexes Prefixes Table TDB B+Trees   Inline values Query Processing Caching on 32 and 64 bit Java systems  Terminology Terms like &amp;ldquo;table&amp;rdquo; and &amp;ldquo;index&amp;rdquo; are used in this description. They don&amp;rsquo;t directly correspond to concepts in SQL, For example, in SQL terms, there is no triple table; that can be seen as just having indexes for the table or, alternatively, there are 3 tables, each [...]
+      <description>This page gives an overview of the TDB architecture. It applies to TDB1 and TDB2 with differences noted.
+Contents  Terminology Design  The Node Table Triple and Quad indexes Prefixes Table TDB B+Trees Transactions   Inline values Query Processing Caching on 32 and 64 bit Java systems  Terminology Terms like &amp;ldquo;table&amp;rdquo; and &amp;ldquo;index&amp;rdquo; are used in this description. They don&amp;rsquo;t directly correspond to concepts in SQL, For example, in SQL terms, there is no triple table; that can be seen as just having indexes for the table or, alternatively, there are 3 [...]
     </item>
     
     <item>
@@ -2070,10 +2070,10 @@ On 32 bit platforms, TDB uses in-heap caching of file data. In practice, the JVM
       <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
       
       <guid>https://jena.apache.org/documentation/tdb/tdb_transactions.html</guid>
-      <description>TDB provides ACID transaction support through the use of write-ahead-logging.
-Use of transactions protects a TDB dataset against data corruption, unexpected process termination and system crashes and therefore use of transactions is strongly recommended.
-This feature is part of version TDB 0.9.0 and later. Databases created with version of TDB 0.8.X can be used with 0.9.X to add transactional capability.
-Contents  Overview Limitations API for Transactions  Read transactions Write transactions   Multi-threaded use Bulk loading Multi JVM Migration from TDB 0.</description>
+      <description>TDB provides ACID transaction support through the use of write-ahead-logging in TDB1 and copy-on-write MVCC structures in TDB2.
+Use of transactions protects a TDB dataset against data corruption, unexpected process termination and system crashes.
+Non-transactional use of TDB1 should be avoided; TDB2 only operates with transactions.
+Contents  Overview Limitations API for Transactions  Read transactions Write transactions   Multi-threaded use Bulk loading Multi JVM  Overview TDB2 uses MVCC via a copy-on-write mechanism.</description>
     </item>
     
     <item>
diff --git a/content/sitemap.xml b/content/sitemap.xml
index defe19f..f1f55aa 100644
--- a/content/sitemap.xml
+++ b/content/sitemap.xml
@@ -929,7 +929,7 @@
   
   <url>
     <loc>https://jena.apache.org/documentation/tdb/architecture.html</loc>
-    <lastmod>2020-02-28T13:09:12+01:00</lastmod>
+    <lastmod>2022-01-06T14:44:32+00:00</lastmod>
   </url>
   
   <url>
@@ -984,7 +984,7 @@
   
   <url>
     <loc>https://jena.apache.org/documentation/tdb/tdb_transactions.html</loc>
-    <lastmod>2021-11-05T16:04:36+00:00</lastmod>
+    <lastmod>2022-01-06T14:44:32+00:00</lastmod>
   </url>
   
   <url>