You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by an...@apache.org on 2021/11/12 12:33:45 UTC

[jena-site] 01/01: Documentation for xloader

This is an automated email from the ASF dual-hosted git repository.

andy pushed a commit to branch xloader
in repository https://gitbox.apache.org/repos/asf/jena-site.git

commit 5291aa37f945cba24c11aac3658cd3f6cc592a70
Author: Andy Seaborne <an...@apache.org>
AuthorDate: Fri Nov 12 12:33:33 2021 +0000

    Documentation for xloader
---
 source/documentation/tdb/tdb-xloader.md | 51 +++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/source/documentation/tdb/tdb-xloader.md b/source/documentation/tdb/tdb-xloader.md
new file mode 100644
index 0000000..443e18e
--- /dev/null
+++ b/source/documentation/tdb/tdb-xloader.md
@@ -0,0 +1,51 @@
+---
+title: TDB xloader
+---
+
+TDB xloader ("x" for external) is a bulkloader for very large datasets. The goal
+is stability and reliability for long running loading, running on modest and
+
+xloader is not a replacement for regular TDB1 and TDB2 loaders.
+
+"tdb1.xloader" was called "tdbloader2" and has some improvements.
+
+It is not as fast as other TDB loaders on dataset where the general loaders work
+on without encountering progressive slowdown.
+
+The xloaders for TDB1 and TDB2 are not identical. The TDB2 is more capable; it
+is based on the same design approach with further refinements to building the
+node table and to reduce the total amount of temporary file space used.
+
+The xloader does not run on MS Windows. It uses and external sort program from
+unix - `sort(1)`.
+
+The xloader only builds a fresh database from empty.
+It can not be used to load an existing database.
+
+### Running xloader
+
+`tdb2.xloader --loc DIRECTORY` FILE...
+
+or
+
+`tdb1.xloader --loc DIRECTORY` FILE...
+
+Additioally, there is an argument `--tmpdir` to use a different directory for
+temporary files.
+
+`FILE` is any RDF syntax supported by Jena.
+
+### Advice
+
+`xloader` uses a lot of temporary disk space. 
+
+To avoid a load failing due to a syntax or other data error, it is advisable to
+run `riot --check` on the data first. Parsing is faster than loading.
+
+If desired, the data can be converted to [RDF Thrift](../io/rdf-binary.html) at
+this stage by adding `--stream rdf-thrift` to the riot checking run.
+Parsing RDF Thrift is faster than parsing N-Triples although the bulk of the loading process is not limited by parser speed.
+
+
+Do not capture the bulk loader output in a file on the same disk as the database
+or temporary directory; it slows loading down.