You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by an...@apache.org on 2021/11/15 17:05:36 UTC

[jena-site] branch more-doc created (now 8ea1cfb)

This is an automated email from the ASF dual-hosted git repository.

andy pushed a change to branch more-doc
in repository https://gitbox.apache.org/repos/asf/jena-site.git.


      at 8ea1cfb  Query building substitution()

This branch includes the following new commits:

     new 018c0a8  More on xloader
     new 8ea1cfb  Query building substitution()

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


[jena-site] 02/02: Query building substitution()

Posted by an...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

andy pushed a commit to branch more-doc
in repository https://gitbox.apache.org/repos/asf/jena-site.git

commit 8ea1cfb48a416119238be0d43658555c3569f9d6
Author: Andy Seaborne <an...@apache.org>
AuthorDate: Mon Nov 15 17:04:25 2021 +0000

    Query building substitution()
---
 .gitignore                                   |  3 +-
 source/documentation/query/arq-query-eval.md |  1 -
 source/documentation/sparql-apis/__index.md  | 47 +++++++++++++++++++++++-----
 3 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/.gitignore b/.gitignore
index 3fb554e..2495e8f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,6 +3,7 @@
 target/
 
 .vscode
+.#*
 .hugo*
 
 # IntelliJ generated
@@ -11,4 +12,4 @@ target/
 .java-version
 
 # Hugo
-.hugo_build.lock
\ No newline at end of file
+.hugo_build.lock
diff --git a/source/documentation/query/arq-query-eval.md b/source/documentation/query/arq-query-eval.md
index 4527ea3..51e4ceb 100644
--- a/source/documentation/query/arq-query-eval.md
+++ b/source/documentation/query/arq-query-eval.md
@@ -446,7 +446,6 @@ custom query engine and overriding `QueryEngineMain.modifyOp`:
       @Override
       protected Op modifyOp(Op op)
       {
-         // Cope with initial bindings.
          op = Substitute.substitute(op, initialInput) ;
          // Use standard optimizations.
          op = super.modifyOp(op) ;
diff --git a/source/documentation/sparql-apis/__index.md b/source/documentation/sparql-apis/__index.md
index d26d8d3..67e148f 100644
--- a/source/documentation/sparql-apis/__index.md
+++ b/source/documentation/sparql-apis/__index.md
@@ -3,7 +3,7 @@ title: Apache Jena SPARQL APIs
 slug: index
 ---
 
-Jump to "[Changes](#changes)".
+Jump to the "[Changes](#changes)" section.
 
 ## Overview
 
@@ -89,12 +89,10 @@ objects have been removed.
 
 * Deprecate modifying `QueryExecution` after it is built.
 
-* Parameterization for remote queries.
-  Parameterization - replacing variables by values before sending
-  a query - makes the query into a template. The same applies to updates.
-  This is also provided uniformly for local queries and should be used in
-  preference to the local-only "initial binding" approach which is
-  similarly but not identical.
+* Substitution of variables for concrete values in query and update execution.
+  This is a form of paramterization that works in both local and remnote usage
+  (unlike "intial bindings" which are only available for lcoal query execution).
+  See the [substitution section](#substitution) section below.
 
 * `HttpOp`, using `java.net.http.HttpClient`, is split into `HttpRDF` for
   GET/POST/PUT/DELETE of graphs and datasets and new `HttpOp` for packaged-up
@@ -109,6 +107,41 @@ objects have been removed.
 `ModelStore` are the replacement for remote operations. `RDFConnection` and
 `RDFLink` provide APIs.
 
+## Substitution
+
+All query and update builders provide operations to uses a query and substitute
+variables for concrete RDF terms in the execution  
+
+Unlike "initial bindings" substitution is provided in query and update builders
+for both local and remote cases. 
+
+Substitution is always "replace variable with RDF term" in a query or update
+that is correct syntax. This means is does not apply to `INSERT DATA` or `DELETE
+DATA` but can be used with `INSERT { ?s ?p ?o } WHERE {}` and 
+`DELETE { ?s ?p ?o } WHERE {}`.
+
+Full example:
+[ExQuerySubstitute_01.java](https://github.com/afs/jena/tree/main/jena-arq/src-examples/arq/examples/ExQuerySubstitute_01.java).
+
+``` 
+    ResultSet resultSet1 = QueryExecution.dataset(dataset)
+            .query(prefixes+"SELECT * { ?person foaf:name ?name }")
+            .substitution("name", name1)
+            .select();
+    ResultSetFormatter.out(resultSet1);
+```
+
+Substitution is to be preferred over "initial bindings" because it is clearly
+defined and applies to both query and update in both local and remote uses.
+
+"Substitution" and "initial bindings" are similar but not identical.
+
+See also 
+* [Parameterized Queries](documentation/query/parameterized-sparql-strings.html) 
+* [Jena Query Builder](https://jena.apache.org/documentation/extras/querybuilder/index.html)
+
+which provide a different ways to build a query.
+
 ## <tt>RDFConnection</tt>
 
 [RDFConnection](../rdfconnection/)

[jena-site] 01/02: More on xloader

Posted by an...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

andy pushed a commit to branch more-doc
in repository https://gitbox.apache.org/repos/asf/jena-site.git

commit 018c0a824a738cd9fd8732b5d62cd70a00744525
Author: Andy Seaborne <an...@apache.org>
AuthorDate: Mon Nov 15 17:03:29 2021 +0000

    More on xloader
---
 source/documentation/tdb/commands.md    | 53 +++++++++++++++++++--------------
 source/documentation/tdb/faqs.md        | 26 ++++++++++++++--
 source/documentation/tdb/tdb-xloader.md | 29 ++++++++++--------
 3 files changed, 71 insertions(+), 37 deletions(-)

diff --git a/source/documentation/tdb/commands.md b/source/documentation/tdb/commands.md
index baa2e27..9a9af88 100644
--- a/source/documentation/tdb/commands.md
+++ b/source/documentation/tdb/commands.md
@@ -13,7 +13,7 @@ title: TDB Command-line Utilities
 -   [TDB Commands](#tdb-commands)
     -   [Store description](#store-description)
     -   [tdbloader](#tdbloader)
-    -   [tdbloader2](#tdbloader2)
+    -   [TDB xloader](#tdb-xloader)
     -   [tdbquery](#tdbquery)
     -   [tdbdump](#tdbdump)
     -   [tdbstats](#tdbstats)
@@ -98,10 +98,37 @@ are loaded into the dataset according to the name or the default graph.
 Bulk loader and index builder. Performs bulk load operations more
 efficiently than simply reading RDF into a TDB-back model.
 
+### tdb.xloader
+
+`tdb1.xloader` and `tdb2.xloader` are bulk loaders for very large data for TDB1
+and TDB2.
+
+See [TDB xloader](./tdb-xloader.html) for more information. These loaders only
+work on Linux and Mac OS/X since it relies on some Unix system utilities.
+
+### `tdbquery`
+
+Invoke a SPARQL query on a store. Use `--time` for timing
+information. The store is attached on each run of this command so
+timing includes some overhead not present in a running system.
+
+Details about query execution can be obtained -- see notes on the
+[TDB Optimizer](optimizer.html#investigating-what-is-going-on).
+
+### `tdbdump`
+
+Dump the store in
+[N-Quads](http://www.w3.org/TR/n-quads/)
+format.
+
+### `tdbstats`
+
+Produce a statistics for the dataset. See the
+[TDB Optimizer description.](optimizer.html#statistics-rule-file).
+
 ### `tdbloader2`
 
-Bulk loader and index builder. Faster than `tdbloader` but only works
-on Linux and Mac OS/X since it relies on some Unix system utilities.
+*This has been replace by  [TDB xloader](./tdb-xloader.html).*
 
 This bulk loader can only be used to create a database. It may
 overwrite existing data. It requires accepts the `--loc` argument and a
@@ -130,23 +157,3 @@ If you are building a large dataset (i.e. gigabytes of input data) you may
 wish to have the [PipeViewer](http://www.ivarch.com/programs/pv.shtml)
 tool installed on your system as this will provide extra progress information 
 during the indexing phase of the build.
-
-### `tdbquery`
-
-Invoke a SPARQL query on a store. Use `--time` for timing
-information. The store is attached on each run of this command so
-timing includes some overhead not present in a running system.
-
-Details about query execution can be obtained -- see notes on the
-[TDB Optimizer](optimizer.html#investigating-what-is-going-on).
-
-### `tdbdump`
-
-Dump the store in
-[N-Quads](http://www.w3.org/TR/n-quads/)
-format.
-
-### tdbstats
-
-Produce a statistics for the dataset. See the
-[TDB Optimizer description.](optimizer.html#statistics-rule-file).
diff --git a/source/documentation/tdb/faqs.md b/source/documentation/tdb/faqs.md
index b7f9f19..e7479c8 100644
--- a/source/documentation/tdb/faqs.md
+++ b/source/documentation/tdb/faqs.md
@@ -4,6 +4,8 @@ title: TDB FAQs
 
 ## FAQs
 
+
+-   [What are TDB1 and TDB2?](#tdv1-tdb2)
 -   [Does TDB support Transactions?](#transactions)
 -   [Can I share a TDB dataset between multiple applications?](#multi-jvm)
 -   [What is the *Impossibly Large Object* exception?](#impossibly-large-object)
@@ -18,6 +20,15 @@ title: TDB FAQs
 -   [What is the *Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database* error?](#tdb2-lock)
 -   [My question isn't answered here?](#not-answered)
 
+<a name="tdb1-tdb2></a>
+## TDB1 and TDB2
+
+TDB2 is a later generation of database for Jena. It is more robust and can
+handle large update transactions.
+
+These are different databases systems - the have different on-disk file formats
+and databases for one are not compatible with other database engine.
+
 <a name="transactions"></a>
 ## Does TDB support transactions?
 
@@ -37,11 +48,11 @@ transactionally.
 ## Can I share a TDB dataset between multiple applications?
 
 Multiple applications, running in multiple JVMs, using the same
-file databases is **not** supported and has a high risk of data corruption.  Once corrupted a database cannot be repaired
+file databases is **not** supported and has a high risk of data corruption.  Once corrupted, a database cannot be repaired
 and must be rebuilt from the original source data. Therefore there **must** be a single JVM
 controlling the database directory and files.
 
-From 1.1.0 onwards TDB includes automatic prevention of multi-JVM usage which prevents this under most circumstances and helps
+TDB includes automatic prevention of multi-JVM usage which prevents this under most circumstances and helps
 protect your data from corruption.
 
 If you wish to share a TDB dataset between applications use our [Fuseki](../fuseki2/) component which provides a 
@@ -77,11 +88,22 @@ As noted above to resolve this problem you **must** rebuild your database from t
 be repaired. This is why we **strongly** recommend you use [transactions](tdb_transactions.html) since this protects your dataset against 
 corruption.
 
+## What is `tdb.xloader`?
+
+`tdb1.xloader` and `tdb2.xloader` are bulk laodrs for very large dataset that
+take several hours to load.
+
+See [TDB xloader](./tdb-xloader.html) for more information.
+
 <a name="tdbloader-vs-tdbloader2"></a>
 ## What is the different between `tdbloader` and `tdbloader2`?
 
+`tdbloader2` has been replaced by `tdb1.xloader` and `tdb2.xloader` for TDB1 and TDB2 respectively.
+
+
 `tdbloader` and `tdbloader2` differ in how they build databases.
 
+
 `tdbloader` is Java based and uses the same TDB APIs that you would use in your own Java code to perform the data load.  The advantage of this is that
 it supports incremental loading of data into a TDB database.  The downside is that the loader will be slower for initial database builds.
 
diff --git a/source/documentation/tdb/tdb-xloader.md b/source/documentation/tdb/tdb-xloader.md
index 443e18e..c6feaec 100644
--- a/source/documentation/tdb/tdb-xloader.md
+++ b/source/documentation/tdb/tdb-xloader.md
@@ -7,16 +7,19 @@ is stability and reliability for long running loading, running on modest and
 
 xloader is not a replacement for regular TDB1 and TDB2 loaders.
 
-"tdb1.xloader" was called "tdbloader2" and has some improvements.
+There are two scripts to load data using the xlaoder subsystem.
+
+"tdb1.xloader", which was called "tdbloader2" and has some improvements.
 
 It is not as fast as other TDB loaders on dataset where the general loaders work
 on without encountering progressive slowdown.
 
-The xloaders for TDB1 and TDB2 are not identical. The TDB2 is more capable; it
-is based on the same design approach with further refinements to building the
-node table and to reduce the total amount of temporary file space used.
+The xloaders for TDB1 and TDB2 are not identical. The TDB2 xlaoder is more
+capable; it is based on the same design approach with further refinements to
+building the node table and to reduce the total amount of temporary file space
+used.
 
-The xloader does not run on MS Windows. It uses and external sort program from
+The xloader does not run on MS Windows. It uses an external sort program from
 unix - `sort(1)`.
 
 The xloader only builds a fresh database from empty.
@@ -30,22 +33,24 @@ or
 
 `tdb1.xloader --loc DIRECTORY` FILE...
 
-Additioally, there is an argument `--tmpdir` to use a different directory for
+Additionally, there is an argument `--tmpdir` to use a different directory for
 temporary files.
 
-`FILE` is any RDF syntax supported by Jena.
+`FILE` is any RDF syntax supported by Jena. Syntax is detemined by file
+extension and can include an addtional ".gz" or ".bz2" for compresses files.
 
 ### Advice
 
-`xloader` uses a lot of temporary disk space. 
-
 To avoid a load failing due to a syntax or other data error, it is advisable to
 run `riot --check` on the data first. Parsing is faster than loading.
 
-If desired, the data can be converted to [RDF Thrift](../io/rdf-binary.html) at
-this stage by adding `--stream rdf-thrift` to the riot checking run.
-Parsing RDF Thrift is faster than parsing N-Triples although the bulk of the loading process is not limited by parser speed.
+The TDB databases will take up a lot of disk space and in addition during
+loading `xloader` uses a significant amout of temporary disk space.
 
+If desired, the data can be converted to [RDF Thrift](../io/rdf-binary.html) at
+this stage by adding `--stream rdf-thrift` to the riot checking run.  Parsing
+RDF Thrift is faster than parsing N-Triples although the bulk of the loading
+process is not limited by parser speed.
 
 Do not capture the bulk loader output in a file on the same disk as the database
 or temporary directory; it slows loading down.