You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by ij...@apache.org on 2011/09/13 01:39:35 UTC
svn commit: r1169979 [2/2] - in /incubator/jena/site/trunk: content/jena/documentation/notes/ templates/

Added: incubator/jena/site/trunk/content/jena/documentation/notes/schemagen.mdtext
URL: http://svn.apache.org/viewvc/incubator/jena/site/trunk/content/jena/documentation/notes/schemagen.mdtext?rev=1169979&view=auto
==============================================================================
--- incubator/jena/site/trunk/content/jena/documentation/notes/schemagen.mdtext (added)
+++ incubator/jena/site/trunk/content/jena/documentation/notes/schemagen.mdtext Mon Sep 12 23:39:35 2011
@@ -0,0 +1,850 @@
+Title: Jena schemagen HOWTO
+
+The `schemagen` provided with Jena is used to convert an OWL, DAML
+or RDFS vocabulary into a Java class file that contains static
+constants for the terms in the vocabulary. This documents outlines
+the use of schemagen, and the various options and templates that
+may be used to control the output.
+
+Schemagen is typically invoked from the command line or from a
+built script (such as Ant). Synopsis of the command:
+
+    java jena.schemagen -i <input> [-a <namespaceURI>] [-o <output file>] [-c <config uri>] [-e <encoding>] ...
+
+Schemagen is highly configurable, either with command line options
+or by RDF information read from a configuration file. **Many**
+other options are defined, and these are described in detail below.
+Note that the `CLASSPATH` environment variable must be set to
+include the Jena `.jar` libraries.
+
+## Summary of configuration options
+
+For quick reference, here is a list of all of the schemagen options
+(both command line and configuration file). The use of these
+options is explained in detail below.
+
+Table 1: schemagen options
+
+Command line option | RDF config file property | Meaning
+------------------- | ------------------------ | -------
+-a <uri\> | sgen:namespace | The namespace URI for the vocabulary. Names with this URI as prefix are automatically included in the generated vocabulary. If not specified, the base URI of the ontology is used as a default (butnote that some ontology documents don't define a base URI).
+-c <filename\><br />-c <url\> | | Specify an alternative config file.
+--classdec <string\> | sgen:classdec | Additional decoration for class header (such as `implements`)
+--classnamesuffix <string\> | sgen:classnamesuffix | Option for adding a suffix to the generated class name, e.g. "Vocab".
+--classSection <string\> |  sgen:classSection | Section declaration comment for class section.
+--classTemplate <string\>  | sgen:classTemplate | Template for writing out declarations of class resources.
+--daml | sgen:daml | Specify that the language of the source ontology is DAML+OIL.
+--declarations <string\> | sgen:declarations | Additional declarations to add at the top of the class.
+--dos | sgen:dos | Use MSDOS-style line endings (i.e. \\r\\n). Default is Unix-style line endings.
+-e <string\> | sgen:encoding | The surface syntax of the input file (e.g. RDF/XML, N3). Defaults to RDF/XML.
+--footer <string\> | sgen:footer | Template for standard text to add to the end of the file.
+--header <string\> | sgen:header | Template for the file header, including the class comment.
+-i <filename\> <br />-i <url\> | sgen:input | Specify the input document to load
+--include <uri\> | sgen:include | Option for including non-local URI's in vocabulary
+--individualsSection <string\> | sgen:individualsSection | Section declaration comment for individuals section.
+--individualTemplate <string\> | sgen:individualTemplate | Template for writing out declarations of individuals.
+--inference | sgen:inference | Causes the model that loads the document prior to being processed to apply inference rules appropriate to the language. E.g. OWL inference rules will be used on a `.owl` file.
+--marker <string\> | sgen:marker | Specify the marker string for substitutions, default is '%'
+-n <string\> | sgen:classname | The name of the generated class. The default is to synthesise a name based on input document name.
+--noclasses | sgen:noclasses Option to suppress classes in the generated vocabulary file
+--nocomments | sgen:noComments | Turn off all comment output in the generated vocabulary
+--noheader | sgen:noHeader | Prevent the output of a file header, with class comment etc.
+--noindividuals | sgen:noindividuals | Option to suppress individuals in the generated vocabulary file.
+--noproperties | sgen:noproperties | Option to suppress properties in the generated vocabulary file.
+-o <filename\>  <br /> -o <dir\> | sgen:output | Specify the destination for the output. If the given value evaluates to a directory, the generated class will be placed in that directory with a file name formed from the generated (or given) class name with ".java" appended.
+--nostrict | sgen:noStrict | Option to turn off strict checking for ontology classes and properties (prevents `ConversionExceptions`).
+--ontology | sgen:ontology | The generated vocabulary will use the ontology API terms, inpreference to RDF model API terms.
+--owl | sgen:owl | Specify that the language of the source is OWL (the default). Note that RDFS is a subset of OWL, so this setting also suffices for RDFS.
+--package <string\> | sgen:package | Specify the Java package name and directory.
+--propSection <string\> | sgen:propSection | Section declaration comment for properties section.
+--propTemplate <string\> | sgen:propTemplate | Template for writing out declarations of property resources.
+-r <uri\> | | Specify the uri of the root node in the RDF configuration model.
+--rdfs | sgen:rdfs | Specify that the language of the source ontology is RDFS.
+--strictIndividuals | sgen:strictIndividuals | When selecting the individuals to include in the output class, schemagen will normally include those individuals whose `rdf:type` is in the included namespaces for the vocabulary. However, if `strictIndividuals` is turned on, then all individuals in theoutput class must themselves have a URI in the included namespaces.
+--uppercase  | sgen:uppercase Option for mapping constant names to uppercase (like Java constants). Default is to leave the case of names unchanged.
+
+## What does schemagen do?
+
+RDFS, OWL and DAML+OIL provide a very convenient means to define a
+controlled vocabulary or ontology. For general ontology processing,
+Jena provides various API's to allow the source files to be read in
+and manipulated. However, when developing an application, it is
+frequently convenient to refer to the controlled vocabulary terms
+directly from Java code. This leads typically to the declaration of
+constants, such as:
+
+        public static final Resource A_CLASS = new ResourceImpl( "http://example.org/schemas#a-class" );
+
+When these constants are defined manually, it is tedious and
+error-prone to maintain them in synch with the source ontology
+file. Schemagen automates the production of Java constants that
+correspond to terms in an ontology document. By automating the step
+from source vocabulary to Java constants, a source of error and
+inconsistency is removed.
+
+### Example
+
+Perhaps the easiest way to explain the detail of what schemagen
+does is to show an example. Consider the following mini-RDF
+vocabulary:
+
+    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+     Â Â Â Â Â Â Â Â xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
+     Â Â Â Â Â Â Â Â Â Â Â xmlns="http://example.org/eg#"
+     Â Â Â Â Â Â Â Â xml:base="http://example.org/eg">
+     Â <rdfs:Class rdf:ID="Dog">
+     Â Â Â Â Â <rdfs:comment>A class of canine companions</rdfs:comment>
+     Â </rdfs:Class>
+     Â <rdf:Property rdf:ID="petName">
+     Â Â Â Â Â <rdfs:comment>The name that everyone calls a dog</rdfs:comment>
+     Â Â Â Â Â <rdfs:domain rdf:resource="http://example.org/eg#Dog" />
+     Â </rdf:Property>
+     Â <rdf:Property rdf:ID="kennelName">
+     Â Â Â Â Â <rdfs:comment>Posh dogs have a formal name on their KC certificate</rdfs:comment>
+     Â </rdf:Property>
+     Â <Dog rdf:ID="deputy">
+     Â Â Â Â Â <rdfs:comment>Deputy is a particular Dog</rdfs:comment>
+     Â Â Â Â Â <kennelName>Deputy Dawg of Chilcompton</kennelName>
+     Â </Dog>
+    </rdf:RDF>
+
+We process this document with a command something like:
+`Java jena.schemagen -i deputy.rdf -a http://example.org/eg#`
+to produce the following generated class:
+
+    /* CVS $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $ */
+
+    import com.hp.hpl.jena.rdf.model.*;
+
+    /**
+     * Vocabulary definitions from deputy.rdf
+     * @author Auto-generated by schemagen on 01 May 2003 21:49
+     */
+    public class Deputy {
+     Â Â Â /** <p>The RDF model that holds the vocabulary terms</p> */
+     Â Â Â private static Model m_model = ModelFactory.createDefaultModel();
+     Â Â Â 
+     Â Â Â /** <p>The namespace of the vocabulary as a string {@value}</p> */
+     Â Â Â public static final String NS = "http://example.org/eg#";
+     Â Â Â 
+     Â Â Â /** <p>The namespace of the vocabulary as a resource {@value}</p> */
+     Â Â Â public static final Resource NAMESPACE = m_model.createResource( "http://example.org/eg#" );
+     Â Â Â 
+     Â Â Â /** <p>The name that everyone calls a dog</p> */
+     Â Â Â public static final Property petName = m_model.createProperty( "http://example.org/eg#petName" );
+     Â Â Â 
+     Â Â Â /** <p>Posh dogs have a formal name on their KC certificate</p> */
+     Â Â Â public static final Property kennelName = m_model.createProperty( "http://example.org/eg#kennelName" );
+     Â Â Â 
+     Â Â Â /** <p>A class of canine companions</p> */
+     Â Â Â public static final Resource Dog = m_model.createResource( "http://example.org/eg#Dog" );
+     Â Â Â 
+     Â Â Â /** <p>Deputy is a particular Dog</p> */
+     Â Â Â public static final Resource deputy = m_model.createResource( "http://example.org/eg#deputy" );
+     Â Â Â 
+    }
+
+Some things to note in this example. All of the named classes,
+properties and individuals from the source document are translated
+to Java constants (below we show how to be more selective than
+this). The properties of the named resources are *not* translated:
+schemagen is for giving access to the names in the vocabulary or
+schema, not to perform a general translation of RDF to Java. The
+RDFS comments from the source code are translated to Javadoc
+comments. Finally, we no longer directly call `new ResourceImpl`:
+this idiom is no longer recommended by the Jena team.
+
+We noted earlier that schemagen is highly configurable. One
+additional argument generates a vocabulary file that uses Jena's
+ontology API, rather than the RDF model API. We change `rdfs:Class`
+to `owl:Class`, and invoke
+`Java jena.schemagen -i deputy.rdf -b http://example.org/eg# --ontology`
+to get:
+
+    /* CVs $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $ */
+
+    import com.hp.hpl.jena.rdf.model.*;
+    import com.hp.hpl.jena.ontology.*;
+    /**
+     * Vocabulary definitions from deputy.rdf
+     * @author Auto-generated by schemagen on 01 May 2003 22:03
+     */
+    public class Deputy {
+     Â Â Â /** <p>The ontology model that holds the vocabulary terms</p> */
+     Â Â Â private static OntModel m_model = ModelFactory.createOntologyModel( ProfileRegistry.OWL_LANG );
+     Â Â Â 
+     Â Â Â /** <p>The namespace of the vocabulary as a string {@value}</p> */
+     Â Â Â public static final String NS = "http://example.org/eg#";
+     Â Â Â 
+     Â Â Â /** <p>The namespace of the vocabulary as a resource {@value}</p> */
+     Â Â Â public static final Resource NAMESPACE = m_model.createResource( "http://example.org/eg#" );
+     Â Â Â 
+     Â Â Â /** <p>The name that everyone calls a dog</p> */
+     Â Â Â public static final Property petName = m_model.createProperty( "http://example.org/eg#petName" );
+     Â Â Â 
+     Â Â Â /** <p>Posh dogs have a formal name on their KC certificate</p> */
+     Â Â Â public static final Property kennelName = m_model.createProperty( "http://example.org/eg#kennelName" );
+     Â Â Â 
+     Â Â Â /** <p>A class of canine companions</p> */
+     Â Â Â public static final OntClass Dog = m_model.createClass( "http://example.org/eg#Dog" );
+     Â Â Â 
+     Â Â Â /** <p>Deputy is a particular Dog</p> */
+     Â Â Â public static final Individual deputy = m_model.createIndividual( Dog, "http://example.org/eg#deputy" );
+     Â Â Â 
+    }
+
+## General principles
+
+In essence, schemagen will load a single vocabulary file (imports
+processing is switched off in DAML and OWL), and generate a Java
+class that contains static constants for the named classes,
+properties and instances of the vocabulary. Most of the generated
+components of the output Java file can be controlled by option
+flags, and formatted with a template. Default templates are
+provided for all elements, so the minimum amount of necessary
+information is actually very small.
+
+Options can be specified on the command line (when invoking
+schemagen), or may be preset in an RDF file. Any mixture of command
+line and RDF option specification is permitted. Where a given
+option is specified both in an RDF file and on the command line,
+the command line setting takes precedence. Thus the options in the
+RDF file can be seen as defaults.
+
+### Specifying command line options
+
+To specify a command line option, add its name (and optional value)
+to the command line when invoking the schemagen tool. E.g:
+`Java jena.schemagen -i myvocab.owl --ontology --uppercase`
+
+### Specifying options in an RDF file
+
+To specify an option in an RDF file, create a resource of type
+`sgen:Config`, with properties corresponding to the option names
+listed in Table 1. The following fragment shows a small options
+file. A complete example configuration file is shown in
+[appendix A](#appendixA).
+
+By default, schemagen will look for a configuration file named
+`schemagen.rdf` in the current directory. To specify another
+configuration, use the `-c` option with a URL to reference the
+configuration. Multiple configurations (i.e. multiple `sgen:Config`
+nodes) can be placed in one RDF document. In this case, each
+configuration node must be named, and the URI specified in the `-r`
+command line option. If there is no `-r` option, schemagen will
+look for a node of type `rdf:type sgen:Config`. If there are
+multiple such nodes in the model, it is indeterminate which one
+will be used.
+
+### Using templates
+
+We have several times referred to a template being used to
+construct part of the generated file. What is a template? Simply
+put, it is a fragment of output file. Some templates will be used
+at most once (for example the file header template), some will be
+used many times (such as the template used to generate a class
+constant). In order to make the templates adaptable to the job
+they're doing, before it is written out a template has
+*keyword substitution* performed on it. This looks for certain
+keywords delimited by a pair of special characters (% by default),
+and replaces them with the current binding for that keyword. Some
+keyword bindings stay the same throughout the processing of the
+file, and some are dependent on the language element being
+processed. The substitutions are:
+
+Table 2: Substitutable keywords in templates
+
+Keyword  | Meaning | Typical value
+------- | -------- | -------------
+classname The name of the Java class being generated | Automatically defined from the document name, or given with the `-n` option
+date | The date and time the class was generated
+imports | The Java imports for this class
+nl | The newline character for the current platform
+package  | The Java package name | As specified by an option. The option just gives the package name, schemagen turns the name into a legal Java statement.
+sourceURI | The source of the document being processed | As given by the `-i` option or in the config file.
+valclass | The Java class of the value being defined | E.g. Property for vocabulary properties, Resource for classes in RDFS, or OntClass for classes using the ontology API
+valcreator | The method used to generate an instance of the Java representation | E.g. `createResource` or `createClass`
+valname | The name of the Java constant being generated | This is generated from the name of the resource in the source file, adjusted to be a legal Java identifier. By default, this will preserve the case of the RDF constant, but setting `--uppercase` will map all constants to upper-case names (a common convention in Java code).
+valtype | The rdf:type for an individual | The class name or URI used when creating an individual in the ontology API
+valuri | The full URI of the value being defined | From the RDF, without adjustment.
+
+## Details of schemagen options
+
+**TODO watermark **
+
+We now go through each of the configuration options in detail.
+
+**Note**: for brevity, we assume a standard prefix `sgen` is
+defined for resource URI's in the schemagen namespace. The
+expansion for `sgen` is:
+`http://jena.hpl.hp.com/2003/04/schemagen#`, thus:
+
+    xmlns:sgen="http://jena.hpl.hp.com/2003/04/schemagen#"
+
+### Note on legal Java identifiers
+
+Schemagen will attempt to ensure that all generated code will
+compile as legal Java. Occasionally, this means that identifiers
+from input documents, which are legal components of RDF URI
+identifiers, have to be modified to be legal Java identifiers.
+Specifically, any character in an identifier name that is not a
+legal Java identifier character will be replaced with the character
+'\_' (underscore). Thus the name '`trading-price`' might become
+`'trading_price`'. In addition, Java requires that identifiers be
+distinct. If a name clash is detected (for example, `trading-price`
+and `trading+price` both map to the same Java identifier),
+schemagen will add disambiguators to the second and subsequent
+uses. These will be based on the role of the identifier; for
+example property names are disambiguated by appending `_PROPn` for
+increasing values of `n`. In a well-written ontology, identifiers
+are typically made distinct for clarity and ease-of-use by the
+ontology users, so the use of the disambiguation tactic is rare.
+Indeed, it may be taken as a hint that refactoring the ontology
+itself is desirable.
+
+### Specifying the configuration file
+
+Command line
+-c <*config-file-path*\>
+-c <*config-file-URL*\>
+Config file
+n/a
+The default configuration file name is `schemagen.rdf` in the
+current directory. To specify a different configuration file,
+either as a file name on the local file system, or as a URL (e.g.
+an `http:` address), the config file location is passed with the
+`-c` option. If no `-c` option is given, and there is no
+configuration file in the current directory, schemagen will
+continue and use default values (plus the other command line
+options) to configure the tool. If a file name or URL is given with
+`-c`, and that file cannot be located, schemagen will stop with an
+error.
+
+Schemagen will assume the the language encoding of the
+configuration file is implied by the filename/ULRL suffix: ".n3"
+means N3, ".nt" means NTRIPLES, ".rdf" and ".owl" mean "RDF/XML".
+By default it assumes RDF/XML.
+
+### Specifying the configuration root in the configuration file
+
+Command line
+-r <*config-root-URI*\>
+Config file
+n/a
+It is possible to have more than one set of configuration options
+in one configuration file. If there is only one set of
+configuration options, schemagen will locate the root by searching
+for a resource of rdf:type sgen:Config. If there is more than one,
+and no root is specified on the command line, it is not specified
+which set of configuration options will be used. The root URI given
+as a command line option must match exactly with the URI given in
+the configuration file. For example:
+
+    Java jena.schemagen -c config/localconf.rdf -r http://example.org/sg#project1
+
+matches:
+
+    ...
+     <sgen:Config rdf:about="http://example.org/SG#project1">
+       ....
+     </sgen:Config>
+
+### Specifying the input document
+
+Command line
+-i <*input-file-path*\>
+-i <*input-URL*\>
+Config file
+<sgen:input rdf:resource="*inputURL*" /\>
+The only mandatory argument to schemagen is the input document to
+process. This can be specified in the configuration file, though
+this does, of course, mean that the same configuration cannot be
+applied to multiple different input files for consistency. However,
+by specifying the input document in the default configuration file,
+schemagen can easily be invoked with the minimum of command line
+typing. For other means of automating schemagen, see
+[using schemagen with Ant](#ant).
+
+### Specifying the output location
+
+Command line
+-o <*input-file-path*\>
+-o <*output-dir*\>
+Config file
+<sgen:output
+rdf:datatype="&xsd;string"\>*output-path-or-dir*</sgen:output\>
+Schemagen must know where to write the generated Java file. By
+default, the output is written to the standard output. Various
+options exist to change this. The output location can be specified
+either on the command line, or in the configuration file. If
+specified in the configuration file, the resource must be a string
+literal, denoting the file path. If the path given resolves to an
+existing directory, then it is assumed that the output will be
+based on the [name](#class_name) of the generated class (i.e. it
+will be the class name with Java appended). Otherwise, the path is
+assumed to point to a file. Any existing file that has the given
+path name will be overwritten.
+
+By default, schemagen will create files that have the Unix
+convention for line-endings (i.e. '\\n'). To switch to DOS-style
+line endings, use `--dos`.
+
+Command line
+--dos
+Config file
+<sgen:dos rdf:datatype="&xsd;boolean"\>true</sgen:dos\>
+
+
+### Specifying the class name
+
+Command line
+-n <*class-name*\>
+Config file
+<sgen:classname
+rdf:datatype="&xsd;string"\>*classname*</sgen:classname\>
+By default, the name of the class will be based on the name of the
+input file. Specifically, the last component of the input
+document's path name, with the prefix removed, becomes the class
+name. By default, the initial letter is adjusted to a capital to
+conform to standard Java usage. Thus `file:vocabs/trading.owl`
+becomes `Trading.java`. To override this default algorithm, a class
+name specified by `-n` or in the config file is used exactly as
+given.
+
+Sometimes it is convenient to have all vocabulary files
+distinguished by a common suffix, for example `xyzSchema.java` or
+`xyzVocabs.java`. This can be achieved by the classname-suffix
+option:
+
+Command line
+--classnamesuffix <*suffix*\>
+Config file
+<sgen:classnamesuffix
+rdf:datatype="&xsd;string"\>*suffix*</sgen:classnamesuffix\>
+
+
+See also the [note on legal Java identifiers](#java_ids), which
+applies to generated class names.
+
+
+
+### Specifying the vocabulary namespace
+
+Command line
+-a <*namespace-URI*\>
+Config file
+<sgen:namespace
+rdf:datatype="&xsd;string"\>*namespace*</sgen:namespace\>
+Since ontology files are often modularised, it is not the case that
+all of the resource names appearing in a given document are being
+defined by that ontology. They may appear simply as part of the
+definitions of other terms. Schemagen assumes that there is one
+primary namespace for each document, and it is names from that
+namespace that will appear in the generated Java file.
+
+In an OWL or DAML+OIL ontology, this namespace is computed by
+finding the owl:Ontology or daml:Ontology element, and using its
+namespace as the primary namespace of the ontology. This may not be
+available (it is not, for example, a part of RDFS) or correct, so
+the namespace may be specified directly with the `-a` option or in
+the configuration file.
+
+###
+
+Schemagen does not, in the present version, permit more than one
+primary namespace per generated Java class. However, constants from
+namespaces other than the primary namespace may be included in the
+generated Java class by the include option:
+
+Command line
+--include <*namespace-URI*\>
+Config file
+<sgen:include
+rdf:datatype="&xsd;string"\>*namespace*</sgen:include\>
+
+
+The include option may repeated multiple times to include a variety
+of constants from other namespaces in the output class.
+
+###
+
+Since OWL and RDFS ontologies may include individuals that are
+named instances of declared classes, schemagen will include
+individuals among the constants that it generates in Java. By
+default, an individual will be included if its class has a URI that
+is in one of the permitted namespaces for the vocabulary, even if
+the individual itself is not in that namespace. If the option
+`strictIndividuals` is set, individuals are **only** included if
+they have a URI that is in the permitted namespaces for the
+vocabulary.
+
+Command line
+--strictIndividuals
+Config file
+<sgen:strictIndividuals /\>
+
+
+### Specifying the syntax (encoding) of the input document
+
+Command line
+-e <*encoding*\>
+Config file
+<sgen:encoding
+rdf:datatype="&xsd;string"\>*encoding*</sgen:encoding\>
+Jena can parse a number of different presentation syntaxes for RDF
+documents, including RDF/XML, N3 and NTRIPLE. By default, the
+encoding will be derived from the name of the input document (e.g.
+a document `xyz.n3` will be parsed in N3 format), or, if the
+extension is non-obvious the default is RDF/XML. The encoding, and
+hence the parser, to use on the input document may be specified by
+the encoding configuration option.
+
+### Choosing the style of the generated class: ontology or plain RDF
+
+Command line
+--ontology
+Config file
+<sgen:ontology
+rdf:datatype="&xsd;boolean"\>*true or false*</sgen:ontology\>
+By default, the Java class generated by schemagen will generate
+constants that are plain RDF Resource, Property or Literal
+constants. When working with OWL, DAML, or RDFS ontologies, it may
+be more convenient to have constants that are OntClass,
+ObjectProperty, DatatypeProperty and Individual Java objects. To
+generate these ontology constants, rather than plain RDF constants,
+set the ontology configuration option.
+
+Furthermore, since Jena can handle input ontologies in DAML+OIL,
+OWL (the default), and RDFS, it is necessary to be able to specify
+which language is being processed. This will affect both the
+parsing of the input documents, and the language profile selected
+for the constants in the generated Java class.
+
+Command line
+--daml
+Config file
+<sgen:daml rdf:datatype="&xsd;boolean"\>true</sgen:daml\>
+
+
+Command line
+--owl
+Config file
+<sgen:owl rdf:datatype="&xsd;boolean"\>true</sgen:owl\>
+
+
+Command line
+--rdfs
+Config file
+<sgen:rdfs rdf:datatype="&xsd;boolean"\>true</sgen:owl\>
+Prior to Jena 2.2, schemagen used a Jena model to load the input
+document that also applied some *rules of inference* to the input
+data. So, for example, a resource that is mentioned as the
+`owl:range` of a property can be inferred to be
+`rdf:typeÂ owl:Class`, and hence listed in the class constants in
+the generated Java class, even if that fact is not directly
+asserted in the input model. From Jena 2.2 onwards, this option is
+now **off by default**. If correct handling of an input document by
+schemagen requires the use of inference rules, this must be
+specified by the `inference` option. In particular, some DAML+OIL
+input files may require the use of this option, to ensure that
+synonyms such as `daml:Class` and `rdfs:Class` are recognised
+correctly.
+
+
+
+Command line
+--inference
+Config file
+<sgen:inference rdf:datatype="&xsd;boolean"\>true</sgen:owl\>
+
+
+### Specifying the Java package
+
+Command line
+--package <*package-name*\>
+Config file
+<sgen:package
+rdf:datatype="&xsd;string"\>*package-name*</sgen:package\>
+By default, the Java class generated by schemagen will not be in a
+Java package. Set the package configuration option to specify the
+Java package name. **Change from Jena 2.6.4-SNAPSHOT onwards:**
+Setting the package name will affect the directory into which the
+generated class will be written: directories will be appended to
+the [output directory](#output) to match the Java package.
+
+### Additional decorations on the main class declaration
+
+Command line
+--classdec <*class-declaration*\>
+Config file
+<sgen:classdec
+rdf:datatype="&xsd;string"\>*class-declaration*</sgen:classdec\>
+In some applications, it may be convenient to add additional
+information to the declaration of the Java class, for example that
+the class implements a given interface (such as
+`java.lang.Serializable`). Any string given as the value of the
+class-declaration option will be written immediately after
+"`publicÂ classÂ <i>ClassName</i>`".
+
+### Adding general declarations within the generated class
+
+Command line
+--declarations <*declarations*\>
+Config file
+<sgen:declarations
+rdf:datatype="&xsd;string"\>*declarations*</sgen:declarations\>
+Some more complex vocabularies may require access to static
+constants, or other Java objects or factories to fully declare the
+constants defined by the given templates. Any text given by the
+declarations option will be included in the generated class after
+the class declaration but before the body of the declared
+constants. The value of the option should be fully legal Java code
+(though the [template](#templates) substitutions will be performed
+on the code). Although this option can be declared as a command
+line option, it is typically easier to specify as a value in a
+configuration options file.
+
+### Omitting sections of the generated vocabulary
+
+Command line
+--noclasses
+--noproperties
+--noindividuals
+Config file
+<sgen:noclassses
+rdf:datatype="&xsd;boolean"\>true</sgen:noclassses\>
+<sgen:noproperties
+rdf:datatype="&xsd;boolean"\>true</sgen:noproperties\>
+<sgen:noindividuals
+rdf:datatype="&xsd;boolean"\>true</sgen:noindividuals\>
+By default, the vocabulary class generated from a given ontology
+will include constants for each of the included classes, properties
+and individuals in the ontology. To omit any of these groups, use
+the corresponding *noXYZ* configuration option. For example,
+specifying `--noproperties` means that the generated class will not
+contain any constants corresponding to predicate names from the
+ontology, irrespective of what is in the input document.
+
+### Section header comments
+
+Command line
+--classSection *<section heading\>*
+--propSection*<section heading\>*
+--individualSection *<section heading*\>
+--header *<file header section\>*
+--footer *<file footer section\>*
+Config file
+<sgen:classSection
+rdf:datatype="&xsd;string"\>*section heading*</sgen:classSection\>
+<sgen:propSection
+rdf:datatype="&xsd;string"\>*section heading*</sgen:propSection\>
+<sgen:individualSection
+rdf:datatype="&xsd;string"\>*section heading*</sgen:individualSection\>
+
+<sgen:header
+rdf:datatype="&xsd;string"\>*file header*</sgen:header\>
+<sgen:footer
+rdf:datatype="&xsd;string"\>*file footer*</sgen:footer\>
+Some coding styles use block comments to delineate different
+sections of a class. These options allow the introduction of
+arbitrary Java code, though typically this will be a comment block,
+at the head of the sections of class constant declarations,
+property constant declarations, and individual constant
+declarations.
+
+## Using schemagen with Ant
+
+[Apache Ant](http://ant.apache.org/) is a tool for automating build
+steps in Java (and other language) projects. For example, it is the
+tool used to compile the Jena sources to the jena.jar file, and to
+prepare the Jena distribution prior to download. Although it would
+be quite possible to create an Ant *taskdef* to automate the
+production of Java classes from input vocabularies, we have not yet
+done this. Nevertheless, it is straightforward to use schemagen
+from an ant build script, by making use of Ant's built-in Java
+task, which can execute an arbitrary Java program.
+
+The following example shows a complete ant target definition for
+generating ExampleVocab.java from example.owl. It ensures that the
+generation step is only performed when example.owl has been updated
+more recently than ExampleVocab.java (e.g. if the definitions in
+the owl file have recently been changed).
+
+      <!-- properties -->
+      <property name="vocab.dir"       value="src/org/example/vocabulary" />
+      <property name="vocab.template"  value="${rdf.dir}/exvocab.rdf" />
+      <property name="vocab.tool"      value="jena.schemagen" />
+
+      <!-- Section: vocabulary generation -->
+      <target name="vocabularies" depends="exVocab" />
+
+      <target name="exVocab.check">
+        <uptodate
+           property="exVocab.nobuild"
+           srcFile="${rdf.dir}/example.owl"
+           targetFile="${vocab.dir}/ExampleVocab.java" />
+      </target>
+
+      <target name="exVocab" depends="exVocab.check" unless="exVocab.nobuild">
+        <Java classname="${vocab.tool}" classpathref="classpath" fork="yes">
+          <arg value="-i" />
+          <arg value="file:${rdf.dir}/example.owl" />
+          <arg value="-c" />
+          <arg value="${vocab.template}" />
+          <arg value="--classnamesuffix" />
+          <arg value="Vocab" />
+          <arg value="--include" />
+          <arg value="http://example.org/2004/01/services#" />
+          <arg value="--ontology" />
+        </Java>
+      </target>
+
+Clearly it is up to each developer to find the appropriate balance
+between options that are specified via the command line options,
+and those that are specified in the configuration options file
+(`exvocab.rdf` in the above example). This is not the only, nor
+necessarily the "right" way to use schemagen from Ant, but if it
+points readers in the appropriate direction to produce a custom
+target for their own application it will have served its purpose.
+
+## Appendix A: Complete example configuration file
+
+The source of this example is provided in the Jena download as
+`etc/schemagen.rdf`. For clarity, RDF/XML text is highlighted in
+blue.
+
+    <?xml version='1.0'?>
+
+    <!DOCTYPE rdf:RDF [
+     Â Â Â <!ENTITY jena Â Â Â 'http://jena.hpl.hp.com/'>
+
+     Â Â Â <!ENTITY rdf Â Â Â Â 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
+     Â Â Â <!ENTITY rdfs Â Â Â 'http://www.w3.org/2000/01/rdf-schema#'>
+     Â Â Â <!ENTITY owl Â Â Â Â 'http://www.w3.org/2002/07/owl#'>
+     Â Â Â <!ENTITY xsd Â Â Â Â 'http://www.w3.org/2001/XMLSchema#'>
+     Â Â Â <!ENTITY base Â Â Â '&jena;2003/04/schemagen'>
+     Â Â Â <!ENTITY sgen Â Â Â '&base;#'>
+    ]>
+
+    <rdf:RDF
+     Â xmlns:rdf Â Â ="&rdf;"
+     Â xmlns:rdfs Â ="&rdfs;"
+     Â xmlns:owl Â Â ="&owl;"
+     Â xmlns:sgen  ="&sgen;"
+     Â xmlns Â Â Â Â Â Â ="&sgen;"
+     Â xml:base Â Â Â ="&base;"
+    >
+
+    <!--
+        Example schemagen configuration for use with jena.schemagen
+     Â Â Â Not all possible options are used in this example, see Javadoc and Howto for full details.
+
+        Author: Ian Dickinson, mailto:ian.dickinson@hp.com
+        CVs: Â Â Â $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $
+    -->
+
+    <sgen:Config>
+     Â Â Â <!-- specifies that the Â source document uses OWL -->
+     Â Â Â <sgen:owl rdf:datatype="&xsd;boolean">true</sgen:owl>
+
+     Â Â Â <!-- specifies that we want the generated vocab to use OntClass, OntProperty, etc, not Resource and Property -->
+     Â Â Â <sgen:ontology rdf:datatype="&xsd;boolean">true</sgen:ontology>
+
+     Â Â Â <!-- specifies that we want names mapped to uppercase (as standard Java constants) -->
+     Â Â Â <sgen:uppercase rdf:datatype="&xsd;boolean">true</sgen:uppercase>
+
+     Â Â Â <!-- append Vocab to class name, so input beer.owl becomes BeerVocab.java -->
+     Â Â Â <sgen:classnamesuffix rdf:datatype="&xsd;string">Vocab</sgen:classnamesuffix>
+
+     Â Â Â <!-- the Java package that the vocabulary is in -->
+     Â Â Â <sgen:package rdf:datatype="&xsd;string">com.example.vocabulary</sgen:package>
+
+     Â Â Â <!-- the directory or file to write the results out to -->
+     Â Â Â <sgen:output rdf:datatype="&xsd;string">src/com/example/vocabulary</sgen:output>
+
+     Â Â Â <!-- the template for the file header -->
+    <sgen:header rdf:datatype="&xsd;string">/*****************************************************************************
+     * Source code information
+     * -----------------------
+     * Original author Â Â Â Jane Smart, example.com
+     * Author email Â Â Â Â Â Â jane.smart@example.com
+     * Package Â Â Â Â Â Â Â Â Â Â Â @package@
+     * Web site Â Â Â Â Â Â Â Â Â Â @website@
+     * Created Â Â Â Â Â Â Â Â Â Â Â %date%
+     * Filename Â Â Â Â Â Â Â Â Â Â $RCSfile: schemagen.html,v $
+     * Revision Â Â Â Â Â Â Â Â Â Â $Revision: 1.16 $
+     * Release status Â Â Â Â @releaseStatus@ $State: Exp $
+     *
+     * Last modified on Â Â $Date: 2010-06-11 00:08:23 $
+     * Â Â Â Â Â Â Â Â Â Â Â Â Â Â by Â Â $Author: ian_dickinson $
+     *
+     * @copyright@
+     *****************************************************************************/
+
+
+    // Package
+    ///////////////////////////////////////
+    %package%
+
+
+    // Imports
+    ///////////////////////////////////////
+    %imports%
+
+
+
+    /**
+     * Vocabulary definitions from %sourceURI%
+     * @author Auto-generated by schemagen on %date%
+     */</sgen:header>
+
+    <!-- the template for the file footer (note @footer@ is an Ant-ism, and will not be processed by SchemaGen) -->
+    <sgen:footer rdf:datatype="&xsd;string">
+    /*
+    @footer@
+    */
+    </sgen:footer>
+
+    <!-- template for extra declarations at the top of the class file -->
+    <sgen:declarations rdf:datatype="&xsd;string">
+     Â Â Â /** Factory for generating symbols */
+     Â Â Â private static KsValueFactory s_vf = new DefaultValueFactory();
+    </sgen:declarations>
+
+    <!-- template for introducing the properties in the vocabulary -->
+    <sgen:propSection rdf:datatype="&xsd;string">
+     Â Â Â // Vocabulary properties
+     Â Â Â ///////////////////////////
+    </sgen:propSection>
+
+    <!-- template for introducing the classes in the vocabulary -->
+    <sgen:classSection rdf:datatype="&xsd;string">
+     Â Â Â // Vocabulary classes
+     Â Â Â ///////////////////////////
+    </sgen:classSection>
+
+    <!-- template for introducing the individuals in the vocabulary -->
+    <sgen:individualsSection rdf:datatype="&xsd;string">
+     Â Â Â // Vocabulary individuals
+     Â Â Â ///////////////////////////
+    </sgen:individualsSection>
+
+    <!-- template for doing fancy declarations of individuals -->
+    <sgen:individualTemplate rdf:datatype="&xsd;string">public static final KsSymbol %valname% = s_vf.newSymbol( "%valuri%" );
+
+     Â Â Â /** Ontology individual corresponding to {@link #%valname%} */
+     Â Â Â public static final %valclass% _%valname% = m_model.%valcreator%( %valtype%, "%valuri%" );
+    </sgen:individualTemplate>
+
+    </sgen:Config>
+
+    </rdf:RDF>
+
+
+* * * * *
+
+CVS $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian\_dickinson
+Exp $
+
+
+

Added: incubator/jena/site/trunk/content/jena/documentation/notes/typed-literals.mdtext
URL: http://svn.apache.org/viewvc/incubator/jena/site/trunk/content/jena/documentation/notes/typed-literals.mdtext?rev=1169979&view=auto
==============================================================================
--- incubator/jena/site/trunk/content/jena/documentation/notes/typed-literals.mdtext (added)
+++ incubator/jena/site/trunk/content/jena/documentation/notes/typed-literals.mdtext Mon Sep 12 23:39:35 2011
@@ -0,0 +1,401 @@
+Title:Typed literals how-to
+
+## What are typed literals?
+
+In the original RDF specifications there were two types of literal
+values defined - plain literals (which are basically strings with
+an optional language tag) and XML literals (which are more or less
+plain literals plus a "well-formed-xml" flag).
+
+Part of the remit for the current
+[RDF Core](http://www.w3.org/2001/sw/RDFCore/) working group was to
+add to RDF support for typed values, i.e. things like numbers. At
+the time of writing the core specification for these has been
+published in the last call documents though some modifications to
+the way xml:Lang tags are treated have been proposed in response to
+last call comments.
+
+These notes describe the support for typed literals built into
+Jena2 at present. Some of the details have been changed recently in
+response to the recent working group decisions. We will now attempt
+to keep the API as stable as we can, unless some major shifting in
+the specifications occurs.
+
+Before going into the Jena details here are some informal reminders
+of how typed literals work in RDF. We refer readers to the RDF core
+[semantics](http://www.w3.org/TR/rdf-mt/),
+[syntax](http://www.w3.org/TR/rdf-syntax-grammar) and
+[concepts](http://www.w3.org/TR/rdf-concepts/) documents for more
+precise details.
+
+In RDF, typed literal values comprise a string (the lexical form of
+the literal) and a datatype (identified by a URI). The datatype is
+supposed to denote a mapping from lexical forms to some space of
+values. The pair comprising the literal then denotes an element of
+the value space of the datatype. For example, a typed literal
+comprising ("true", xsd:boolean) would denote the abstract true
+value T.
+
+In the RDF/XML syntax typed literals are notated with syntax such
+as:
+
+    <age rdf:datatype="http://www.w3.org/2001/XMLSchema#int">13</age>
+
+In NTriple syntax the notation is:
+
+    "13"^^<http://www.w3.org/2001/XMLSchema#int>
+
+and this `^^` notation will appear in literals printed by Jena.
+
+Note that a literal is either typed or plain (an old style literal)
+and which it is can be determined statically. There is no way to
+define a literal as having a lexical value of, say "13" but leave
+its datatype open and then infer the datatype from some schema or
+ontology definition.
+
+In the new scheme of things well-formed XML literals are treated as
+typed literals whose datatype is the special type
+"rdf:XMLLiteral".
+
+## Basic API operations
+
+Jena2 will correctly parse typed literals within RDF/XML, NTriple
+and N3 source files. The same Java object,
+`<a href="../javadoc/com/hp/hpl/jena/rdf/model/Literal.html">Literal</a>`,
+will represent "plain" and "typed" literals. Literal now supports
+some new methods:
+
+`getDatatype()`
+Returns null for a plain literal or a Java object which represents
+the datatype of a typed Literal.
+`getDatatypeURI()`
+Returns null for a plain literal or the URI of the datatype of a
+typed Literal.
+`getValue()`
+Returns a Java object representing the value of the literal, for
+example for an xsd:int this will be a java.lang.Integer, for plain
+literals it will be a String.
+The converse operation of creating a Java object to represent a
+typed literal in a model can be achieved using:
+
+> `model.createTypedLiteral`(value, datatype)
+
+This allows the `value` to be specified by a lexical form (i.e. a
+String) or by a Java object representing the typed value; the
+`datatype` can be specified by a URI string or a Java object
+representing the datatype.
+
+In addition there is a built in mapping from standard Java wrapper
+objects to XSD datatypes (see later) so that the simpler call:
+
+> `model.createTypedLiteral(Object)`
+
+will create a typed literal with the datatype appropriate for
+representing that java object. For example,
+
+    Literal l = model.createTypedLiteral(new Integer(25));
+
+will create a typed literal with the lexical value "25", of type
+xsd:int.
+
+Note that there are also functions which look similar but do not
+use typed literals. For example::
+
+    Literal l = model.createLiteral(25);
+    int age = l.getInt();
+
+These worked by converting the primitive to a string and storing
+the resulting string as a plain literal. The inverse operation then
+attempts to parse the string of the plain literal (as an int in
+this example). These are for backward compability with earlier
+versions of Jena and older datasets. In normal circumstances
+`createTypedLiteral` is preferable.
+
+### Equality issues
+
+There is a well defined notion of when two typed literals should be
+equal, based on the equality defined for the datatype in question.
+Jena2 implements this equality function by using the method
+`sameValueAs`. Thus two literals ("13", xsd:int) and ("13",
+xsd:decimal) will test as sameValueAs each other but neither will
+test sameValueAs ("13", xsd:string).
+
+Note that this is a different function from the Java `equals`
+method. Had we changed the equals method to test for semantic
+equality problems would have arisen because the two objects are not
+substitutable in the Java sense (for example they return different
+values from a getDatatype() call). This would, for example, have
+made it impossible to cache literals in a hash table.
+
+## How datatypes are represented
+
+Datatypes for typed literals are represented by instances of the
+interface
+[`com.hp.hpl.jena.datatypes.RDFDatatype`](../javadoc/com/hp/hpl/jena/datatypes/RDFDatatype.html).
+Instances of this interface can be used to parse and serialized
+typed data, test for equality and test if a typed or lexical value
+is a legal value for this datatype.
+
+Prebuilt instances of this interface are included for all the main
+XSD datatypes (see [below](#xsd)).
+
+In addition, it is possible for an application to define new
+datatypes and register them against some URI (see
+[below](#userdef)).
+
+### Error detection
+
+When Jena parses a datatype whose lexical value is not legal for
+the declared datatype is does not immediately throw an error. This
+is because the RDFCore working group has defined that illegal
+datatype values are errors but are not syntactic errors so we try
+to avoid have parsers break at this point. Instead a literal is
+created which is marked internally as ill-formed and the first time
+an application attempts to access its value (with `getValue()`) an
+error will be thrown.
+
+When Jena is reading a file there is also the issue of what to do
+when it encounters a typed value whose datatype URI is not one that
+is knows about. The default behaviour is to create a new datatype
+object (whose value space is the same as its lexical space). Again
+this behaviour seems in keeping with the working group preference
+that illegal datatypes are semantic but not syntactic errors.
+
+However, both of these behaviours can mean that simple common
+errors (such as mis-spelling the xsd namespace) may go unnoticed
+untill very late on. To overcome this we have hidden some global
+switches that allow you to force Jena to report such syntactic
+errors earlier. These are static Boolean parameters:
+
+    com.hp.hpl.jena.shared.impl.JenaParameters.enableEagerLiteralValidation
+    com.hp.hpl.jena.shared.impl.JenaParameters.enableSilentAcceptanceOfUnknownDatatypes
+
+They are placed here in an impl package (and thus only visible in
+the full javadoc, not the API javadoc) because they should not be
+regarded as stable. We plan to develop a cleaner way of setting
+mode switches for Jena and these switches will migrate there in due
+course, if they prove to be useful.
+
+## XSD data types
+
+Jena includes prebuilt, and pre-registered, instances of
+`RDFDatatype` for all of the relevant XSD types:
+
+> float double int long short byte unsignedByte unsignedShort
+> unsignedInt unsignedLong decimal integer nonPositiveInteger
+> nonNegativeInteger positiveInteger negativeInteger Boolean string
+> normalizedString anyURI token Name QName language NMTOKEN ENTITIES
+> NMTOKENS ENTITY ID NCName IDREF IDREFS NOTATION hexBinary
+> base64Binary date time dateTime duration gDay gMonth gYear
+> gYearMonth gMonthDay
+
+These are all available as static member variables from
+[`com.hp.hpl.jena.datatypes.xsd.XSDDatatype`](../javadoc/com/hp/hpl/jena/datatypes/xsd/XSDDatatype.html).
+
+Of these types, the following are registered as the default type to
+use to represent certain Java classes:
+
+Java class
+xsd type
+Float
+float
+Double
+double
+Integer
+int
+Long
+long
+Short
+short
+Byte
+byte
+BigInteger
+integer
+BigDecimal
+decimal
+Boolean
+Boolean
+String
+string
+Thus when creating a typed literal from a Java `BigInteger` then
+`xsd:integer` will be used. The converse mapping is more adaptive.
+When parsing an xsd:integer the Java value object used will be an
+Integer, Long or BigInteger depending on the size of the specific
+value being represented.
+
+## User defined XSD data types
+
+XML schema allows derived types to be defined in which a base type
+is modified through some facet restriction such as limiting the
+min/max of an integer or restricting a string to a regular
+expression. It also allows new types to be created by unioning
+other types or by constructing lists of other types.
+
+Jena provides support for derived and union types but not for list
+types.
+
+These are supported through the `XSDDatatype.loadUserDefined`
+method which allows an XML schema datatype file to be loaded. This
+registers a new `RDFDatatype` that can be used to create, parse,
+serialize, test instances of that datatype.
+
+There is one difficult issue in here, what URI to give to the user
+defined datatype? This is not defined by XML Schema, nor RDF nor
+OWL. Jena2 adopts the position taken by DAML that the defined
+datatype should have the base URI of the schema file with a
+fragment identifier given by the datatype name.
+
+Thus the DAML example
+file`http://www.daml.org/2001/03/daml+oil-ex-dt` (a corrected copy
+of which is stored in `$JENA/testing/xsd/daml+oil-ex-dt.xsd`, where
+`$JENA` is your Jena install directory) defines several types such
+as "over12". The following code fragment will load this file and
+register the newly defined types:
+
+    String uri = "http://www.daml.org/2001/03/daml+oil-ex-DT";
+    String filename = "../jena2/testing/xsd/daml+oil-ex-dt.xsd";
+    TypeMapper tm = TypeMapper.getInstance();
+    List typenames = XSDDatatype.loadUserDefined(uri, new FileReader(filename), null, TM);
+    System.out.println("Defined types are:");
+    for (Iterator i = typenames.iterator(); i.hasNext(); ) {
+        System.out.println(" - " + i.next());
+    }
+
+it produces the following output:
+
+    Defined types are:
+     - http://www.daml.org/2001/03/daml+oil-ex-DT#XSDEnumerationHeight
+     - http://www.daml.org/2001/03/daml+oil-ex-DT#over12
+     - http://www.daml.org/2001/03/daml+oil-ex-DT#over17
+     - http://www.daml.org/2001/03/daml+oil-ex-DT#over59
+     - http://www.daml.org/2001/03/daml+oil-ex-DT#clothingsize
+
+To illustrate working with the defined types, the following code
+then tries to create and use two instances of the over 12 type:
+
+    Model m = ModelFactory.createDefaultModel();
+    RDFDatatype over12Type = tm.getSafeTypeByName(uri + "#over12");
+    Object value = null;
+    try {
+        value = "15";
+        m.createTypedLiteral((String)value, over12Type).getValue();
+        System.out.println("Over 12 value of " + value + " is ok");
+        value = "12";
+        m.createTypedLiteral((String)value, over12Type).getValue();
+        System.out.println("Over 12 value of " + value + " is OK");
+    } catch (DatatypeFormatException e) {
+        System.out.println("Over 12 value of " + value + " is illegal");
+    }
+
+which products the output:
+
+    Over 12 value of 15 is OK
+    Over 12 value of 12 is illegal
+
+## User defined non-XSD data types
+
+RDF allows any URI to be used as a datatype but provides no
+standard for how to map the datatype URI to a datatype definition.
+
+Within Jena2 we allow new datatypes to be created and registered by
+using the
+[`TypeMapper`](../javadoc/com/hp/hpl/jena/datatypes/TypeMapper.html)
+class.
+
+The easiest way to define a new RDFDatatype is to subclass
+BaseDatatype and define implementations for parse, unparse and
+isEqual.
+
+For example here is the outline of a type used to represent
+rational numbers:
+
+    class RationalType extends BaseDatatype {
+        public static final String theTypeURI = "urn:x-hp-dt:rational";
+        public static final RDFDatatype theRationalType = new RationalType();
+
+        /** private constructor - single global instance */
+        private RationalType() {
+            super(theTypeURI);
+        }
+
+        /**
+         * Convert a value of this datatype out
+         * to lexical form.
+         */
+        public String unparse(Object value) {
+            Rational r = (Rational) value;
+            return Integer.toString(r.getNumerator()) + "/" + r.getDenominator();
+        }
+
+        /**
+         * Parse a lexical form of this datatype to a value
+         * @throws DatatypeFormatException if the lexical form is not legal
+         */
+        public Object parse(String lexicalForm) throws DatatypeFormatException {
+            int index = lexicalForm.indexOf("/");
+            if (index == -1) {
+                throw new DatatypeFormatException(lexicalForm, theRationalType, "");
+            }
+            try {
+                int numerator = Integer.parseInt(lexicalForm.substring(0, index));
+                int denominator = Integer.parseInt(lexicalForm.substring(index+1));
+                return new Rational(numerator, denominator);
+            } catch (NumberFormatException e) {
+                throw new DatatypeFormatException(lexicalForm, theRationalType, "");
+            }
+        }
+
+        /**
+         * Compares two instances of values of the given datatype.
+         * This does not allow rationals to be compared to other number
+         * formats, Lang tag is not significant.
+         */
+        Public Boolean isEqual(LiteralLabel value1, LiteralLabel value2) {
+            return value1.getDatatype() == value2.getDatatype()
+                 && value1.getValue().equals(value2.getValue());
+        }
+    }
+
+To register and and use this type you simply need the call:
+
+    RDFDatatype rtype = RationalType.theRationalType;
+    TypeMapper.getInstance().registerDatatype(rtype);
+    ...
+    // Create a rational literal
+    Literal l1 = m.createTypedLiteral("3/5", rtype);
+
+Note that whilst any serialization of RDF containing such user
+defined literals will be perfectly legal a client application has
+no standard way of looking up the datatype URI you have chosen.
+This has to be done "out of band" as they say.
+
+## A note on xml:Lang
+
+Plain literals have an xml:Lang tag as well as a string value. Two
+plain literals with the same string but different Lang tags are not
+equal.
+
+XML Schema states that xml:Lang is not meaningful on xsd
+datatypes.
+
+Thus for almost all typed literals there is no xml:Lang tag.
+
+At the time of last call the RDF specifications allowed the special
+case that `rdf:XMLLiteral`s could have a Lang tag that would be
+significant in equality testing. Thus in preview releases of Jena2
+the createTypedLiterals calls took an extra Lang tag argument.
+
+However, at the time of writing that specification has been changed
+so that Lang tags will never be significant on typed literals
+(whether this means that xml:Lang is not significant on XMLLiterals
+or means that XMLLiteral will cease to be a typed literal is not
+completely certain).
+
+For this reason we have removed the Lang tag from the
+createTypedLiterals calls and deprecated the createLiteral call
+which allowed both wellFormedXML and Lang tag to be specified.
+
+We do not expect to need to change the API even if the working
+group decision changes again, the most we might expect to do would
+be to undeprecate the 3-argument version of createLiteral.
+

Modified: incubator/jena/site/trunk/templates/sidenav.mdtext
URL: http://svn.apache.org/viewvc/incubator/jena/site/trunk/templates/sidenav.mdtext?rev=1169979&r1=1169978&r2=1169979&view=diff
==============================================================================
--- incubator/jena/site/trunk/templates/sidenav.mdtext (original)
+++ incubator/jena/site/trunk/templates/sidenav.mdtext Mon Sep 12 23:39:35 2011
@@ -48,6 +48,8 @@
   - [Assembler](/jena/documentation/assembler/index.html)
     - [Assembler how-to](/jena/documentation/assembler/assembler-howto.html)
     - [Inside assemblers](/jena/documentation/assembler/inside-assemblers.html)
+  - [Notes](/jena/documentation/notes/index.html)
+    - [Concurrency how-to](/jena/documentation/assembler/concurrency-howto.html)
   - [Tools](/jena/documentation/tools/index.html)
     - [schemagen](/jena/documentation/tools/schemagen.html)