You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Olivier Mesnard <ol...@cea.fr> on 2011/09/13 14:23:29 UTC
side effect of com.hp.hpl.jena.rdf.arp.DOM2Model.load() is incomplete
Dear Everyone,
I have some difficulties with the load() operation of class
com.hp.hpl.jena.rdf.arp.DOM2Model.
The operation seems to performs incompletely its task and I don't know what is
wrong with my code. perhaps I miss some options of the parser.
I use jaxb to create Java classes for my model (I enclose the xsd file which
describes the model).
This model contains Documents, Resources, Annotation, PieceOfKnowlede and
Text.
A Resource is an abstract class which represents an object with an URI and
which can hold annotations.
Annotation (or PieceOfKnowledge) encapsulate Rdf triples as a literal within
an element data of type xsd:anyType.
A MediaUnit is a Resource.
A Text is a specialized MediaUnit with some content of type string.
To complete the model, a Document is a Resource and is composed of several
MediaUnit.
So, my application has to manage some XML documents which contains text with
embeded annotations about that text as XML serialized RDF. I choose to decode
these annotations as a com.hp.hpl.jena.rdf.model.Model to be able to make some
query on it.
I try unsuccessfully to load this model from the document with the D2Model
class.
Here is the code:
====================================
package sampleforjena;
// file IO
import java.io.File;
import java.io.IOException;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
// data model
import model.MediaUnit;
import model.Annotation;
import model.Resource;
import model.Document;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.JAXBElement;
import javax.xml.transform.stream.StreamSource;
import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.rdf.arp.DOM2Model;
public class RdfIo {
private
com.hp.hpl.jena.rdf.model.Model _model;
....
// read XML Serialized document file and extract RDF model
public void decodeSampleDocFile(File annotatedRegularDoc ) {
Resource resource = new Resource();
// Unmarshall the XML document
try {
JAXBContext jContext = JAXBContext.newInstance( "model" );
Unmarshaller unmarshaller = jContext.createUnmarshaller();
JAXBElement<Resource> jroot = unmarshaller.unmarshal(new StreamSource(
annotatedRegularDoc), Resource.class);
resource = (Resource)jroot.getValue();
} catch (JAXBException e) {
System.out.println("RdfIo.decodeDocFile: Error!!! unable to read
annotatedWebLabDoc...");
e.printStackTrace();
}
// Create a model
_model = com.hp.hpl.jena.rdf.model.ModelFactory.createDefaultModel();
Document d = (Document)resource;
// get XML serialized rdf triples from annotation in document
for (MediaUnit mu: ((Document)resource).getMediaUnit()) {
for (Annotation annot: mu.getAnnotation()) {
org.apache.xerces.dom.ElementImpl dataElement =
(org.apache.xerces.dom.ElementImpl)annot.getData();
try
{
System.out.println("==rdfModelFromDomElement: createD2M ...");
// DOM2Model.createD2M("", _model).load(dataElement);
DOM2Model arp = DOM2Model.createD2M("", _model);
arp.allowRelativeURIs();
arp.load(dataElement);
}
catch (org.xml.sax.SAXParseException e)
{
e.printStackTrace();
}
// To check if the whole set of rdf statements have been loaded
// print the size of the rdf store
String annotSize = Integer.toString(_model.getGraph().size());
System.out.println("RDFIO.decodeDocFile: annot contains " +
annotSize + " triples");
}
}
}
....
}
====================================
A run time, I have some errors like
13 sept. 2011 13:36:08 com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler
warning
ATTENTION: unknown-source: {W104} Unqualified typed nodes are not allowed.
Type treated as a relative URI.
13 sept. 2011 13:36:09 com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler
error
GRAVE: unknown-source: {E205} rdf:RDF is not allowed as an element tag here.
13 sept. 2011 13:36:09 com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler
error
GRAVE: unknown-source: {E201} Multiple children of property element
That I cannot explain, but the program is no interrupted.
It behaves as if the complete model is not loaded (only 3 triples are read) ,
although the full data seems to be contained within the dataElement variable.
I must add that my documents seems correct because when I isolate the RDF part
as a XML serialized, I can perfectly read the whole set of statements (9
triples are read) with the following decodeRdfFile() operation:
// test reading XML Serialized RDF File
public void decodeRdfFile(File serializedRdf ) {
FileInputStream is;
try {
is = new FileInputStream(serializedRdf);
_model = com.hp.hpl.jena.rdf.model.ModelFactory.createDefaultModel();
_model.read(is,null);
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
}
Another remark is that I have problem with the following operation, when I try
to "pretty print" the model, which do not ease the debugging task....
// test writing XML Serialized RDF model
public void writeModel() {
try
{
//print
java.io.FileOutputStream of = new
java.io.FileOutputStream("fffModel.xml");
_model.write(of);
}
catch(IOException ie) {
ie.printStackTrace();
}
Exception in thread "main" com.hp.hpl.jena.shared.BadURIException: Only well-
formed absolute URIrefs can be included in RDF/XML output: <d> Code:
58/REQUIRED_COMPONENT_MISSING in SCHEME: A component that is required by the
scheme is missing.
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.checkURI(BaseXMLWriter.java:768)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.xmlnsDecl(BaseXMLWriter.java:300)
at com.hp.hpl.jena.xmloutput.impl.Basic.writeRDFHeader(Basic.java:56)
at com.hp.hpl.jena.xmloutput.impl.Basic.writeBody(Basic.java:39)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.writeXMLBody(BaseXMLWriter.java:452)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:424)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:410)
at com.hp.hpl.jena.rdf.model.impl.ModelCom.write(ModelCom.java:270)
at sampleforjena.RdfIo.writeModel(RdfIo.java:112)
Sorry for the long message.
Olivier Mesnard
Re: side effect of com.hp.hpl.jena.rdf.arp.DOM2Model.load() is incomplete
Posted by Olivier Mesnard <ol...@cea.fr>.
Thankyou Damian,
I did not know the option.
Unfortunately, I tried it and it does not change the result.
I agree with your second advise and I follow it:
I already tried to pass in the DOM node corresponding to the sub-tree to the
load operation, but I did not navigate in my object model down to the right
level:
Document d = (Document)resource;
for (MediaUnit mu: ((Document)resource).getMediaUnit()) {
for (Annotation annot: mu.getAnnotation()) {
org.apache.xerces.dom.ElementImpl dataElement =
(org.apache.xerces.dom.ElementImpl)annot.getData();
// load only the sub tree <data><rdf:RDF ...
arp.load(dataElement); // wrong!
I just add the following line:
org.w3c.dom.Node rdfNode = dataElement.getFirstChild();
And now
// load only the sub tree <rdf:RDF ...
arp.load(rdfNode); // allright!
I reach the right level and it works! I now reads the 9 triples.
Thanks
Olivier
Le mardi 13 septembre 2011 15:01:26, Damian Steer a écrit :
> On 13 Sep 2011, at 13:23, Olivier Mesnard wrote:
> > Dear Everyone,
>
> Hi Olivier,
>
> > I have some difficulties with the load() operation of class
> > com.hp.hpl.jena.rdf.arp.DOM2Model.
> > The operation seems to performs incompletely its task and I don't know
> > what is wrong with my code. perhaps I miss some options of the parser.
>
> I think there is an option for this
>
> > Here is the code:
>
> <snip>
>
> > DOM2Model arp = DOM2Model.createD2M("", _model);
> > arp.allowRelativeURIs();
>
> Try:
>
> arp.getOptions().setEmbedding(true);
>
> From the javadoc: [1]
>
> "Sets whether the XML document is only RDF, or contains RDF embedded in
> other XML. The default is non-embedded mode. Embedded mode also matches
> RDF documents that use the rdf:RDF tag at the top-level. Non-embeded mode
> matches RDF documents which omit that optional tag, and consist of a
> single rdf:Description or typed node. To find embedded RDF it is necessary
> to setEmbedding(true)."
>
> Which sounds like what you want.
>
> (As an alternative you could pass in the DOM node corresponding to the
> sub-tree you want to parse. Your issue is that you're trying to parse the
> whole thing as RDF/XML)
>
> Damian
>
> [1]
> <http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/rdf/arp/ARPOptions.ht
> ml#setEmbedding(boolean)>
>
Re: side effect of com.hp.hpl.jena.rdf.arp.DOM2Model.load() is incomplete
Posted by Damian Steer <d....@bristol.ac.uk>.
On 13 Sep 2011, at 13:23, Olivier Mesnard wrote:
> Dear Everyone,
Hi Olivier,
> I have some difficulties with the load() operation of class
> com.hp.hpl.jena.rdf.arp.DOM2Model.
> The operation seems to performs incompletely its task and I don't know what is
> wrong with my code. perhaps I miss some options of the parser.
I think there is an option for this
> Here is the code:
<snip>
> DOM2Model arp = DOM2Model.createD2M("", _model);
> arp.allowRelativeURIs();
Try:
arp.getOptions().setEmbedding(true);
From the javadoc: [1]
"Sets whether the XML document is only RDF, or contains RDF embedded in other XML. The default is non-embedded mode. Embedded mode also matches RDF documents that use the rdf:RDF tag at the top-level. Non-embeded mode matches RDF documents which omit that optional tag, and consist of a single rdf:Description or typed node. To find embedded RDF it is necessary to setEmbedding(true)."
Which sounds like what you want.
(As an alternative you could pass in the DOM node corresponding to the sub-tree you want to parse. Your issue is that you're trying to parse the whole thing as RDF/XML)
Damian
[1] <http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/rdf/arp/ARPOptions.html#setEmbedding(boolean)>