You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Olivier Mesnard <ol...@cea.fr> on 2011/09/13 14:23:29 UTC

side effect of com.hp.hpl.jena.rdf.arp.DOM2Model.load() is incomplete

Dear Everyone,

I have some difficulties with the load() operation of class  
com.hp.hpl.jena.rdf.arp.DOM2Model.
The operation seems to performs incompletely its task and I don't know what is 
wrong with my code. perhaps I miss some options of the parser.

I use jaxb to create Java classes for my model (I enclose the xsd file which 
describes the model).
This model contains Documents, Resources, Annotation, PieceOfKnowlede and 
Text.
A Resource is an abstract class which represents an object with an URI and 
which can hold annotations.
Annotation (or PieceOfKnowledge) encapsulate Rdf triples as a literal within 
an element data of type xsd:anyType.
A MediaUnit is a Resource.
A Text is a specialized MediaUnit with some content of type string.
To complete the model, a Document is a Resource and is composed of several 
MediaUnit.

So, my application has to manage some XML documents which contains text with 
embeded annotations about that text as XML serialized RDF. I choose to decode 
these annotations as a com.hp.hpl.jena.rdf.model.Model to be able to make some 
query on it.
I try unsuccessfully to load this model from the document with the D2Model 
class.

Here is the code:

====================================
package sampleforjena;

// file IO
import java.io.File;
import java.io.IOException;
import java.io.FileInputStream;
import java.io.FileNotFoundException;

// data model
import model.MediaUnit;
import model.Annotation;
import model.Resource;
import model.Document;

import javax.xml.bind.Unmarshaller;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.JAXBElement;

import javax.xml.transform.stream.StreamSource;

import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.rdf.arp.DOM2Model;

public class RdfIo {
    private
      com.hp.hpl.jena.rdf.model.Model _model;
....
    // read XML Serialized document file and extract RDF model 
    public void decodeSampleDocFile(File annotatedRegularDoc ) {

      Resource resource = new Resource();
      // Unmarshall the XML document
      try {
        JAXBContext jContext = JAXBContext.newInstance( "model" );
        Unmarshaller unmarshaller = jContext.createUnmarshaller();
        JAXBElement<Resource> jroot = unmarshaller.unmarshal(new StreamSource(
                          annotatedRegularDoc), Resource.class);
         resource = (Resource)jroot.getValue();
      } catch (JAXBException e) {
        System.out.println("RdfIo.decodeDocFile: Error!!! unable to read 
annotatedWebLabDoc...");
        e.printStackTrace();
      }

      // Create a model
      _model = com.hp.hpl.jena.rdf.model.ModelFactory.createDefaultModel();
      Document d = (Document)resource;
      // get XML serialized rdf triples from annotation in document
      for (MediaUnit mu: ((Document)resource).getMediaUnit()) {
        for (Annotation annot: mu.getAnnotation()) {
          org.apache.xerces.dom.ElementImpl dataElement = 
(org.apache.xerces.dom.ElementImpl)annot.getData();
          try
          {
            System.out.println("==rdfModelFromDomElement: createD2M ...");
//            DOM2Model.createD2M("", _model).load(dataElement);
            DOM2Model arp = DOM2Model.createD2M("", _model);
            arp.allowRelativeURIs();
            arp.load(dataElement);
          }
          catch (org.xml.sax.SAXParseException e)
          {
            e.printStackTrace();
          }
          // To check if the whole set of rdf statements have been loaded
          // print the size of the rdf store
          String annotSize = Integer.toString(_model.getGraph().size());
          System.out.println("RDFIO.decodeDocFile: annot contains " + 
annotSize + " triples");
        }
      }
    }
....
  }

====================================

A run time, I have some errors like

13 sept. 2011 13:36:08 com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler 
warning
ATTENTION: unknown-source: {W104} Unqualified typed nodes are not allowed. 
Type treated as a relative URI.
13 sept. 2011 13:36:09 com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler 
error
GRAVE: unknown-source: {E205} rdf:RDF is not allowed as an element tag here.
13 sept. 2011 13:36:09 com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler 
error
GRAVE: unknown-source: {E201} Multiple children of property element

That I cannot explain, but the program is no interrupted.

It behaves as if the complete model is not loaded (only 3 triples are read) , 
although the full data seems to be contained within the dataElement variable.

I must add that my documents seems correct because when I isolate the RDF part
as a XML serialized, I can perfectly read the whole set of statements (9 
triples are read) with the following decodeRdfFile() operation:

    // test reading XML Serialized RDF File
    public void decodeRdfFile(File serializedRdf ) {
      FileInputStream is;
      try {
        is = new FileInputStream(serializedRdf);
        _model = com.hp.hpl.jena.rdf.model.ModelFactory.createDefaultModel();
        _model.read(is,null);
      }
      catch (FileNotFoundException e) {
        e.printStackTrace();
      }
    }

Another remark is that I have problem with the following operation, when I try 
to "pretty print" the model, which do not ease the debugging task....

    // test writing XML Serialized RDF model
    public void writeModel() {
      try
      {
        //print
        java.io.FileOutputStream of = new 
java.io.FileOutputStream("fffModel.xml");
        _model.write(of);
      }
        catch(IOException ie) {
        ie.printStackTrace();
      }


Exception in thread "main" com.hp.hpl.jena.shared.BadURIException: Only well-
formed absolute URIrefs can be included in RDF/XML output: <d> Code: 
58/REQUIRED_COMPONENT_MISSING in SCHEME: A component that is required by the 
scheme is missing.
        at 
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.checkURI(BaseXMLWriter.java:768)
        at 
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.xmlnsDecl(BaseXMLWriter.java:300)
        at com.hp.hpl.jena.xmloutput.impl.Basic.writeRDFHeader(Basic.java:56)
        at com.hp.hpl.jena.xmloutput.impl.Basic.writeBody(Basic.java:39)
        at 
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.writeXMLBody(BaseXMLWriter.java:452)
        at 
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:424)
        at 
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:410)
        at com.hp.hpl.jena.rdf.model.impl.ModelCom.write(ModelCom.java:270)
        at sampleforjena.RdfIo.writeModel(RdfIo.java:112)

Sorry for the long message.


Olivier Mesnard

Re: side effect of com.hp.hpl.jena.rdf.arp.DOM2Model.load() is incomplete

Posted by Olivier Mesnard <ol...@cea.fr>.
Thankyou Damian, 

I did not know the option.
Unfortunately, I tried it and it does not change the result.

I agree with your second advise and I follow it:
I already tried to pass in the DOM node corresponding to the sub-tree to the 
load operation, but I did not navigate in my object model down to the right 
level:

      Document d = (Document)resource;
      for (MediaUnit mu: ((Document)resource).getMediaUnit()) {
        for (Annotation annot: mu.getAnnotation()) {
          org.apache.xerces.dom.ElementImpl dataElement = 
(org.apache.xerces.dom.ElementImpl)annot.getData();
          // load only the sub tree  <data><rdf:RDF ...
          arp.load(dataElement); // wrong!

I just add the following line:
          org.w3c.dom.Node rdfNode = dataElement.getFirstChild();

And now
          // load only the sub tree  <rdf:RDF ...
            arp.load(rdfNode); // allright!

I reach the right level and it works! I now reads the 9 triples.

Thanks

Olivier 
 


Le mardi 13 septembre 2011 15:01:26, Damian Steer a écrit :
> On 13 Sep 2011, at 13:23, Olivier Mesnard wrote:
> > Dear Everyone,
> 
> Hi Olivier,
> 
> > I have some difficulties with the load() operation of class
> > com.hp.hpl.jena.rdf.arp.DOM2Model.
> > The operation seems to performs incompletely its task and I don't know
> > what is wrong with my code. perhaps I miss some options of the parser.
> 
> I think there is an option for this
> 
> > Here is the code:
> 
> <snip>
> 
> >            DOM2Model arp = DOM2Model.createD2M("", _model);
> >            arp.allowRelativeURIs();
> 
> Try:
> 
> arp.getOptions().setEmbedding(true);
> 
> From the javadoc: [1]
> 
> "Sets whether the XML document is only RDF, or contains RDF embedded in
>  other XML. The default is non-embedded mode. Embedded mode also matches
>  RDF documents that use the rdf:RDF tag at the top-level. Non-embeded mode
>  matches RDF documents which omit that optional tag, and consist of a
>  single rdf:Description or typed node. To find embedded RDF it is necessary
>  to setEmbedding(true)."
> 
> Which sounds like what you want.
> 
> (As an alternative you could pass in the DOM node corresponding to the
>  sub-tree you want to parse. Your issue is that you're trying to parse the
>  whole thing as RDF/XML)
> 
> Damian
> 
> [1]
>  <http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/rdf/arp/ARPOptions.ht
> ml#setEmbedding(boolean)>
> 

Re: side effect of com.hp.hpl.jena.rdf.arp.DOM2Model.load() is incomplete

Posted by Damian Steer <d....@bristol.ac.uk>.
On 13 Sep 2011, at 13:23, Olivier Mesnard wrote:

> Dear Everyone,

Hi Olivier,

> I have some difficulties with the load() operation of class  
> com.hp.hpl.jena.rdf.arp.DOM2Model.
> The operation seems to performs incompletely its task and I don't know what is 
> wrong with my code. perhaps I miss some options of the parser.

I think there is an option for this

> Here is the code:

<snip>

>            DOM2Model arp = DOM2Model.createD2M("", _model);
>            arp.allowRelativeURIs();

Try:

arp.getOptions().setEmbedding(true);

From the javadoc: [1]

"Sets whether the XML document is only RDF, or contains RDF embedded in other XML. The default is non-embedded mode. Embedded mode also matches RDF documents that use the rdf:RDF tag at the top-level. Non-embeded mode matches RDF documents which omit that optional tag, and consist of a single rdf:Description or typed node. To find embedded RDF it is necessary to setEmbedding(true)."

Which sounds like what you want.

(As an alternative you could pass in the DOM node corresponding to the sub-tree you want to parse. Your issue is that you're trying to parse the whole thing as RDF/XML)

Damian

[1] <http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/rdf/arp/ARPOptions.html#setEmbedding(boolean)>