You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Eric Feliksik <fe...@gmail.com> on 2011/09/30 11:55:36 UTC

parsing RDF/XML sub-part with stax, using Jena?

Dear Jena fanatics,

I am parsing an xml document and RDF/XML can be contained as a subtree, as
follows:

<?xml version='1.0' encoding='utf-8'?>
<myFormat>
  <myGraphAnnouncer>
      <rdf:RDF
        xmlns:j.0="http://dbpedia.org/property/" >
      <rdf:Description rdf:about="
http://dbpedia.org/resource/Geert_den_Ouden">
        <j.0:cityofbirth rdf:resource="http://dbpedia.org/resource/Delft"/>
        <j.0:placeOfBirth rdf:resource="http://dbpedia.org/resource/Delft"/>
      </rdf:Description>
    </rdf:RDF>
  </myGraphAnnouncer>
</myFormat>

I am using the Stax parser to process my document. As soon as i find the
<myGraphAnnouncer> tag, I want to hand the responsibility of parsing to Jena
as it already implements this.
Is there a Stax implementation that I could directly pass the stax
javax.xml.stream.XMLStreamReader ? Sharing the inputStream seem to quickly
become complex, as various parsers may buffer the input in their own
particular ways. I prefer a streaming solution as the document may become
many megabytes in size.

Does Jena have a Stax RDF parsing implementation? If not, do you have any
advice? Thanks in advance!
Regards,
Eric

Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Andy Seaborne <an...@apache.org>.
On 30/09/11 16:17, Dave Reynolds wrote:
> On Fri, 2011-09-30 at 11:55 +0200, Eric Feliksik wrote:
>> Dear Jena fanatics,
>>
>> I am parsing an xml document and RDF/XML can be contained as a subtree, as
>> follows:
>>
>> <?xml version='1.0' encoding='utf-8'?>
>> <myFormat>
>>    <myGraphAnnouncer>
>>        <rdf:RDF
>>          xmlns:j.0="http://dbpedia.org/property/">
>>        <rdf:Description rdf:about="
>> http://dbpedia.org/resource/Geert_den_Ouden">
>>          <j.0:cityofbirth rdf:resource="http://dbpedia.org/resource/Delft"/>
>>          <j.0:placeOfBirth rdf:resource="http://dbpedia.org/resource/Delft"/>
>>        </rdf:Description>
>>      </rdf:RDF>
>>    </myGraphAnnouncer>
>> </myFormat>
>>
>> I am using the Stax parser to process my document. As soon as i find the
>> <myGraphAnnouncer>  tag, I want to hand the responsibility of parsing to Jena
>> as it already implements this.
>> Is there a Stax implementation that I could directly pass the stax
>> javax.xml.stream.XMLStreamReader ? Sharing the inputStream seem to quickly
>> become complex, as various parsers may buffer the input in their own
>> particular ways. I prefer a streaming solution as the document may become
>> many megabytes in size.
>>
>> Does Jena have a Stax RDF parsing implementation? If not, do you have any
>> advice? Thanks in advance!
>
> Unless there's something in RIOT I don't there there is a direct Stax
> RDF parser.
>
> However, there is support for parsing from SAX event streams [1].
> Code to pull on your Stax stream (up to the  closing</rdf:RDF>) passing
> the events directly to a SAX2Model probably wouldn't be too bad.

All RIOT does with RDF/XML is wrap ARP into the standard RIOT interfaces.

	Andy

>
> Dave
>
> [1] http://jena.sourceforge.net/ARP/sax.html
>


Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Dave Reynolds <da...@gmail.com>.
On Fri, 2011-09-30 at 11:55 +0200, Eric Feliksik wrote: 
> Dear Jena fanatics,
> 
> I am parsing an xml document and RDF/XML can be contained as a subtree, as
> follows:
> 
> <?xml version='1.0' encoding='utf-8'?>
> <myFormat>
>   <myGraphAnnouncer>
>       <rdf:RDF
>         xmlns:j.0="http://dbpedia.org/property/" >
>       <rdf:Description rdf:about="
> http://dbpedia.org/resource/Geert_den_Ouden">
>         <j.0:cityofbirth rdf:resource="http://dbpedia.org/resource/Delft"/>
>         <j.0:placeOfBirth rdf:resource="http://dbpedia.org/resource/Delft"/>
>       </rdf:Description>
>     </rdf:RDF>
>   </myGraphAnnouncer>
> </myFormat>
> 
> I am using the Stax parser to process my document. As soon as i find the
> <myGraphAnnouncer> tag, I want to hand the responsibility of parsing to Jena
> as it already implements this.
> Is there a Stax implementation that I could directly pass the stax
> javax.xml.stream.XMLStreamReader ? Sharing the inputStream seem to quickly
> become complex, as various parsers may buffer the input in their own
> particular ways. I prefer a streaming solution as the document may become
> many megabytes in size.
> 
> Does Jena have a Stax RDF parsing implementation? If not, do you have any
> advice? Thanks in advance!

Unless there's something in RIOT I don't there there is a direct Stax
RDF parser.

However, there is support for parsing from SAX event streams [1].
Code to pull on your Stax stream (up to the  closing </rdf:RDF>) passing
the events directly to a SAX2Model probably wouldn't be too bad.

Dave

[1] http://jena.sourceforge.net/ARP/sax.html


Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Andy Seaborne <an...@apache.org>.
On 19/10/11 12:34, Andy Seaborne wrote:

> There are just 4 non-missing @Override warnings in the codebase.
>
> I'll commit the code with @Override (378 files) and with the Java level
> as environment "JavaSE-1.6". As it's one update, we can reverse it.

OK - sorry about all the commit messages - most projects updated now. 
Thankfully, I wasn't making all those changes myself; Eclispe did the work.

	Andy

Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Andy Seaborne <an...@apache.org>.
On 19/10/11 12:17, Andy Seaborne wrote:
> On 18/10/11 23:16, Damian Steer wrote:
>>
>> On 18 Oct 2011, at 11:36, Andy Seaborne wrote:
>>
>>> On 17/10/11 23:09, Damian Steer wrote:
>>>>
>>>> On 17 Oct 2011, at 22:33, Ian Dickinson wrote:
>>>>
>>>>> On 17/10/11 22:12, Damian Steer wrote:
>>>>>> Any objections to me adding this to jena?
>>>>> No objection. I would +1, but given the framing of the question I
>>>>> suspect I'd be objecting :)
>>>>>
>>>>> Ian
>>>>
>>>>
>>>> What could be clearer than a hearty -1 of agreement ;-)
>>>>
>>>> Damian
>>>
>>> I agree that adding it would be good (whether that is +1, -1 or -i !)
>>>
>>> Andy
>>
>> Added, with tests derived from SAX2RDF.
>>
>> Damian
>
> StAX2SAX requires Java6 for javax.xml.stream.events.* This is the straw.
>
> I'll reset the checked-in Eclipse settings to be compatible and start
> updating all the other projects.
>
>
> I get 3 warnings in Eclipse in StAX2SAX. Having a warnign free codebase
> is really helpful to a release manager.
>
>
> In Eclipse, I see 2732 warnings - all but 136 are missing @Override on
> interfaces (I'm not sure what the standard Eclipse settings are). Does
> anyone mind if I go and fix these and set the compiler level to warn on
> missing @Override for interface implementations?

Eclipse "quick fix" wasn't perfect for fixing missing @Overrides all in 
one go.  Nearly, but not quite.

There are just 4 non-missing @Override warnings in the codebase.

I'll commit the code with @Override (378 files) and with the Java level 
as environment "JavaSE-1.6".  As it's one update, we can reverse it.

	Andy

>
> Andy
>
>


Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Andy Seaborne <an...@apache.org>.
On 19/10/11 12:37, Damian Steer wrote:
>
> On 19 Oct 2011, at 12:17, Andy Seaborne wrote:
>
>> StAX2SAX requires Java6 for javax.xml.stream.events.*  This is the straw.
>
> Bah, sorry. The project is set for source 1.5, but it won't catch that.
>
> It's ARQ that requires StAX, of course.

Woodstox depends on the stax-api (from 2006!), which has an advanced 
copy in it.

>> I'll reset the checked-in Eclipse settings to be compatible and start updating all the other projects.
>>
>>
>> I get 3 warnings in Eclipse in StAX2SAX.  Having a warnign free codebase is really helpful to a release manager.
>
> All @Override? That's a 1.6 change, isn't it?

It became allowable at Java6 to have @Override on a method implementing 
an interface, not just overriding an (abstract) method.  It makes the 
treatment of interfaces and abstract classes more samey.

	Andy

> /me sees nothing in netbeans.
>
> Damian


Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Damian Steer <d....@bristol.ac.uk>.
On 19 Oct 2011, at 12:17, Andy Seaborne wrote:

> StAX2SAX requires Java6 for javax.xml.stream.events.*  This is the straw.

Bah, sorry. The project is set for source 1.5, but it won't catch that.

It's ARQ that requires StAX, of course.

> I'll reset the checked-in Eclipse settings to be compatible and start updating all the other projects.
> 
> 
> I get 3 warnings in Eclipse in StAX2SAX.  Having a warnign free codebase is really helpful to a release manager.

All @Override? That's a 1.6 change, isn't it?

/me sees nothing in netbeans.

Damian

Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Andy Seaborne <an...@apache.org>.
On 18/10/11 23:16, Damian Steer wrote:
>
> On 18 Oct 2011, at 11:36, Andy Seaborne wrote:
>
>> On 17/10/11 23:09, Damian Steer wrote:
>>>
>>> On 17 Oct 2011, at 22:33, Ian Dickinson wrote:
>>>
>>>> On 17/10/11 22:12, Damian Steer wrote:
>>>>> Any objections to me adding this to jena?
>>>> No objection. I would +1, but given the framing of the question I suspect I'd be objecting :)
>>>>
>>>> Ian
>>>
>>>
>>> What could be clearer than a hearty -1 of agreement ;-)
>>>
>>> Damian
>>
>> I agree that adding it would be good (whether that is +1, -1 or -i !)
>>
>> 	Andy
>
> Added, with tests derived from SAX2RDF.
>
> Damian

StAX2SAX requires Java6 for javax.xml.stream.events.*  This is the straw.

I'll reset the checked-in Eclipse settings to be compatible and start 
updating all the other projects.


I get 3 warnings in Eclipse in StAX2SAX.  Having a warnign free codebase 
is really helpful to a release manager.


In Eclipse, I see 2732 warnings - all but 136 are missing @Override on 
interfaces (I'm not sure what the standard Eclipse settings are).  Does 
anyone mind if I go and fix these and set the compiler level to warn on 
missing @Override for interface implementations?

	Andy



Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Damian Steer <d....@bristol.ac.uk>.
On 18 Oct 2011, at 11:36, Andy Seaborne wrote:

> On 17/10/11 23:09, Damian Steer wrote:
>> 
>> On 17 Oct 2011, at 22:33, Ian Dickinson wrote:
>> 
>>> On 17/10/11 22:12, Damian Steer wrote:
>>>> Any objections to me adding this to jena?
>>> No objection. I would +1, but given the framing of the question I suspect I'd be objecting :)
>>> 
>>> Ian
>> 
>> 
>> What could be clearer than a hearty -1 of agreement ;-)
>> 
>> Damian
> 
> I agree that adding it would be good (whether that is +1, -1 or -i !)
> 
> 	Andy

Added, with tests derived from SAX2RDF.

Damian

Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Andy Seaborne <an...@apache.org>.
On 17/10/11 23:09, Damian Steer wrote:
>
> On 17 Oct 2011, at 22:33, Ian Dickinson wrote:
>
>> On 17/10/11 22:12, Damian Steer wrote:
>>> Any objections to me adding this to jena?
>> No objection. I would +1, but given the framing of the question I suspect I'd be objecting :)
>>
>> Ian
>
>
> What could be clearer than a hearty -1 of agreement ;-)
>
> Damian

I agree that adding it would be good (whether that is +1, -1 or -i !)

	Andy

Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Damian Steer <d....@bristol.ac.uk>.
On 17 Oct 2011, at 22:33, Ian Dickinson wrote:

> On 17/10/11 22:12, Damian Steer wrote:
>> Any objections to me adding this to jena?
> No objection. I would +1, but given the framing of the question I suspect I'd be objecting :)
> 
> Ian


What could be clearer than a hearty -1 of agreement ;-)

Damian

Re: Fwd: parsing RDF/XML sub-part with stax, using Jena?

Posted by Ian Dickinson <ia...@epimorphics.com>.
On 17/10/11 22:12, Damian Steer wrote:
> Any objections to me adding this to jena?
No objection. I would +1, but given the framing of the question I 
suspect I'd be objecting :)

Ian

Fwd: parsing RDF/XML sub-part with stax, using Jena?

Posted by Damian Steer <d....@bristol.ac.uk>.

Begin forwarded message:
> 
> So here's a mostly working (for ARP) version I wrote. [1] Use it in the following way:
> 
> SAX2Model s2m = SAX2Model.create(baseuri, model);  // ARP
> StAX2SAX converter = new StAX2SAX(s2m);
> converter.parse(xmlStreamReader);
> 
> Hope it works ok. Might be worth adding to jena.
> 
> Damian
> 
> [1] <https://gist.github.com/1257922>

[Moving this to jena-dev]

Any objections to me adding this to jena? It's a simple, self contained class. I should be able to use the existing SAX2Model tests.

Damian

Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Eric Feliksik <fe...@gmail.com>.
Hi Damian,

Thanks a lot, it's working great! This is exactly what I needed and your
help is much appreciated.

Regards,
Eric

On Thu, Oct 6, 2011 at 2:31 PM, Damian Steer <d....@bristol.ac.uk> wrote:

>
> On 6 Oct 2011, at 13:18, Eric Feliksik wrote:
>
> > Hi Damian,
> >
> > Thanks a lot for this contribution! I will have time to test this code
> next
> > week -- It looks good, I almost gave up hope :-)
>
> No problem.
>
> > comment says you'd be happier to use stream reader, now it is
> > using XMLEventReader -- but isn't this also streaming? It's still not
> like
> > DOM, right? Maybe i'll be able to escape the event stream (when I
> finished
> > the relevant RDF/XML sub-part) by throwing an exception -- what do you
> > think? (I'm not very experienced with XML processing).
>
> Oh, ignore that comment. The code would have been a bit cleaner using
> XMLStreamReader, but in order to support both I had to use XMLEventReader.
> (Streams can be converted to event readers, but not the other way)
>
> You are correct, both stream. Event reader encapsulates the low level
> events of stream reader in event objects, which is nicer in many situations.
>
> Damian
>
>

Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Damian Steer <d....@bristol.ac.uk>.
On 6 Oct 2011, at 13:18, Eric Feliksik wrote:

> Hi Damian,
> 
> Thanks a lot for this contribution! I will have time to test this code next
> week -- It looks good, I almost gave up hope :-)

No problem.

> comment says you'd be happier to use stream reader, now it is
> using XMLEventReader -- but isn't this also streaming? It's still not like
> DOM, right? Maybe i'll be able to escape the event stream (when I finished
> the relevant RDF/XML sub-part) by throwing an exception -- what do you
> think? (I'm not very experienced with XML processing).

Oh, ignore that comment. The code would have been a bit cleaner using XMLStreamReader, but in order to support both I had to use XMLEventReader. (Streams can be converted to event readers, but not the other way)

You are correct, both stream. Event reader encapsulates the low level events of stream reader in event objects, which is nicer in many situations.

Damian


Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Eric Feliksik <fe...@gmail.com>.
On Sun, Oct 2, 2011 at 10:47 PM, Damian Steer <d....@bristol.ac.uk> wrote:

>
> On 30 Sep 2011, at 10:55, Eric Feliksik wrote:
>
> > I am using the Stax parser to process my document. As soon as i find the
> > <myGraphAnnouncer> tag, I want to hand the responsibility of parsing to
> Jena
> > as it already implements this.
> > Is there a Stax implementation that I could directly pass the stax
> > javax.xml.stream.XMLStreamReader ?
>
>
> > Does Jena have a Stax RDF parsing implementation? If not, do you have any
> > advice? Thanks in advance!
> > Regards,
> > Eric
>
> Hi Eric,
>
> ARP (Jena's RDF/XML parser) uses SAX, not StAX. However it ought to be
> fairly easy to move between to two, since you just need to pull all the
> events and pass them on to a content handler.
>
> Looking around there seem to quite a few StAX / SAX converters in various
> projects, however the ones I could track down didn't seem to work. (I also
> tried a favourite XSLT identity transform trick, but it exploded)
>
> So here's a mostly working (for ARP) version I wrote. [1] Use it in the
> following way:
>
> SAX2Model s2m = SAX2Model.create(baseuri, model);  // ARP
> StAX2SAX converter = new StAX2SAX(s2m);
> converter.parse(xmlStreamReader);
>
> Hope it works ok. Might be worth adding to jena.
>
> Damian
>
> [1] <https://gist.gi <https://gist.github.com/1257922>

*Thanks Damian, *

*
> *

*I will hopefully have time to test this code next week! It looks good as *

thub.com/1257922 <https://gist.github.com/1257922>>


Hi Damian,

Thanks a lot for this contribution! I will have time to test this code next
week -- It looks good, I almost gave up hope :-)
comment says you'd be happier to use stream reader, now it is
using XMLEventReader -- but isn't this also streaming? It's still not like
DOM, right? Maybe i'll be able to escape the event stream (when I finished
the relevant RDF/XML sub-part) by throwing an exception -- what do you
think? (I'm not very experienced with XML processing).

Any suggestions are more than welcome!
Cheers
Eric

Re: parsing RDF/XML sub-part with stax, using Jena?

Posted by Damian Steer <d....@bristol.ac.uk>.
On 30 Sep 2011, at 10:55, Eric Feliksik wrote:

> I am using the Stax parser to process my document. As soon as i find the
> <myGraphAnnouncer> tag, I want to hand the responsibility of parsing to Jena
> as it already implements this.
> Is there a Stax implementation that I could directly pass the stax
> javax.xml.stream.XMLStreamReader ? 


> Does Jena have a Stax RDF parsing implementation? If not, do you have any
> advice? Thanks in advance!
> Regards,
> Eric

Hi Eric,

ARP (Jena's RDF/XML parser) uses SAX, not StAX. However it ought to be fairly easy to move between to two, since you just need to pull all the events and pass them on to a content handler.

Looking around there seem to quite a few StAX / SAX converters in various projects, however the ones I could track down didn't seem to work. (I also tried a favourite XSLT identity transform trick, but it exploded)

So here's a mostly working (for ARP) version I wrote. [1] Use it in the following way:

SAX2Model s2m = SAX2Model.create(baseuri, model);  // ARP
StAX2SAX converter = new StAX2SAX(s2m);
converter.parse(xmlStreamReader);

Hope it works ok. Might be worth adding to jena.

Damian

[1] <https://gist.github.com/1257922>