You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Enrico Daga (enridaga)" <en...@gmail.com> on 2015/10/15 00:39:09 UTC

Streaming a ResultSet as RDF using a custom vocabulary

Hi,

in my use case I need to stream a ResultSet obtained from a query to a remote endpoint converted into an RDF output format.
I know Jena provides a ResultSetFormatter.toModel facility for that, however I have the following constraints:
- I want to use a different representation/vocabulary and not the one provided by Jena, and
- I don't want to load the data in memory. In other words I don't want to create a Model and fill it with the ResultSet, but streaming out the triples while I iterate on it, to control memory consumption.
- I still want to benefit by the Jena serializers

I have seen the StreamRDF interface, but I am not very clear about how to use it effectively.
What could be a correct approach in this scenario?

Thank you,

Enrico

—
Enrico Daga (enridaga)
http://www.enridaga.net <http://www.enridaga.net/>
Il budda e’ nel parco.






Re: Streaming a ResultSet as RDF using a custom vocabulary

Posted by "Enrico Daga (enridaga)" <en...@gmail.com>.
FYI, I opened a question in StackOverflow as well: 
http://stackoverflow.com/questions/33136916/streaming-a-resultset-as-rdf-using-a-custom-vocabulary <http://stackoverflow.com/questions/33136916/streaming-a-resultset-as-rdf-using-a-custom-vocabulary>


—
Enrico Daga (enridaga)
http://www.enridaga.net <http://www.enridaga.net/>
Il budda e’ nel parco.





> On 14 Oct 2015, at 18:39, Enrico Daga (enridaga) <en...@gmail.com> wrote:
> 
> Hi,
> 
> in my use case I need to stream a ResultSet obtained from a query to a remote endpoint converted into an RDF output format.
> I know Jena provides a ResultSetFormatter.toModel facility for that, however I have the following constraints:
> - I want to use a different representation/vocabulary and not the one provided by Jena, and
> - I don't want to load the data in memory. In other words I don't want to create a Model and fill it with the ResultSet, but streaming out the triples while I iterate on it, to control memory consumption.
> - I still want to benefit by the Jena serializers
> 
> I have seen the StreamRDF interface, but I am not very clear about how to use it effectively.
> What could be a correct approach in this scenario?
> 
> Thank you,
> 
> Enrico
> 
> —
> Enrico Daga (enridaga)
> http://www.enridaga.net <http://www.enridaga.net/>
> Il budda e’ nel parco.
> 
> 
> 
> 
> 


Re: Streaming a ResultSet as RDF using a custom vocabulary

Posted by Andy Seaborne <an...@apache.org>.
On 16/10/15 17:31, Enrico Daga (enridaga) wrote:
...
> However, Jena does not seem to support some of the RDF serializations
> for streaming, namely XML and JSON formats, resulting in a
> org.apache.jena.riot.RiotException: No serialization for language
> Lang:rdf/null, for example. Is this right or I am mistaking/missing
> something? I would really like this same code to support all
> available serialisation formats!

I just added Lang:rdf/null support (i.e. /dev/null) - obviously things 
do not round trip through rdf/null

RDF/XML, JSON-LD formats don't stream.  These need to be written 
non-streaming.

JSON-LD is not a streaming format:
1/ The processor Jena used jsonld-java simply does not work that way.
2/ @context must be calculated before writing.

RDF/XML is not a streaming form: the namespace declarations for 
properties need to be found or generated before writing triples.  It 
would be possible to add an RDF/XML writer that put namespaces on each 
rdf:Description element.  The output will be large.

RDF/JSON streams.
As does Trix.

	Andy

Re: Streaming a ResultSet as RDF using a custom vocabulary

Posted by "Enrico Daga (enridaga)" <en...@gmail.com>.
Thank you for the insight and the suggestion about compacting the code.

About streaming block formats,  keeping track of blocks in memory should not be a problem for my use case, as I can expect column numbers in select queries won’t be too many.
And as far as I know none of the RDF formats really require to load *all* the data in memory (you can define local prefixes in XML, and repeat them in Turtle, if you really want them).
But you are right that in general these are bad formats to use for large data streams as they end up to be very verbose.

However, Jena does not seem to support some of the RDF serializations for streaming, namely XML and JSON formats, resulting in a org.apache.jena.riot.RiotException: No serialization for language Lang:rdf/null, for example. Is this right or I am mistaking/missing something? 
I would really like this same code to support all available serialisation formats!

Thanks,
Enrico




> On 15 Oct 2015, at 17:34, A. Soroka <aj...@virginia.edu> wrote:
> 
> I just re-read your message more carefully and realized that you are using a version of Jena <3. In this case, I believe you will want to use, instead of the type Function<>, the older type Map1<> if you want to use my suggestion. I am sorry for any confusion.
> 
> ---
> A. Soroka
> The University of Virginia Library
> 
>> On Oct 15, 2015, at 12:00 PM, Enrico Daga (enridaga) <en...@gmail.com> wrote:
>> 
>> Thank you for your reply.
>> Actually the problem is not really about the representation - for example I might use the DataCube vocabulary - but is more about how to use the Jena serialisers to stream custom triples adapted from a ResultSet efficiently.
>> The ResultSetFormatter.toModel approach is not the one I like, as it requires the RDF to be generated in memory before serialisation. 
>> I posted my solution to SO: http://stackoverflow.com/questions/33136916/streaming-a-resultset-as-rdf-using-a-custom-vocabulary/33153024#33153024 <http://stackoverflow.com/questions/33136916/streaming-a-resultset-as-rdf-using-a-custom-vocabulary/33153024#33153024>
>> (Are there better ways of doing that?)
>> 
>> However, it looks like the streaming features do not support all RDF syntax, as I got a RIOT exception when I ask for RDF/XML or RDF/JSON formats.
>> So now my problem is how to support all serialisations.
>> Or maybe my version of Jena is outdated (2.12.1) and I should use Jena 3?
>> 
>> Thanks,
>> 
>> Enrico
>> 
>> 
>>> On 14 Oct 2015, at 18:53, A. Soroka <aj...@virginia.edu> wrote:
>>> 
>>> Perhaps you could say more about the representation you want to use? ResultSetFormatter does feature methods that (to my understanding) do stream using Jena serialization:
>>> 
>>> https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFormatter.html#output-java.io.OutputStream-org.apache.jena.query.ResultSet-org.apache.jena.sparql.resultset.ResultsFormat- <https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFormatter.html#output-java.io.OutputStream-org.apache.jena.query.ResultSet-org.apache.jena.sparql.resultset.ResultsFormat->
>>> 
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>> 
>>>> On Oct 14, 2015, at 6:39 PM, Enrico Daga (enridaga) <enricodaga@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> in my use case I need to stream a ResultSet obtained from a query to a remote endpoint converted into an RDF output format.
>>>> I know Jena provides a ResultSetFormatter.toModel facility for that, however I have the following constraints:
>>>> - I want to use a different representation/vocabulary and not the one provided by Jena, and
>>>> - I don't want to load the data in memory. In other words I don't want to create a Model and fill it with the ResultSet, but streaming out the triples while I iterate on it, to control memory consumption.
>>>> - I still want to benefit by the Jena serializers
>>>> 
>>>> I have seen the StreamRDF interface, but I am not very clear about how to use it effectively.
>>>> What could be a correct approach in this scenario?
>>>> 
>>>> Thank you,
>>>> 
>>>> Enrico
>>>> 
>>>> —
>>>> Enrico Daga (enridaga)
>>>> http://www.enridaga.net <http://www.enridaga.net/> <http://www.enridaga.net/ <http://www.enridaga.net/>>
>>>> Il budda e’ nel parco.
>> 
> 


Re: Streaming a ResultSet as RDF using a custom vocabulary

Posted by "A. Soroka" <aj...@virginia.edu>.
I just re-read your message more carefully and realized that you are using a version of Jena <3. In this case, I believe you will want to use, instead of the type Function<>, the older type Map1<> if you want to use my suggestion. I am sorry for any confusion.

---
A. Soroka
The University of Virginia Library

> On Oct 15, 2015, at 12:00 PM, Enrico Daga (enridaga) <en...@gmail.com> wrote:
> 
> Thank you for your reply.
> Actually the problem is not really about the representation - for example I might use the DataCube vocabulary - but is more about how to use the Jena serialisers to stream custom triples adapted from a ResultSet efficiently.
> The ResultSetFormatter.toModel approach is not the one I like, as it requires the RDF to be generated in memory before serialisation. 
> I posted my solution to SO: http://stackoverflow.com/questions/33136916/streaming-a-resultset-as-rdf-using-a-custom-vocabulary/33153024#33153024 <http://stackoverflow.com/questions/33136916/streaming-a-resultset-as-rdf-using-a-custom-vocabulary/33153024#33153024>
> (Are there better ways of doing that?)
> 
> However, it looks like the streaming features do not support all RDF syntax, as I got a RIOT exception when I ask for RDF/XML or RDF/JSON formats.
> So now my problem is how to support all serialisations.
> Or maybe my version of Jena is outdated (2.12.1) and I should use Jena 3?
> 
> Thanks,
> 
> Enrico
> 
> 
>> On 14 Oct 2015, at 18:53, A. Soroka <aj...@virginia.edu> wrote:
>> 
>> Perhaps you could say more about the representation you want to use? ResultSetFormatter does feature methods that (to my understanding) do stream using Jena serialization:
>> 
>> https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFormatter.html#output-java.io.OutputStream-org.apache.jena.query.ResultSet-org.apache.jena.sparql.resultset.ResultsFormat- <https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFormatter.html#output-java.io.OutputStream-org.apache.jena.query.ResultSet-org.apache.jena.sparql.resultset.ResultsFormat->
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Oct 14, 2015, at 6:39 PM, Enrico Daga (enridaga) <enricodaga@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> in my use case I need to stream a ResultSet obtained from a query to a remote endpoint converted into an RDF output format.
>>> I know Jena provides a ResultSetFormatter.toModel facility for that, however I have the following constraints:
>>> - I want to use a different representation/vocabulary and not the one provided by Jena, and
>>> - I don't want to load the data in memory. In other words I don't want to create a Model and fill it with the ResultSet, but streaming out the triples while I iterate on it, to control memory consumption.
>>> - I still want to benefit by the Jena serializers
>>> 
>>> I have seen the StreamRDF interface, but I am not very clear about how to use it effectively.
>>> What could be a correct approach in this scenario?
>>> 
>>> Thank you,
>>> 
>>> Enrico
>>> 
>>> —
>>> Enrico Daga (enridaga)
>>> http://www.enridaga.net <http://www.enridaga.net/> <http://www.enridaga.net/ <http://www.enridaga.net/>>
>>> Il budda e’ nel parco.
> 


Re: Streaming a ResultSet as RDF using a custom vocabulary

Posted by "A. Soroka" <aj...@virginia.edu>.
It seems to me that your solution might be shortened by using Jena’s built-in facilities for this kind of task. E.g.

Function<QuerySolution, Iterator<Triple>> myCustomTransform = qs -> {
	// calculate triples from a query solution qs
	return triplesForASolution;
};
Iterator<Triple> triples = WrappedIterator.createIteratorIterator( Iter.map( myResultSet, myCustomTransform ));
// or
Iterator<Triple> triples = Iterators.concat( Iter.map( myResultSet, myCustomTransform ));

I can tell you that some formats (and I believe that RDF/XML is a good example) are inherently impossible to fully stream because they require a real buildup of state along the way. Here:

https://jena.apache.org/documentation/io/rdf-output.html#streamed-block-formats

is some information about some options.

---
A. Soroka
The University of Virginia Library

> On Oct 15, 2015, at 12:00 PM, Enrico Daga (enridaga) <en...@gmail.com> wrote:
> 
> Thank you for your reply.
> Actually the problem is not really about the representation - for example I might use the DataCube vocabulary - but is more about how to use the Jena serialisers to stream custom triples adapted from a ResultSet efficiently.
> The ResultSetFormatter.toModel approach is not the one I like, as it requires the RDF to be generated in memory before serialisation. 
> I posted my solution to SO: http://stackoverflow.com/questions/33136916/streaming-a-resultset-as-rdf-using-a-custom-vocabulary/33153024#33153024 <http://stackoverflow.com/questions/33136916/streaming-a-resultset-as-rdf-using-a-custom-vocabulary/33153024#33153024>
> (Are there better ways of doing that?)
> 
> However, it looks like the streaming features do not support all RDF syntax, as I got a RIOT exception when I ask for RDF/XML or RDF/JSON formats.
> So now my problem is how to support all serialisations.
> Or maybe my version of Jena is outdated (2.12.1) and I should use Jena 3?
> 
> Thanks,
> 
> Enrico
> 
> 
>> On 14 Oct 2015, at 18:53, A. Soroka <aj...@virginia.edu> wrote:
>> 
>> Perhaps you could say more about the representation you want to use? ResultSetFormatter does feature methods that (to my understanding) do stream using Jena serialization:
>> 
>> https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFormatter.html#output-java.io.OutputStream-org.apache.jena.query.ResultSet-org.apache.jena.sparql.resultset.ResultsFormat- <https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFormatter.html#output-java.io.OutputStream-org.apache.jena.query.ResultSet-org.apache.jena.sparql.resultset.ResultsFormat->
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Oct 14, 2015, at 6:39 PM, Enrico Daga (enridaga) <enricodaga@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> in my use case I need to stream a ResultSet obtained from a query to a remote endpoint converted into an RDF output format.
>>> I know Jena provides a ResultSetFormatter.toModel facility for that, however I have the following constraints:
>>> - I want to use a different representation/vocabulary and not the one provided by Jena, and
>>> - I don't want to load the data in memory. In other words I don't want to create a Model and fill it with the ResultSet, but streaming out the triples while I iterate on it, to control memory consumption.
>>> - I still want to benefit by the Jena serializers
>>> 
>>> I have seen the StreamRDF interface, but I am not very clear about how to use it effectively.
>>> What could be a correct approach in this scenario?
>>> 
>>> Thank you,
>>> 
>>> Enrico
>>> 
>>> —
>>> Enrico Daga (enridaga)
>>> http://www.enridaga.net <http://www.enridaga.net/> <http://www.enridaga.net/ <http://www.enridaga.net/>>
>>> Il budda e’ nel parco.
> 


Re: Streaming a ResultSet as RDF using a custom vocabulary

Posted by "Enrico Daga (enridaga)" <en...@gmail.com>.
Thank you for your reply.
Actually the problem is not really about the representation - for example I might use the DataCube vocabulary - but is more about how to use the Jena serialisers to stream custom triples adapted from a ResultSet efficiently.
The ResultSetFormatter.toModel approach is not the one I like, as it requires the RDF to be generated in memory before serialisation. 
I posted my solution to SO: http://stackoverflow.com/questions/33136916/streaming-a-resultset-as-rdf-using-a-custom-vocabulary/33153024#33153024 <http://stackoverflow.com/questions/33136916/streaming-a-resultset-as-rdf-using-a-custom-vocabulary/33153024#33153024>
(Are there better ways of doing that?)

However, it looks like the streaming features do not support all RDF syntax, as I got a RIOT exception when I ask for RDF/XML or RDF/JSON formats.
So now my problem is how to support all serialisations.
Or maybe my version of Jena is outdated (2.12.1) and I should use Jena 3?

Thanks,

Enrico


> On 14 Oct 2015, at 18:53, A. Soroka <aj...@virginia.edu> wrote:
> 
> Perhaps you could say more about the representation you want to use? ResultSetFormatter does feature methods that (to my understanding) do stream using Jena serialization:
> 
> https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFormatter.html#output-java.io.OutputStream-org.apache.jena.query.ResultSet-org.apache.jena.sparql.resultset.ResultsFormat- <https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFormatter.html#output-java.io.OutputStream-org.apache.jena.query.ResultSet-org.apache.jena.sparql.resultset.ResultsFormat->
> 
> ---
> A. Soroka
> The University of Virginia Library
> 
>> On Oct 14, 2015, at 6:39 PM, Enrico Daga (enridaga) <enricodaga@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> in my use case I need to stream a ResultSet obtained from a query to a remote endpoint converted into an RDF output format.
>> I know Jena provides a ResultSetFormatter.toModel facility for that, however I have the following constraints:
>> - I want to use a different representation/vocabulary and not the one provided by Jena, and
>> - I don't want to load the data in memory. In other words I don't want to create a Model and fill it with the ResultSet, but streaming out the triples while I iterate on it, to control memory consumption.
>> - I still want to benefit by the Jena serializers
>> 
>> I have seen the StreamRDF interface, but I am not very clear about how to use it effectively.
>> What could be a correct approach in this scenario?
>> 
>> Thank you,
>> 
>> Enrico
>> 
>> —
>> Enrico Daga (enridaga)
>> http://www.enridaga.net <http://www.enridaga.net/> <http://www.enridaga.net/ <http://www.enridaga.net/>>
>> Il budda e’ nel parco.


Re: Streaming a ResultSet as RDF using a custom vocabulary

Posted by "A. Soroka" <aj...@virginia.edu>.
Perhaps you could say more about the representation you want to use? ResultSetFormatter does feature methods that (to my understanding) do stream using Jena serialization:

https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFormatter.html#output-java.io.OutputStream-org.apache.jena.query.ResultSet-org.apache.jena.sparql.resultset.ResultsFormat-

---
A. Soroka
The University of Virginia Library

> On Oct 14, 2015, at 6:39 PM, Enrico Daga (enridaga) <en...@gmail.com> wrote:
> 
> Hi,
> 
> in my use case I need to stream a ResultSet obtained from a query to a remote endpoint converted into an RDF output format.
> I know Jena provides a ResultSetFormatter.toModel facility for that, however I have the following constraints:
> - I want to use a different representation/vocabulary and not the one provided by Jena, and
> - I don't want to load the data in memory. In other words I don't want to create a Model and fill it with the ResultSet, but streaming out the triples while I iterate on it, to control memory consumption.
> - I still want to benefit by the Jena serializers
> 
> I have seen the StreamRDF interface, but I am not very clear about how to use it effectively.
> What could be a correct approach in this scenario?
> 
> Thank you,
> 
> Enrico
> 
> —
> Enrico Daga (enridaga)
> http://www.enridaga.net <http://www.enridaga.net/>
> Il budda e’ nel parco.
> 
> 
> 
> 
>