You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Osma Suominen (JIRA)" <ji...@apache.org> on 2017/02/16 11:01:41 UTC

[jira] [Commented] (JENA-329) Add streaming CONSTRUCT results to Fuseki

    [ https://issues.apache.org/jira/browse/JENA-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869732#comment-15869732 ] 

Osma Suominen commented on JENA-329:
------------------------------------

I implemented something like this for the hdtsparql command line tool in the hdt-java package. See this PR: https://github.com/rdfhdt/hdt-java/pull/43

In the implementation I used a 1000-slot LRU cache to check for duplicates, effectively a sliding window. In the (very limited) testing I performed, this seemed to do a good job of eliminating duplicates with good performance. Of course it won't guarantee that all duplicates are eliminated, but I agree with Andy above that this is a reasonable trade-off. I considered using DistinctDataNet as well, which in my understanding would eliminate all duplicates, but it would be a lot more costly in terms of resources (disk space and IO) for queries with large result sets.

I could do the same for tdbquery (and/or the sparql command line tool) if desired. Probably Fuseki as well, though I'm not very familiar with its internals.

> Add streaming CONSTRUCT results to Fuseki
> -----------------------------------------
>
>                 Key: JENA-329
>                 URL: https://issues.apache.org/jira/browse/JENA-329
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: Fuseki
>            Reporter: Stephen Allen
>
> As a result of JENA-205, streaming results are now available for CONSTRUCT queries.  However there can be duplicate triples in the iterator.  This task is to allow Fuseki to stream back results, while at the same time performing a distinct operation.
> The fix would be to modify SPARQL_Query to use QueryExecution.execConstructTriples() and filter the results through a DistinctDataNet<Triple> as they are being streamed back to the client.
> This also requires RDFWriter implementations that can accept Iterator<Triple> instead of Model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)