You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Claude Warren <cl...@xenei.com> on 2012/01/11 16:28:53 UTC

Polling iterator?

Greetings,

I am looking at combining multiple remote triple stores into a single graph
using Jena.

Assume that I just create a simple Graph implementation that makes requests
of the remote systems and combine those with a Jena Polyadic graph
implementation.

When the Polyadic.find( s, p o ) is called each of the sub graphs will
construct an iterator and return it which will be combined by the Polyadic
graph to appear as a single iterator.

When thinking about this problem it seems to me that the speed of the
iterator is limited by the speed that the remote systems can respond and
that no effort is made to interweave the subgraph iterator results.

To resolve this I am thinking that a "polling iterator" might make sense.
 The polling iterator would add a "pollNext()" method that would return a
Boolean, true=there is a next, false = there is not a next, null = no data
yet.

the Polyadic graph would then return an iterator that polls each of the sub
graphs to find one that has a next thus the faster subgraphs would not be
blocked by the slower ones.  I think that overall performance might be
improved.  However before I spend much time working on this solution I
wanted to know if anyone else has thought about this solution and perhaps
might have an implementation along these lines.

Many thanks,
Claude

-- 
Identity: https://www.identify.nu/user.php?claude@xenei.com
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Polling iterator?

Posted by Andy Seaborne <an...@apache.org>.
On 11/01/12 15:28, Claude Warren wrote:
> Greetings,
>
> I am looking at combining multiple remote triple stores into a single graph
> using Jena.
>
> Assume that I just create a simple Graph implementation that makes requests
> of the remote systems and combine those with a Jena Polyadic graph
> implementation.
>
> When the Polyadic.find( s, p o ) is called each of the sub graphs will
> construct an iterator and return it which will be combined by the Polyadic
> graph to appear as a single iterator.
>
> When thinking about this problem it seems to me that the speed of the
> iterator is limited by the speed that the remote systems can respond and
> that no effort is made to interweave the subgraph iterator results.
>
> To resolve this I am thinking that a "polling iterator" might make sense.
>   The polling iterator would add a "pollNext()" method that would return a
> Boolean, true=there is a next, false = there is not a next, null = no data
> yet.
>
> the Polyadic graph would then return an iterator that polls each of the sub
> graphs to find one that has a next thus the faster subgraphs would not be
> blocked by the slower ones.  I think that overall performance might be
> improved.  However before I spend much time working on this solution I
> wanted to know if anyone else has thought about this solution and perhaps
> might have an implementation along these lines.
>
> Many thanks,
> Claude
>

Claude, apologies for the delay in replying,

A polyadic graph should work but it will be sending individual triple 
patterns to the remote store.   This may be what you, it may not.  It 
can be expensive.

Another way is to use SPARQL SERVICE.

{
...
    { SERVICE <graph1> { pattern } }
    UNION
    { SERVICE <graph2> { pattern } }
    UNION
    { SERVICE <graph3> { pattern } }
...
}

which is the pattern executed at each site (but not across them).

For the polling iterator, have you considered threading?  You could kick 
off a thread per operation and the results put on a ArrayBlockingQueue 
(fixed length - stops flooding).

	Andy