You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by anand Suvernkar <an...@gmail.com> on 2013/06/10 22:14:20 UTC

Design question on parallelism in Query Execution Engine

Hi
  I am evaluating Apache Jena for SPARQL query execution.
My question is about the ability of this engine to achieve parallelism for
a distributed RDF store.

Consider a SPARQL query that requires level order traversal on a graph.  A
serial execution of the query to get friends of friends, will sequentially
fetch friends for each friend of given person.
Since the store is distributed, this query can be easily palatalized to get
friends for each successive friend in parallel. This can be quite good
improvement.

However, this does require support for asynchronous graph operations ,
 both in the engine and at the graph layer.

So is this possible or it is possible to do it in some way ?

Anand

Re: Design question on parallelism in Query Execution Engine

Posted by Andy Seaborne <an...@apache.org>.

On 10/06/13 21:14, anand Suvernkar wrote:
> Hi
>    I am evaluating Apache Jena for SPARQL query execution.
> My question is about the ability of this engine to achieve parallelism for
> a distributed RDF store.
>
> Consider a SPARQL query that requires level order traversal on a graph.  A
> serial execution of the query to get friends of friends, will sequentially
> fetch friends for each friend of given person.
> Since the store is distributed, this query can be easily palatalized to get
> friends for each successive friend in parallel. This can be quite good
> improvement.
>
> However, this does require support for asynchronous graph operations ,
>   both in the engine and at the graph layer.
>
> So is this possible or it is possible to do it in some way ?

Query execution is extensible in several ways.  You can introduce new 
algebra operations (ARQ has several beyond the ones defined in SPARQL) 
and modify how execution happens.  At the algebra level, it's nto really 
graph access operations - it's much closer to the relational algebra, 
and all the parallelism work on that would apply to SPARQL.

As you describe it, you're asking for a join operation - sounds like a 
parallel index join.

What is the distributed RDF store architecture you had in mind?  How is 
data distributed and how does the system know where to look?

Are you considering moving computation as well?

	Andy

http://jena.apache.org/documentation/query/arq-query-eval.html