You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by anand Suvernkar <an...@gmail.com> on 2013/06/10 22:14:20 UTC
Design question on parallelism in Query Execution Engine
Hi
I am evaluating Apache Jena for SPARQL query execution.
My question is about the ability of this engine to achieve parallelism for
a distributed RDF store.
Consider a SPARQL query that requires level order traversal on a graph. A
serial execution of the query to get friends of friends, will sequentially
fetch friends for each friend of given person.
Since the store is distributed, this query can be easily palatalized to get
friends for each successive friend in parallel. This can be quite good
improvement.
However, this does require support for asynchronous graph operations ,
both in the engine and at the graph layer.
So is this possible or it is possible to do it in some way ?
Anand
Re: Design question on parallelism in Query Execution Engine
Posted by Andy Seaborne <an...@apache.org>.
On 10/06/13 21:14, anand Suvernkar wrote:
> Hi
> I am evaluating Apache Jena for SPARQL query execution.
> My question is about the ability of this engine to achieve parallelism for
> a distributed RDF store.
>
> Consider a SPARQL query that requires level order traversal on a graph. A
> serial execution of the query to get friends of friends, will sequentially
> fetch friends for each friend of given person.
> Since the store is distributed, this query can be easily palatalized to get
> friends for each successive friend in parallel. This can be quite good
> improvement.
>
> However, this does require support for asynchronous graph operations ,
> both in the engine and at the graph layer.
>
> So is this possible or it is possible to do it in some way ?
Query execution is extensible in several ways. You can introduce new
algebra operations (ARQ has several beyond the ones defined in SPARQL)
and modify how execution happens. At the algebra level, it's nto really
graph access operations - it's much closer to the relational algebra,
and all the parallelism work on that would apply to SPARQL.
As you describe it, you're asking for a join operation - sounds like a
parallel index join.
What is the distributed RDF store architecture you had in mind? How is
data distributed and how does the system know where to look?
Are you considering moving computation as well?
Andy
http://jena.apache.org/documentation/query/arq-query-eval.html