You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Dick Murray <da...@gmail.com> on 2018/06/01 10:28:20 UTC
Re: Multiple Fuseki Servers in Distributed Environment

Apologies for resurrecting this thread...

Yes, it uses Thrift when distributed, ie multi JVM.

It was on hold because I changed jobs, yay!

I'm starting to look at making it available as a Jena side car, ie
jena-mosaic.

DickM

On 27 May 2018 at 12:02, ajs6f <aj...@apache.org> wrote:

> There are several systems that distribute SPARQL using Jena.
>
> Dick Murray has written a system called Mosaic that (I believe) uses
> Apache Thrift to distribute the lower-level (DatasetGraph) primitives that
> ARQ uses to execute SPARQL. An advantage over your plan might be that he
> isn't serializing full results over HTTP to pass them around. I don't
> understand that system to be ready for use outside of Dick's deployment,
> but he could say more.
>
> The SANSA project [1] has provided a system that I understand to use ARQ
> to execute queries over Apache Spark or Apache Flink. This sounds similar
> in some ways to what you are doing, and that system is available today. I
> think Jena committer Lorenz Bühmann is involved with that project; if I am
> correct, he may be able to say more.
>
> There are doubtless others about which I don't know.
>
> ajs6f
>
> [1] http://sansa-stack.net/
>
> > On May 26, 2018, at 5:47 AM, Mirko Kämpf <mi...@gmail.com> wrote:
> >
> > Hello Fuseki experts,
> >
> > I want to ask you for your experience / thoughts about the following
> > approach:
> >
> >
> >
> > In order to enable semantic queries over "trancient data" or on data
> which
> > is persisted in HDFS / HBase I
> > execute a Fuseki Server (standalone or embedded) on each cluster node,
> > which hosts a Spark Executor.
> >
> > Since the data is partitioned I will not have references between the
> > datasets (in this particular case).
> >
> > A simple query broker allows distributing the query and consolidation of
> > results. Next thing would be adding
> > a coordinator with graph statistics for optimization of data set dumps
> and
> > reloading in case of failure.
> >
> > A load balancer is used to balance request and result flows towards
> > clients, eventually, the query broker will run in Docker.
> >
> > A sketch is available here:
> > https://raw.githubusercontent.com/kamir/fuseki-cloud/master/
> > Fuseki%20Cloud.png
> >
> >
> >
> > My initial prototype works well. Now I want go deeper. But I wonder, if
> > such an activity has already been started or if
> > you know reasons, why this is not a good approach.
> >
> > In any case, if there is no reason for not implementing such a
> > "Fuseki-Cloud" approach - I continue on that route and
> > I want to contribute the results to the existing project.
> >
> > Thanks for any hint or recommendation.
> >
> > Best wishes,
> > Mirko
>
>