You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@clerezza.apache.org by Minto van der Sluis <mi...@xup.nl> on 2013/07/19 15:09:19 UTC

Clustering

Hi Folks,

Next issue I have to tackle on my assignment is Clustering. Basically I
have 2 issues right now:

1) Running my assembly clustered
This seems to be the easy one, since I am using Apache Karaf for
building my assembly/distribution. Karaf has a subproject called Cellar
for clustering. Here just have to dive in a get it done.

2) Having a cluster aware datastore (Jena TDB)
Googling around leaves me clueless. Sometimes it appears TDB can be used
in clustered environments others it appear it can't. I sure couldn't
find any recipe how to get it done. I should probably ask this on the
Jena mailing list, but I am trying this one first since you guys seem to
have quite some experience with TDB.

I am wondering if I can really use TDB in a clustered setup or if need
to switch to another datastore to accomplish this. Probably SDB can be
used for this but I fear a performance penalty.

Any thoughts?

Regards,

Minto

Re: Clustering

Posted by Andy Seaborne <an...@apache.org>.

On 22/07/13 09:23, Minto van der Sluis wrote:
> Hi Andy,
>
> See below
>
> Regards,
>
> Minto
>
> Op 21-7-2013 19:06, Andy Seaborne schreef:
>>
>> Minto,
>>
>> Do you want a shared database across the cluster (3 tier architecture
>> style)?  That's what SDB woulD give you.
> Yes, using a separate DB layer which most probably should be clustered
> itself. Preferred database is postgres. As far as I understand this
> would not be a problem for SDB. But somehow I fear a preformance drop
> when switching to SDB. Is this justified?

Yes.

Laying over SQL/JDBC and preserving SPARQL (specifically XSD value) 
semantics means a lot of trips to the database for a single query unless 
it is a single graph pattern or graph pattern and OPTIONAL (no FILTERS).

The only SQL-level engines left at scale these days put SPARQL inside 
the SQL engine, not layered on top.

>>
>> Fuseki does this for TDB.  It can be used as a shared DB server.  TDB
>> is just the database engine.
>>
>> We/Epimorphics run replicated Fusek's as well but update is not
>> instantly consistent (we don't need that for our usage).
>>
>> If the interface in Clerezza were the SPARQL protocol, then choice of
>> the backend database is a deployment choice.
> Is there a Clerezza TcProvider that works straight on the SPARQL
> protocol? Most I have seen (TDB and Sesame) use native API's. The
> Virtuoso TcProvider mentioned by Reto almost seem to work like this, but
> is dedicated to Virtuoso and uses the jdbc route sending SPARQL requests.
>>
>> Is there a SPARQL protocol provider?
> Currently I have no SPARQL protocol provider. I use the Clerezza
> TcProvider specifically the TDB implementation.
>>
>>      Andy
>>
>

Re: Clustering

Posted by Minto van der Sluis <mi...@xup.nl>.

Hi Andy,

See below

Regards,

Minto

Op 21-7-2013 19:06, Andy Seaborne schreef:
>
> Minto,
>
> Do you want a shared database across the cluster (3 tier architecture
> style)?  That's what SDB woulD give you.
Yes, using a separate DB layer which most probably should be clustered
itself. Preferred database is postgres. As far as I understand this
would not be a problem for SDB. But somehow I fear a preformance drop
when switching to SDB. Is this justified?
>
> Fuseki does this for TDB.  It can be used as a shared DB server.  TDB
> is just the database engine.
>
> We/Epimorphics run replicated Fusek's as well but update is not
> instantly consistent (we don't need that for our usage).
>
> If the interface in Clerezza were the SPARQL protocol, then choice of
> the backend database is a deployment choice.
Is there a Clerezza TcProvider that works straight on the SPARQL
protocol? Most I have seen (TDB and Sesame) use native API's. The
Virtuoso TcProvider mentioned by Reto almost seem to work like this, but
is dedicated to Virtuoso and uses the jdbc route sending SPARQL requests.
>
> Is there a SPARQL protocol provider?
Currently I have no SPARQL protocol provider. I use the Clerezza
TcProvider specifically the TDB implementation.
>
>     Andy
>

-- 
ir. ing. Minto van der Sluis
Software innovator / renovator
Xup BV

Mobiel: +31 (0) 626 014541

Re: Clustering

Posted by Andy Seaborne <an...@apache.org>.

On 19/07/13 14:09, Minto van der Sluis wrote:
> Hi Folks,
>
> Next issue I have to tackle on my assignment is Clustering. Basically I
> have 2 issues right now:
>
> 1) Running my assembly clustered
> This seems to be the easy one, since I am using Apache Karaf for
> building my assembly/distribution. Karaf has a subproject called Cellar
> for clustering. Here just have to dive in a get it done.
>
> 2) Having a cluster aware datastore (Jena TDB)
> Googling around leaves me clueless. Sometimes it appears TDB can be used
> in clustered environments others it appear it can't. I sure couldn't
> find any recipe how to get it done. I should probably ask this on the
> Jena mailing list, but I am trying this one first since you guys seem to
> have quite some experience with TDB.
>
> I am wondering if I can really use TDB in a clustered setup or if need
> to switch to another datastore to accomplish this. Probably SDB can be
> used for this but I fear a performance penalty.
>
> Any thoughts?
>
> Regards,
>
> Minto
>

Minto,

Do you want a shared database across the cluster (3 tier architecture 
style)?  That's what SDB woulD give you.

Fuseki does this for TDB.  It can be used as a shared DB server.  TDB is 
just the database engine.

We/Epimorphics run replicated Fusek's as well but update is not 
instantly consistent (we don't need that for our usage).

If the interface in Clerezza were the SPARQL protocol, then choice of 
the backend database is a deployment choice.

Is there a SPARQL protocol provider?

	Andy

Re: Clustering

Posted by Minto van der Sluis <mi...@xup.nl>.

Hi Reto,

See below.

Regards,

Minto

Op 19-7-2013 20:30, Reto Bachmann-Gmür schreef:
> Hi Minto
>
> Really curios to learn if things work with cellar.
Will keep you posted about my progress on this topic. Also I repeatedly
asked for open sourcing the stuff I worked on. Hopefully some day soon I
can show you the details.
>
> I didn't know about the possibility to use cluster a TDB store. But a
> 2008 paper seems to be doing (or investigating) exactly this.
>
> For web applications with many reads and few writes I would suggest
> adding some clustering at the HTTP level. With GET requests being
> forwarded to any host in a pool and POST/PUT being forwarded to all
> hosts.
>
> As for clustering triple stores it seems virtuoso supports this. And
> for virtuoso there is Enrico Daga's clerezza binding (to which
> fastlane should probably be added).
This one is very interesting. I will have a closer look at it.
>
> Cheers,
> Reto
>
> On Fri, Jul 19, 2013 at 3:09 PM, Minto van der Sluis <mi...@xup.nl> wrote:
>> Hi Folks,
>>
>> Next issue I have to tackle on my assignment is Clustering. Basically I
>> have 2 issues right now:
>>
>> 1) Running my assembly clustered
>> This seems to be the easy one, since I am using Apache Karaf for
>> building my assembly/distribution. Karaf has a subproject called Cellar
>> for clustering. Here just have to dive in a get it done.
>>
>> 2) Having a cluster aware datastore (Jena TDB)
>> Googling around leaves me clueless. Sometimes it appears TDB can be used
>> in clustered environments others it appear it can't. I sure couldn't
>> find any recipe how to get it done. I should probably ask this on the
>> Jena mailing list, but I am trying this one first since you guys seem to
>> have quite some experience with TDB.
>>
>> I am wondering if I can really use TDB in a clustered setup or if need
>> to switch to another datastore to accomplish this. Probably SDB can be
>> used for this but I fear a performance penalty.
>>
>> Any thoughts?
>>
>> Regards,
>>
>> Minto
>


-- 
ir. ing. Minto van der Sluis
Software innovator / renovator
Xup BV

Mobiel: +31 (0) 626 014541

Re: Clustering

Posted by Reto Bachmann-Gmür <re...@wymiwyg.com>.

Hi Minto

Really curios to learn if things work with cellar.

I didn't know about the possibility to use cluster a TDB store. But a
2008 paper seems to be doing (or investigating) exactly this.

For web applications with many reads and few writes I would suggest
adding some clustering at the HTTP level. With GET requests being
forwarded to any host in a pool and POST/PUT being forwarded to all
hosts.

As for clustering triple stores it seems virtuoso supports this. And
for virtuoso there is Enrico Daga's clerezza binding (to which
fastlane should probably be added).

Cheers,
Reto

On Fri, Jul 19, 2013 at 3:09 PM, Minto van der Sluis <mi...@xup.nl> wrote:
> Hi Folks,
>
> Next issue I have to tackle on my assignment is Clustering. Basically I
> have 2 issues right now:
>
> 1) Running my assembly clustered
> This seems to be the easy one, since I am using Apache Karaf for
> building my assembly/distribution. Karaf has a subproject called Cellar
> for clustering. Here just have to dive in a get it done.
>
> 2) Having a cluster aware datastore (Jena TDB)
> Googling around leaves me clueless. Sometimes it appears TDB can be used
> in clustered environments others it appear it can't. I sure couldn't
> find any recipe how to get it done. I should probably ask this on the
> Jena mailing list, but I am trying this one first since you guys seem to
> have quite some experience with TDB.
>
> I am wondering if I can really use TDB in a clustered setup or if need
> to switch to another datastore to accomplish this. Probably SDB can be
> used for this but I fear a performance penalty.
>
> Any thoughts?
>
> Regards,
>
> Minto