You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Jürgen Jakobitsch <ja...@punkt.at> on 2011/04/06 21:22:18 UTC

TomcatCluster data replication

hi,

i'm in need of data replication in a tomcat-cluster.
i set up a tomcat cluster of three tomcats on a single machine with a apache (mod_jk) front that does the load balacing.
everything works absolutely charming for reading requests, my trouble start with data input.

what i'm trying to achieve is that if i submit data with a html form, the storage on all cluster members needs to be updated.
i'm using an openrdf's sesame triple store which locks it's data directory so i can't simply use a single shared directory
in my application.

what i have in mind, after first readings, is some sort of clustervalve that checks, if a request is a POST request and if
yes, sends this request (which updates the repository in the back) to all members of the cluster.

so here would be my questions :

1. is there a standard way of doing something like (which a not-clusterable data-backend)
2. is the thing with the clustervalve in fact the correct starting point

any help or pointer to the right direction greatly appreciated

wkr turnguard.com/turnguard

-- 
punkt. netServices
______________________________
Jürgen Jakobitsch
Codeography

Lerchenfelder Gürtel 43 Top 5/2
A - 1160 Wien
Tel.: 01 / 897 41 22 - 29
Fax: 01 / 897 41 22 - 22

netServices http://www.punkt.at


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: TomcatCluster data replication

Posted by Filip Hanik - Dev Lists <de...@hanik.com>.
On 4/6/2011 1:22 PM, Jürgen Jakobitsch wrote:
> hi,
>
> i'm in need of data replication in a tomcat-cluster.
> i set up a tomcat cluster of three tomcats on a single machine with a apache (mod_jk) front that does the load balacing.
> everything works absolutely charming for reading requests, my trouble start with data input.
>
> what i'm trying to achieve is that if i submit data with a html form, the storage on all cluster members needs to be updated.
> i'm using an openrdf's sesame triple store which locks it's data directory so i can't simply use a single shared directory
> in my application.

sounds like a limitation of sesame. Use some other noSQL data store and you wont have this issue

best
Filip

> what i have in mind, after first readings, is some sort of clustervalve that checks, if a request is a POST request and if
> yes, sends this request (which updates the repository in the back) to all members of the cluster.
>
> so here would be my questions :
>
> 1. is there a standard way of doing something like (which a not-clusterable data-backend)
> 2. is the thing with the clustervalve in fact the correct starting point
>
> any help or pointer to the right direction greatly appreciated
>
> wkr turnguard.com/turnguard
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: TomcatCluster data replication

Posted by Thomas Strauß <t....@srs-management.de>.
Am 06.04.2011 um 22:35 schrieb André Warnier:

> Jürgen Jakobitsch wrote:
> ...
>> 
>> image you have a simple text file in the WEB-INF directory of a webapp named ClusterApp. this ClusterApp is deployed 
>> on three tomcats in a cluster. now comes a POST request, that updates the text file (adds one line to it).
>> now of course i need to synchronize the text file on all tomcats in the cluster.
>> 
> Ok, let's imagine there are initially 3 identical simple text files, on each of the 3 tomcats.
> And there are 2 clients accessing the load balancer.
> In order to determine if they need to update the text file, the clients first request the 
> text file to examine it.  Their requests go to 2 different tomcats via the load-balancer.
> But it does not matter, since they both get the same response text file, since it is 
> identical.
> Now client A decides to update the file by adding a line XXX to it.
> And client B decides to update the file by adding a line YYY to it.
> They both POST their request at about the same time to the front-end, and the front-end 
> (or whatever replication mechanism) sends each request to all 3 back-end tomcats.
> When the 2 POST requests have been processed, what is the state of the 3 text files ?
> 

I would say this is a classical case for either centralized datastore or distributed transaction manager.

To solve the issue with the existing setup, I would possibly serialize the write request into a message queue that has one subscriber per cluster member. Only the subscriber thread is allowed to write into the file.

A little bit tuning the queue setup will provide you with a fail safe system, were a crashing cluster member will recover and continue on his copies of the write requests. 

Overall the files should be reasonable equal. If you need realtime updates of all cluster members, the datastore you have chosen IMHO sucks  :-) I suppose you would need to add a kind of distributed transaction then. This could be reached if you send a commit message to all cluster members when you have successfully written your data. For each dataset, you expect a commit message from all others before you serve it to clients again... sounds a little bit like reinventing the wheel.


Regards,
 Thomas

> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: TomcatCluster data replication

Posted by André Warnier <aw...@ice-sa.com>.
Jürgen Jakobitsch wrote:
...
> 
> image you have a simple text file in the WEB-INF directory of a webapp named ClusterApp. this ClusterApp is deployed 
> on three tomcats in a cluster. now comes a POST request, that updates the text file (adds one line to it).
> now of course i need to synchronize the text file on all tomcats in the cluster.
> 
Ok, let's imagine there are initially 3 identical simple text files, on each of the 3 tomcats.
And there are 2 clients accessing the load balancer.
In order to determine if they need to update the text file, the clients first request the 
text file to examine it.  Their requests go to 2 different tomcats via the load-balancer.
But it does not matter, since they both get the same response text file, since it is 
identical.
Now client A decides to update the file by adding a line XXX to it.
And client B decides to update the file by adding a line YYY to it.
They both POST their request at about the same time to the front-end, and the front-end 
(or whatever replication mechanism) sends each request to all 3 back-end tomcats.
When the 2 POST requests have been processed, what is the state of the 3 text files ?




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: TomcatCluster data replication

Posted by Jürgen Jakobitsch <ja...@punkt.at>.
hi, thanks for your input..

1. switching that backend is apparently not an option, i wouldn't have asked with respect to a non-clusterable data-backend
2. it wouldn't be that two request update one piece of data, but it would be that the first cluster member that receives 
   a POST request, posts that request also to other members, these then simply handle this POST request. Since every 
   application has it's own datadirectory every member would write into it's own datadirectory, that's why the requests
   need to be forwarded to all members of the cluster.
3. these three tomcats on one machine are for testing purposes only - real world would go on different physical machines.

image you have a simple text file in the WEB-INF directory of a webapp named ClusterApp. this ClusterApp is deployed 
on three tomcats in a cluster. now comes a POST request, that updates the text file (adds one line to it).
now of course i need to synchronize the text file on all tomcats in the cluster.

in my opinion there are only a few options to achieve this :
1. rsync the file, which is kind of hard, since i have a load balancer and don't know exactly which member answers the request, there are 
   to many insecurities
2. check all incoming requests for HTTP POST, if the request is a POST the send it simply to all members of the cluster.


honestly i can hardly imagine that i'm the first to come across this usecase...


any help really appreciated..
wkr turnguard.com/turnguard


----- Original Message -----
From: "André Warnier" <aw...@ice-sa.com>
To: "Tomcat Users List" <us...@tomcat.apache.org>
Sent: Wednesday, April 6, 2011 9:43:02 PM
Subject: Re: TomcatCluster data replication

Jürgen Jakobitsch wrote:
> hi,
> 
> i'm in need of data replication in a tomcat-cluster.
> i set up a tomcat cluster of three tomcats on a single machine with a apache (mod_jk) front that does the load balacing.
> everything works absolutely charming for reading requests, my trouble start with data input.
> 
> what i'm trying to achieve is that if i submit data with a html form, the storage on all cluster members needs to be updated.
> i'm using an openrdf's sesame triple store which locks it's data directory so i can't simply use a single shared directory
> in my application.
> 
> what i have in mind, after first readings, is some sort of clustervalve that checks, if a request is a POST request and if
> yes, sends this request (which updates the repository in the back) to all members of the cluster.
> 
> so here would be my questions :
> 
> 1. is there a standard way of doing something like (which a not-clusterable data-backend)

No.

> 2. is the thing with the clustervalve in fact the correct starting point

Probably not.

> 
> any help or pointer to the right direction greatly appreciated
> 
I'm not saying that it would not be possible to do this.  And I have no idea what a 
"openrdf's sesame triple store" is.
But what you describe sounds more like something that should be handled at the level of 
the application which processes the POST.  It is the application which should arrange to 
update the nn back-end data stores at the same time.  Of course that introduces some 
interesting issues of locking and synchronisation, in case two quasi-simultaneous requests 
handled by two separate tomcats try to update the same piece of data in each of the 
datastores.

Now just by curiosity, what is the real-world point of this setup, considering that your 3 
tomcats are running on the same host ?
Why not have a single Tomcat with 3 times more resources, to handle all the requests ?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


-- 
punkt. netServices
______________________________
Jürgen Jakobitsch
Codeography

Lerchenfelder Gürtel 43 Top 5/2
A - 1160 Wien
Tel.: 01 / 897 41 22 - 29
Fax: 01 / 897 41 22 - 22

netServices http://www.punkt.at


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: TomcatCluster data replication

Posted by André Warnier <aw...@ice-sa.com>.
Jürgen Jakobitsch wrote:
> hi,
> 
> i'm in need of data replication in a tomcat-cluster.
> i set up a tomcat cluster of three tomcats on a single machine with a apache (mod_jk) front that does the load balacing.
> everything works absolutely charming for reading requests, my trouble start with data input.
> 
> what i'm trying to achieve is that if i submit data with a html form, the storage on all cluster members needs to be updated.
> i'm using an openrdf's sesame triple store which locks it's data directory so i can't simply use a single shared directory
> in my application.
> 
> what i have in mind, after first readings, is some sort of clustervalve that checks, if a request is a POST request and if
> yes, sends this request (which updates the repository in the back) to all members of the cluster.
> 
> so here would be my questions :
> 
> 1. is there a standard way of doing something like (which a not-clusterable data-backend)

No.

> 2. is the thing with the clustervalve in fact the correct starting point

Probably not.

> 
> any help or pointer to the right direction greatly appreciated
> 
I'm not saying that it would not be possible to do this.  And I have no idea what a 
"openrdf's sesame triple store" is.
But what you describe sounds more like something that should be handled at the level of 
the application which processes the POST.  It is the application which should arrange to 
update the nn back-end data stores at the same time.  Of course that introduces some 
interesting issues of locking and synchronisation, in case two quasi-simultaneous requests 
handled by two separate tomcats try to update the same piece of data in each of the 
datastores.

Now just by curiosity, what is the real-world point of this setup, considering that your 3 
tomcats are running on the same host ?
Why not have a single Tomcat with 3 times more resources, to handle all the requests ?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org