You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Edward Capriolo <ed...@gmail.com> on 2008/10/16 18:08:16 UTC

xindice session replication questions

Hello list,

I had setup xindice in the past to learn XPATH etc. It is very cool.

I am working on a project http://www.jointhegrid.com/jtgweb/.
Previously I used java.beans.xmlencoder and xmldecoder to persist xml
data to a directory. Now, I am moving to a multi-node implementation.
So I have an issue of data replication. My goal is to have the data on
all the nodes, and have the data scale out automatically if I moved
from two nodes to three.

I do not want to use shared storage or NFS.

This morning I came up with an idea:
1) Use tomcat with session/application clustering anything in memory
is shared on all the nodes.

I remember from using xindice it persists the data to the webapps
directory. Can I run xindici in memory? would the data be
serializable.

If that is possible xindici can be used as a replicated xml database. Any ideas?

Re: xindice session replication questions

Posted by Edward Capriolo <ed...@gmail.com>.
The idea to do replication layer on top makes good sense. My only
issue with it is catching up a host that is not in sync with other
other hosts.

As for the tomcat session replication, I believe the entire context is
replicated, if xindice was stored in memory in the context I believe
it would be replicated.

As to concurrent modification. I plan on writing on a single node. I
do not need transactional replication. I just need a reasonably in
sync copy to be on all the nodes. Lets say I want optimistic
replication and locking. The application is not write intensive.

So right now jgroups + xindici in memory looks good. Thanks.

Re: xindice session replication questions

Posted by Vadim Gritsenko <va...@reverycodes.com>.
On Oct 17, 2008, at 11:06 AM, Edward Capriolo wrote:

> Very interesting. If those classes implement serialization

Even if they don't (I don't remember one way or another), it would be  
trivial to implement it.


> they can be used with tomcat session replication?

I'm not sure that I see how *session* replication can be used to  
replicate a *shared* (across multiple users) resource.

Also, consider that database can be modified on different nodes at  
once by different web site visitors. Say, one user will add a document  
and another user will delete some other document. After these changes  
are made, how you are going to synchronize state of these databases?

That's why I suggested building a replication layer on top. Instead of  
trying to replicate complete database (which will be costly if  
database size is anything but trivial), you'd replicate only  
modifications.


> I am also looking into
> http://www.jgroups.org/. My goal is to have some type of database that
> can scale on the fly.

Yes, jgroups can be employed to build communication for the  
replication layer.

Vadim


Re: xindice session replication questions

Posted by Edward Capriolo <ed...@gmail.com>.
Very interesting. If those classes implement serialization they can be
used with tomcat session replication? I am also looking into
http://www.jgroups.org/. My goal is to have some type of database that
can scale on the fly.

Maybe MemFiler and jgroups can accomplish this.

Re: xindice session replication questions

Posted by Natalia Shilenkova <ns...@gmail.com>.
On Thu, Oct 16, 2008 at 8:49 PM, Vadim Gritsenko <va...@reverycodes.com> wrote:
> On Oct 16, 2008, at 12:08 PM, Edward Capriolo wrote:

<snip/>

>> Can I run xindici in memory? would the data be serializable.
>
> Xindice as-is is capable of using only filesystem based storage. Having said
> that, it is possible to implement completely in-memory database, you would
> have to replace file based implementation with a memory based one.

Actually, Xindice can run in-memory collections, it already has
MemFiler to accomplish this. There is also MemValueIndexer, so it is
possible to build an index for in-memory collection.

<snip/>

Natalia

Re: xindice session replication questions

Posted by Vadim Gritsenko <va...@reverycodes.com>.
On Oct 16, 2008, at 12:08 PM, Edward Capriolo wrote:

> Hello list,
>
> I had setup xindice in the past to learn XPATH etc. It is very cool.
>
> I am working on a project http://www.jointhegrid.com/jtgweb/.
> Previously I used java.beans.xmlencoder and xmldecoder to persist xml
> data to a directory. Now, I am moving to a multi-node implementation.
> So I have an issue of data replication. My goal is to have the data on
> all the nodes, and have the data scale out automatically if I moved
> from two nodes to three.
>
> I do not want to use shared storage or NFS.

Even if you were to use shared file system, it still will not be  
possible to run multiple xindice database instances against same set  
of shared files. Database files can be opened - and afterwards are  
managed - by single database instance. Any attempt to run two database  
instances against same database file will result in data corruption.


> This morning I came up with an idea:
> 1) Use tomcat with session/application clustering anything in memory
> is shared on all the nodes.
>
> I remember from using xindice it persists the data to the webapps
> directory.

Database files location can be configured using configuration file. By  
default, files indeed are placed under webapp's WEB-INF directory.


> Can I run xindici in memory? would the data be serializable.

Xindice as-is is capable of using only filesystem based storage.  
Having said that, it is possible to implement completely in-memory  
database, you would have to replace file based implementation with a  
memory based one.


> If that is possible xindici can be used as a replicated xml  
> database. Any ideas?

Client code which talks to the xindice database can be clustered, and  
can talk to single database instance residing on single back end  
server. That would be the easiest, and most common approach.

Another idea, if you insist on running multiple database instances, is  
to build a data replication layer on top. Every change to any of  
database instances will have to be broadcasted to database cluster and  
replayed in all database instances.


Vadim