You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rob Outar <ro...@ideorlando.org> on 2002/11/01 15:05:58 UTC

Working with a Distributed System

All,

	I have what I think is an interesting problem.  I am working on a
distributed system where all repositories on each node have to be kept in
sync.  I am using Lucene on each node to index the data.  Users are allowed
to associate Fields with files, set values of existing fields, these fields
then have be also be associated with the same document on other nodes.  I am
using broadcast events to update the other nodes.  The problem is when a new
node joins in, I am not sure how to get the changes to the various indexes
to that node.  All nodes that are running together should be in sync, but
when a new node joins it does not know about any of the changes.  The basic
problem is how do I keep the indexes the same on all of the nodes.  I though
about maybe setting up a CVS Server and storing the index in it then when a
new node joins it checks out the index but I do not know enough about the
internal of Lucene to know if that will work, I will be constantly
committing files because the index will get updated a lot on the various
nodes, also will node b's committed files overwrite node a's files which
means nodes a changes to the index will be lost... very difficult problem,
if anyone has any thoughts on this subject I would love to hear them.

Thanks,

Rob


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Working with a Distributed System

Posted by Rob Outar <ro...@ideorlando.org>.
Thank you all for replying and I will let u know how it goes.

Thanks,

Rob

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Working with a Distributed System

Posted by Otis Gospodnetic <ot...@yahoo.com>.
That sounds like a potentially nice piece of software for Lucene
Sandbox contributions area.  Thanks.

Otis

--- Paul <pa...@waite.net.nz> wrote:
> My initial reaction to the first post was to use rsync too. I was
> about
> to post that, when I read Ype's post. ;-)
> 
> Another option is to do what we're doing, and write a daemon
> which talks to Lucene on the server it runs on, and also serves
> requests coming in on a specific port. That way many clients
> can have the benefit of one index.
> 
> You are welcome to our source, once we've got it to a stage
> where we can wrap it all up nicely and Open Source it. As
> it stands it is currently working well in a beta form.
> 
> Cheers,
> Paul.
> 
> 
> Otis Gospodnetic wrote:
> > That is the approach I took at my previous job, which involved some
> > Lucene work.  I used sdist, to securely distribute the whole index
> (the
> > whole dir with index files) to a number of remote machines.
> >
> > This may not work well if indices need to constantly be in sync,
> and if
> > the index can be modified on all index nodes.
> >
> > How about using JMS and publish/subscribe with maybe time-stamped
> > messages, etc.?
> >
> > Otis
> >
> > --- Ype Kingma <yk...@xs4all.nl> wrote:
> > > On Friday 01 November 2002 15:05, Rob Outar wrote:
> > > > All,
> > > >
> > > > I have what I think is an interesting problem.  I am working on
> a
> > > > distributed system where all repositories on each node have to
> be
> > >
> > > Assuming you run Unix, you might try and use rsync.
> > > It works like cp (copy) but it takes into account what is already
> on
> > > the destination.
> > > See http://rsync.samba.org/
> > > I'd like to hear how it works for lucene indexes...
> > > Kind regards,
> > > Ype
> 
> -- 
> Morton's Law:
> 	If rats are experimented upon, they will develop cancer.
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Working with a Distributed System

Posted by Paul <pa...@waite.net.nz>.
My initial reaction to the first post was to use rsync too. I was about
to post that, when I read Ype's post. ;-)

Another option is to do what we're doing, and write a daemon
which talks to Lucene on the server it runs on, and also serves
requests coming in on a specific port. That way many clients
can have the benefit of one index.

You are welcome to our source, once we've got it to a stage
where we can wrap it all up nicely and Open Source it. As
it stands it is currently working well in a beta form.

Cheers,
Paul.


Otis Gospodnetic wrote:
> That is the approach I took at my previous job, which involved some
> Lucene work.  I used sdist, to securely distribute the whole index (the
> whole dir with index files) to a number of remote machines.
>
> This may not work well if indices need to constantly be in sync, and if
> the index can be modified on all index nodes.
>
> How about using JMS and publish/subscribe with maybe time-stamped
> messages, etc.?
>
> Otis
>
> --- Ype Kingma <yk...@xs4all.nl> wrote:
> > On Friday 01 November 2002 15:05, Rob Outar wrote:
> > > All,
> > >
> > > I have what I think is an interesting problem.  I am working on a
> > > distributed system where all repositories on each node have to be
> >
> > Assuming you run Unix, you might try and use rsync.
> > It works like cp (copy) but it takes into account what is already on
> > the destination.
> > See http://rsync.samba.org/
> > I'd like to hear how it works for lucene indexes...
> > Kind regards,
> > Ype

-- 
Morton's Law:
	If rats are experimented upon, they will develop cancer.

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Working with a Distributed System

Posted by Clemens Marschner <cm...@lanlab.de>.
> How about using JMS and publish/subscribe with maybe time-stamped
> messages, etc.?

Since Lucene is not transactional this will eventually get out of sync, I
suppose.

clemens



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Working with a Distributed System

Posted by Otis Gospodnetic <ot...@yahoo.com>.
That is the approach I took at my previous job, which involved some
Lucene work.  I used sdist, to securely distribute the whole index (the
whole dir with index files) to a number of remote machines.

This may not work well if indices need to constantly be in sync, and if
the index can be modified on all index nodes.

How about using JMS and publish/subscribe with maybe time-stamped
messages, etc.?

Otis



--- Ype Kingma <yk...@xs4all.nl> wrote:
> On Friday 01 November 2002 15:05, Rob Outar wrote:
> > All,
> >
> > 	I have what I think is an interesting problem.  I am working on a
> > distributed system where all repositories on each node have to be
> kept in
> > sync.  I am using Lucene on each node to index the data.  Users are
> allowed
> > to associate Fields with files, set values of existing fields,
> these fields
> > then have be also be associated with the same document on other
> nodes.  I
> > am using broadcast events to update the other nodes.  The problem
> is when a
> > new node joins in, I am not sure how to get the changes to the
> various
> > indexes to that node.  All nodes that are running together should
> be in
> > sync, but when a new node joins it does not know about any of the
> changes. 
> > The basic problem is how do I keep the indexes the same on all of
> the
> > nodes.  I though about maybe setting up a CVS Server and storing
> the index
> > in it then when a new node joins it checks out the index but I do
> not know
> > enough about the internal of Lucene to know if that will work, I
> will be
> > constantly committing files because the index will get updated a
> lot on the
> > various nodes, also will node b's committed files overwrite node
> a's files
> > which means nodes a changes to the index will be lost... very
> difficult
> > problem, if anyone has any thoughts on this subject I would love to
> hear
> > them.
> 
> Assuming you run Unix, you might try and use rsync.
> It works like cp (copy) but it takes into account what is already on
> the 
> destination.
> See http://rsync.samba.org/
> 
> I'd like to hear how it works for lucene indexes...
> 
> Kind regards,
> Ype
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Working with a Distributed System

Posted by Ype Kingma <yk...@xs4all.nl>.
On Friday 01 November 2002 15:05, Rob Outar wrote:
> All,
>
> 	I have what I think is an interesting problem.  I am working on a
> distributed system where all repositories on each node have to be kept in
> sync.  I am using Lucene on each node to index the data.  Users are allowed
> to associate Fields with files, set values of existing fields, these fields
> then have be also be associated with the same document on other nodes.  I
> am using broadcast events to update the other nodes.  The problem is when a
> new node joins in, I am not sure how to get the changes to the various
> indexes to that node.  All nodes that are running together should be in
> sync, but when a new node joins it does not know about any of the changes. 
> The basic problem is how do I keep the indexes the same on all of the
> nodes.  I though about maybe setting up a CVS Server and storing the index
> in it then when a new node joins it checks out the index but I do not know
> enough about the internal of Lucene to know if that will work, I will be
> constantly committing files because the index will get updated a lot on the
> various nodes, also will node b's committed files overwrite node a's files
> which means nodes a changes to the index will be lost... very difficult
> problem, if anyone has any thoughts on this subject I would love to hear
> them.

Assuming you run Unix, you might try and use rsync.
It works like cp (copy) but it takes into account what is already on the 
destination.
See http://rsync.samba.org/

I'd like to hear how it works for lucene indexes...

Kind regards,
Ype

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>