You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Desteny Child <my...@gmail.com> on 2016/07/03 18:09:07 UTC

Lucene cluster with NFS or synchronization tool such as rsync

I need to organize a cluster for my stateless application based on Lucene
5.2.1. Right now I'm looking for a solution in order to share Lucene index
via NFS or rsync between different Lucene nodes.

Is it a good idea to use NFS for this purpose and if so will it be possible
to read/write from different nodes to the same shared index ?

Also I read that rsync tool can be used for this purpose(in order to
synchronize index files across all nodes) but I can't find any success
story for using rsync + Lucene. Right now I have a lot of question, one of
them - is it safe to use rsync at anytime especially when IndexWriter is in
progress(not closed) and actively indexes documents.

Re: Lucene cluster with NFS or synchronization tool such as rsync

Posted by Michael McCandless <lu...@mikemccandless.com>.
Alas, there are no more docs than the classes themselves, in the
lucene/replicator module, under the oal.replicator.nrt package.

Essentially, you create a PrimaryNOde (equivalent of IndexWriter) for
indexing documents, in a JVM on machine 1, and a ReplicaNode in a JVM on
machine 2, but you must subclass these classes to handle sending files
across the wire.

The test cases give simplistic examples (thread-per-socket-connection) of
how to do this.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 4, 2016 at 8:10 AM, Desteny Child <my...@gmail.com> wrote:

> Hi Mike,
>
> Thanks you very much for your response.
>
> I would be really grateful if you can please provide me with an information
> where I can read(may be with examples) about new near-real-time replication
> ?
>
> Thanks,
> Alex
>
> 2016-07-04 12:57 GMT+03:00 Michael McCandless <lu...@mikemccandless.com>:
>
> > NFS is dangerous if different nodes may take turns writing to the shared
> > index.
> >
> > Locking sometimes doesn't work correctly, client-side metadata caching
> > (e.g. the directory entry) can cause problems, NFS doesn't support
> "delete
> > on final close" semantics that Lucene relies on.
> >
> > rsync-like behavior can work with IndexWriter if you use
> > SnapshotDeletionPolicy to hold a point-in-time view of the index open for
> > copying ... this is also how to take a live backup of a still-writing
> > index, and it's how Lucene's replication module works.
> >
> > You could also try the new near-real-time replication, which copies just
> > the newly written segment files without requiring a full commit (fsync)
> on
> > the source index.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > On Sun, Jul 3, 2016 at 2:09 PM, Desteny Child <my...@gmail.com>
> wrote:
> >
> > > I need to organize a cluster for my stateless application based on
> Lucene
> > > 5.2.1. Right now I'm looking for a solution in order to share Lucene
> > index
> > > via NFS or rsync between different Lucene nodes.
> > >
> > > Is it a good idea to use NFS for this purpose and if so will it be
> > possible
> > > to read/write from different nodes to the same shared index ?
> > >
> > > Also I read that rsync tool can be used for this purpose(in order to
> > > synchronize index files across all nodes) but I can't find any success
> > > story for using rsync + Lucene. Right now I have a lot of question, one
> > of
> > > them - is it safe to use rsync at anytime especially when IndexWriter
> is
> > in
> > > progress(not closed) and actively indexes documents.
> > >
> >
>

Re: Lucene cluster with NFS or synchronization tool such as rsync

Posted by Sanne Grinovero <sa...@gmail.com>.
I had a similar need some ~5 years ago, and contributed this Lucene
extension to the Infinispan project:
 - http://infinispan.org/docs/8.2.x/user_guide/user_guide.html#_infinispan_as_a_storage_for_lucene_indexes

It since matured and is now being actively maintained by several other
people using it.

Version 8.2.1.Final of Infinispan is also ASL2 and was compiled with
Lucene 5.5.0, which is not compatible with the LockFactory API of
Lucene 5.2 so you should pick an older release or adapt the source
code.

-- Sanne


On 4 July 2016 at 13:10, Desteny Child <my...@gmail.com> wrote:
> Hi Mike,
>
> Thanks you very much for your response.
>
> I would be really grateful if you can please provide me with an information
> where I can read(may be with examples) about new near-real-time replication
> ?
>
> Thanks,
> Alex
>
> 2016-07-04 12:57 GMT+03:00 Michael McCandless <lu...@mikemccandless.com>:
>
>> NFS is dangerous if different nodes may take turns writing to the shared
>> index.
>>
>> Locking sometimes doesn't work correctly, client-side metadata caching
>> (e.g. the directory entry) can cause problems, NFS doesn't support "delete
>> on final close" semantics that Lucene relies on.
>>
>> rsync-like behavior can work with IndexWriter if you use
>> SnapshotDeletionPolicy to hold a point-in-time view of the index open for
>> copying ... this is also how to take a live backup of a still-writing
>> index, and it's how Lucene's replication module works.
>>
>> You could also try the new near-real-time replication, which copies just
>> the newly written segment files without requiring a full commit (fsync) on
>> the source index.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Sun, Jul 3, 2016 at 2:09 PM, Desteny Child <my...@gmail.com> wrote:
>>
>> > I need to organize a cluster for my stateless application based on Lucene
>> > 5.2.1. Right now I'm looking for a solution in order to share Lucene
>> index
>> > via NFS or rsync between different Lucene nodes.
>> >
>> > Is it a good idea to use NFS for this purpose and if so will it be
>> possible
>> > to read/write from different nodes to the same shared index ?
>> >
>> > Also I read that rsync tool can be used for this purpose(in order to
>> > synchronize index files across all nodes) but I can't find any success
>> > story for using rsync + Lucene. Right now I have a lot of question, one
>> of
>> > them - is it safe to use rsync at anytime especially when IndexWriter is
>> in
>> > progress(not closed) and actively indexes documents.
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene cluster with NFS or synchronization tool such as rsync

Posted by Desteny Child <my...@gmail.com>.
Hi Mike,

Thanks you very much for your response.

I would be really grateful if you can please provide me with an information
where I can read(may be with examples) about new near-real-time replication
?

Thanks,
Alex

2016-07-04 12:57 GMT+03:00 Michael McCandless <lu...@mikemccandless.com>:

> NFS is dangerous if different nodes may take turns writing to the shared
> index.
>
> Locking sometimes doesn't work correctly, client-side metadata caching
> (e.g. the directory entry) can cause problems, NFS doesn't support "delete
> on final close" semantics that Lucene relies on.
>
> rsync-like behavior can work with IndexWriter if you use
> SnapshotDeletionPolicy to hold a point-in-time view of the index open for
> copying ... this is also how to take a live backup of a still-writing
> index, and it's how Lucene's replication module works.
>
> You could also try the new near-real-time replication, which copies just
> the newly written segment files without requiring a full commit (fsync) on
> the source index.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, Jul 3, 2016 at 2:09 PM, Desteny Child <my...@gmail.com> wrote:
>
> > I need to organize a cluster for my stateless application based on Lucene
> > 5.2.1. Right now I'm looking for a solution in order to share Lucene
> index
> > via NFS or rsync between different Lucene nodes.
> >
> > Is it a good idea to use NFS for this purpose and if so will it be
> possible
> > to read/write from different nodes to the same shared index ?
> >
> > Also I read that rsync tool can be used for this purpose(in order to
> > synchronize index files across all nodes) but I can't find any success
> > story for using rsync + Lucene. Right now I have a lot of question, one
> of
> > them - is it safe to use rsync at anytime especially when IndexWriter is
> in
> > progress(not closed) and actively indexes documents.
> >
>

Re: Lucene cluster with NFS or synchronization tool such as rsync

Posted by Michael McCandless <lu...@mikemccandless.com>.
NFS is dangerous if different nodes may take turns writing to the shared
index.

Locking sometimes doesn't work correctly, client-side metadata caching
(e.g. the directory entry) can cause problems, NFS doesn't support "delete
on final close" semantics that Lucene relies on.

rsync-like behavior can work with IndexWriter if you use
SnapshotDeletionPolicy to hold a point-in-time view of the index open for
copying ... this is also how to take a live backup of a still-writing
index, and it's how Lucene's replication module works.

You could also try the new near-real-time replication, which copies just
the newly written segment files without requiring a full commit (fsync) on
the source index.

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 3, 2016 at 2:09 PM, Desteny Child <my...@gmail.com> wrote:

> I need to organize a cluster for my stateless application based on Lucene
> 5.2.1. Right now I'm looking for a solution in order to share Lucene index
> via NFS or rsync between different Lucene nodes.
>
> Is it a good idea to use NFS for this purpose and if so will it be possible
> to read/write from different nodes to the same shared index ?
>
> Also I read that rsync tool can be used for this purpose(in order to
> synchronize index files across all nodes) but I can't find any success
> story for using rsync + Lucene. Right now I have a lot of question, one of
> them - is it safe to use rsync at anytime especially when IndexWriter is in
> progress(not closed) and actively indexes documents.
>

Re: Lucene cluster with NFS or synchronization tool such as rsync

Posted by Evert Wagenaar <ev...@gmail.com>.
Hello,

I know that Nutch does it this way, already for years (Since version 1.0).
If you want to know exactly how it's done you should look into Nutch's
Source Code. The latest version of Nutch actually uses SolrCloud, therefore
I would look into earlier versions of Nutch, say 1.xx. Don't let Map reduce
scare you off. It's just the way how Nutch works with the Nutch Distributed
File System.

Good luck,

Evert Wagenaar.

Op zondag 3 juli 2016 heeft Desteny Child <my...@gmail.com> het
volgende geschreven:

> I need to organize a cluster for my stateless application based on Lucene
> 5.2.1. Right now I'm looking for a solution in order to share Lucene index
> via NFS or rsync between different Lucene nodes.
>
> Is it a good idea to use NFS for this purpose and if so will it be possible
> to read/write from different nodes to the same shared index ?
>
> Also I read that rsync tool can be used for this purpose(in order to
> synchronize index files across all nodes) but I can't find any success
> story for using rsync + Lucene. Right now I have a lot of question, one of
> them - is it safe to use rsync at anytime especially when IndexWriter is in
> progress(not closed) and actively indexes documents.
>


-- 
Sent from Gmail IPad