You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Guilherme Balena Versiani <gu...@take.io> on 2011/07/07 02:22:08 UTC

[Lucene.Net] Lucene Steroids

Hi,

I am working on a derived work of Solr for .NET. The purpose is to 
obtain a similar solution of Lucene replication available at Solr, but 
without the need to port all Solr code.

There is a SnapShooter, SnapPuller and a SnapInstaller. The SnapShooter 
does similar work as in Solr script. The SnapPuller uses cwRsync to 
replicate the database between machines, but without storing the 
snapshot.current.MACHINENAME files on master, as cwRsync does no support 
sync with the server. The SnapInstaller tries to substitute the Lucene 
database files "in-place" -- the Lucene application should use a 
"SteroidsFSDirectory" that creates a special "SteroidsFSIndexInput" that 
permits to rename files in use; after that, SnapInstaller sends a 
"commit" operation through a Windows named pipe to the application to 
reset its current IndexSearcher instance.

This solution has the "suggestive" name of Lucene Steroids, and was 
hosted in BitBucket.org. What is the best way to continue to distribute 
it? Should I continue to maintain it on BitBucket.org or should I apply 
to Lucene.NET project (I don't know how) to include it on Contrib modules?

The current code is available at 
http://bitbucket.org/guibv/lucene.steroids. The work is incomplete; the 
first stable version should be available on next few days.

Best regards,
Guilherme Balena Versiani.

Re: [Lucene.Net] Lucene Steroids

Posted by Robert Stewart <Ro...@epam.com>.
I have built something similar using NTFS hard-links and re-using existing local snapshot files, etc.  It runs in production for 3+ years now with more than 100 million docs, and distributes new snapshots from master servers every minute.  It does not use any rsync, but only leverages unique file names in lucene - it only copies files not already existing on slaves, and uses NTFS hard links to "copy" existing local files into new snapshot directory. Also, on the masters, it just uses NTFS hard links to create a new "snapshot" of the master index, and then slaves just look for new snapshot directories on the master servers.  When new directory shows up, it looks at existing local snapshot to see which files are new on master (or have been deleted by master), and then only copies new files.  It does not need to send any explicit commit operations, and there is no explicit communication between masters and slaves (slaves just look in some remote directory for new snapshot sub-directories).   This has worked great with no problems at all.  All this was built prior to SOLR being available on windows.  Going forward we are transitioning to Java and SOLR on Linux (it is just to hard to keep up with improvements otherwise IMO).



On Jul 6, 2011, at 8:22 PM, Guilherme Balena Versiani wrote:

> Hi,
> 
> I am working on a derived work of Solr for .NET. The purpose is to obtain a similar solution of Lucene replication available at Solr, but without the need to port all Solr code.
> 
> There is a SnapShooter, SnapPuller and a SnapInstaller. The SnapShooter does similar work as in Solr script. The SnapPuller uses cwRsync to replicate the database between machines, but without storing the snapshot.current.MACHINENAME files on master, as cwRsync does no support sync with the server. The SnapInstaller tries to substitute the Lucene database files "in-place" -- the Lucene application should use a "SteroidsFSDirectory" that creates a special "SteroidsFSIndexInput" that permits to rename files in use; after that, SnapInstaller sends a "commit" operation through a Windows named pipe to the application to reset its current IndexSearcher instance.
> 
> This solution has the "suggestive" name of Lucene Steroids, and was hosted in BitBucket.org. What is the best way to continue to distribute it? Should I continue to maintain it on BitBucket.org or should I apply to Lucene.NET project (I don't know how) to include it on Contrib modules?
> 
> The current code is available at http://bitbucket.org/guibv/lucene.steroids. The work is incomplete; the first stable version should be available on next few days.
> 
> Best regards,
> Guilherme Balena Versiani.