You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rajesh parab <ra...@yahoo.com> on 2008/04/04 02:19:50 UTC

Lucene 2.3.0 and NFS

Hi,

We are currently using Lucene 2.0 for full-text
searches within our enterprise application, which can
be deployed in clustered environment. We generate
Lucene index for data stored inside relational
database.

As Lucene 2.0 did not have solid NFS support and as we
wanted Lucene based searches to work properly in
Clustered environment, we had decided on following
approach:
1. The index generation happens on a machine (could be
one of the cluster nodes or a separate machine) and
once the Lucene index is generated, we copy all the
index files to the database.
2. The index search request on each cluster node
retrieves the index files from database (during first
search or after index update), copies to the file
system and use it for searches.
3. Thus, each cluster node has its own copy of the
index and it keeps on picking up latest version if it
is available inside database.

This has worked fine for us till now, though we will
not be able continue with this model in future as we
want to support Lucene based searches across our
application and also want to index large components
inside our application like Wiki, forums, etc. As the
index will grow, storing and retrieving index files
from database will not be an efficient operation.

My questions are:
- Will we be able to use NFS if we move to Lucene
2.3.0?
- Will there be any significant performance impact on
index generation and searches if we move to NFS?
- Is Lucene + NFS combination supported for all
operating systems? (We support Windows, Solaris, AIX,
HP-UX, Red Hat Linux)
- Is there any other alternative available other than
NFS?

I will really appreciate your comments/thoughts on
this topic.

Regards,
Rajesh


      ____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene 2.3.0 and NFS

Posted by Cedric Ho <ce...@gmail.com>.
I think cygwin will do the trick.

On Thu, Apr 10, 2008 at 9:10 AM, Rajesh parab <ra...@yahoo.com> wrote:
> Hi All,
>
>  Has anyone used rsync or similar utilities on Windows
>  OS to replicate Lucene index across multiple machines?
>
>  Any pointers on it will be very useful?
>
>  Regards,
>  Rajesh
>
>
>
>  --- Rajesh parab <ra...@yahoo.com> wrote:
>
>  > Hi Michael,
>  >
>  > Thanks a lot for your suggestions.
>  >
>  > I was looking at rsync; as per this link
>  > (http://samba.anu.edu.au/rsync/features.html), rsync
>  > is a file transfer program for UNIX. Is there rsync
>  > support for Windows as well? I found few rsync
>  > programs that works for Windows, but I am not sure
>  > if
>  > they will server the purpose. Has anyone using rsync
>  > on Windows?
>  >
>  > Regards,
>  > Rajesh
>  >
>  > --- Michael McCandless <lu...@mikemccandless.com>
>  > wrote:
>  >
>  > >
>  > > Rajesh parab wrote:
>  > > > Hi,
>  > > >
>  > > > We are currently using Lucene 2.0 for full-text
>  > > > searches within our enterprise application,
>  > which
>  > > can
>  > > > be deployed in clustered environment. We
>  > generate
>  > > > Lucene index for data stored inside relational
>  > > > database.
>  > > >
>  > > > As Lucene 2.0 did not have solid NFS support and
>  > > as we
>  > > > wanted Lucene based searches to work properly in
>  > > > Clustered environment, we had decided on
>  > following
>  > > > approach:
>  > > > 1. The index generation happens on a machine
>  > > (could be
>  > > > one of the cluster nodes or a separate machine)
>  > > and
>  > > > once the Lucene index is generated, we copy all
>  > > the
>  > > > index files to the database.
>  > >
>  > > Note that you can do also incremental replication:
>  > > often, the Lucene
>  > > index changes in minor ways (eg a single new
>  > segment
>  > > is flushed and a
>  > > new segments_N and segments.gen is written) so you
>  > > should only sync
>  > > the files that are new (and remove the ones that
>  > are
>  > > now gone).
>  > > Lucene's write-once approach makes this very
>  > simple
>  > > (you just have to
>  > > compare file names, not the contents of each
>  > file).
>  > >
>  > > It's also possible to replicate without using a
>  > DB.
>  > > EG rsync does a
>  > > great job.
>  > >
>  > > > 2. The index search request on each cluster node
>  > > > retrieves the index files from database (during
>  > > first
>  > > > search or after index update), copies to the
>  > file
>  > > > system and use it for searches.
>  > > > 3. Thus, each cluster node has its own copy of
>  > the
>  > > > index and it keeps on picking up latest version
>  > if
>  > > it
>  > > > is available inside database.
>  > > >
>  > > > This has worked fine for us till now, though we
>  > > will
>  > > > not be able continue with this model in future
>  > as
>  > > we
>  > > > want to support Lucene based searches across our
>  > > > application and also want to index large
>  > > components
>  > > > inside our application like Wiki, forums, etc.
>  > As
>  > > the
>  > > > index will grow, storing and retrieving index
>  > > files
>  > > > from database will not be an efficient
>  > operation.
>  > > >
>  > > > My questions are:
>  > > > - Will we be able to use NFS if we move to
>  > Lucene
>  > > > 2.3.0?
>  > >
>  > > Make sure you update to 2.3.1, not 2.3.0.
>  > >
>  > > > - Will there be any significant performance
>  > impact
>  > > on
>  > > > index generation and searches if we move to NFS?
>  > > > - Is Lucene + NFS combination supported for all
>  > > > operating systems? (We support Windows, Solaris,
>  > > AIX,
>  > > > HP-UX, Red Hat Linux)
>  > >
>  > > NFS *should* work, however:
>  > >
>  > >    * It's not widely used, so, test thoroughly in
>  > > your particular setup.
>  > >
>  > >    * Most likely to work is if you use a single
>  > > machine writing to
>  > > the index, and many readers.
>  > >
>  > >    * Performance is likely not great, especially
>  > on
>  > > searching, but
>  > > you should test in your specific situation.
>  > >
>  > > > - Is there any other alternative available other
>  > > than
>  > > > NFS?
>  > > >
>  > >
>  > > It's also possible to replicate without using a
>  > DB.
>  > > EG rsync does
>  > > agreat job.
>  > >
>  > > You should look at Solr, since it already has all
>  > > the infrastructure
>  > > toaccept updates, replicate index changes to
>  > remote
>  > > machines, etc.
>  > >
>  > > > I will really appreciate your comments/thoughts
>  > on
>  > > > this topic.
>  > > >
>  > > > Regards,
>  > > > Rajesh
>  > > >
>  > > >
>  > > >
>  > > >
>  > >
>  >
>  ______________________________________________________________________
>  > >
>  > > > ______________
>  > > > You rock. That's why Blockbuster's offering you
>  > > one month of
>  > > > Blockbuster Total Access, No Cost.
>  > > >
>  > http://tc.deals.yahoo.com/tc/blockbuster/text5.com
>  > > >
>  > > >
>  > >
>  >
>  ---------------------------------------------------------------------
>  > > > To unsubscribe, e-mail:
>  > > java-user-unsubscribe@lucene.apache.org
>  > > > For additional commands, e-mail:
>  > > java-user-help@lucene.apache.org
>  > > >
>  > >
>  > >
>  > >
>  >
>  ---------------------------------------------------------------------
>  > > To unsubscribe, e-mail:
>  > > java-user-unsubscribe@lucene.apache.org
>  > > For additional commands, e-mail:
>  > > java-user-help@lucene.apache.org
>  > >
>  > >
>  >
>  >
>  >
>  >
>  >
>  ____________________________________________________________________________________
>  > You rock. That's why Blockbuster's offering you one
>  > month of Blockbuster Total Access, No Cost.
>  > http://tc.deals.yahoo.com/tc/blockbuster/text5.com
>  >
>  >
>  ---------------------------------------------------------------------
>  > To unsubscribe, e-mail:
>  > java-user-unsubscribe@lucene.apache.org
>  > For additional commands, e-mail:
>  > java-user-help@lucene.apache.org
>  >
>  >
>
>
>
> __________________________________________________
>  Do You Yahoo!?
>  Tired of spam?  Yahoo! Mail has the best spam protection around
>  http://mail.yahoo.com
>
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene 2.3.0 and NFS

Posted by 仇寅 <qi...@software.nju.edu.cn>.
Hi,

I know there are several options, though none of which I have tried myself.

Rsync win32 port: http://sourceforge.net/projects/rsyncwin32/
rsync python script (should be platform-independent):
http://www.vdesmedt.com/~vds2212/rsync.html
Jarsync (in java): http://jarsync.sourceforge.net/, which is a Java
implementation for rsync
And finally cygwin of course.

Hope that helps.

On Thu, Apr 10, 2008 at 9:10 AM, Rajesh parab <ra...@yahoo.com>
wrote:

> Hi All,
>
> Has anyone used rsync or similar utilities on Windows
> OS to replicate Lucene index across multiple machines?
>
> Any pointers on it will be very useful?
>
> Regards,
> Rajesh
>
> --- Rajesh parab <ra...@yahoo.com> wrote:
>
> > Hi Michael,
> >
> > Thanks a lot for your suggestions.
> >
> > I was looking at rsync; as per this link
> > (http://samba.anu.edu.au/rsync/features.html), rsync
> > is a file transfer program for UNIX. Is there rsync
> > support for Windows as well? I found few rsync
> > programs that works for Windows, but I am not sure
> > if
> > they will server the purpose. Has anyone using rsync
> > on Windows?
> >
> > Regards,
> > Rajesh
> >
> > --- Michael McCandless <lu...@mikemccandless.com>
> > wrote:
> >
> > >
> > > Rajesh parab wrote:
> > > > Hi,
> > > >
> > > > We are currently using Lucene 2.0 for full-text
> > > > searches within our enterprise application,
> > which
> > > can
> > > > be deployed in clustered environment. We
> > generate
> > > > Lucene index for data stored inside relational
> > > > database.
> > > >
> > > > As Lucene 2.0 did not have solid NFS support and
> > > as we
> > > > wanted Lucene based searches to work properly in
> > > > Clustered environment, we had decided on
> > following
> > > > approach:
> > > > 1. The index generation happens on a machine
> > > (could be
> > > > one of the cluster nodes or a separate machine)
> > > and
> > > > once the Lucene index is generated, we copy all
> > > the
> > > > index files to the database.
> > >
> > > Note that you can do also incremental replication:
> > > often, the Lucene
> > > index changes in minor ways (eg a single new
> > segment
> > > is flushed and a
> > > new segments_N and segments.gen is written) so you
> > > should only sync
> > > the files that are new (and remove the ones that
> > are
> > > now gone).
> > > Lucene's write-once approach makes this very
> > simple
> > > (you just have to
> > > compare file names, not the contents of each
> > file).
> > >
> > > It's also possible to replicate without using a
> > DB.
> > > EG rsync does a
> > > great job.
> > >
> > > > 2. The index search request on each cluster node
> > > > retrieves the index files from database (during
> > > first
> > > > search or after index update), copies to the
> > file
> > > > system and use it for searches.
> > > > 3. Thus, each cluster node has its own copy of
> > the
> > > > index and it keeps on picking up latest version
> > if
> > > it
> > > > is available inside database.
> > > >
> > > > This has worked fine for us till now, though we
> > > will
> > > > not be able continue with this model in future
> > as
> > > we
> > > > want to support Lucene based searches across our
> > > > application and also want to index large
> > > components
> > > > inside our application like Wiki, forums, etc.
> > As
> > > the
> > > > index will grow, storing and retrieving index
> > > files
> > > > from database will not be an efficient
> > operation.
> > > >
> > > > My questions are:
> > > > - Will we be able to use NFS if we move to
> > Lucene
> > > > 2.3.0?
> > >
> > > Make sure you update to 2.3.1, not 2.3.0.
> > >
> > > > - Will there be any significant performance
> > impact
> > > on
> > > > index generation and searches if we move to NFS?
> > > > - Is Lucene + NFS combination supported for all
> > > > operating systems? (We support Windows, Solaris,
> > > AIX,
> > > > HP-UX, Red Hat Linux)
> > >
> > > NFS *should* work, however:
> > >
> > >    * It's not widely used, so, test thoroughly in
> > > your particular setup.
> > >
> > >    * Most likely to work is if you use a single
> > > machine writing to
> > > the index, and many readers.
> > >
> > >    * Performance is likely not great, especially
> > on
> > > searching, but
> > > you should test in your specific situation.
> > >
> > > > - Is there any other alternative available other
> > > than
> > > > NFS?
> > > >
> > >
> > > It's also possible to replicate without using a
> > DB.
> > > EG rsync does
> > > agreat job.
> > >
> > > You should look at Solr, since it already has all
> > > the infrastructure
> > > toaccept updates, replicate index changes to
> > remote
> > > machines, etc.
> > >
> > > > I will really appreciate your comments/thoughts
> > on
> > > > this topic.
> > > >
> > > > Regards,
> > > > Rajesh
> > > >
> > > >
> > > >
> > > >
> > >
> >
> ______________________________________________________________________
> > >
> > > > ______________
> > > > You rock. That's why Blockbuster's offering you
> > > one month of
> > > > Blockbuster Total Access, No Cost.
> > > >
> > http://tc.deals.yahoo.com/tc/blockbuster/text5.com
> > > >
> > > >
> > >
> >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > > >
> > >
> > >
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> >
> >
> >
>
> ____________________________________________________________________________________
> > You rock. That's why Blockbuster's offering you one
> > month of Blockbuster Total Access, No Cost.
> > http://tc.deals.yahoo.com/tc/blockbuster/text5.com
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> >
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Yin Qiu
Nanjing University, China
-------------------------------------------

Re: Lucene 2.3.0 and NFS

Posted by Rajesh parab <ra...@yahoo.com>.
Hi All,

Has anyone used rsync or similar utilities on Windows
OS to replicate Lucene index across multiple machines?

Any pointers on it will be very useful?

Regards,
Rajesh

--- Rajesh parab <ra...@yahoo.com> wrote:

> Hi Michael,
> 
> Thanks a lot for your suggestions.
> 
> I was looking at rsync; as per this link
> (http://samba.anu.edu.au/rsync/features.html), rsync
> is a file transfer program for UNIX. Is there rsync
> support for Windows as well? I found few rsync
> programs that works for Windows, but I am not sure
> if
> they will server the purpose. Has anyone using rsync
> on Windows?
> 
> Regards,
> Rajesh
> 
> --- Michael McCandless <lu...@mikemccandless.com>
> wrote:
> 
> > 
> > Rajesh parab wrote:
> > > Hi,
> > >
> > > We are currently using Lucene 2.0 for full-text
> > > searches within our enterprise application,
> which
> > can
> > > be deployed in clustered environment. We
> generate
> > > Lucene index for data stored inside relational
> > > database.
> > >
> > > As Lucene 2.0 did not have solid NFS support and
> > as we
> > > wanted Lucene based searches to work properly in
> > > Clustered environment, we had decided on
> following
> > > approach:
> > > 1. The index generation happens on a machine
> > (could be
> > > one of the cluster nodes or a separate machine)
> > and
> > > once the Lucene index is generated, we copy all
> > the
> > > index files to the database.
> > 
> > Note that you can do also incremental replication:
> > often, the Lucene  
> > index changes in minor ways (eg a single new
> segment
> > is flushed and a  
> > new segments_N and segments.gen is written) so you
> > should only sync  
> > the files that are new (and remove the ones that
> are
> > now gone).   
> > Lucene's write-once approach makes this very
> simple
> > (you just have to  
> > compare file names, not the contents of each
> file).
> > 
> > It's also possible to replicate without using a
> DB. 
> > EG rsync does a  
> > great job.
> > 
> > > 2. The index search request on each cluster node
> > > retrieves the index files from database (during
> > first
> > > search or after index update), copies to the
> file
> > > system and use it for searches.
> > > 3. Thus, each cluster node has its own copy of
> the
> > > index and it keeps on picking up latest version
> if
> > it
> > > is available inside database.
> > >
> > > This has worked fine for us till now, though we
> > will
> > > not be able continue with this model in future
> as
> > we
> > > want to support Lucene based searches across our
> > > application and also want to index large
> > components
> > > inside our application like Wiki, forums, etc.
> As
> > the
> > > index will grow, storing and retrieving index
> > files
> > > from database will not be an efficient
> operation.
> > >
> > > My questions are:
> > > - Will we be able to use NFS if we move to
> Lucene
> > > 2.3.0?
> > 
> > Make sure you update to 2.3.1, not 2.3.0.
> > 
> > > - Will there be any significant performance
> impact
> > on
> > > index generation and searches if we move to NFS?
> > > - Is Lucene + NFS combination supported for all
> > > operating systems? (We support Windows, Solaris,
> > AIX,
> > > HP-UX, Red Hat Linux)
> > 
> > NFS *should* work, however:
> > 
> >    * It's not widely used, so, test thoroughly in
> > your particular setup.
> > 
> >    * Most likely to work is if you use a single
> > machine writing to  
> > the index, and many readers.
> > 
> >    * Performance is likely not great, especially
> on
> > searching, but  
> > you should test in your specific situation.
> > 
> > > - Is there any other alternative available other
> > than
> > > NFS?
> > >
> > 
> > It's also possible to replicate without using a
> DB. 
> > EG rsync does  
> > agreat job.
> > 
> > You should look at Solr, since it already has all
> > the infrastructure  
> > toaccept updates, replicate index changes to
> remote
> > machines, etc.
> > 
> > > I will really appreciate your comments/thoughts
> on
> > > this topic.
> > >
> > > Regards,
> > > Rajesh
> > >
> > >
> > >        
> > >
> >
>
______________________________________________________________________
> > 
> > > ______________
> > > You rock. That's why Blockbuster's offering you
> > one month of  
> > > Blockbuster Total Access, No Cost.
> > >
> http://tc.deals.yahoo.com/tc/blockbuster/text5.com
> > >
> > >
> >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > >
> > 
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > 
> > 
> 
> 
> 
>      
>
____________________________________________________________________________________
> You rock. That's why Blockbuster's offering you one
> month of Blockbuster Total Access, No Cost.  
> http://tc.deals.yahoo.com/tc/blockbuster/text5.com
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene 2.3.0 and NFS

Posted by Rajesh parab <ra...@yahoo.com>.
Hi Michael,

Thanks a lot for your suggestions.

I was looking at rsync; as per this link
(http://samba.anu.edu.au/rsync/features.html), rsync
is a file transfer program for UNIX. Is there rsync
support for Windows as well? I found few rsync
programs that works for Windows, but I am not sure if
they will server the purpose. Has anyone using rsync
on Windows?

Regards,
Rajesh

--- Michael McCandless <lu...@mikemccandless.com>
wrote:

> 
> Rajesh parab wrote:
> > Hi,
> >
> > We are currently using Lucene 2.0 for full-text
> > searches within our enterprise application, which
> can
> > be deployed in clustered environment. We generate
> > Lucene index for data stored inside relational
> > database.
> >
> > As Lucene 2.0 did not have solid NFS support and
> as we
> > wanted Lucene based searches to work properly in
> > Clustered environment, we had decided on following
> > approach:
> > 1. The index generation happens on a machine
> (could be
> > one of the cluster nodes or a separate machine)
> and
> > once the Lucene index is generated, we copy all
> the
> > index files to the database.
> 
> Note that you can do also incremental replication:
> often, the Lucene  
> index changes in minor ways (eg a single new segment
> is flushed and a  
> new segments_N and segments.gen is written) so you
> should only sync  
> the files that are new (and remove the ones that are
> now gone).   
> Lucene's write-once approach makes this very simple
> (you just have to  
> compare file names, not the contents of each file).
> 
> It's also possible to replicate without using a DB. 
> EG rsync does a  
> great job.
> 
> > 2. The index search request on each cluster node
> > retrieves the index files from database (during
> first
> > search or after index update), copies to the file
> > system and use it for searches.
> > 3. Thus, each cluster node has its own copy of the
> > index and it keeps on picking up latest version if
> it
> > is available inside database.
> >
> > This has worked fine for us till now, though we
> will
> > not be able continue with this model in future as
> we
> > want to support Lucene based searches across our
> > application and also want to index large
> components
> > inside our application like Wiki, forums, etc. As
> the
> > index will grow, storing and retrieving index
> files
> > from database will not be an efficient operation.
> >
> > My questions are:
> > - Will we be able to use NFS if we move to Lucene
> > 2.3.0?
> 
> Make sure you update to 2.3.1, not 2.3.0.
> 
> > - Will there be any significant performance impact
> on
> > index generation and searches if we move to NFS?
> > - Is Lucene + NFS combination supported for all
> > operating systems? (We support Windows, Solaris,
> AIX,
> > HP-UX, Red Hat Linux)
> 
> NFS *should* work, however:
> 
>    * It's not widely used, so, test thoroughly in
> your particular setup.
> 
>    * Most likely to work is if you use a single
> machine writing to  
> the index, and many readers.
> 
>    * Performance is likely not great, especially on
> searching, but  
> you should test in your specific situation.
> 
> > - Is there any other alternative available other
> than
> > NFS?
> >
> 
> It's also possible to replicate without using a DB. 
> EG rsync does  
> agreat job.
> 
> You should look at Solr, since it already has all
> the infrastructure  
> toaccept updates, replicate index changes to remote
> machines, etc.
> 
> > I will really appreciate your comments/thoughts on
> > this topic.
> >
> > Regards,
> > Rajesh
> >
> >
> >        
> >
>
______________________________________________________________________
> 
> > ______________
> > You rock. That's why Blockbuster's offering you
> one month of  
> > Blockbuster Total Access, No Cost.
> > http://tc.deals.yahoo.com/tc/blockbuster/text5.com
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 



      ____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene 2.3.0 and NFS

Posted by Michael McCandless <lu...@mikemccandless.com>.
Rajesh parab wrote:
> Hi,
>
> We are currently using Lucene 2.0 for full-text
> searches within our enterprise application, which can
> be deployed in clustered environment. We generate
> Lucene index for data stored inside relational
> database.
>
> As Lucene 2.0 did not have solid NFS support and as we
> wanted Lucene based searches to work properly in
> Clustered environment, we had decided on following
> approach:
> 1. The index generation happens on a machine (could be
> one of the cluster nodes or a separate machine) and
> once the Lucene index is generated, we copy all the
> index files to the database.

Note that you can do also incremental replication: often, the Lucene  
index changes in minor ways (eg a single new segment is flushed and a  
new segments_N and segments.gen is written) so you should only sync  
the files that are new (and remove the ones that are now gone).   
Lucene's write-once approach makes this very simple (you just have to  
compare file names, not the contents of each file).

It's also possible to replicate without using a DB.  EG rsync does a  
great job.

> 2. The index search request on each cluster node
> retrieves the index files from database (during first
> search or after index update), copies to the file
> system and use it for searches.
> 3. Thus, each cluster node has its own copy of the
> index and it keeps on picking up latest version if it
> is available inside database.
>
> This has worked fine for us till now, though we will
> not be able continue with this model in future as we
> want to support Lucene based searches across our
> application and also want to index large components
> inside our application like Wiki, forums, etc. As the
> index will grow, storing and retrieving index files
> from database will not be an efficient operation.
>
> My questions are:
> - Will we be able to use NFS if we move to Lucene
> 2.3.0?

Make sure you update to 2.3.1, not 2.3.0.

> - Will there be any significant performance impact on
> index generation and searches if we move to NFS?
> - Is Lucene + NFS combination supported for all
> operating systems? (We support Windows, Solaris, AIX,
> HP-UX, Red Hat Linux)

NFS *should* work, however:

   * It's not widely used, so, test thoroughly in your particular setup.

   * Most likely to work is if you use a single machine writing to  
the index, and many readers.

   * Performance is likely not great, especially on searching, but  
you should test in your specific situation.

> - Is there any other alternative available other than
> NFS?
>

It's also possible to replicate without using a DB.  EG rsync does  
agreat job.

You should look at Solr, since it already has all the infrastructure  
toaccept updates, replicate index changes to remote machines, etc.

> I will really appreciate your comments/thoughts on
> this topic.
>
> Regards,
> Rajesh
>
>
>        
> ______________________________________________________________________ 
> ______________
> You rock. That's why Blockbuster's offering you one month of  
> Blockbuster Total Access, No Cost.
> http://tc.deals.yahoo.com/tc/blockbuster/text5.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene 2.3.0 and NFS

Posted by "Duan, Nick" <ND...@mcdonaldbradley.com>.
Have you looked at Nutch or Hadoop?  They are subprojects of Lucene,
developed specifically to support large-scale, distributed indexing.
Nutch is probably more mature whereas Hadoop supports clustering out of
the box...

ND

-----Original Message-----
From: Rajesh parab [mailto:rajesh_parab_1@yahoo.com] 
Sent: Thursday, April 03, 2008 8:20 PM
To: java-user@lucene.apache.org
Subject: Lucene 2.3.0 and NFS

Hi,

We are currently using Lucene 2.0 for full-text
searches within our enterprise application, which can
be deployed in clustered environment. We generate
Lucene index for data stored inside relational
database.

As Lucene 2.0 did not have solid NFS support and as we
wanted Lucene based searches to work properly in
Clustered environment, we had decided on following
approach:
1. The index generation happens on a machine (could be
one of the cluster nodes or a separate machine) and
once the Lucene index is generated, we copy all the
index files to the database.
2. The index search request on each cluster node
retrieves the index files from database (during first
search or after index update), copies to the file
system and use it for searches.
3. Thus, each cluster node has its own copy of the
index and it keeps on picking up latest version if it
is available inside database.

This has worked fine for us till now, though we will
not be able continue with this model in future as we
want to support Lucene based searches across our
application and also want to index large components
inside our application like Wiki, forums, etc. As the
index will grow, storing and retrieving index files
from database will not be an efficient operation.

My questions are:
- Will we be able to use NFS if we move to Lucene
2.3.0?
- Will there be any significant performance impact on
index generation and searches if we move to NFS?
- Is Lucene + NFS combination supported for all
operating systems? (We support Windows, Solaris, AIX,
HP-UX, Red Hat Linux)
- Is there any other alternative available other than
NFS?

I will really appreciate your comments/thoughts on
this topic.

Regards,
Rajesh


 
________________________________________________________________________
____________
You rock. That's why Blockbuster's offering you one month of Blockbuster
Total Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org