You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Brian Sayatovic <bs...@creditinfonet.com> on 2011/10/19 16:36:50 UTC

[Lucene.Net] RE: Clarifying enterprise application use of Lucene.Net

Here's an update on the way I'm currently applying Lucene.NET.

While I'm developing locally on a single server (Visual Studio's Cassini), I deploy to some single-server environments as well as two-server environments.  The two-server environments are the more interesting ones, and they're just a special case of a hypothetical N-server environment.  So one thing I'm dealing with is lock contention between the servers.

Right now, each call to index an entity (resulting in a Lucene.NET Document), I open an IndexWriter, use it, and them commit and close it.  I am not presently keeping my IndexWriter objects around -- not per thread and not per-server.  This means that any thread on any server could be writing to the index at any given moment.  Because my writes are very small, the usage of any one particular IndexWriter is short-lived reducing the contention for locks.  That said, depending on how close the file system is (local, remote, distributed), my error rate (due to lock contention) is anywhere from <1% to 5%.  This is currently a thorn in my side, and I think it will require me re-thinking how I'm using it.

One help is that this is a multi-tenant system, and each tenant gets its own Lucene index.  Thus, contention is limited to users of the same tenant.

Regards,
Brian.


-----Original Message-----
From: Brian J. Sayatovic [mailto:bsayatovic@creditinfonet.com]
Sent: Monday, June 07, 2010 8:41 AM
To: lucene-net-user@lucene.apache.org
Subject: Clarifying enterprise application use of Lucene.Net

I'm building a client-server app where the application logic - including Lucene.Net-based search - is implemented in the server behind WCF services.
After reading the Lucene.Net getting started, API documentation, FAQ, blogs and mailing list archives, I'm still not entirely clear on how best to employ IndexWriters and IndexReaders.



At the moment, I've gone with a single, global IndexWriter, as well as transient instances of IndexSearchers constructed from IndexReaders spawned from that global IndexWriter.  That is, every WCF request coming in shares a single IndexWriter that's instantiated at application startup.  The WCF requests could be on any of a number of threads in the WCF thread pool.  The FAQ says that "IndexWriter in general is thread-safe"
(http://wiki.apache.org/lucene-java/LuceneFAQ#Is_the_IndexWriter_class.2C_an
d_especially_the_method_addIndexes.28Directory.5B.5D.29_thread_safe.3F).
While the FAQ also suggests that it is safe to use a single IndexSearcher, I also saw suggestions that you can get an IndexReader from that IndexWriter and use an IndexSearcher around that.  I even saw mention of a LuceneInxexAccessor, but found no source, so I crudely implemented my own.



But what are the implications of this?  In my final production deployment topology, I'll have two independent legs of production sharing one database as well as sharing the same Lucene.Net index data.  With each leg having a single, global writer, will that be a problem?



And what happens if I "Commit" my IndexWriter after every modification to it, but never actually "Close" it?



Regards,

Brian.




__________ Information from ESET NOD32 Antivirus, version of virus signature database 5178 (20100607) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

________________________________

Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

RE: [Lucene.Net] RE: Clarifying enterprise application use of Lucene.Net

Posted by Brian Sayatovic <bs...@creditinfonet.com>.
Going the single writer route is definitely in my future.

Right now I'm using NHibernate.  When a change happens to an entity, and I have its data in my hands, I can very easily update the Lucene document.  That's what I'm doing now.  Thus, it happens on whichever of the N servers the business transaction happens on.

My future plan is to have a single server out of the N nominated as the "indexing" server (I actually have a clever, autonomic nomination process so if one server goes down, another can take over).  All servers updating entities will push a message to a queue indicating which entity has changed.  Then the indexing server will be responsible for dequeueing these messages and updating the index.  Because a single server is doing it, it can afford to keep the writer open longer.

The downside is it will be a little slower -- the queueing and the re-looking-up of the entity so that it's data is available for placement in the index.

Then I believe I'll follow the strategy others have used where the index is periodically copied to each server to server as a read-only copy (perhaps even a lower-level replication solution).

Thoughts?

Regards,
Brian.

-----Original Message-----
From: Troy Howard [mailto:thoward37@gmail.com]
Sent: Wednesday, October 19, 2011 3:07 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] RE: Clarifying enterprise application use of Lucene.Net

It's probably not a good idea to have Lucene.Net running directly within the server-side code of an ASP.NET web application. Another approach would be to move your index writing to a service layer with a message queue in-between the web front-end and the indexing service.
This will allow you to buffer and batch writes during high traffic times, and in the case of batches of write operations against a single index, let the IndexWriter do more work at once. It also cleans up your web app code so that it's only dealing with a single concern.

While it is best practice from a performance perspective to keep an IndexWriter open for as long as possible, there's also a danger of running into memory limits if you have large indexes, large individual documents, or numerous indexes open at the same time. It's a careful balancing act between obtaining the best performance and staying under the memory limits for a single .NET process.

Thanks,
Troy


On Wed, Oct 19, 2011 at 7:36 AM, Brian Sayatovic <bs...@creditinfonet.com> wrote:
> Here's an update on the way I'm currently applying Lucene.NET.
>
> While I'm developing locally on a single server (Visual Studio's Cassini), I deploy to some single-server environments as well as two-server environments.  The two-server environments are the more interesting ones, and they're just a special case of a hypothetical N-server environment.  So one thing I'm dealing with is lock contention between the servers.
>
> Right now, each call to index an entity (resulting in a Lucene.NET Document), I open an IndexWriter, use it, and them commit and close it.  I am not presently keeping my IndexWriter objects around -- not per thread and not per-server.  This means that any thread on any server could be writing to the index at any given moment.  Because my writes are very small, the usage of any one particular IndexWriter is short-lived reducing the contention for locks.  That said, depending on how close the file system is (local, remote, distributed), my error rate (due to lock contention) is anywhere from <1% to 5%.  This is currently a thorn in my side, and I think it will require me re-thinking how I'm using it.
>
> One help is that this is a multi-tenant system, and each tenant gets its own Lucene index.  Thus, contention is limited to users of the same tenant.
>
> Regards,
> Brian.
>
>
> -----Original Message-----
> From: Brian J. Sayatovic [mailto:bsayatovic@creditinfonet.com]
> Sent: Monday, June 07, 2010 8:41 AM
> To: lucene-net-user@lucene.apache.org
> Subject: Clarifying enterprise application use of Lucene.Net
>
> I'm building a client-server app where the application logic - including Lucene.Net-based search - is implemented in the server behind WCF services.
> After reading the Lucene.Net getting started, API documentation, FAQ, blogs and mailing list archives, I'm still not entirely clear on how best to employ IndexWriters and IndexReaders.
>
>
>
> At the moment, I've gone with a single, global IndexWriter, as well as transient instances of IndexSearchers constructed from IndexReaders spawned from that global IndexWriter.  That is, every WCF request coming in shares a single IndexWriter that's instantiated at application startup.  The WCF requests could be on any of a number of threads in the WCF thread pool.  The FAQ says that "IndexWriter in general is thread-safe"
> (http://wiki.apache.org/lucene-java/LuceneFAQ#Is_the_IndexWriter_class
> .2C_an
> d_especially_the_method_addIndexes.28Directory.5B.5D.29_thread_safe.3F).
> While the FAQ also suggests that it is safe to use a single IndexSearcher, I also saw suggestions that you can get an IndexReader from that IndexWriter and use an IndexSearcher around that.  I even saw mention of a LuceneInxexAccessor, but found no source, so I crudely implemented my own.
>
>
>
> But what are the implications of this?  In my final production deployment topology, I'll have two independent legs of production sharing one database as well as sharing the same Lucene.Net index data.  With each leg having a single, global writer, will that be a problem?
>
>
>
> And what happens if I "Commit" my IndexWriter after every modification to it, but never actually "Close" it?
>
>
>
> Regards,
>
> Brian.
>
>
>
>
> __________ Information from ESET NOD32 Antivirus, version of virus
> signature database 5178 (20100607) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
> ________________________________
>
> Learn more about the products, services and technology solutions
> available from CIN Legal Data Services at:
> www.cinlegal.com<http://www.cinlegal.com>
>
> This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
>
________________________________

Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

Re: [Lucene.Net] RE: Clarifying enterprise application use of Lucene.Net

Posted by Troy Howard <th...@gmail.com>.
It's probably not a good idea to have Lucene.Net running directly
within the server-side code of an ASP.NET web application. Another
approach would be to move your index writing to a service layer with a
message queue in-between the web front-end and the indexing service.
This will allow you to buffer and batch writes during high traffic
times, and in the case of batches of write operations against a single
index, let the IndexWriter do more work at once. It also cleans up
your web app code so that it's only dealing with a single concern.

While it is best practice from a performance perspective to keep an
IndexWriter open for as long as possible, there's also a danger of
running into memory limits if you have large indexes, large individual
documents, or numerous indexes open at the same time. It's a careful
balancing act between obtaining the best performance and staying under
the memory limits for a single .NET process.

Thanks,
Troy


On Wed, Oct 19, 2011 at 7:36 AM, Brian Sayatovic
<bs...@creditinfonet.com> wrote:
> Here's an update on the way I'm currently applying Lucene.NET.
>
> While I'm developing locally on a single server (Visual Studio's Cassini), I deploy to some single-server environments as well as two-server environments.  The two-server environments are the more interesting ones, and they're just a special case of a hypothetical N-server environment.  So one thing I'm dealing with is lock contention between the servers.
>
> Right now, each call to index an entity (resulting in a Lucene.NET Document), I open an IndexWriter, use it, and them commit and close it.  I am not presently keeping my IndexWriter objects around -- not per thread and not per-server.  This means that any thread on any server could be writing to the index at any given moment.  Because my writes are very small, the usage of any one particular IndexWriter is short-lived reducing the contention for locks.  That said, depending on how close the file system is (local, remote, distributed), my error rate (due to lock contention) is anywhere from <1% to 5%.  This is currently a thorn in my side, and I think it will require me re-thinking how I'm using it.
>
> One help is that this is a multi-tenant system, and each tenant gets its own Lucene index.  Thus, contention is limited to users of the same tenant.
>
> Regards,
> Brian.
>
>
> -----Original Message-----
> From: Brian J. Sayatovic [mailto:bsayatovic@creditinfonet.com]
> Sent: Monday, June 07, 2010 8:41 AM
> To: lucene-net-user@lucene.apache.org
> Subject: Clarifying enterprise application use of Lucene.Net
>
> I'm building a client-server app where the application logic - including Lucene.Net-based search - is implemented in the server behind WCF services.
> After reading the Lucene.Net getting started, API documentation, FAQ, blogs and mailing list archives, I'm still not entirely clear on how best to employ IndexWriters and IndexReaders.
>
>
>
> At the moment, I've gone with a single, global IndexWriter, as well as transient instances of IndexSearchers constructed from IndexReaders spawned from that global IndexWriter.  That is, every WCF request coming in shares a single IndexWriter that's instantiated at application startup.  The WCF requests could be on any of a number of threads in the WCF thread pool.  The FAQ says that "IndexWriter in general is thread-safe"
> (http://wiki.apache.org/lucene-java/LuceneFAQ#Is_the_IndexWriter_class.2C_an
> d_especially_the_method_addIndexes.28Directory.5B.5D.29_thread_safe.3F).
> While the FAQ also suggests that it is safe to use a single IndexSearcher, I also saw suggestions that you can get an IndexReader from that IndexWriter and use an IndexSearcher around that.  I even saw mention of a LuceneInxexAccessor, but found no source, so I crudely implemented my own.
>
>
>
> But what are the implications of this?  In my final production deployment topology, I'll have two independent legs of production sharing one database as well as sharing the same Lucene.Net index data.  With each leg having a single, global writer, will that be a problem?
>
>
>
> And what happens if I "Commit" my IndexWriter after every modification to it, but never actually "Close" it?
>
>
>
> Regards,
>
> Brian.
>
>
>
>
> __________ Information from ESET NOD32 Antivirus, version of virus signature database 5178 (20100607) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
> ________________________________
>
> Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>
>
> This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
>