You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lars-Erik Aabech <LE...@markedspartner.no> on 2012/12/10 11:31:58 UTC

Deciding how to use reader

Hi!

I'm using lucene.net, but I'm sure this question is not platform specific. :)
I've created an index for a website which uses a central database server and three front-end servers.
For now I've put the index and the building of the index on a fifth server which builds the index once an hour.
It uses a singleton with one directory and one writer for the lifetime of the application. (currently infinite)
I also created an http service which gets a reader from the writer and does the search.
The front-end servers get the search result via that service over a 1gb lan.
The problem is that this server has 100% cpu load quite often, and that affects the responsiveness of the service.
So I'm about to rewrite it so that the index is stored on a SAN available to all servers.
The front-end servers will then open their own singletons of directory and do the searches themselves.

My question is as follows,
Given ~17,000 documents and ~120,000 terms, would you open and close a reader for each search,
or would you keep a singleton reader and re-open it say every hour?

mvh.
Lars-Erik Aabech
Faglig leder utvikling
MarkedsPartner AS
Mobil: +47 920 30 537


RE: Deciding how to use reader

Posted by Lars-Erik Aabech <LE...@markedspartner.no>.
Thanks again. Lucene.NET is supposed to be equally thread safe, so I'll give it a go.

L-E

-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com] 
Sent: 10. desember 2012 13:18
To: java-user@lucene.apache.org
Subject: Re: Deciding how to use reader

Certainly in java IndexReader is thread safe.  From the 4.0.0 javadocs "IndexReader instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently".  I know nothing about Lucene.NET.

All your reasons for not using Solr sound fine, particularly the more fun one.  Good luck.


--
Ian.


On Mon, Dec 10, 2012 at 12:09 PM, Lars-Erik Aabech <LE...@markedspartner.no> wrote:
> OK, thanks. As far as I understand, there's no problem performance wise or other to use a singleton reader for multiple threads/users, right?
>
> I've considered Solr, but my conclusion is that installing and 
> configuring something a bit geeky to configure, running on Java when all our stuff is Windows/MSSQL/.net just to save myself from using lucene.net within my existing c# code, seems like overkill. :) We also have some very specific configuration and UI needs for the system we are indexing, and it's all ended up quite nice. The main reason was that our SQL searches (like '%...%') combined with crappy legacy rendering took a minute, the current lucene searches with security, highlighting, other rules, multiple fields etc. takes <1 sec.
>
> Besides, it's more fun getting to know and tamper around with lucene itself.
>
> Lars-Erik
>
> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@gmail.com]
> Sent: 10. desember 2012 13:00
> To: java-user@lucene.apache.org
> Subject: Re: Deciding how to use reader
>
> If the index is only updated once an hour I'd create a new reader once an hour as well, synchronized with the updates.
>
> Have you considered Solr?  That would probably take care of most of the complications pretty much out of the box.
> http://lucene.apache.org/solr/features.html
>
>
> --
> Ian.
>
>
> On Mon, Dec 10, 2012 at 10:31 AM, Lars-Erik Aabech <LE...@markedspartner.no> wrote:
>> Hi!
>>
>> I'm using lucene.net, but I'm sure this question is not platform 
>> specific. :) I've created an index for a website which uses a central database server and three front-end servers.
>> For now I've put the index and the building of the index on a fifth server which builds the index once an hour.
>> It uses a singleton with one directory and one writer for the 
>> lifetime of the application. (currently infinite) I also created an http service which gets a reader from the writer and does the search.
>> The front-end servers get the search result via that service over a 1gb lan.
>> The problem is that this server has 100% cpu load quite often, and that affects the responsiveness of the service.
>> So I'm about to rewrite it so that the index is stored on a SAN available to all servers.
>> The front-end servers will then open their own singletons of directory and do the searches themselves.
>>
>> My question is as follows,
>> Given ~17,000 documents and ~120,000 terms, would you open and close 
>> a reader for each search, or would you keep a singleton reader and re-open it say every hour?
>>
>> mvh.
>> Lars-Erik Aabech
>> Faglig leder utvikling
>> MarkedsPartner AS
>> Mobil: +47 920 30 537
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Deciding how to use reader

Posted by Ian Lea <ia...@gmail.com>.
Certainly in java IndexReader is thread safe.  From the 4.0.0 javadocs
"IndexReader instances are completely thread safe, meaning multiple
threads can call any of its methods, concurrently".  I know nothing
about Lucene.NET.

All your reasons for not using Solr sound fine, particularly the more
fun one.  Good luck.


--
Ian.


On Mon, Dec 10, 2012 at 12:09 PM, Lars-Erik Aabech
<LE...@markedspartner.no> wrote:
> OK, thanks. As far as I understand, there's no problem performance wise or other to use a singleton reader for multiple threads/users, right?
>
> I've considered Solr, but my conclusion is that installing and configuring something a bit geeky to configure, running on Java when all our stuff is Windows/MSSQL/.net just to save myself from using lucene.net within my existing c# code, seems like overkill. :)
> We also have some very specific configuration and UI needs for the system we are indexing, and it's all ended up quite nice. The main reason was that our SQL searches (like '%...%') combined with crappy legacy rendering took a minute, the current lucene searches with security, highlighting, other rules, multiple fields etc. takes <1 sec.
>
> Besides, it's more fun getting to know and tamper around with lucene itself.
>
> Lars-Erik
>
> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@gmail.com]
> Sent: 10. desember 2012 13:00
> To: java-user@lucene.apache.org
> Subject: Re: Deciding how to use reader
>
> If the index is only updated once an hour I'd create a new reader once an hour as well, synchronized with the updates.
>
> Have you considered Solr?  That would probably take care of most of the complications pretty much out of the box.
> http://lucene.apache.org/solr/features.html
>
>
> --
> Ian.
>
>
> On Mon, Dec 10, 2012 at 10:31 AM, Lars-Erik Aabech <LE...@markedspartner.no> wrote:
>> Hi!
>>
>> I'm using lucene.net, but I'm sure this question is not platform
>> specific. :) I've created an index for a website which uses a central database server and three front-end servers.
>> For now I've put the index and the building of the index on a fifth server which builds the index once an hour.
>> It uses a singleton with one directory and one writer for the lifetime
>> of the application. (currently infinite) I also created an http service which gets a reader from the writer and does the search.
>> The front-end servers get the search result via that service over a 1gb lan.
>> The problem is that this server has 100% cpu load quite often, and that affects the responsiveness of the service.
>> So I'm about to rewrite it so that the index is stored on a SAN available to all servers.
>> The front-end servers will then open their own singletons of directory and do the searches themselves.
>>
>> My question is as follows,
>> Given ~17,000 documents and ~120,000 terms, would you open and close a
>> reader for each search, or would you keep a singleton reader and re-open it say every hour?
>>
>> mvh.
>> Lars-Erik Aabech
>> Faglig leder utvikling
>> MarkedsPartner AS
>> Mobil: +47 920 30 537
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Deciding how to use reader

Posted by Lars-Erik Aabech <LE...@markedspartner.no>.
OK, thanks. As far as I understand, there's no problem performance wise or other to use a singleton reader for multiple threads/users, right?

I've considered Solr, but my conclusion is that installing and configuring something a bit geeky to configure, running on Java when all our stuff is Windows/MSSQL/.net just to save myself from using lucene.net within my existing c# code, seems like overkill. :)
We also have some very specific configuration and UI needs for the system we are indexing, and it's all ended up quite nice. The main reason was that our SQL searches (like '%...%') combined with crappy legacy rendering took a minute, the current lucene searches with security, highlighting, other rules, multiple fields etc. takes <1 sec.

Besides, it's more fun getting to know and tamper around with lucene itself. 

Lars-Erik

-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com] 
Sent: 10. desember 2012 13:00
To: java-user@lucene.apache.org
Subject: Re: Deciding how to use reader

If the index is only updated once an hour I'd create a new reader once an hour as well, synchronized with the updates.

Have you considered Solr?  That would probably take care of most of the complications pretty much out of the box.
http://lucene.apache.org/solr/features.html


--
Ian.


On Mon, Dec 10, 2012 at 10:31 AM, Lars-Erik Aabech <LE...@markedspartner.no> wrote:
> Hi!
>
> I'm using lucene.net, but I'm sure this question is not platform 
> specific. :) I've created an index for a website which uses a central database server and three front-end servers.
> For now I've put the index and the building of the index on a fifth server which builds the index once an hour.
> It uses a singleton with one directory and one writer for the lifetime 
> of the application. (currently infinite) I also created an http service which gets a reader from the writer and does the search.
> The front-end servers get the search result via that service over a 1gb lan.
> The problem is that this server has 100% cpu load quite often, and that affects the responsiveness of the service.
> So I'm about to rewrite it so that the index is stored on a SAN available to all servers.
> The front-end servers will then open their own singletons of directory and do the searches themselves.
>
> My question is as follows,
> Given ~17,000 documents and ~120,000 terms, would you open and close a 
> reader for each search, or would you keep a singleton reader and re-open it say every hour?
>
> mvh.
> Lars-Erik Aabech
> Faglig leder utvikling
> MarkedsPartner AS
> Mobil: +47 920 30 537
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Deciding how to use reader

Posted by Ian Lea <ia...@gmail.com>.
If the index is only updated once an hour I'd create a new reader once
an hour as well, synchronized with the updates.

Have you considered Solr?  That would probably take care of most of
the complications pretty much out of the box.
http://lucene.apache.org/solr/features.html


--
Ian.


On Mon, Dec 10, 2012 at 10:31 AM, Lars-Erik Aabech
<LE...@markedspartner.no> wrote:
> Hi!
>
> I'm using lucene.net, but I'm sure this question is not platform specific. :)
> I've created an index for a website which uses a central database server and three front-end servers.
> For now I've put the index and the building of the index on a fifth server which builds the index once an hour.
> It uses a singleton with one directory and one writer for the lifetime of the application. (currently infinite)
> I also created an http service which gets a reader from the writer and does the search.
> The front-end servers get the search result via that service over a 1gb lan.
> The problem is that this server has 100% cpu load quite often, and that affects the responsiveness of the service.
> So I'm about to rewrite it so that the index is stored on a SAN available to all servers.
> The front-end servers will then open their own singletons of directory and do the searches themselves.
>
> My question is as follows,
> Given ~17,000 documents and ~120,000 terms, would you open and close a reader for each search,
> or would you keep a singleton reader and re-open it say every hour?
>
> mvh.
> Lars-Erik Aabech
> Faglig leder utvikling
> MarkedsPartner AS
> Mobil: +47 920 30 537
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org