You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by ezer <an...@adinet.com.uy> on 2008/08/05 16:21:17 UTC

Lucene Performance and usage alternatives

I just made a program using the java api of Lucene. Its is working fine for
my actually index size. But i am worried about performance with an biger
index and simultaneous users access.

1) I am worried with the fact of having to make the program in java. I
searched for alternative like the C Port, but i saw that the version used
its a little old an no much people seem to use that.

2) I also thinking in compiling the code with cgj to generate native code
and not use the jvm. Anybody tried it ? Can be an advantage that could
aproximate to the performance of a C program ?

3) I wont use an application server, i will call the program directly from a
php page, is there any architecture model suggested for doing that? I mean
for preview many users accessing to the program. The fact of initiating one
isntance each time someone do a query and opening the index should not
degrade the performance?
-- 
View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: Lucene Performance and usage alternatives

Posted by ezer <an...@adinet.com.uy>.
Grant, wich other information can i provide in order to clarify my questions?



ezer wrote:
> 
> Yes i saw that.. it talks about performance, but not about the variants i
> mentioned before.
> Actually i tested indexing a database of about 200.000 registers. As i
> mentioned it works fine with response of less than a second. But this
> database can grow to millions of registers, and not sure if i am choosing
> the best architecture for that step to allow simultaneous accesing.
> 
> Thanks for the help
> 
> 
> Grant Ingersoll-6 wrote:
>> 
>> Before we go solving a problem that isn't necessarily there, can you  
>> share a bit about what sizes you are at currently?  Num docs, index  
>> size, query rate?
>> 
>> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance 
>>    ?
>> 
>> -Grant
>> 
>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>> 
>>>
>>> I just made a program using the java api of Lucene. Its is working  
>>> fine for
>>> my actually index size. But i am worried about performance with an  
>>> biger
>>> index and simultaneous users access.
>>>
>>> 1) I am worried with the fact of having to make the program in java. I
>>> searched for alternative like the C Port, but i saw that the version  
>>> used
>>> its a little old an no much people seem to use that.
>>>
>>> 2) I also thinking in compiling the code with cgj to generate native  
>>> code
>>> and not use the jvm. Anybody tried it ? Can be an advantage that could
>>> aproximate to the performance of a C program ?
>>>
>>> 3) I wont use an application server, i will call the program  
>>> directly from a
>>> php page, is there any architecture model suggested for doing that?  
>>> I mean
>>> for preview many users accessing to the program. The fact of  
>>> initiating one
>>> isntance each time someone do a query and opening the index should not
>>> degrade the performance?
>> 
>> You shouldn't be instantiating a Reader/Searcher for each query.  See  
>> the link above.
>> 
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>> 
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18834310.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: Lucene Performance and usage alternatives

Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 5, 2008, at 2:29 PM, ezer wrote:

>
> Thanks Stefan and Grant.
> Yes solr seems very intresting i tried once, i am seeing now the  
> part of the
> php client you mentioned.
> What hapens if rhater than starting a server that opens a port to  
> listen to
> requests, i call from php every time i need to search using for  
> example
> exec(theSearchingProgram...., $arrayResult).

That won't perform.  The main cost of searching is loading up the  
index and you would have to do that every time.

> By now is the solution i am
> testing, but i am not sure if it is an elegant way of use this. I  
> would like
> to know the pros and cons from each solution, in the first instance  
> i think
> that opening a port has a  security issue behind.

What kind of environment are you in that you can't secure the port?  
I'm not a security expert, but starting points would be to allow only  
from a given IP, use SSL, put behind a firewall, etc.   Treat Solr  
just as you treat a database in the typical tiered architecture.

-Grant

Re: Lucene Performance and usage alternatives

Posted by Grant Ingersoll <gs...@apache.org>.
Ezer,

I've never tried it, but I just downloaded the wpSearch Wordpress  
plugin, that uses Zend Search for Lucene: http://devzone.zend.com/node/view/id/91

So, it seems you could do PHP search that way, too.

-Grant

On Aug 5, 2008, at 2:29 PM, ezer wrote:

>
> Thanks Stefan and Grant.
> Yes solr seems very intresting i tried once, i am seeing now the  
> part of the
> php client you mentioned.
> What hapens if rhater than starting a server that opens a port to  
> listen to
> requests, i call from php every time i need to search using for  
> example
> exec(theSearchingProgram...., $arrayResult). By now is the solution  
> i am
> testing, but i am not sure if it is an elegant way of use this. I  
> would like
> to know the pros and cons from each solution, in the first instance  
> i think
> that opening a port has a  security issue behind.
>
>
>
> Grant Ingersoll-6 wrote:
>>
>> My point is more that you don't necessarily need to go looking for
>> variants.  I've seen Lucene Java scale to millions no problem.  I
>> talked w/ a guy using Solr this past week who had ~80 million records
>> in a single 80 gb index on one machine.
>>
>> If I had a PHP front end, I would most likely start with Solr and  
>> it's
>> PHP client.  No sense in reinventing the wheel, IMO.
>>
>> On Aug 5, 2008, at 11:15 AM, ezer wrote:
>>
>>>
>>> Yes i saw that.. it talks about performance, but not about the
>>> variants i
>>> mentioned before.
>>> Actually i tested indexing a database of about 200.000 registers.  
>>> As i
>>> mentioned it works fine with response of less than a second. But  
>>> this
>>> database can grow to millions of registers, and not sure if i am
>>> choosing
>>> the best architecture for that step to allow simultaneous accesing.
>>>
>>> Thanks for the help
>>>
>>>
>>> Grant Ingersoll-6 wrote:
>>>>
>>>> Before we go solving a problem that isn't necessarily there, can  
>>>> you
>>>> share a bit about what sizes you are at currently?  Num docs, index
>>>> size, query rate?
>>>>
>>>> Have you looked at
>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>  ?
>>>>
>>>> -Grant
>>>>
>>>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>>>
>>>>>
>>>>> I just made a program using the java api of Lucene. Its is working
>>>>> fine for
>>>>> my actually index size. But i am worried about performance with an
>>>>> biger
>>>>> index and simultaneous users access.
>>>>>
>>>>> 1) I am worried with the fact of having to make the program in
>>>>> java. I
>>>>> searched for alternative like the C Port, but i saw that the  
>>>>> version
>>>>> used
>>>>> its a little old an no much people seem to use that.
>>>>>
>>>>> 2) I also thinking in compiling the code with cgj to generate  
>>>>> native
>>>>> code
>>>>> and not use the jvm. Anybody tried it ? Can be an advantage that
>>>>> could
>>>>> aproximate to the performance of a C program ?
>>>>>
>>>>> 3) I wont use an application server, i will call the program
>>>>> directly from a
>>>>> php page, is there any architecture model suggested for doing  
>>>>> that?
>>>>> I mean
>>>>> for preview many users accessing to the program. The fact of
>>>>> initiating one
>>>>> isntance each time someone do a query and opening the index should
>>>>> not
>>>>> degrade the performance?
>>>>
>>>> You shouldn't be instantiating a Reader/Searcher for each query.   
>>>> See
>>>> the link above.
>>>>
>>>>>
>>>>> -- 
>>>>> View this message in context:
>>>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18837195.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








Re: Lucene Performance and usage alternatives

Posted by ezer <an...@adinet.com.uy>.
Thanks Stefan and Grant.
Yes solr seems very intresting i tried once, i am seeing now the part of the
php client you mentioned.
What hapens if rhater than starting a server that opens a port to listen to
requests, i call from php every time i need to search using for example
exec(theSearchingProgram...., $arrayResult). By now is the solution i am
testing, but i am not sure if it is an elegant way of use this. I would like
to know the pros and cons from each solution, in the first instance i think
that opening a port has a  security issue behind.



Grant Ingersoll-6 wrote:
> 
> My point is more that you don't necessarily need to go looking for  
> variants.  I've seen Lucene Java scale to millions no problem.  I  
> talked w/ a guy using Solr this past week who had ~80 million records  
> in a single 80 gb index on one machine.
> 
> If I had a PHP front end, I would most likely start with Solr and it's  
> PHP client.  No sense in reinventing the wheel, IMO.
> 
> On Aug 5, 2008, at 11:15 AM, ezer wrote:
> 
>>
>> Yes i saw that.. it talks about performance, but not about the  
>> variants i
>> mentioned before.
>> Actually i tested indexing a database of about 200.000 registers. As i
>> mentioned it works fine with response of less than a second. But this
>> database can grow to millions of registers, and not sure if i am  
>> choosing
>> the best architecture for that step to allow simultaneous accesing.
>>
>> Thanks for the help
>>
>>
>> Grant Ingersoll-6 wrote:
>>>
>>> Before we go solving a problem that isn't necessarily there, can you
>>> share a bit about what sizes you are at currently?  Num docs, index
>>> size, query rate?
>>>
>>> Have you looked at
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>   ?
>>>
>>> -Grant
>>>
>>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>>
>>>>
>>>> I just made a program using the java api of Lucene. Its is working
>>>> fine for
>>>> my actually index size. But i am worried about performance with an
>>>> biger
>>>> index and simultaneous users access.
>>>>
>>>> 1) I am worried with the fact of having to make the program in  
>>>> java. I
>>>> searched for alternative like the C Port, but i saw that the version
>>>> used
>>>> its a little old an no much people seem to use that.
>>>>
>>>> 2) I also thinking in compiling the code with cgj to generate native
>>>> code
>>>> and not use the jvm. Anybody tried it ? Can be an advantage that  
>>>> could
>>>> aproximate to the performance of a C program ?
>>>>
>>>> 3) I wont use an application server, i will call the program
>>>> directly from a
>>>> php page, is there any architecture model suggested for doing that?
>>>> I mean
>>>> for preview many users accessing to the program. The fact of
>>>> initiating one
>>>> isntance each time someone do a query and opening the index should  
>>>> not
>>>> degrade the performance?
>>>
>>> You shouldn't be instantiating a Reader/Searcher for each query.  See
>>> the link above.
>>>
>>>>
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>
> 
> 
> 
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18837195.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: Lucene Performance and usage alternatives

Posted by Grant Ingersoll <gs...@apache.org>.
My point is more that you don't necessarily need to go looking for  
variants.  I've seen Lucene Java scale to millions no problem.  I  
talked w/ a guy using Solr this past week who had ~80 million records  
in a single 80 gb index on one machine.

If I had a PHP front end, I would most likely start with Solr and it's  
PHP client.  No sense in reinventing the wheel, IMO.

On Aug 5, 2008, at 11:15 AM, ezer wrote:

>
> Yes i saw that.. it talks about performance, but not about the  
> variants i
> mentioned before.
> Actually i tested indexing a database of about 200.000 registers. As i
> mentioned it works fine with response of less than a second. But this
> database can grow to millions of registers, and not sure if i am  
> choosing
> the best architecture for that step to allow simultaneous accesing.
>
> Thanks for the help
>
>
> Grant Ingersoll-6 wrote:
>>
>> Before we go solving a problem that isn't necessarily there, can you
>> share a bit about what sizes you are at currently?  Num docs, index
>> size, query rate?
>>
>> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>   ?
>>
>> -Grant
>>
>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>
>>>
>>> I just made a program using the java api of Lucene. Its is working
>>> fine for
>>> my actually index size. But i am worried about performance with an
>>> biger
>>> index and simultaneous users access.
>>>
>>> 1) I am worried with the fact of having to make the program in  
>>> java. I
>>> searched for alternative like the C Port, but i saw that the version
>>> used
>>> its a little old an no much people seem to use that.
>>>
>>> 2) I also thinking in compiling the code with cgj to generate native
>>> code
>>> and not use the jvm. Anybody tried it ? Can be an advantage that  
>>> could
>>> aproximate to the performance of a C program ?
>>>
>>> 3) I wont use an application server, i will call the program
>>> directly from a
>>> php page, is there any architecture model suggested for doing that?
>>> I mean
>>> for preview many users accessing to the program. The fact of
>>> initiating one
>>> isntance each time someone do a query and opening the index should  
>>> not
>>> degrade the performance?
>>
>> You shouldn't be instantiating a Reader/Searcher for each query.  See
>> the link above.
>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>>
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>







Re: Lucene Performance and usage alternatives

Posted by ezer <an...@adinet.com.uy>.
Yes i saw that.. it talks about performance, but not about the variants i
mentioned before.
Actually i tested indexing a database of about 200.000 registers. As i
mentioned it works fine with response of less than a second. But this
database can grow to millions of registers, and not sure if i am choosing
the best architecture for that step to allow simultaneous accesing.

Thanks for the help


Grant Ingersoll-6 wrote:
> 
> Before we go solving a problem that isn't necessarily there, can you  
> share a bit about what sizes you are at currently?  Num docs, index  
> size, query rate?
> 
> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance 
>    ?
> 
> -Grant
> 
> On Aug 5, 2008, at 10:21 AM, ezer wrote:
> 
>>
>> I just made a program using the java api of Lucene. Its is working  
>> fine for
>> my actually index size. But i am worried about performance with an  
>> biger
>> index and simultaneous users access.
>>
>> 1) I am worried with the fact of having to make the program in java. I
>> searched for alternative like the C Port, but i saw that the version  
>> used
>> its a little old an no much people seem to use that.
>>
>> 2) I also thinking in compiling the code with cgj to generate native  
>> code
>> and not use the jvm. Anybody tried it ? Can be an advantage that could
>> aproximate to the performance of a C program ?
>>
>> 3) I wont use an application server, i will call the program  
>> directly from a
>> php page, is there any architecture model suggested for doing that?  
>> I mean
>> for preview many users accessing to the program. The fact of  
>> initiating one
>> isntance each time someone do a query and opening the index should not
>> degrade the performance?
> 
> You shouldn't be instantiating a Reader/Searcher for each query.  See  
> the link above.
> 
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: Lucene Performance and usage alternatives

Posted by Grant Ingersoll <gs...@apache.org>.
Before we go solving a problem that isn't necessarily there, can you  
share a bit about what sizes you are at currently?  Num docs, index  
size, query rate?

Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance 
   ?

-Grant

On Aug 5, 2008, at 10:21 AM, ezer wrote:

>
> I just made a program using the java api of Lucene. Its is working  
> fine for
> my actually index size. But i am worried about performance with an  
> biger
> index and simultaneous users access.
>
> 1) I am worried with the fact of having to make the program in java. I
> searched for alternative like the C Port, but i saw that the version  
> used
> its a little old an no much people seem to use that.
>
> 2) I also thinking in compiling the code with cgj to generate native  
> code
> and not use the jvm. Anybody tried it ? Can be an advantage that could
> aproximate to the performance of a C program ?
>
> 3) I wont use an application server, i will call the program  
> directly from a
> php page, is there any architecture model suggested for doing that?  
> I mean
> for preview many users accessing to the program. The fact of  
> initiating one
> isntance each time someone do a query and opening the index should not
> degrade the performance?

You shouldn't be instantiating a Reader/Searcher for each query.  See  
the link above.

>
> -- 
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>



Re: Lucene Performance and usage alternatives

Posted by Stefan Groschupf <sg...@101tec.com>.
An alternative is always to distribute the index to a set of servers.  
If you need to scale I guess this is the only long term perspective.
You can do your own home grown lucene distribution or look into  
existing one.
I'm currently working on katta (http://katta.wiki.sourceforge.net/) -  
there is no release yet but we are in the QA and test cycles.
But there are other as well - solar for example provides distribution  
as well.

Stefan


On Aug 5, 2008, at 7:21 AM, ezer wrote:

>
> I just made a program using the java api of Lucene. Its is working  
> fine for
> my actually index size. But i am worried about performance with an  
> biger
> index and simultaneous users access.
>
> 1) I am worried with the fact of having to make the program in java. I
> searched for alternative like the C Port, but i saw that the version  
> used
> its a little old an no much people seem to use that.
>
> 2) I also thinking in compiling the code with cgj to generate native  
> code
> and not use the jvm. Anybody tried it ? Can be an advantage that could
> aproximate to the performance of a C program ?
>
> 3) I wont use an application server, i will call the program  
> directly from a
> php page, is there any architecture model suggested for doing that?  
> I mean
> for preview many users accessing to the program. The fact of  
> initiating one
> isntance each time someone do a query and opening the index should not
> degrade the performance?
> -- 
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com