You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by ezer <an...@adinet.com.uy> on 2008/08/05 16:21:17 UTC
Lucene Performance and usage alternatives
I just made a program using the java api of Lucene. Its is working fine for
my actually index size. But i am worried about performance with an biger
index and simultaneous users access.
1) I am worried with the fact of having to make the program in java. I
searched for alternative like the C Port, but i saw that the version used
its a little old an no much people seem to use that.
2) I also thinking in compiling the code with cgj to generate native code
and not use the jvm. Anybody tried it ? Can be an advantage that could
aproximate to the performance of a C program ?
3) I wont use an application server, i will call the program directly from a
php page, is there any architecture model suggested for doing that? I mean
for preview many users accessing to the program. The fact of initiating one
isntance each time someone do a query and opening the index should not
degrade the performance?
--
View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
Posted by ezer <an...@adinet.com.uy>.
Grant, wich other information can i provide in order to clarify my questions?
ezer wrote:
>
> Yes i saw that.. it talks about performance, but not about the variants i
> mentioned before.
> Actually i tested indexing a database of about 200.000 registers. As i
> mentioned it works fine with response of less than a second. But this
> database can grow to millions of registers, and not sure if i am choosing
> the best architecture for that step to allow simultaneous accesing.
>
> Thanks for the help
>
>
> Grant Ingersoll-6 wrote:
>>
>> Before we go solving a problem that isn't necessarily there, can you
>> share a bit about what sizes you are at currently? Num docs, index
>> size, query rate?
>>
>> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> ?
>>
>> -Grant
>>
>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>
>>>
>>> I just made a program using the java api of Lucene. Its is working
>>> fine for
>>> my actually index size. But i am worried about performance with an
>>> biger
>>> index and simultaneous users access.
>>>
>>> 1) I am worried with the fact of having to make the program in java. I
>>> searched for alternative like the C Port, but i saw that the version
>>> used
>>> its a little old an no much people seem to use that.
>>>
>>> 2) I also thinking in compiling the code with cgj to generate native
>>> code
>>> and not use the jvm. Anybody tried it ? Can be an advantage that could
>>> aproximate to the performance of a C program ?
>>>
>>> 3) I wont use an application server, i will call the program
>>> directly from a
>>> php page, is there any architecture model suggested for doing that?
>>> I mean
>>> for preview many users accessing to the program. The fact of
>>> initiating one
>>> isntance each time someone do a query and opening the index should not
>>> degrade the performance?
>>
>> You shouldn't be instantiating a Reader/Searcher for each query. See
>> the link above.
>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>>
>>
>>
>>
>
>
--
View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18834310.html
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 5, 2008, at 2:29 PM, ezer wrote:
>
> Thanks Stefan and Grant.
> Yes solr seems very intresting i tried once, i am seeing now the
> part of the
> php client you mentioned.
> What hapens if rhater than starting a server that opens a port to
> listen to
> requests, i call from php every time i need to search using for
> example
> exec(theSearchingProgram...., $arrayResult).
That won't perform. The main cost of searching is loading up the
index and you would have to do that every time.
> By now is the solution i am
> testing, but i am not sure if it is an elegant way of use this. I
> would like
> to know the pros and cons from each solution, in the first instance
> i think
> that opening a port has a security issue behind.
What kind of environment are you in that you can't secure the port?
I'm not a security expert, but starting points would be to allow only
from a given IP, use SSL, put behind a firewall, etc. Treat Solr
just as you treat a database in the typical tiered architecture.
-Grant
Re: Lucene Performance and usage alternatives
Posted by Grant Ingersoll <gs...@apache.org>.
Ezer,
I've never tried it, but I just downloaded the wpSearch Wordpress
plugin, that uses Zend Search for Lucene: http://devzone.zend.com/node/view/id/91
So, it seems you could do PHP search that way, too.
-Grant
On Aug 5, 2008, at 2:29 PM, ezer wrote:
>
> Thanks Stefan and Grant.
> Yes solr seems very intresting i tried once, i am seeing now the
> part of the
> php client you mentioned.
> What hapens if rhater than starting a server that opens a port to
> listen to
> requests, i call from php every time i need to search using for
> example
> exec(theSearchingProgram...., $arrayResult). By now is the solution
> i am
> testing, but i am not sure if it is an elegant way of use this. I
> would like
> to know the pros and cons from each solution, in the first instance
> i think
> that opening a port has a security issue behind.
>
>
>
> Grant Ingersoll-6 wrote:
>>
>> My point is more that you don't necessarily need to go looking for
>> variants. I've seen Lucene Java scale to millions no problem. I
>> talked w/ a guy using Solr this past week who had ~80 million records
>> in a single 80 gb index on one machine.
>>
>> If I had a PHP front end, I would most likely start with Solr and
>> it's
>> PHP client. No sense in reinventing the wheel, IMO.
>>
>> On Aug 5, 2008, at 11:15 AM, ezer wrote:
>>
>>>
>>> Yes i saw that.. it talks about performance, but not about the
>>> variants i
>>> mentioned before.
>>> Actually i tested indexing a database of about 200.000 registers.
>>> As i
>>> mentioned it works fine with response of less than a second. But
>>> this
>>> database can grow to millions of registers, and not sure if i am
>>> choosing
>>> the best architecture for that step to allow simultaneous accesing.
>>>
>>> Thanks for the help
>>>
>>>
>>> Grant Ingersoll-6 wrote:
>>>>
>>>> Before we go solving a problem that isn't necessarily there, can
>>>> you
>>>> share a bit about what sizes you are at currently? Num docs, index
>>>> size, query rate?
>>>>
>>>> Have you looked at
>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>> ?
>>>>
>>>> -Grant
>>>>
>>>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>>>
>>>>>
>>>>> I just made a program using the java api of Lucene. Its is working
>>>>> fine for
>>>>> my actually index size. But i am worried about performance with an
>>>>> biger
>>>>> index and simultaneous users access.
>>>>>
>>>>> 1) I am worried with the fact of having to make the program in
>>>>> java. I
>>>>> searched for alternative like the C Port, but i saw that the
>>>>> version
>>>>> used
>>>>> its a little old an no much people seem to use that.
>>>>>
>>>>> 2) I also thinking in compiling the code with cgj to generate
>>>>> native
>>>>> code
>>>>> and not use the jvm. Anybody tried it ? Can be an advantage that
>>>>> could
>>>>> aproximate to the performance of a C program ?
>>>>>
>>>>> 3) I wont use an application server, i will call the program
>>>>> directly from a
>>>>> php page, is there any architecture model suggested for doing
>>>>> that?
>>>>> I mean
>>>>> for preview many users accessing to the program. The fact of
>>>>> initiating one
>>>>> isntance each time someone do a query and opening the index should
>>>>> not
>>>>> degrade the performance?
>>>>
>>>> You shouldn't be instantiating a Reader/Searcher for each query.
>>>> See
>>>> the link above.
>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18837195.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
--------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Lucene Performance and usage alternatives
Posted by ezer <an...@adinet.com.uy>.
Thanks Stefan and Grant.
Yes solr seems very intresting i tried once, i am seeing now the part of the
php client you mentioned.
What hapens if rhater than starting a server that opens a port to listen to
requests, i call from php every time i need to search using for example
exec(theSearchingProgram...., $arrayResult). By now is the solution i am
testing, but i am not sure if it is an elegant way of use this. I would like
to know the pros and cons from each solution, in the first instance i think
that opening a port has a security issue behind.
Grant Ingersoll-6 wrote:
>
> My point is more that you don't necessarily need to go looking for
> variants. I've seen Lucene Java scale to millions no problem. I
> talked w/ a guy using Solr this past week who had ~80 million records
> in a single 80 gb index on one machine.
>
> If I had a PHP front end, I would most likely start with Solr and it's
> PHP client. No sense in reinventing the wheel, IMO.
>
> On Aug 5, 2008, at 11:15 AM, ezer wrote:
>
>>
>> Yes i saw that.. it talks about performance, but not about the
>> variants i
>> mentioned before.
>> Actually i tested indexing a database of about 200.000 registers. As i
>> mentioned it works fine with response of less than a second. But this
>> database can grow to millions of registers, and not sure if i am
>> choosing
>> the best architecture for that step to allow simultaneous accesing.
>>
>> Thanks for the help
>>
>>
>> Grant Ingersoll-6 wrote:
>>>
>>> Before we go solving a problem that isn't necessarily there, can you
>>> share a bit about what sizes you are at currently? Num docs, index
>>> size, query rate?
>>>
>>> Have you looked at
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> ?
>>>
>>> -Grant
>>>
>>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>>
>>>>
>>>> I just made a program using the java api of Lucene. Its is working
>>>> fine for
>>>> my actually index size. But i am worried about performance with an
>>>> biger
>>>> index and simultaneous users access.
>>>>
>>>> 1) I am worried with the fact of having to make the program in
>>>> java. I
>>>> searched for alternative like the C Port, but i saw that the version
>>>> used
>>>> its a little old an no much people seem to use that.
>>>>
>>>> 2) I also thinking in compiling the code with cgj to generate native
>>>> code
>>>> and not use the jvm. Anybody tried it ? Can be an advantage that
>>>> could
>>>> aproximate to the performance of a C program ?
>>>>
>>>> 3) I wont use an application server, i will call the program
>>>> directly from a
>>>> php page, is there any architecture model suggested for doing that?
>>>> I mean
>>>> for preview many users accessing to the program. The fact of
>>>> initiating one
>>>> isntance each time someone do a query and opening the index should
>>>> not
>>>> degrade the performance?
>>>
>>> You shouldn't be instantiating a Reader/Searcher for each query. See
>>> the link above.
>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>
>
>
>
>
>
>
>
>
--
View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18837195.html
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
Posted by Grant Ingersoll <gs...@apache.org>.
My point is more that you don't necessarily need to go looking for
variants. I've seen Lucene Java scale to millions no problem. I
talked w/ a guy using Solr this past week who had ~80 million records
in a single 80 gb index on one machine.
If I had a PHP front end, I would most likely start with Solr and it's
PHP client. No sense in reinventing the wheel, IMO.
On Aug 5, 2008, at 11:15 AM, ezer wrote:
>
> Yes i saw that.. it talks about performance, but not about the
> variants i
> mentioned before.
> Actually i tested indexing a database of about 200.000 registers. As i
> mentioned it works fine with response of less than a second. But this
> database can grow to millions of registers, and not sure if i am
> choosing
> the best architecture for that step to allow simultaneous accesing.
>
> Thanks for the help
>
>
> Grant Ingersoll-6 wrote:
>>
>> Before we go solving a problem that isn't necessarily there, can you
>> share a bit about what sizes you are at currently? Num docs, index
>> size, query rate?
>>
>> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> ?
>>
>> -Grant
>>
>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>
>>>
>>> I just made a program using the java api of Lucene. Its is working
>>> fine for
>>> my actually index size. But i am worried about performance with an
>>> biger
>>> index and simultaneous users access.
>>>
>>> 1) I am worried with the fact of having to make the program in
>>> java. I
>>> searched for alternative like the C Port, but i saw that the version
>>> used
>>> its a little old an no much people seem to use that.
>>>
>>> 2) I also thinking in compiling the code with cgj to generate native
>>> code
>>> and not use the jvm. Anybody tried it ? Can be an advantage that
>>> could
>>> aproximate to the performance of a C program ?
>>>
>>> 3) I wont use an application server, i will call the program
>>> directly from a
>>> php page, is there any architecture model suggested for doing that?
>>> I mean
>>> for preview many users accessing to the program. The fact of
>>> initiating one
>>> isntance each time someone do a query and opening the index should
>>> not
>>> degrade the performance?
>>
>> You shouldn't be instantiating a Reader/Searcher for each query. See
>> the link above.
>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
Re: Lucene Performance and usage alternatives
Posted by ezer <an...@adinet.com.uy>.
Yes i saw that.. it talks about performance, but not about the variants i
mentioned before.
Actually i tested indexing a database of about 200.000 registers. As i
mentioned it works fine with response of less than a second. But this
database can grow to millions of registers, and not sure if i am choosing
the best architecture for that step to allow simultaneous accesing.
Thanks for the help
Grant Ingersoll-6 wrote:
>
> Before we go solving a problem that isn't necessarily there, can you
> share a bit about what sizes you are at currently? Num docs, index
> size, query rate?
>
> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance
> ?
>
> -Grant
>
> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>
>>
>> I just made a program using the java api of Lucene. Its is working
>> fine for
>> my actually index size. But i am worried about performance with an
>> biger
>> index and simultaneous users access.
>>
>> 1) I am worried with the fact of having to make the program in java. I
>> searched for alternative like the C Port, but i saw that the version
>> used
>> its a little old an no much people seem to use that.
>>
>> 2) I also thinking in compiling the code with cgj to generate native
>> code
>> and not use the jvm. Anybody tried it ? Can be an advantage that could
>> aproximate to the performance of a C program ?
>>
>> 3) I wont use an application server, i will call the program
>> directly from a
>> php page, is there any architecture model suggested for doing that?
>> I mean
>> for preview many users accessing to the program. The fact of
>> initiating one
>> isntance each time someone do a query and opening the index should not
>> degrade the performance?
>
> You shouldn't be instantiating a Reader/Searcher for each query. See
> the link above.
>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>
>
>
>
>
--
View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
Posted by Grant Ingersoll <gs...@apache.org>.
Before we go solving a problem that isn't necessarily there, can you
share a bit about what sizes you are at currently? Num docs, index
size, query rate?
Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance
?
-Grant
On Aug 5, 2008, at 10:21 AM, ezer wrote:
>
> I just made a program using the java api of Lucene. Its is working
> fine for
> my actually index size. But i am worried about performance with an
> biger
> index and simultaneous users access.
>
> 1) I am worried with the fact of having to make the program in java. I
> searched for alternative like the C Port, but i saw that the version
> used
> its a little old an no much people seem to use that.
>
> 2) I also thinking in compiling the code with cgj to generate native
> code
> and not use the jvm. Anybody tried it ? Can be an advantage that could
> aproximate to the performance of a C program ?
>
> 3) I wont use an application server, i will call the program
> directly from a
> php page, is there any architecture model suggested for doing that?
> I mean
> for preview many users accessing to the program. The fact of
> initiating one
> isntance each time someone do a query and opening the index should not
> degrade the performance?
You shouldn't be instantiating a Reader/Searcher for each query. See
the link above.
>
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
Re: Lucene Performance and usage alternatives
Posted by Stefan Groschupf <sg...@101tec.com>.
An alternative is always to distribute the index to a set of servers.
If you need to scale I guess this is the only long term perspective.
You can do your own home grown lucene distribution or look into
existing one.
I'm currently working on katta (http://katta.wiki.sourceforge.net/) -
there is no release yet but we are in the QA and test cycles.
But there are other as well - solar for example provides distribution
as well.
Stefan
On Aug 5, 2008, at 7:21 AM, ezer wrote:
>
> I just made a program using the java api of Lucene. Its is working
> fine for
> my actually index size. But i am worried about performance with an
> biger
> index and simultaneous users access.
>
> 1) I am worried with the fact of having to make the program in java. I
> searched for alternative like the C Port, but i saw that the version
> used
> its a little old an no much people seem to use that.
>
> 2) I also thinking in compiling the code with cgj to generate native
> code
> and not use the jvm. Anybody tried it ? Can be an advantage that could
> aproximate to the performance of a C program ?
>
> 3) I wont use an application server, i will call the program
> directly from a
> php page, is there any architecture model suggested for doing that?
> I mean
> for preview many users accessing to the program. The fact of
> initiating one
> isntance each time someone do a query and opening the index should not
> degrade the performance?
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com