You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jack L <jl...@yahoo.ca> on 2007/02/21 22:25:29 UTC
Re[4]: solr performance
Thanks for all who replied.
> my number 1000 was per minute, not second!
I can't read! :-p
> couple of times today at around 158 documents / sec.
This is not bad at all. How about search performance?
How many concurrent queries have people been having?
What does the response time look like?
> Thanks to the others that clarified. I run my indexers in
> parallel... but a single instance of Solr (which in turn handles
> requests in parallel as well).
Do you feel if multi-threaded posting is helpful?
I suppose when solr does indexing, it's bound more
on solr indexer than the poster?
Jack
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Re: solr performance
Posted by Mark Miller <ma...@gmail.com>.
Yonik Seeley wrote:
>
> Not sure what would be the best
> for error handling though - perhaps just polling (allow user to ask
> for failed or successful operations).
>
Thats how I've handled similar situations in the past. Your submitting a
batch of data to be processed, and if your so inclined to see how it
went, you can inspect some kind of report object. If the batch process
blocks, you could return the report object, or if not, you could return
a batch/job id (with reports valid for x amount of time after they are
done?).
It seems like a sound enough method to me, but it would be interesting
to hear if someone has a better idea.
- Mark
Re: solr performance
Posted by Chris Hostetter <ho...@fucit.org>.
:
: Not sure how that would work (unless you didn't want responses), but
: I've thought about it from the SolrJ side - something you could
: quickly add documents to and it would manage a number of threads under
: the covers to maximize throughput. Not sure what would be the best
: for error handling though - perhaps just polling (allow user to ask
: for failed or successful operations).
the j.u.concurrent simplifies this type of problem a lot ... the Future
interface is probably the most straight forward way to let the caller
poll.
-Hoss
Re: solr performance
Posted by Ryan McKinley <ry...@gmail.com>.
For a similar idea, check:
https://issues.apache.org/jira/browse/SOLR-906
This opens a single stream and writes all documents to that. It could
easily be extended to have multiple threads draining the same Queue
On Dec 9, 2008, at 4:02 AM, Noble Paul നോബിള്
नोब्ळ् wrote:
> I guess this is the best idea . Let us have a new BatchHttpSolrServer
> which can help achieve this
> --Noble
>
> On Thu, Dec 4, 2008 at 7:14 PM, Yonik Seeley <yo...@apache.org> wrote:
>> On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller <ma...@gmail.com>
>> wrote:
>>> Kick off some indexing more than once - eg, post a folder of docs,
>>> and while
>>> thats working, post another.
>>>
>>> I've been thinking about a multi threaded UpdateProcessor as well
>>> - that
>>> could be interesting.
>>
>> Not sure how that would work (unless you didn't want responses), but
>> I've thought about it from the SolrJ side - something you could
>> quickly add documents to and it would manage a number of threads
>> under
>> the covers to maximize throughput. Not sure what would be the best
>> for error handling though - perhaps just polling (allow user to ask
>> for failed or successful operations).
>>
>> -Yonik
>>
>
>
>
> --
> --Noble Paul
Re: solr performance
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
I guess this is the best idea . Let us have a new BatchHttpSolrServer
which can help achieve this
--Noble
On Thu, Dec 4, 2008 at 7:14 PM, Yonik Seeley <yo...@apache.org> wrote:
> On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller <ma...@gmail.com> wrote:
>> Kick off some indexing more than once - eg, post a folder of docs, and while
>> thats working, post another.
>>
>> I've been thinking about a multi threaded UpdateProcessor as well - that
>> could be interesting.
>
> Not sure how that would work (unless you didn't want responses), but
> I've thought about it from the SolrJ side - something you could
> quickly add documents to and it would manage a number of threads under
> the covers to maximize throughput. Not sure what would be the best
> for error handling though - perhaps just polling (allow user to ask
> for failed or successful operations).
>
> -Yonik
>
--
--Noble Paul
Re: solr performance
Posted by Yonik Seeley <yo...@apache.org>.
On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller <ma...@gmail.com> wrote:
> Kick off some indexing more than once - eg, post a folder of docs, and while
> thats working, post another.
>
> I've been thinking about a multi threaded UpdateProcessor as well - that
> could be interesting.
Not sure how that would work (unless you didn't want responses), but
I've thought about it from the SolrJ side - something you could
quickly add documents to and it would manage a number of threads under
the covers to maximize throughput. Not sure what would be the best
for error handling though - perhaps just polling (allow user to ask
for failed or successful operations).
-Yonik
Re: solr performance
Posted by sunnyfr <jo...@gmail.com>.
Ok ...
Actually my problem is more multi thread which take long time ... like 3sec
when 100 threads/sec.
I thought that could have helped me .. but no link actually :s
sorry
markrmiller wrote:
>
> Kick off some indexing more than once - eg, post a folder of docs, and
> while thats working, post another.
>
> I've been thinking about a multi threaded UpdateProcessor as well - that
> could be interesting.
>
> - Mark
>
> sunnyfr wrote:
>> Hi,
>> I was reading this post and I wondering how can I parallelize document
>> processing???
>> Thanks Erik
>>
>>
>> Erik Hatcher wrote:
>>
>>> On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>>>
>>>>> couple of times today at around 158 documents / sec.
>>>>>
>>>> This is not bad at all. How about search performance?
>>>> How many concurrent queries have people been having?
>>>> What does the response time look like?
>>>>
>>> I'm the only user :) What I've done is a proof-of-concept for our
>>> library. We have 3.7M records that I've indexed and faceted. Search
>>> performance (in my unrealistic single user scenario) is blazing (50ms
>>> or so) for purely full-text queries. For queries that return facets,
>>> the response times are actually quite good too (~900ms, or less
>>> depending on the request) - provided the filter cache is warmed and
>>> large enough. This is running on my laptop (MacBook Pro, 2GB RAM,
>>> 1.83GHz) - I'm sure on a beefier box it'll only get better.
>>>
>>>
>>>>> Thanks to the others that clarified. I run my indexers in
>>>>> parallel... but a single instance of Solr (which in turn handles
>>>>> requests in parallel as well).
>>>>>
>>>> Do you feel if multi-threaded posting is helpful?
>>>>
>>> It depends. If the data processing can be parallelized and your
>>> hardware supports it, it can certainly make a big difference... it
>>> did in my case. Both CPUs were cooking during my parallel indexing
>>> runs.
>>>
>>> Erik
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>
--
View this message in context: http://www.nabble.com/solr-performance-tp9055437p20833662.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr performance
Posted by Mark Miller <ma...@gmail.com>.
Kick off some indexing more than once - eg, post a folder of docs, and
while thats working, post another.
I've been thinking about a multi threaded UpdateProcessor as well - that
could be interesting.
- Mark
sunnyfr wrote:
> Hi,
> I was reading this post and I wondering how can I parallelize document
> processing???
> Thanks Erik
>
>
> Erik Hatcher wrote:
>
>> On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>>
>>>> couple of times today at around 158 documents / sec.
>>>>
>>> This is not bad at all. How about search performance?
>>> How many concurrent queries have people been having?
>>> What does the response time look like?
>>>
>> I'm the only user :) What I've done is a proof-of-concept for our
>> library. We have 3.7M records that I've indexed and faceted. Search
>> performance (in my unrealistic single user scenario) is blazing (50ms
>> or so) for purely full-text queries. For queries that return facets,
>> the response times are actually quite good too (~900ms, or less
>> depending on the request) - provided the filter cache is warmed and
>> large enough. This is running on my laptop (MacBook Pro, 2GB RAM,
>> 1.83GHz) - I'm sure on a beefier box it'll only get better.
>>
>>
>>>> Thanks to the others that clarified. I run my indexers in
>>>> parallel... but a single instance of Solr (which in turn handles
>>>> requests in parallel as well).
>>>>
>>> Do you feel if multi-threaded posting is helpful?
>>>
>> It depends. If the data processing can be parallelized and your
>> hardware supports it, it can certainly make a big difference... it
>> did in my case. Both CPUs were cooking during my parallel indexing
>> runs.
>>
>> Erik
>>
>>
>>
>>
>>
>>
>
>
Re: Re[4]: solr performance
Posted by sunnyfr <jo...@gmail.com>.
Hi,
I was reading this post and I wondering how can I parallelize document
processing???
Thanks Erik
Erik Hatcher wrote:
>
>
> On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>>> couple of times today at around 158 documents / sec.
>>
>> This is not bad at all. How about search performance?
>> How many concurrent queries have people been having?
>> What does the response time look like?
>
> I'm the only user :) What I've done is a proof-of-concept for our
> library. We have 3.7M records that I've indexed and faceted. Search
> performance (in my unrealistic single user scenario) is blazing (50ms
> or so) for purely full-text queries. For queries that return facets,
> the response times are actually quite good too (~900ms, or less
> depending on the request) - provided the filter cache is warmed and
> large enough. This is running on my laptop (MacBook Pro, 2GB RAM,
> 1.83GHz) - I'm sure on a beefier box it'll only get better.
>
>>> Thanks to the others that clarified. I run my indexers in
>>> parallel... but a single instance of Solr (which in turn handles
>>> requests in parallel as well).
>>
>> Do you feel if multi-threaded posting is helpful?
>
> It depends. If the data processing can be parallelized and your
> hardware supports it, it can certainly make a big difference... it
> did in my case. Both CPUs were cooking during my parallel indexing
> runs.
>
> Erik
>
>
>
>
>
--
View this message in context: http://www.nabble.com/solr-performance-tp9055437p20833421.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re[4]: solr performance
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>> couple of times today at around 158 documents / sec.
>
> This is not bad at all. How about search performance?
> How many concurrent queries have people been having?
> What does the response time look like?
I'm the only user :) What I've done is a proof-of-concept for our
library. We have 3.7M records that I've indexed and faceted. Search
performance (in my unrealistic single user scenario) is blazing (50ms
or so) for purely full-text queries. For queries that return facets,
the response times are actually quite good too (~900ms, or less
depending on the request) - provided the filter cache is warmed and
large enough. This is running on my laptop (MacBook Pro, 2GB RAM,
1.83GHz) - I'm sure on a beefier box it'll only get better.
>> Thanks to the others that clarified. I run my indexers in
>> parallel... but a single instance of Solr (which in turn handles
>> requests in parallel as well).
>
> Do you feel if multi-threaded posting is helpful?
It depends. If the data processing can be parallelized and your
hardware supports it, it can certainly make a big difference... it
did in my case. Both CPUs were cooking during my parallel indexing
runs.
Erik
Re: Re[4]: solr performance
Posted by Mike Klaas <mi...@gmail.com>.
On 2/21/07, Jack L <jl...@yahoo.ca> wrote:
> > Thanks to the others that clarified. I run my indexers in
> > parallel... but a single instance of Solr (which in turn handles
> > requests in parallel as well).
>
> Do you feel if multi-threaded posting is helpful?
> I suppose when solr does indexing, it's bound more
> on solr indexer than the poster?
It certainly is bound more on solr than the poster, but I've found
multithreading beneficial as it removes whatever latency factors might
exist--http connections, xml parsing, i/o, the poster, etc. For us,
concurrent analysis was less of a gain, but then again our analysis is
relatively light.
-Mike