You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2008/12/04 14:39:14 UTC

Re: solr performance

Kick off some indexing more than once - eg, post a folder of docs, and 
while thats working, post another.

I've been thinking about a multi threaded UpdateProcessor as well - that 
could be interesting.

- Mark

sunnyfr wrote:
> Hi,
> I was reading this post and I wondering how can I parallelize document
> processing??? 
> Thanks Erik
>
>
> Erik Hatcher wrote:
>   
>> On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>>     
>>>> couple of times today at around 158 documents / sec.
>>>>         
>>> This is not bad at all. How about search performance?
>>> How many concurrent queries have people been having?
>>> What does the response time look like?
>>>       
>> I'm the only user :)   What I've done is a proof-of-concept for our  
>> library.  We have 3.7M records that I've indexed and faceted.  Search  
>> performance (in my unrealistic single user scenario) is blazing (50ms  
>> or so) for purely full-text queries.  For queries that return facets,  
>> the response times are actually quite good too (~900ms, or less  
>> depending on the request) - provided the filter cache is warmed and  
>> large enough.  This is running on my laptop (MacBook Pro, 2GB RAM,  
>> 1.83GHz) - I'm sure on a beefier box it'll only get better.
>>
>>     
>>>> Thanks to the others that clarified.  I run my indexers in
>>>> parallel... but a single instance of Solr (which in turn handles
>>>> requests in parallel as well).
>>>>         
>>> Do you feel if multi-threaded posting is helpful?
>>>       
>> It depends.  If the data processing can be parallelized and your  
>> hardware supports it, it can certainly make a big difference... it  
>> did in my case.  Both CPUs were cooking during my parallel indexing  
>> runs.
>>
>> 	Erik
>>
>>
>>
>>
>>
>>     
>
>   


Re: solr performance

Posted by Mark Miller <ma...@gmail.com>.
Yonik Seeley wrote:
>   
> Not sure what would be the best
> for error handling though - perhaps just polling (allow user to ask
> for failed or successful operations).
>   
Thats how I've handled similar situations in the past. Your submitting a 
batch of data to be processed, and if your so inclined to see how it 
went, you can inspect some kind of report object. If the batch process 
blocks, you could return the report object, or if not, you could return 
a batch/job id (with reports valid for x amount of time after they are 
done?).

It seems like a sound enough method to me, but it would be interesting 
to hear if someone has a better idea.

- Mark

Re: solr performance

Posted by Chris Hostetter <ho...@fucit.org>.
: 
: Not sure how that would work (unless you didn't want responses), but
: I've thought about it from the SolrJ side - something you could
: quickly add documents to and it would manage a number of threads under
: the covers to maximize throughput.  Not sure what would be the best
: for error handling though - perhaps just polling (allow user to ask
: for failed or successful operations).

the j.u.concurrent simplifies this type of problem a lot ... the Future 
interface is probably the most straight forward way to let the caller 
poll.



-Hoss


Re: solr performance

Posted by Ryan McKinley <ry...@gmail.com>.
For a similar idea, check:
https://issues.apache.org/jira/browse/SOLR-906

This opens a single stream and writes all documents to that.  It could  
easily be extended to have multiple threads draining the same Queue


On Dec 9, 2008, at 4:02 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> I guess this is the best idea . Let us have a new BatchHttpSolrServer
> which can help achieve this
> --Noble
>
> On Thu, Dec 4, 2008 at 7:14 PM, Yonik Seeley <yo...@apache.org> wrote:
>> On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller <ma...@gmail.com>  
>> wrote:
>>> Kick off some indexing more than once - eg, post a folder of docs,  
>>> and while
>>> thats working, post another.
>>>
>>> I've been thinking about a multi threaded UpdateProcessor as well  
>>> - that
>>> could be interesting.
>>
>> Not sure how that would work (unless you didn't want responses), but
>> I've thought about it from the SolrJ side - something you could
>> quickly add documents to and it would manage a number of threads  
>> under
>> the covers to maximize throughput.  Not sure what would be the best
>> for error handling though - perhaps just polling (allow user to ask
>> for failed or successful operations).
>>
>> -Yonik
>>
>
>
>
> -- 
> --Noble Paul


Re: solr performance

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
I guess this is the best idea . Let us have a new BatchHttpSolrServer
which can help achieve this
--Noble

On Thu, Dec 4, 2008 at 7:14 PM, Yonik Seeley <yo...@apache.org> wrote:
> On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller <ma...@gmail.com> wrote:
>> Kick off some indexing more than once - eg, post a folder of docs, and while
>> thats working, post another.
>>
>> I've been thinking about a multi threaded UpdateProcessor as well - that
>> could be interesting.
>
> Not sure how that would work (unless you didn't want responses), but
> I've thought about it from the SolrJ side - something you could
> quickly add documents to and it would manage a number of threads under
> the covers to maximize throughput.  Not sure what would be the best
> for error handling though - perhaps just polling (allow user to ask
> for failed or successful operations).
>
> -Yonik
>



-- 
--Noble Paul

Re: solr performance

Posted by Yonik Seeley <yo...@apache.org>.
On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller <ma...@gmail.com> wrote:
> Kick off some indexing more than once - eg, post a folder of docs, and while
> thats working, post another.
>
> I've been thinking about a multi threaded UpdateProcessor as well - that
> could be interesting.

Not sure how that would work (unless you didn't want responses), but
I've thought about it from the SolrJ side - something you could
quickly add documents to and it would manage a number of threads under
the covers to maximize throughput.  Not sure what would be the best
for error handling though - perhaps just polling (allow user to ask
for failed or successful operations).

-Yonik

Re: solr performance

Posted by sunnyfr <jo...@gmail.com>.
Ok ... 
Actually my problem is more multi thread which take long time ... like 3sec
when 100 threads/sec.
I thought that could have helped me .. but no link actually :s 
sorry 


markrmiller wrote:
> 
> Kick off some indexing more than once - eg, post a folder of docs, and 
> while thats working, post another.
> 
> I've been thinking about a multi threaded UpdateProcessor as well - that 
> could be interesting.
> 
> - Mark
> 
> sunnyfr wrote:
>> Hi,
>> I was reading this post and I wondering how can I parallelize document
>> processing??? 
>> Thanks Erik
>>
>>
>> Erik Hatcher wrote:
>>   
>>> On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>>>     
>>>>> couple of times today at around 158 documents / sec.
>>>>>         
>>>> This is not bad at all. How about search performance?
>>>> How many concurrent queries have people been having?
>>>> What does the response time look like?
>>>>       
>>> I'm the only user :)   What I've done is a proof-of-concept for our  
>>> library.  We have 3.7M records that I've indexed and faceted.  Search  
>>> performance (in my unrealistic single user scenario) is blazing (50ms  
>>> or so) for purely full-text queries.  For queries that return facets,  
>>> the response times are actually quite good too (~900ms, or less  
>>> depending on the request) - provided the filter cache is warmed and  
>>> large enough.  This is running on my laptop (MacBook Pro, 2GB RAM,  
>>> 1.83GHz) - I'm sure on a beefier box it'll only get better.
>>>
>>>     
>>>>> Thanks to the others that clarified.  I run my indexers in
>>>>> parallel... but a single instance of Solr (which in turn handles
>>>>> requests in parallel as well).
>>>>>         
>>>> Do you feel if multi-threaded posting is helpful?
>>>>       
>>> It depends.  If the data processing can be parallelized and your  
>>> hardware supports it, it can certainly make a big difference... it  
>>> did in my case.  Both CPUs were cooking during my parallel indexing  
>>> runs.
>>>
>>> 	Erik
>>>
>>>
>>>
>>>
>>>
>>>     
>>
>>   
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/solr-performance-tp9055437p20833662.html
Sent from the Solr - User mailing list archive at Nabble.com.