You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2008/12/04 14:39:14 UTC
Re: solr performance
Kick off some indexing more than once - eg, post a folder of docs, and
while thats working, post another.
I've been thinking about a multi threaded UpdateProcessor as well - that
could be interesting.
- Mark
sunnyfr wrote:
> Hi,
> I was reading this post and I wondering how can I parallelize document
> processing???
> Thanks Erik
>
>
> Erik Hatcher wrote:
>
>> On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>>
>>>> couple of times today at around 158 documents / sec.
>>>>
>>> This is not bad at all. How about search performance?
>>> How many concurrent queries have people been having?
>>> What does the response time look like?
>>>
>> I'm the only user :) What I've done is a proof-of-concept for our
>> library. We have 3.7M records that I've indexed and faceted. Search
>> performance (in my unrealistic single user scenario) is blazing (50ms
>> or so) for purely full-text queries. For queries that return facets,
>> the response times are actually quite good too (~900ms, or less
>> depending on the request) - provided the filter cache is warmed and
>> large enough. This is running on my laptop (MacBook Pro, 2GB RAM,
>> 1.83GHz) - I'm sure on a beefier box it'll only get better.
>>
>>
>>>> Thanks to the others that clarified. I run my indexers in
>>>> parallel... but a single instance of Solr (which in turn handles
>>>> requests in parallel as well).
>>>>
>>> Do you feel if multi-threaded posting is helpful?
>>>
>> It depends. If the data processing can be parallelized and your
>> hardware supports it, it can certainly make a big difference... it
>> did in my case. Both CPUs were cooking during my parallel indexing
>> runs.
>>
>> Erik
>>
>>
>>
>>
>>
>>
>
>
Re: solr performance
Posted by Mark Miller <ma...@gmail.com>.
Yonik Seeley wrote:
>
> Not sure what would be the best
> for error handling though - perhaps just polling (allow user to ask
> for failed or successful operations).
>
Thats how I've handled similar situations in the past. Your submitting a
batch of data to be processed, and if your so inclined to see how it
went, you can inspect some kind of report object. If the batch process
blocks, you could return the report object, or if not, you could return
a batch/job id (with reports valid for x amount of time after they are
done?).
It seems like a sound enough method to me, but it would be interesting
to hear if someone has a better idea.
- Mark
Re: solr performance
Posted by Chris Hostetter <ho...@fucit.org>.
:
: Not sure how that would work (unless you didn't want responses), but
: I've thought about it from the SolrJ side - something you could
: quickly add documents to and it would manage a number of threads under
: the covers to maximize throughput. Not sure what would be the best
: for error handling though - perhaps just polling (allow user to ask
: for failed or successful operations).
the j.u.concurrent simplifies this type of problem a lot ... the Future
interface is probably the most straight forward way to let the caller
poll.
-Hoss
Re: solr performance
Posted by Ryan McKinley <ry...@gmail.com>.
For a similar idea, check:
https://issues.apache.org/jira/browse/SOLR-906
This opens a single stream and writes all documents to that. It could
easily be extended to have multiple threads draining the same Queue
On Dec 9, 2008, at 4:02 AM, Noble Paul നോബിള്
नोब्ळ् wrote:
> I guess this is the best idea . Let us have a new BatchHttpSolrServer
> which can help achieve this
> --Noble
>
> On Thu, Dec 4, 2008 at 7:14 PM, Yonik Seeley <yo...@apache.org> wrote:
>> On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller <ma...@gmail.com>
>> wrote:
>>> Kick off some indexing more than once - eg, post a folder of docs,
>>> and while
>>> thats working, post another.
>>>
>>> I've been thinking about a multi threaded UpdateProcessor as well
>>> - that
>>> could be interesting.
>>
>> Not sure how that would work (unless you didn't want responses), but
>> I've thought about it from the SolrJ side - something you could
>> quickly add documents to and it would manage a number of threads
>> under
>> the covers to maximize throughput. Not sure what would be the best
>> for error handling though - perhaps just polling (allow user to ask
>> for failed or successful operations).
>>
>> -Yonik
>>
>
>
>
> --
> --Noble Paul
Re: solr performance
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
I guess this is the best idea . Let us have a new BatchHttpSolrServer
which can help achieve this
--Noble
On Thu, Dec 4, 2008 at 7:14 PM, Yonik Seeley <yo...@apache.org> wrote:
> On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller <ma...@gmail.com> wrote:
>> Kick off some indexing more than once - eg, post a folder of docs, and while
>> thats working, post another.
>>
>> I've been thinking about a multi threaded UpdateProcessor as well - that
>> could be interesting.
>
> Not sure how that would work (unless you didn't want responses), but
> I've thought about it from the SolrJ side - something you could
> quickly add documents to and it would manage a number of threads under
> the covers to maximize throughput. Not sure what would be the best
> for error handling though - perhaps just polling (allow user to ask
> for failed or successful operations).
>
> -Yonik
>
--
--Noble Paul
Re: solr performance
Posted by Yonik Seeley <yo...@apache.org>.
On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller <ma...@gmail.com> wrote:
> Kick off some indexing more than once - eg, post a folder of docs, and while
> thats working, post another.
>
> I've been thinking about a multi threaded UpdateProcessor as well - that
> could be interesting.
Not sure how that would work (unless you didn't want responses), but
I've thought about it from the SolrJ side - something you could
quickly add documents to and it would manage a number of threads under
the covers to maximize throughput. Not sure what would be the best
for error handling though - perhaps just polling (allow user to ask
for failed or successful operations).
-Yonik
Re: solr performance
Posted by sunnyfr <jo...@gmail.com>.
Ok ...
Actually my problem is more multi thread which take long time ... like 3sec
when 100 threads/sec.
I thought that could have helped me .. but no link actually :s
sorry
markrmiller wrote:
>
> Kick off some indexing more than once - eg, post a folder of docs, and
> while thats working, post another.
>
> I've been thinking about a multi threaded UpdateProcessor as well - that
> could be interesting.
>
> - Mark
>
> sunnyfr wrote:
>> Hi,
>> I was reading this post and I wondering how can I parallelize document
>> processing???
>> Thanks Erik
>>
>>
>> Erik Hatcher wrote:
>>
>>> On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>>>
>>>>> couple of times today at around 158 documents / sec.
>>>>>
>>>> This is not bad at all. How about search performance?
>>>> How many concurrent queries have people been having?
>>>> What does the response time look like?
>>>>
>>> I'm the only user :) What I've done is a proof-of-concept for our
>>> library. We have 3.7M records that I've indexed and faceted. Search
>>> performance (in my unrealistic single user scenario) is blazing (50ms
>>> or so) for purely full-text queries. For queries that return facets,
>>> the response times are actually quite good too (~900ms, or less
>>> depending on the request) - provided the filter cache is warmed and
>>> large enough. This is running on my laptop (MacBook Pro, 2GB RAM,
>>> 1.83GHz) - I'm sure on a beefier box it'll only get better.
>>>
>>>
>>>>> Thanks to the others that clarified. I run my indexers in
>>>>> parallel... but a single instance of Solr (which in turn handles
>>>>> requests in parallel as well).
>>>>>
>>>> Do you feel if multi-threaded posting is helpful?
>>>>
>>> It depends. If the data processing can be parallelized and your
>>> hardware supports it, it can certainly make a big difference... it
>>> did in my case. Both CPUs were cooking during my parallel indexing
>>> runs.
>>>
>>> Erik
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>
--
View this message in context: http://www.nabble.com/solr-performance-tp9055437p20833662.html
Sent from the Solr - User mailing list archive at Nabble.com.