You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2009/08/16 21:46:59 UTC

good performance news

I just profiled a CSV upload, and aside from the CSV parsing, Solr
adds pretty much no overhead!
I was expecting some non-trivial overhead due to Solr's
SolrInputDocument, update processing pipeline, and update handler...
but profiling showed that it amounted to less than 1%.

85% of the time was spent in Lucene's IndexWriter
12% of the time was spent in the CSV parser2
2% of the time was spent merging segments  in the IndexWriter

-Yonik
http://www.lucidimagination.com

Re: good performance news

Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 18, 2009, at 1:58 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> I our internal testing , the binary request writer gave very good perf
> for large no:of docs.

Yeah, that only makes sense.  I was just curious on the overhead of  
XML in typical cases.  I think all the native clients should use  
binary format.

>
> Though we did not benchmark it
>
> On Tue, Aug 18, 2009 at 2:57 AM, Grant  
> Ingersoll<gs...@apache.org> wrote:
>>
>> On Aug 16, 2009, at 3:46 PM, Yonik Seeley wrote:
>>
>>> I just profiled a CSV upload, and aside from the CSV parsing, Solr
>>> adds pretty much no overhead!
>>> I was expecting some non-trivial overhead due to Solr's
>>> SolrInputDocument, update processing pipeline, and update handler...
>>> but profiling showed that it amounted to less than 1%.
>>>
>>> 85% of the time was spent in Lucene's IndexWriter
>>> 12% of the time was spent in the CSV parser2
>>
>> I'm curious how much overhead there is in parsing Solr XML.  I will  
>> try some
>> tests on that later if I get a chance.  We really should push  
>> clients to use
>> the Binary request/response formats in most cases.
>>
>
>
>
> -- 
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com



Re: good performance news

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
I our internal testing , the binary request writer gave very good perf
for large no:of docs.

Though we did not benchmark it

On Tue, Aug 18, 2009 at 2:57 AM, Grant Ingersoll<gs...@apache.org> wrote:
>
> On Aug 16, 2009, at 3:46 PM, Yonik Seeley wrote:
>
>> I just profiled a CSV upload, and aside from the CSV parsing, Solr
>> adds pretty much no overhead!
>> I was expecting some non-trivial overhead due to Solr's
>> SolrInputDocument, update processing pipeline, and update handler...
>> but profiling showed that it amounted to less than 1%.
>>
>> 85% of the time was spent in Lucene's IndexWriter
>> 12% of the time was spent in the CSV parser2
>
> I'm curious how much overhead there is in parsing Solr XML.  I will try some
> tests on that later if I get a chance.  We really should push clients to use
> the Binary request/response formats in most cases.
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: good performance news

Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 16, 2009, at 3:46 PM, Yonik Seeley wrote:

> I just profiled a CSV upload, and aside from the CSV parsing, Solr
> adds pretty much no overhead!
> I was expecting some non-trivial overhead due to Solr's
> SolrInputDocument, update processing pipeline, and update handler...
> but profiling showed that it amounted to less than 1%.
>
> 85% of the time was spent in Lucene's IndexWriter
> 12% of the time was spent in the CSV parser2

I'm curious how much overhead there is in parsing Solr XML.  I will  
try some tests on that later if I get a chance.  We really should push  
clients to use the Binary request/response formats in most cases.