You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2008/12/10 19:14:44 UTC

[jira] Commented: (SOLR-906) Buffered / Streaming SolrServer implementaion

    [ https://issues.apache.org/jira/browse/SOLR-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655298#action_12655298 ] 

Ryan McKinley commented on SOLR-906:
------------------------------------

One basic problem with calling add( SolrInputDocument) with the CommonsHttpSolrServer is that it logs a request for each document.  This can be a substantial impact.  For example while indexing 40K docs on my machine, it takes ~3 1/2 mins.  If I turn logging off the time drops to ! 2 1/2 mins.  With the streaming approach, the time drops to 20sec!   Some of that is obviously because it limits the logging:
{code}
INFO: {add=[id1,id2,id3,id4, ...(38293 more)]} 0 20714
{code}

> Buffered / Streaming SolrServer implementaion
> ---------------------------------------------
>
>                 Key: SOLR-906
>                 URL: https://issues.apache.org/jira/browse/SOLR-906
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java
>            Reporter: Ryan McKinley
>             Fix For: 1.4
>
>
> While indexing lots of documents, the CommonsHttpSolrServer add( SolrInputDocument ) is less then optimal.  This makes a new request for each document.
> With a "StreamingHttpSolrServer", documents are buffered and then written to a single open Http connection.
> For related discussion see:
> http://www.nabble.com/solr-performance-tt9055437.html#a20833680

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.