You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2008/12/10 19:14:44 UTC
[jira] Commented: (SOLR-906) Buffered / Streaming SolrServer
implementaion
[ https://issues.apache.org/jira/browse/SOLR-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655298#action_12655298 ]
Ryan McKinley commented on SOLR-906:
------------------------------------
One basic problem with calling add( SolrInputDocument) with the CommonsHttpSolrServer is that it logs a request for each document. This can be a substantial impact. For example while indexing 40K docs on my machine, it takes ~3 1/2 mins. If I turn logging off the time drops to ! 2 1/2 mins. With the streaming approach, the time drops to 20sec! Some of that is obviously because it limits the logging:
{code}
INFO: {add=[id1,id2,id3,id4, ...(38293 more)]} 0 20714
{code}
> Buffered / Streaming SolrServer implementaion
> ---------------------------------------------
>
> Key: SOLR-906
> URL: https://issues.apache.org/jira/browse/SOLR-906
> Project: Solr
> Issue Type: New Feature
> Components: clients - java
> Reporter: Ryan McKinley
> Fix For: 1.4
>
>
> While indexing lots of documents, the CommonsHttpSolrServer add( SolrInputDocument ) is less then optimal. This makes a new request for each document.
> With a "StreamingHttpSolrServer", documents are buffered and then written to a single open Http connection.
> For related discussion see:
> http://www.nabble.com/solr-performance-tt9055437.html#a20833680
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.