You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by gaoagong <th...@gmail.com> on 2013/09/07 01:53:34 UTC

Batch Solr Server

Does anyone know if there is such a thing as a BatchSolrServer object in the
solrj code? I am currently using the ConcurrentUpdateSolrServer, but it
isn't doing quite what I expected. It will distribute the load of sending
through the http client through different threads and manage the
connections, but it does not package the documents in bundles. This can be
done manually by calling solrServer.add(Collection<SolrInputDocument>
documents), which will create an UpdateRequest object for the entire
collection. When the ConcurrentUpdateSolrServer gets to this UpdateRequest
it will send all of the documents together in a single http call.

What I want to be able to do is call solrServer.add(SolInputDocument
document) and have the SolrServer grab the next batch (up to a specified
size) and then create an UpdateRequest. This would reduce the number of
individual Requests the SOLR servers have to handle as well as any per http
call overhead incurred.

Would this kind of functionality be worth while to anyone else? Should I
create such a SolrServer object?



--
View this message in context: http://lucene.472066.n3.nabble.com/Batch-Solr-Server-tp4088657.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Batch Solr Server

Posted by Erick Erickson <er...@gmail.com>.
It's unclear to me why using the server.add(Collection<SolrInputDocument)
doesn't work for you.

bq:  which will create an UpdateRequest object for the entire
collection

Huh? Just call it with your "batches", something like

List<SolrInputDocument> list = new...
while (more docs) {
   list.add(doc);
   if ((list.size() % batch_size) == 0) {
       server.add(list);
       list.clear();
   }

}
if (list.size() > 0) server.add(list);

Best,
Erick


On Fri, Sep 6, 2013 at 7:53 PM, gaoagong <th...@gmail.com> wrote:

> Does anyone know if there is such a thing as a BatchSolrServer object in
> the
> solrj code? I am currently using the ConcurrentUpdateSolrServer, but it
> isn't doing quite what I expected. It will distribute the load of sending
> through the http client through different threads and manage the
> connections, but it does not package the documents in bundles. This can be
> done manually by calling solrServer.add(Collection<SolrInputDocument>
> documents), which will create an UpdateRequest object for the entire
> collection. When the ConcurrentUpdateSolrServer gets to this UpdateRequest
> it will send all of the documents together in a single http call.
>
> What I want to be able to do is call solrServer.add(SolInputDocument
> document) and have the SolrServer grab the next batch (up to a specified
> size) and then create an UpdateRequest. This would reduce the number of
> individual Requests the SOLR servers have to handle as well as any per http
> call overhead incurred.
>
> Would this kind of functionality be worth while to anyone else? Should I
> create such a SolrServer object?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Batch-Solr-Server-tp4088657.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>