You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jay Hill <ja...@gmail.com> on 2009/09/19 19:22:54 UTC

Batching requests using SolrCell with SolrJ

When working with SolrJ I have typically batched a Collection of
SolrInputDocument objects before sending them to the Solr server. I'm
working with the latest nightly build and using the ExtractingRequestHandler
to index documents, and everything is working fine. Except I haven't been
able to figure out how to batch documents when also including literals.
Here's what I've got:

//Looping over a List of Files
          ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract");
          req.addFile(fileToIndex);
          req.setParam("literal.id", fileToIndex.getCanonicalPath());

          try {
            getSolrServer().request(req);
          } catch (SolrServerException e) {
            e.printStackTrace();
          }

Which works great, except that each document processed in the loop is
sending a separate request. Previously I built a collection of SolrInput
docs and had SolrJ send them in batches of 100 or whatever.

It seems like I could batch documents by continuing to add them to the
request (req.addFile(eachFileUpToACount)), but the literals seem to present
a problem. By sending one at a time the contents and the literals all wind
up in the same document. But in a batch there will just be an array of
params for literal.id (in this example) not matched to the contents.

Can anyone provide a code snippet of how to do this? Or is there no other
approach than sending a request for each document.

Thanks,
-Jay
http://www.lucidimagination.com

Re: Batching requests using SolrCell with SolrJ

Posted by Grant Ingersoll <gs...@apache.org>.
On Sep 19, 2009, at 1:22 PM, Jay Hill wrote:

> When working with SolrJ I have typically batched a Collection of
> SolrInputDocument objects before sending them to the Solr server. I'm
> working with the latest nightly build and using the  
> ExtractingRequestHandler
> to index documents, and everything is working fine. Except I haven't  
> been
> able to figure out how to batch documents when also including  
> literals.
> Here's what I've got:
>
> //Looping over a List of Files
>          ContentStreamUpdateRequest req = new
> ContentStreamUpdateRequest("/update/extract");
>          req.addFile(fileToIndex);
>          req.setParam("literal.id", fileToIndex.getCanonicalPath());
>
>          try {
>            getSolrServer().request(req);
>          } catch (SolrServerException e) {
>            e.printStackTrace();
>          }
>
> Which works great, except that each document processed in the loop is
> sending a separate request. Previously I built a collection of  
> SolrInput
> docs and had SolrJ send them in batches of 100 or whatever.
>
> It seems like I could batch documents by continuing to add them to the
> request (req.addFile(eachFileUpToACount)), but the literals seem to  
> present
> a problem. By sending one at a time the contents and the literals  
> all wind
> up in the same document. But in a batch there will just be an array of
> params for literal.id (in this example) not matched to the contents.
>

It might be nice to be able to specify literals on a per stream name  
basis, such as literal.site_pdf.id=site_pdf, but there isn't currently  
support for this.  Then, you could combine that with the  
ContentStreamUpdateRequest to do what is needed, I believe.

-Grant