You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Markus Jelsma <ma...@openindex.io> on 2015/10/29 09:52:11 UTC

SolrJ stalls/hangs on client.add(); and doesn't return

Hello - we have some processes periodically sending documents to 5.3.0 in local mode using ConcurrentUpdateSolrClient 5.3.0, it has queueSize 10 and threadCount 4, just chosen arbitrarily having no idea what is right.

Usually its a few thousand up to some tens of thousands of rather small documents. Now, when the number of documents is around or near a hundred thousand, client.add(Iterator<SolrInputDocument> docIterator) stalls and never returns. It also doesn't index any of the documents. Upon calling, it quickly eats CPU and a load of heap but shortly after it goes idle, no CPU and memory is released.

I am puzzled, any ideas to share? 
Markus

Re: SolrJ stalls/hangs on client.add(); and doesn't return

Posted by Erick Erickson <er...@gmail.com>.

Glad you can solve it one way or the other. I do wonder, though what's
really going on, the fact that your original case just hung is kind of
disturbing.

50K is still a lot, and Yonik's comment is well taken. I did some benchmarking
(not ConcurrentUpdateSolrServer, HttpSolrClient as I remember) and got
diminishing returns pretty rapidly after the first few, see:
https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/

There was a huge jump going from 1 to 10, a smaller (percentage wise)
jump going from 10 to100 and not much to talk about between 100 and 1,000
(single threaded, YMMV of course).

Best
Erick

On Fri, Oct 30, 2015 at 6:26 AM, Yonik Seeley <ys...@gmail.com> wrote:
> On Thu, Oct 29, 2015 at 5:28 PM, Erick Erickson <er...@gmail.com> wrote:
>> Try making batches of 1,000 docs and sending them through instead.
>
> The other thing about ConcurrentUpdateSolrClient is that it will
> create batches itself while streaming.
> For example, if you call add a number of  times very quickly, those
> will all be put in the same update request as they are being streamed
> (you get the benefits of batching without the latency it would
> normally come with.)
>
> So I guess I'd advise to not batch yourself unless it makes more sense
> for your document processing for other reasons.
>
> -Yonik

Re: SolrJ stalls/hangs on client.add(); and doesn't return

Posted by Yonik Seeley <ys...@gmail.com>.

On Thu, Oct 29, 2015 at 5:28 PM, Erick Erickson <er...@gmail.com> wrote:
> Try making batches of 1,000 docs and sending them through instead.

The other thing about ConcurrentUpdateSolrClient is that it will
create batches itself while streaming.
For example, if you call add a number of  times very quickly, those
will all be put in the same update request as they are being streamed
(you get the benefits of batching without the latency it would
normally come with.)

So I guess I'd advise to not batch yourself unless it makes more sense
for your document processing for other reasons.

-Yonik

Re: SolrJ stalls/hangs on client.add(); and doesn't return

Posted by Erick Erickson <er...@gmail.com>.

You're sending 100K docs in a single packet? It's vaguely possible that you're
getting a timeout although that doesn't square with no docs being indexed...

Hmmm, to check you could do a manual commit. Or watch the Solr log to
see if update
requests ever go there.

Or you're running out of memory on the client.

Or even exceeding the packet size that the servlet container will accept?

But I think at root you're misunderstanding
ConcurrentUpdateSolrClient. It doesn't
partition up a huge array and send them in parallel, it parallelized sending the
packet each call is given. So it's trying to send all 100K docs at
once. Probably not
what you were aiming for.

Try making batches of 1,000 docs and sending them through instead.

So the parameters are a bit of magic. You can have up to the number of threads
you specify sending their entire packet to solr in parallel, and up to queueSize
requests. Note this is the _request_, not the docs in the list if I'm
reading the code
correctly.....

Best,
Erick

On Thu, Oct 29, 2015 at 1:52 AM, Markus Jelsma
<ma...@openindex.io> wrote:
> Hello - we have some processes periodically sending documents to 5.3.0 in local mode using ConcurrentUpdateSolrClient 5.3.0, it has queueSize 10 and threadCount 4, just chosen arbitrarily having no idea what is right.
>
> Usually its a few thousand up to some tens of thousands of rather small documents. Now, when the number of documents is around or near a hundred thousand, client.add(Iterator<SolrInputDocument> docIterator) stalls and never returns. It also doesn't index any of the documents. Upon calling, it quickly eats CPU and a load of heap but shortly after it goes idle, no CPU and memory is released.
>
> I am puzzled, any ideas to share?
> Markus