You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by Tod <> on 2011/08/18 16:06:17 UTC

Solr read timeout

I'm using perl to indirectly call the solr ExtractingRequestHandler to 
stream remote documents into a solr index instance.  Every 100 URL's I 
process I do a commit.  I've got about 30K documents to be indexed.  I'm 
using a stock, out of the box version of solr 1.4.1 with the necessary 
schema changes for the fields I'm indexing.

I seem to be running into performance problems about 40 documents in.  I 
start getting Failed: 500 read timeouts that last about 4 minutes each 
slowing processing down to a crawl.  I've tried a later version of tika 
(0.8) and that didn't seem to help.  I'm also not sure it's the problem.

Given I'm using a pretty much unaltered version of Solr could it be my 
problem?  I'm running everything under a typical Tomcat install on a 
Linux VM.  I understand there are performance tweaks I can make to the 
Solr config but would like to focus them first on resolving this problem 
rather than blanket tweaking the entire config.

Is there anything in particular I should look at?  Can I provide any 
more information?

Thanks - Tod