You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tod <li...@gmail.com> on 2011/10/07 20:19:39 UTC

Please help - Solr Cell using 'stream.url'

I'm batching documents into solr using solr cell with the 'stream.url' 
parameter.  Everything is working fine until I get to about 5k documents 
in and then it starts issuing 'read timeout 500' errors on every document.

The sysadmin says there's plenty of CPU, memory, and no paging so it 
doesn't look like the OS is the problem.  I can curl the documents that 
Solr is trying to index and failing just fine so it seems to be a Solr 
issue.  There's only about 35K documents total so Solr should even blink.

Can anyone help me diagnose this problem?  I'd be happy to provide any 
more detail that is needed.


Thanks - Tod

Re: Please help - Solr Cell using 'stream.url'

Posted by Jan Høydahl <ja...@cominvent.com>.
Latest version is 3.4, and it is fairly compatible with 1.4.1, but you have to reindex.
First step migration can be to continue using your 1.4 schema on new solr.war (and SolrJ), but I suggest you take a few hours upgrading your schema and config as well.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. okt. 2011, at 15:32, Tod wrote:

> On 10/10/2011 3:39 PM, � wrote:
>> Hi,
>> 
>> If you have 4Gb on your server total, try giving about 1Gb to Solr, leaving 3Gb for OS, OS caching and mem-allocation outside the JVM.
>> Also, add 'ulimit -v unlimited' and 'ulimit -s 10240' to /etc/profile to increase virtual memory and stack limit.
> 
> 
> I will try this - thanks.
> 
> 
> 
>> And you should also consider upgrading to latest Solr...
> 
> 
> Is there a clearly defined migration path?
> 
> 
> - Tod


Re: Please help - Solr Cell using 'stream.url'

Posted by Tod <li...@gmail.com>.
On 10/10/2011 3:39 PM, � wrote:
> Hi,
>
> If you have 4Gb on your server total, try giving about 1Gb to Solr, leaving 3Gb for OS, OS caching and mem-allocation outside the JVM.
> Also, add 'ulimit -v unlimited' and 'ulimit -s 10240' to /etc/profile to increase virtual memory and stack limit.


I will try this - thanks.



> And you should also consider upgrading to latest Solr...


Is there a clearly defined migration path?


- Tod

Re: Please help - Solr Cell using 'stream.url'

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

If you have 4Gb on your server total, try giving about 1Gb to Solr, leaving 3Gb for OS, OS caching and mem-allocation outside the JVM.
Also, add 'ulimit -v unlimited' and 'ulimit -s 10240' to /etc/profile to increase virtual memory and stack limit.

And you should also consider upgrading to latest Solr...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 10. okt. 2011, at 21:02, Tod wrote:

> On 10/07/2011 6:21 PM, � wrote:
>> Hi,
>> 
>> What Solr version?
> 
> Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42.  Its running on a Suse Linux VM.
> 
>> How often do you do commits, or do you use autocommit?
> 
> I had been doing commits every 100 documents (the entire set is about 35K docs so its relatively small.  Since that wasn't working, and I read that commits are expensive, I decided to experiment and wait until all documents were indexed before committing.  I haven't been able to successfully index all the documents yet to try the manual commit because of this problem.
> 
> 
> 
>> What kind and size of docs?
> 
> Mostly MS office and PDF's, some straight HTML pages.  I can't give a specific answer to size but nothing alarmingly large - typical 2-5 page office documents.
> 
> 
>> Do you feed from a Java program? Where is the read timeout occurring? Can you paste in some logs?
> 
> I'd love to but I could never get it to work.  I'm using Perl right now getting rows from an Oracle database and using LWP to perform the calls to Solr's REST interface.
> 
> 
>> How much RAM on your server, and how much did you give to the JVM?
> 
> RAM to JVM:
> export CATALINA_OPTIONS="-Xms1024m -Xmx3072m"
> 
> Top output on the VM:
> cpu(s): 64.1%us, 11.4%sy,  0.0%ni, 24.0%id,  0.2%wa,  0.2%hi,  0.2%si, 0.0%st
> mem:   3980384k total,  3803300k used,   177084k free,   393924k buffers
> swap:  4194296k total,      512k used,  4193784k free,  1518156k cached
> 
> pid   user   pr  ni  virt res  shr  s %cpu %mem  time+    command 
> 16243  solr   19   0  642m 322m 6256 s  119  8.3  73:16.49 java 
> 
> 
> Thanks.


Re: Please help - Solr Cell using 'stream.url'

Posted by Tod <li...@gmail.com>.
On 10/07/2011 6:21 PM, � wrote:
> Hi,
>
> What Solr version?

Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42. 
  Its running on a Suse Linux VM.

> How often do you do commits, or do you use autocommit?

I had been doing commits every 100 documents (the entire set is about 
35K docs so its relatively small.  Since that wasn't working, and I read 
that commits are expensive, I decided to experiment and wait until all 
documents were indexed before committing.  I haven't been able to 
successfully index all the documents yet to try the manual commit 
because of this problem.



> What kind and size of docs?

Mostly MS office and PDF's, some straight HTML pages.  I can't give a 
specific answer to size but nothing alarmingly large - typical 2-5 page 
office documents.


> Do you feed from a Java program? Where is the read timeout occurring? Can you paste in some logs?

I'd love to but I could never get it to work.  I'm using Perl right now 
getting rows from an Oracle database and using LWP to perform the calls 
to Solr's REST interface.


> How much RAM on your server, and how much did you give to the JVM?

RAM to JVM:
export CATALINA_OPTIONS="-Xms1024m -Xmx3072m"

Top output on the VM:
cpu(s): 64.1%us, 11.4%sy,  0.0%ni, 24.0%id,  0.2%wa,  0.2%hi,  0.2%si, 
0.0%st
mem:   3980384k total,  3803300k used,   177084k free,   393924k buffers
swap:  4194296k total,      512k used,  4193784k free,  1518156k cached

  pid   user   pr  ni  virt res  shr  s %cpu %mem  time+    command 
 

16243  solr   19   0  642m 322m 6256 s  119  8.3  73:16.49 java 
 



Thanks.

Re: Please help - Solr Cell using 'stream.url'

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

What Solr version?
How often do you do commits, or do you use autocommit?
What kind and size of docs?
Do you feed from a Java program? Where is the read timeout occurring? Can you paste in some logs? 
How much RAM on your server, and how much did you give to the JVM?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 7. okt. 2011, at 20:19, Tod wrote:

> I'm batching documents into solr using solr cell with the 'stream.url' parameter.  Everything is working fine until I get to about 5k documents in and then it starts issuing 'read timeout 500' errors on every document.
> 
> The sysadmin says there's plenty of CPU, memory, and no paging so it doesn't look like the OS is the problem.  I can curl the documents that Solr is trying to index and failing just fine so it seems to be a Solr issue.  There's only about 35K documents total so Solr should even blink.
> 
> Can anyone help me diagnose this problem?  I'd be happy to provide any more detail that is needed.
> 
> 
> Thanks - Tod