You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tod <li...@gmail.com> on 2011/10/07 20:19:39 UTC
Please help - Solr Cell using 'stream.url'
I'm batching documents into solr using solr cell with the 'stream.url'
parameter. Everything is working fine until I get to about 5k documents
in and then it starts issuing 'read timeout 500' errors on every document.
The sysadmin says there's plenty of CPU, memory, and no paging so it
doesn't look like the OS is the problem. I can curl the documents that
Solr is trying to index and failing just fine so it seems to be a Solr
issue. There's only about 35K documents total so Solr should even blink.
Can anyone help me diagnose this problem? I'd be happy to provide any
more detail that is needed.
Thanks - Tod
Re: Please help - Solr Cell using 'stream.url'
Posted by Jan Høydahl <ja...@cominvent.com>.
Latest version is 3.4, and it is fairly compatible with 1.4.1, but you have to reindex.
First step migration can be to continue using your 1.4 schema on new solr.war (and SolrJ), but I suggest you take a few hours upgrading your schema and config as well.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 12. okt. 2011, at 15:32, Tod wrote:
> On 10/10/2011 3:39 PM, � wrote:
>> Hi,
>>
>> If you have 4Gb on your server total, try giving about 1Gb to Solr, leaving 3Gb for OS, OS caching and mem-allocation outside the JVM.
>> Also, add 'ulimit -v unlimited' and 'ulimit -s 10240' to /etc/profile to increase virtual memory and stack limit.
>
>
> I will try this - thanks.
>
>
>
>> And you should also consider upgrading to latest Solr...
>
>
> Is there a clearly defined migration path?
>
>
> - Tod
Re: Please help - Solr Cell using 'stream.url'
Posted by Tod <li...@gmail.com>.
On 10/10/2011 3:39 PM, � wrote:
> Hi,
>
> If you have 4Gb on your server total, try giving about 1Gb to Solr, leaving 3Gb for OS, OS caching and mem-allocation outside the JVM.
> Also, add 'ulimit -v unlimited' and 'ulimit -s 10240' to /etc/profile to increase virtual memory and stack limit.
I will try this - thanks.
> And you should also consider upgrading to latest Solr...
Is there a clearly defined migration path?
- Tod
Re: Please help - Solr Cell using 'stream.url'
Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,
If you have 4Gb on your server total, try giving about 1Gb to Solr, leaving 3Gb for OS, OS caching and mem-allocation outside the JVM.
Also, add 'ulimit -v unlimited' and 'ulimit -s 10240' to /etc/profile to increase virtual memory and stack limit.
And you should also consider upgrading to latest Solr...
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 10. okt. 2011, at 21:02, Tod wrote:
> On 10/07/2011 6:21 PM, � wrote:
>> Hi,
>>
>> What Solr version?
>
> Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42. Its running on a Suse Linux VM.
>
>> How often do you do commits, or do you use autocommit?
>
> I had been doing commits every 100 documents (the entire set is about 35K docs so its relatively small. Since that wasn't working, and I read that commits are expensive, I decided to experiment and wait until all documents were indexed before committing. I haven't been able to successfully index all the documents yet to try the manual commit because of this problem.
>
>
>
>> What kind and size of docs?
>
> Mostly MS office and PDF's, some straight HTML pages. I can't give a specific answer to size but nothing alarmingly large - typical 2-5 page office documents.
>
>
>> Do you feed from a Java program? Where is the read timeout occurring? Can you paste in some logs?
>
> I'd love to but I could never get it to work. I'm using Perl right now getting rows from an Oracle database and using LWP to perform the calls to Solr's REST interface.
>
>
>> How much RAM on your server, and how much did you give to the JVM?
>
> RAM to JVM:
> export CATALINA_OPTIONS="-Xms1024m -Xmx3072m"
>
> Top output on the VM:
> cpu(s): 64.1%us, 11.4%sy, 0.0%ni, 24.0%id, 0.2%wa, 0.2%hi, 0.2%si, 0.0%st
> mem: 3980384k total, 3803300k used, 177084k free, 393924k buffers
> swap: 4194296k total, 512k used, 4193784k free, 1518156k cached
>
> pid user pr ni virt res shr s %cpu %mem time+ command
> 16243 solr 19 0 642m 322m 6256 s 119 8.3 73:16.49 java
>
>
> Thanks.
Re: Please help - Solr Cell using 'stream.url'
Posted by Tod <li...@gmail.com>.
On 10/07/2011 6:21 PM, � wrote:
> Hi,
>
> What Solr version?
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42.
Its running on a Suse Linux VM.
> How often do you do commits, or do you use autocommit?
I had been doing commits every 100 documents (the entire set is about
35K docs so its relatively small. Since that wasn't working, and I read
that commits are expensive, I decided to experiment and wait until all
documents were indexed before committing. I haven't been able to
successfully index all the documents yet to try the manual commit
because of this problem.
> What kind and size of docs?
Mostly MS office and PDF's, some straight HTML pages. I can't give a
specific answer to size but nothing alarmingly large - typical 2-5 page
office documents.
> Do you feed from a Java program? Where is the read timeout occurring? Can you paste in some logs?
I'd love to but I could never get it to work. I'm using Perl right now
getting rows from an Oracle database and using LWP to perform the calls
to Solr's REST interface.
> How much RAM on your server, and how much did you give to the JVM?
RAM to JVM:
export CATALINA_OPTIONS="-Xms1024m -Xmx3072m"
Top output on the VM:
cpu(s): 64.1%us, 11.4%sy, 0.0%ni, 24.0%id, 0.2%wa, 0.2%hi, 0.2%si,
0.0%st
mem: 3980384k total, 3803300k used, 177084k free, 393924k buffers
swap: 4194296k total, 512k used, 4193784k free, 1518156k cached
pid user pr ni virt res shr s %cpu %mem time+ command
16243 solr 19 0 642m 322m 6256 s 119 8.3 73:16.49 java
Thanks.
Re: Please help - Solr Cell using 'stream.url'
Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,
What Solr version?
How often do you do commits, or do you use autocommit?
What kind and size of docs?
Do you feed from a Java program? Where is the read timeout occurring? Can you paste in some logs?
How much RAM on your server, and how much did you give to the JVM?
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 7. okt. 2011, at 20:19, Tod wrote:
> I'm batching documents into solr using solr cell with the 'stream.url' parameter. Everything is working fine until I get to about 5k documents in and then it starts issuing 'read timeout 500' errors on every document.
>
> The sysadmin says there's plenty of CPU, memory, and no paging so it doesn't look like the OS is the problem. I can curl the documents that Solr is trying to index and failing just fine so it seems to be a Solr issue. There's only about 35K documents total so Solr should even blink.
>
> Can anyone help me diagnose this problem? I'd be happy to provide any more detail that is needed.
>
>
> Thanks - Tod