You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Husain, Yavar" <yh...@firstam.com> on 2011/11/10 10:33:46 UTC

Solr Indexing Time varying each time I index

Solr 1.4 is doing great with respect to Indexing on a dedicated physical server (Windows Server 2008). For Indexing around 1 million full text documents (around 4 GB size) it takes around 20 minutes with Heap Size = 512M - 1G & 4GB RAM.

However while using Solr on a VM, with 4 GB RAM it took 50 minutes to index at the first time. Note that there is no Network delays and no RAM issues. Now when I increased the RAM to 8GB and increased the heap size, the indexing time increased to 2 hrs. That was really strange. Note that except for SQL Server there is no other process running. There are no network delays. However I have not checked for File I/O. Can that be a bottleneck? Does Solr has any issues running in "Virtualization" Environment? 

I read a paper today by Brian & Harry: "ON THE RESPONSE TIME OF A SOLR SEARCH ENGINE IN A VIRTUALIZED ENVIRONMENT" & they claim that performance gets deteriorated when RAM is increased when Solr is running on a VM but that is with respect to query times and not indexing times. 

I am bit confused as to why it took longer on a VM when I repeated the same test second time with increased heap size and RAM.
****************************************************************************************** 
This message may contain confidential or proprietary information intended only for the use of the 
addressee(s) named above or may contain information that is legally privileged. If you are 
not the intended addressee, or the person responsible for delivering it to the intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying this message is strictly 
prohibited. If you have received this message by mistake, please immediately notify us by 
replying to the message and delete the original message and any copies immediately thereafter. 

Thank you.- 
******************************************************************************************
FAFLD

Re: Solr Indexing Time varying each time I index

Posted by Erick Erickson <er...@gmail.com>.
We've seen around a 10-15% decrease in performance on average in a
virtualized environment as a first approximation, which doesn't explain
your results but might give you a place to start.

I'm pretty sure Solr isn't an issue, but my question is how much RAM is
on your underlying hardware? And how much RAM are you letting the
opsystem have? One common problem is to use up all the RAM for
Solr and starve the OS, the OS makes better use of RAM for things
like disk caches etc.

All of which is to say that as far as I know, you're seeing the effects of
your VM, not Solr.

Best
Erick

On Thu, Nov 10, 2011 at 4:33 AM, Husain, Yavar <yh...@firstam.com> wrote:
> Solr 1.4 is doing great with respect to Indexing on a dedicated physical server (Windows Server 2008). For Indexing around 1 million full text documents (around 4 GB size) it takes around 20 minutes with Heap Size = 512M - 1G & 4GB RAM.
>
> However while using Solr on a VM, with 4 GB RAM it took 50 minutes to index at the first time. Note that there is no Network delays and no RAM issues. Now when I increased the RAM to 8GB and increased the heap size, the indexing time increased to 2 hrs. That was really strange. Note that except for SQL Server there is no other process running. There are no network delays. However I have not checked for File I/O. Can that be a bottleneck? Does Solr has any issues running in "Virtualization" Environment?
>
> I read a paper today by Brian & Harry: "ON THE RESPONSE TIME OF A SOLR SEARCH ENGINE IN A VIRTUALIZED ENVIRONMENT" & they claim that performance gets deteriorated when RAM is increased when Solr is running on a VM but that is with respect to query times and not indexing times.
>
> I am bit confused as to why it took longer on a VM when I repeated the same test second time with increased heap size and RAM.
> ******************************************************************************************
> This message may contain confidential or proprietary information intended only for the use of the
> addressee(s) named above or may contain information that is legally privileged. If you are
> not the intended addressee, or the person responsible for delivering it to the intended addressee,
> you are hereby notified that reading, disseminating, distributing or copying this message is strictly
> prohibited. If you have received this message by mistake, please immediately notify us by
> replying to the message and delete the original message and any copies immediately thereafter.
>
> Thank you.-
> ******************************************************************************************
> FAFLD
>