You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bernd Fehling <be...@uni-bielefeld.de> on 2015/10/01 08:47:39 UTC

Re: [poll] virtualization platform for SOLR

Hi Shawn,

unfortunately we have to run VMs, otherwise we would waste hardware.
I thought other solr users are in the same situation but seams that
other users have tons of hardware available and we are the only one
having to use VMs.
Right, bare metal is always better than any VM.
As you mentioned we have the indexer (master) on one physical machine
and two searchers (slaves) on other physical machines, all together with
other little VMs which are not I/O and CPU heavy.

Regards
Bernd

Am 30.09.2015 um 18:48 schrieb Shawn Heisey:
> On 9/30/2015 3:12 AM, Bernd Fehling wrote:
>> while setting up some new servers (virtual machines) using XEN I was
>> thinking about an alternative like KVM. My last tests with KVM is
>> a while ago and XEN performed much better in the area of I/O and
>> CPU usage.
>> This lead me to the idea to start a poll about virtualization platform and your experiences.
> 
> I once had a virtualized Solr install with Xen where each VM housed one
> Solr instance with one core.  The index was distributed, so it required
> several VMs for one copy of the index.
> 
> I eliminated the virtualization, used the same hardware as bare metal
> with Linux, still one Solr instance installed on the machine, but with
> multiple Solr cores.  Performance is much better now.
> 
> General advice:  Don't run virtual machines.
> 
> If a virtual environment is the only significant hardware you have
> access to and it's used for more than Solr, then you might need to.  If
> you do run virtual, then minimize the number of VMs, don't put multiple
> replicas of the same index data on the same physical VM host, give each
> Solr VM lots of memory, and don't oversubscribe the memory/cpu on the
> physical VM host.
> 
> Thanks,
> Shawn
> 

Re: [poll] virtualization platform for SOLR

Posted by Upayavira <uv...@odoko.co.uk>.
What are you trying to achieve by using virtualisation?

If it is just code separation, consider using containers and Docker
rather than fully fledged VMs.

CPU is shared, but each container sees its own view of its file system.

Upayavira

On Thu, Oct 1, 2015, at 07:47 AM, Bernd Fehling wrote:
> Hi Shawn,
> 
> unfortunately we have to run VMs, otherwise we would waste hardware.
> I thought other solr users are in the same situation but seams that
> other users have tons of hardware available and we are the only one
> having to use VMs.
> Right, bare metal is always better than any VM.
> As you mentioned we have the indexer (master) on one physical machine
> and two searchers (slaves) on other physical machines, all together with
> other little VMs which are not I/O and CPU heavy.
> 
> Regards
> Bernd
> 
> Am 30.09.2015 um 18:48 schrieb Shawn Heisey:
> > On 9/30/2015 3:12 AM, Bernd Fehling wrote:
> >> while setting up some new servers (virtual machines) using XEN I was
> >> thinking about an alternative like KVM. My last tests with KVM is
> >> a while ago and XEN performed much better in the area of I/O and
> >> CPU usage.
> >> This lead me to the idea to start a poll about virtualization platform and your experiences.
> > 
> > I once had a virtualized Solr install with Xen where each VM housed one
> > Solr instance with one core.  The index was distributed, so it required
> > several VMs for one copy of the index.
> > 
> > I eliminated the virtualization, used the same hardware as bare metal
> > with Linux, still one Solr instance installed on the machine, but with
> > multiple Solr cores.  Performance is much better now.
> > 
> > General advice:  Don't run virtual machines.
> > 
> > If a virtual environment is the only significant hardware you have
> > access to and it's used for more than Solr, then you might need to.  If
> > you do run virtual, then minimize the number of VMs, don't put multiple
> > replicas of the same index data on the same physical VM host, give each
> > Solr VM lots of memory, and don't oversubscribe the memory/cpu on the
> > physical VM host.
> > 
> > Thanks,
> > Shawn
> > 

Re: [poll] virtualization platform for SOLR

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Bernd Fehling <be...@uni-bielefeld.de> wrote:
> unfortunately we have to run VMs, otherwise we would waste hardware.
> I thought other solr users are in the same situation but seams that
> other users have tons of hardware available and we are the only one
> having to use VMs.

We have ~5 smaller (< 1M documents) solr setups that runs under VMWare (chosen because that is what Operations use for all their virtualization). We have a single and quite large setup (terabytes of data, billions of documents) that runs alone on dedicated hardware. Then we have the third solution: Multiple independent Solr oriented projects that share the same bare metal. CentOS everywhere BTW.

We would probably get better hardware utilization by running the hardware sharing setups in a virtualization system, together with some random other projects. But I doubt we would gain much for the cost of rocking the high-performance boat.

We do have some other bare-metal setups than Solr at our organization (State and University Library, Denmark), but the default for most other projects is to use virtualizations. Going mostly bare metal with Solr was an explicit and performance-driven decision.

Except for the virtualized instances, we only use local SSDs to hold our index data. That might affect the trade-off as even slight delays in IO becomes visible, when storage access times are < 0.1ms instead of > 1ms. I suspect the relative impact of virtualization is less with spinning drives or networked storage.

- Toke Eskildsen

RE: [poll] virtualization platform for SOLR

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
Shawn,

Same answer as Bernd.   We have a big VmWare vCenter setup and Netapp.    That's what we have to use.    Even in a VM world, some advice persists - "local" disk is faster than network disk even if the "local" disk is virtual.    Netapp disk is exported to VmWare vCenter over Fibre-Channel, and vCenter has its own battery-backed caching.   It is still far better to use "local" disk even on a VM rather than use NFS.   

I did some scientifically not reliable tests using fio, and then replaying search logs to prove this... 

Hope this helps,

Dan

-----Original Message-----
From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de] 
Sent: Thursday, October 01, 2015 2:48 AM
To: solr-user@lucene.apache.org
Subject: Re: [poll] virtualization platform for SOLR

Hi Shawn,

unfortunately we have to run VMs, otherwise we would waste hardware.
I thought other solr users are in the same situation but seams that other users have tons of hardware available and we are the only one having to use VMs.
Right, bare metal is always better than any VM.
As you mentioned we have the indexer (master) on one physical machine and two searchers (slaves) on other physical machines, all together with other little VMs which are not I/O and CPU heavy.

Regards
Bernd

Am 30.09.2015 um 18:48 schrieb Shawn Heisey:
> On 9/30/2015 3:12 AM, Bernd Fehling wrote:
>> while setting up some new servers (virtual machines) using XEN I was 
>> thinking about an alternative like KVM. My last tests with KVM is a 
>> while ago and XEN performed much better in the area of I/O and CPU 
>> usage.
>> This lead me to the idea to start a poll about virtualization platform and your experiences.
> 
> I once had a virtualized Solr install with Xen where each VM housed 
> one Solr instance with one core.  The index was distributed, so it 
> required several VMs for one copy of the index.
> 
> I eliminated the virtualization, used the same hardware as bare metal 
> with Linux, still one Solr instance installed on the machine, but with 
> multiple Solr cores.  Performance is much better now.
> 
> General advice:  Don't run virtual machines.
> 
> If a virtual environment is the only significant hardware you have 
> access to and it's used for more than Solr, then you might need to.  
> If you do run virtual, then minimize the number of VMs, don't put 
> multiple replicas of the same index data on the same physical VM host, 
> give each Solr VM lots of memory, and don't oversubscribe the 
> memory/cpu on the physical VM host.
> 
> Thanks,
> Shawn
> 

Re: [poll] virtualization platform for SOLR

Posted by Bernd Fehling <be...@uni-bielefeld.de>.
Hi Upayavira,

best would be to have 4 dedicated servers, 2 for indexing (masters) and
2 for searching (slaves). Always one is online and one is standby in
case of hardware failure or update of OS, JAVA or even SOLR.

But I only get 256GB RAM machines with many CPUs which I have to share
with other project partners. Such a machine as dedicated SOLR server
would be oversized for a single index SOLR system.
Currently 64GB RAM machines are sufficient.

You think docker could do this?

Regards
Bernd

Am 01.10.2015 um 09:29 schrieb Upayavira:
> What are you trying to achieve by using virtualisation?
> 
> If it is just code separation, consider using containers and Docker
> rather than fully fledged VMs.
> 
> CPU is shared, but each container sees its own view of its file system.
> 
> Upayavira
> 
> On Thu, Oct 1, 2015, at 07:47 AM, Bernd Fehling wrote:
>> Hi Shawn,
>>
>> unfortunately we have to run VMs, otherwise we would waste hardware.
>> I thought other solr users are in the same situation but seams that
>> other users have tons of hardware available and we are the only one
>> having to use VMs.
>> Right, bare metal is always better than any VM.
>> As you mentioned we have the indexer (master) on one physical machine
>> and two searchers (slaves) on other physical machines, all together with
>> other little VMs which are not I/O and CPU heavy.
>>
>> Regards
>> Bernd
>>
>> Am 30.09.2015 um 18:48 schrieb Shawn Heisey:
>>> On 9/30/2015 3:12 AM, Bernd Fehling wrote:
>>>> while setting up some new servers (virtual machines) using XEN I was
>>>> thinking about an alternative like KVM. My last tests with KVM is
>>>> a while ago and XEN performed much better in the area of I/O and
>>>> CPU usage.
>>>> This lead me to the idea to start a poll about virtualization platform and your experiences.
>>>
>>> I once had a virtualized Solr install with Xen where each VM housed one
>>> Solr instance with one core.  The index was distributed, so it required
>>> several VMs for one copy of the index.
>>>
>>> I eliminated the virtualization, used the same hardware as bare metal
>>> with Linux, still one Solr instance installed on the machine, but with
>>> multiple Solr cores.  Performance is much better now.
>>>
>>> General advice:  Don't run virtual machines.
>>>
>>> If a virtual environment is the only significant hardware you have
>>> access to and it's used for more than Solr, then you might need to.  If
>>> you do run virtual, then minimize the number of VMs, don't put multiple
>>> replicas of the same index data on the same physical VM host, give each
>>> Solr VM lots of memory, and don't oversubscribe the memory/cpu on the
>>> physical VM host.
>>>
>>> Thanks,
>>> Shawn
>>>

-- 
*************************************************************
Bernd Fehling                    Bielefeld University Library
Dipl.-Inform. (FH)                LibTec - Library Technology
Universitätsstr. 25                  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060       bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

Re: [poll] virtualization platform for SOLR

Posted by Bernd Fehling <be...@uni-bielefeld.de>.
Hi Toke,

I don't get SSDs, only spinning drives.
And as you mentioned, the impact of VMs is not that much if you use spinning drives.
It is more the VM software that matters and thats why we use XEN and not KVM.
With some tuning of sysctrl for the VMs it performs good, but bare-metal is still better
and should be preferred.

Regards
Bernd


Am 01.10.2015 um 09:44 schrieb Toke Eskildsen:
> Bernd Fehling <be...@uni-bielefeld.de> wrote:
>> unfortunately we have to run VMs, otherwise we would waste hardware.
>> I thought other solr users are in the same situation but seams that
>> other users have tons of hardware available and we are the only one
>> having to use VMs.
> 
> We have ~5 smaller (< 1M documents) solr setups that runs under VMWare (chosen because that is what Operations use for all their virtualization). We have a single and quite large setup (terabytes of data, billions of documents) that runs alone on dedicated hardware. Then we have the third solution: Multiple independent Solr oriented projects that share the same bare metal. CentOS everywhere BTW.
> 
> We would probably get better hardware utilization by running the hardware sharing setups in a virtualization system, together with some random other projects. But I doubt we would gain much for the cost of rocking the high-performance boat.
> 
> We do have some other bare-metal setups than Solr at our organization (State and University Library, Denmark), but the default for most other projects is to use virtualizations. Going mostly bare metal with Solr was an explicit and performance-driven decision.
> 
> Except for the virtualized instances, we only use local SSDs to hold our index data. That might affect the trade-off as even slight delays in IO becomes visible, when storage access times are < 0.1ms instead of > 1ms. I suspect the relative impact of virtualization is less with spinning drives or networked storage.
> 
> - Toke Eskildsen
>