You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jonathan Ariel <io...@gmail.com> on 2009/09/04 01:05:15 UTC

Single Core or Multiple Core?

It seems like it is really hard to decide when the Multiple Core solution is
more appropriate.As I could understand from this list and wiki the Multiple
Core feature was designed to address the need of handling different sets of
data within the same solr instance, where the sets of data don't need to be
joined.
In my case the documents are of a specific site and country. So document A
can be of Site 1 / Country 1, B of Site 2 / Country 1, C of Site 1 / Country
2, and so on.
For the use cases of my application I will never query across countries or
sites. I will always have to provide to the query the country id and the
site id.
Would you suggest to split my data into cores? I have few sites (around 20)
and more countries (around 90).
Should I split my data into sites (around 20 cores) and within a core filter
by site? Should I split by Site and Country (around 1800 cores)?
What should I consider when splitting my data into multiple cores?

Thanks

Jonathan

Re: Single Core or Multiple Core?

Posted by Jonathan Ariel <io...@gmail.com>.
Yes, it seems like I don't need to split. I could use different commit
times. In my use case it is too often and I could have a different commit
time on a country basis.Your questions made me rethink the need of splitting
into cores.

Thanks

On Fri, Sep 4, 2009 at 5:38 AM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Fri, Sep 4, 2009 at 4:35 AM, Jonathan Ariel <io...@gmail.com> wrote:
>
> > It seems like it is really hard to decide when the Multiple Core solution
> > is
> > more appropriate.As I could understand from this list and wiki the
> Multiple
> > Core feature was designed to address the need of handling different sets
> of
> > data within the same solr instance, where the sets of data don't need to
> be
> > joined.
> >
>
> Correct. It is also useful when you don't want to setup multiple boxes or
> tomcats for each Solr.
>
>
> > In my case the documents are of a specific site and country. So document
> A
> > can be of Site 1 / Country 1, B of Site 2 / Country 1, C of Site 1 /
> > Country
> > 2, and so on.
> > For the use cases of my application I will never query across countries
> or
> > sites. I will always have to provide to the query the country id and the
> > site id.
> > Would you suggest to split my data into cores? I have few sites (around
> 20)
> > and more countries (around 90).
> > Should I split my data into sites (around 20 cores) and within a core
> > filter
> > by site? Should I split by Site and Country (around 1800 cores)?
> > What should I consider when splitting my data into multiple cores?
> >
> >
> The first question is why do you want to split at all? Is the schema or
> solrconfig different? Are the different sites or countries updated at
> different times? Is the combined index very big that the response times
> jump
> wildly when all the caches are thrown out if documents related to one site
> or country are updated? Does warmup or optimize or replication take too
> much
> time with one big index?
>
> Each core will have its own configuration files (maintenance) and you need
> to setup replication separately for each core (which is a pain with the
> script based replication). Also note that by keeping all cores in one
> tomcat
> (one JVM), a stop-the-world GC will stop all cores which is not the case
> when using separate JVMs for each index/core.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Single Core or Multiple Core?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, Sep 4, 2009 at 4:35 AM, Jonathan Ariel <io...@gmail.com> wrote:

> It seems like it is really hard to decide when the Multiple Core solution
> is
> more appropriate.As I could understand from this list and wiki the Multiple
> Core feature was designed to address the need of handling different sets of
> data within the same solr instance, where the sets of data don't need to be
> joined.
>

Correct. It is also useful when you don't want to setup multiple boxes or
tomcats for each Solr.


> In my case the documents are of a specific site and country. So document A
> can be of Site 1 / Country 1, B of Site 2 / Country 1, C of Site 1 /
> Country
> 2, and so on.
> For the use cases of my application I will never query across countries or
> sites. I will always have to provide to the query the country id and the
> site id.
> Would you suggest to split my data into cores? I have few sites (around 20)
> and more countries (around 90).
> Should I split my data into sites (around 20 cores) and within a core
> filter
> by site? Should I split by Site and Country (around 1800 cores)?
> What should I consider when splitting my data into multiple cores?
>
>
The first question is why do you want to split at all? Is the schema or
solrconfig different? Are the different sites or countries updated at
different times? Is the combined index very big that the response times jump
wildly when all the caches are thrown out if documents related to one site
or country are updated? Does warmup or optimize or replication take too much
time with one big index?

Each core will have its own configuration files (maintenance) and you need
to setup replication separately for each core (which is a pain with the
script based replication). Also note that by keeping all cores in one tomcat
(one JVM), a stop-the-world GC will stop all cores which is not the case
when using separate JVMs for each index/core.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Single Core or Multiple Core?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sat, Sep 12, 2009 at 9:45 PM, Jonathan Ariel <io...@gmail.com> wrote:

> What do you mean by "single-core deployments does not have a way to enable
> CoreAdminHandler"?I'm just trying to understand the feature that you are
> talking about
>
>
I'm talking about the core related commands described here:

http://wiki.apache.org/solr/CoreAdmin

-- 
Regards,
Shalin Shekhar Mangar.

Re: Single Core or Multiple Core?

Posted by Jonathan Ariel <io...@gmail.com>.
What do you mean by "single-core deployments does not have a way to enable
CoreAdminHandler"?I'm just trying to understand the feature that you are
talking about

On Sat, Sep 12, 2009 at 6:44 AM, Uri Boness <ub...@gmail.com> wrote:

> +1
> Can you add a JIRA issue for that so we can vote for it?
>
>
> Chris Hostetter wrote:
>
>> : > For the record: even if you're only going to have one SOlrCore, using
>> the
>> : > multicore support (ie: having a solr.xml file) might prove handy from
>> a
>> : > maintence standpoint ... the ability to configure new "on deck cores"
>> with
>>        ...
>> : Yeah, it is a shame that single-core deployments (no solr.xml) does not
>> have
>> : a way to enable CoreAdminHandler. This is something we should definitely
>> : look at in Solr 1.5.
>>
>> I think the most straight forward starting point is to switch how we
>> structure the examples so that all of the examples uses a solr.xml with
>> multicore support.
>>
>> Then we can move forward on deprecating the specification of "Solr Home"
>> using JNDI/systemvars and switch to having the location of the solr.xml be
>> the one master config option with everything else coming after that.
>>
>>
>>
>> -Hoss
>>
>>
>>
>>
>

Re: Single Core or Multiple Core?

Posted by Israel Ekpo <is...@gmail.com>.
I concur with Uri, but I would also add that it might be helpful to specify
a default core to use somewhere in the configuration file.

So that if no core is specified, the default one will be implicitly
selected.

I am not sure if this feature is available yet.

What do you think?

On Mon, Sep 14, 2009 at 10:46 AM, Uri Boness <ub...@gmail.com> wrote:

> Is it really a problem? I mean, as i see it, solr to cores is what RDBMS is
> to databases. When you connect to a database you also need to specify the
> database name.
>
> Cheers,
> Uri
>
>
> On Sep 14, 2009, at 16:27, Noble Paul നോബിള്‍  नोब्ळ् <
> noble.paul@corp.aol.com> wrote:
>
>  The problem is that, if we use multicore it forces you to use a core
>> name. this is inconvenient. We must get rid of this restriction before
>> we move single-core to multicore.
>>
>>
>>
>> On Sat, Sep 12, 2009 at 3:14 PM, Uri Boness <ub...@gmail.com> wrote:
>>
>>> +1
>>> Can you add a JIRA issue for that so we can vote for it?
>>>
>>> Chris Hostetter wrote:
>>>
>>>>
>>>> : > For the record: even if you're only going to have one SOlrCore,
>>>> using
>>>> the
>>>> : > multicore support (ie: having a solr.xml file) might prove handy
>>>> from
>>>> a
>>>> : > maintence standpoint ... the ability to configure new "on deck
>>>> cores"
>>>> with
>>>>       ...
>>>> : Yeah, it is a shame that single-core deployments (no solr.xml) does
>>>> not
>>>> have
>>>> : a way to enable CoreAdminHandler. This is something we should
>>>> definitely
>>>> : look at in Solr 1.5.
>>>>
>>>> I think the most straight forward starting point is to switch how we
>>>> structure the examples so that all of the examples uses a solr.xml with
>>>> multicore support.
>>>>
>>>> Then we can move forward on deprecating the specification of "Solr Home"
>>>> using JNDI/systemvars and switch to having the location of the solr.xml
>>>> be
>>>> the one master config option with everything else coming after that.
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.

Re: Single Core or Multiple Core?

Posted by Sumit <ta...@gmail.com>.
Hi,
    I am posting my query as it is very much related to this one. I have an
application which creates different dataset on each invocation. With
dataset, I mean different columns/schema/metadata and different data
ofcourse. The only fields common to the data generated are the primary key.
I have looked into multiple forums but could not get on what is the best
solution wrt single core _vs_ multiple core. As my application creates new
set of fields and data every time, the schema is different on each run I can
say. the numbers can go upto 100. If I use multi core, at a time at worst
case I will be interested in around 20 cores.

My questions are:

1) What factors should I look into to decide what to opt ? 

Pros:
a) I understand that search performance will be faster, giving a good user
experience.
b) Maintainability will be good as change in one core will not effect other.
c) I can easily fetch schema for each run if I have multiple cores. With
single core holding all the data, it is going to be tricky.

Cons:
a) We may have to move cores in and out of memory in case of contentions.
This will result in more CPU utilization.

2) What is the indexing time taken for single core if its updated again and
again ? Does the whole core gets effected in case I add more rows ? In
multiple core model, I need to index the one core I am updating.

3) How much more memory index it takes when I plan to split the single core
into multiple cores ?

Please help me answer these questions.

Thanks,
Sumit



--
View this message in context: http://lucene.472066.n3.nabble.com/Single-Core-or-Multiple-Core-tp501748p4004254.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Single Core or Multiple Core?

Posted by Chris Hostetter <ho...@fucit.org>.
: A large majority of users use single core ONLY. It is hard to explain
: them the need for an extra componentin the url.

A majority use only a single core because that's all they know because 
it's what the default example and the tutorial use.  Even when people 
have no have use for running multiple cores with differnet 
schemas *concurrently* the value of swapping out cores on config upgrade 
is certainly worth the inconvinince of needing to add "/corename" to the 
urls they connect from in their clients.

: I would say it is a design problem which we should solve instead of
: asking users to change

the pros/cons of default core names were discussed at great length when 
multicore support was first added.  Because of core swapping and path 
based requestHandler naming the confusion introduced by trying to have a 
default core winds up being *vastly* worse then the confusion of trying to 
explain why they should use "/solr/core/select" instead of "/solr/select"


-Hoss


Re: Single Core or Multiple Core?

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
A large majority of users use single core ONLY. It is hard to explain
them the need for an extra componentin the url.

I would say it is a design problem which we should solve instead of
asking users to change

On Tue, Sep 15, 2009 at 3:12 AM, Uri Boness <ub...@gmail.com> wrote:
> IMO forcing the users to do configuration change in Solr or in their
> application is the same thing - it all boils down to configuration change
> (I'll be very surprised if someone is actually hardcoding the Solr URL in
> their system - most probably it is configurable, and if it's not, forcing
> them to change it is actually a good thing).
>>
>> Besides,
>> if there's only one core, why need a name?
>
> Consistency. Having a default core as Israel suggested can probably do the
> trick. But, at first it might seem that having a default core and not
> needing to specify the core name will make it easier for users to use. But I
> actually disagree - don't under estimate the power of being consistent. I
> rather have a manual telling me "this is how it works and it always work
> like that in all scenarios" then having something like "this is how it works
> but if you have scenario A then it works differently and you have to do this
> instead".
>
> Shalin Shekhar Mangar wrote:
>>
>> On Mon, Sep 14, 2009 at 8:16 PM, Uri Boness <ub...@gmail.com> wrote:
>>
>>
>>>
>>> Is it really a problem? I mean, as i see it, solr to cores is what RDBMS
>>> is
>>> to databases. When you connect to a database you also need to specify the
>>> database name.
>>>
>>>
>>>
>>
>> The problem is compatibility. If we make solr.xml compulsory then we only
>> force people to do a configuration change. But if we make a core name
>> mandatory, then we force them to change their applications (or the
>> applications' configurations). It is better if we can avoid that. Besides,
>> if there's only one core, why need a name?
>>
>>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Single Core or Multiple Core?

Posted by Uri Boness <ub...@gmail.com>.
IMO forcing the users to do configuration change in Solr or in their 
application is the same thing - it all boils down to configuration 
change (I'll be very surprised if someone is actually hardcoding the 
Solr URL in their system - most probably it is configurable, and if it's 
not, forcing them to change it is actually a good thing).
> Besides,
> if there's only one core, why need a name?
Consistency. Having a default core as Israel suggested can probably do 
the trick. But, at first it might seem that having a default core and 
not needing to specify the core name will make it easier for users to 
use. But I actually disagree - don't under estimate the power of being 
consistent. I rather have a manual telling me "this is how it works and 
it always work like that in all scenarios" then having something like 
"this is how it works but if you have scenario A then it works 
differently and you have to do this instead".

Shalin Shekhar Mangar wrote:
> On Mon, Sep 14, 2009 at 8:16 PM, Uri Boness <ub...@gmail.com> wrote:
>
>   
>> Is it really a problem? I mean, as i see it, solr to cores is what RDBMS is
>> to databases. When you connect to a database you also need to specify the
>> database name.
>>
>>
>>     
> The problem is compatibility. If we make solr.xml compulsory then we only
> force people to do a configuration change. But if we make a core name
> mandatory, then we force them to change their applications (or the
> applications' configurations). It is better if we can avoid that. Besides,
> if there's only one core, why need a name?
>
>   

Re: Single Core or Multiple Core?

Posted by Jonathan Ariel <io...@gmail.com>.
Yes, I think it is better to be backward compatible or the impact of moving
to the new solr version would be big.


On Mon, Sep 14, 2009 at 12:24 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Mon, Sep 14, 2009 at 8:16 PM, Uri Boness <ub...@gmail.com> wrote:
>
> > Is it really a problem? I mean, as i see it, solr to cores is what RDBMS
> is
> > to databases. When you connect to a database you also need to specify the
> > database name.
> >
> >
> The problem is compatibility. If we make solr.xml compulsory then we only
> force people to do a configuration change. But if we make a core name
> mandatory, then we force them to change their applications (or the
> applications' configurations). It is better if we can avoid that. Besides,
> if there's only one core, why need a name?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Single Core or Multiple Core?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Mon, Sep 14, 2009 at 8:16 PM, Uri Boness <ub...@gmail.com> wrote:

> Is it really a problem? I mean, as i see it, solr to cores is what RDBMS is
> to databases. When you connect to a database you also need to specify the
> database name.
>
>
The problem is compatibility. If we make solr.xml compulsory then we only
force people to do a configuration change. But if we make a core name
mandatory, then we force them to change their applications (or the
applications' configurations). It is better if we can avoid that. Besides,
if there's only one core, why need a name?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Single Core or Multiple Core?

Posted by Uri Boness <ub...@gmail.com>.
Is it really a problem? I mean, as i see it, solr to cores is what  
RDBMS is to databases. When you connect to a database you also need to  
specify the database name.

Cheers,
Uri

On Sep 14, 2009, at 16:27, Noble Paul നോബിള്‍  नो 
ब्ळ् <no...@corp.aol.com> wrote:

> The problem is that, if we use multicore it forces you to use a core
> name. this is inconvenient. We must get rid of this restriction before
> we move single-core to multicore.
>
>
>
> On Sat, Sep 12, 2009 at 3:14 PM, Uri Boness <ub...@gmail.com> wrote:
>> +1
>> Can you add a JIRA issue for that so we can vote for it?
>>
>> Chris Hostetter wrote:
>>>
>>> : > For the record: even if you're only going to have one  
>>> SOlrCore, using
>>> the
>>> : > multicore support (ie: having a solr.xml file) might prove  
>>> handy from
>>> a
>>> : > maintence standpoint ... the ability to configure new "on deck  
>>> cores"
>>> with
>>>        ...
>>> : Yeah, it is a shame that single-core deployments (no solr.xml)  
>>> does not
>>> have
>>> : a way to enable CoreAdminHandler. This is something we should  
>>> definitely
>>> : look at in Solr 1.5.
>>>
>>> I think the most straight forward starting point is to switch how we
>>> structure the examples so that all of the examples uses a solr.xml  
>>> with
>>> multicore support.
>>>
>>> Then we can move forward on deprecating the specification of "Solr  
>>> Home"
>>> using JNDI/systemvars and switch to having the location of the  
>>> solr.xml be
>>> the one master config option with everything else coming after that.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>>
>>
>
>
>
> -- 
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Single Core or Multiple Core?

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
The problem is that, if we use multicore it forces you to use a core
name. this is inconvenient. We must get rid of this restriction before
we move single-core to multicore.



On Sat, Sep 12, 2009 at 3:14 PM, Uri Boness <ub...@gmail.com> wrote:
> +1
> Can you add a JIRA issue for that so we can vote for it?
>
> Chris Hostetter wrote:
>>
>> : > For the record: even if you're only going to have one SOlrCore, using
>> the
>> : > multicore support (ie: having a solr.xml file) might prove handy from
>> a
>> : > maintence standpoint ... the ability to configure new "on deck cores"
>> with
>>        ...
>> : Yeah, it is a shame that single-core deployments (no solr.xml) does not
>> have
>> : a way to enable CoreAdminHandler. This is something we should definitely
>> : look at in Solr 1.5.
>>
>> I think the most straight forward starting point is to switch how we
>> structure the examples so that all of the examples uses a solr.xml with
>> multicore support.
>>
>> Then we can move forward on deprecating the specification of "Solr Home"
>> using JNDI/systemvars and switch to having the location of the solr.xml be
>> the one master config option with everything else coming after that.
>>
>>
>>
>> -Hoss
>>
>>
>>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Single Core or Multiple Core?

Posted by Uri Boness <ub...@gmail.com>.
+1
Can you add a JIRA issue for that so we can vote for it?

Chris Hostetter wrote:
> : > For the record: even if you're only going to have one SOlrCore, using the
> : > multicore support (ie: having a solr.xml file) might prove handy from a
> : > maintence standpoint ... the ability to configure new "on deck cores" with
> 	...
> : Yeah, it is a shame that single-core deployments (no solr.xml) does not have
> : a way to enable CoreAdminHandler. This is something we should definitely
> : look at in Solr 1.5.
>
> I think the most straight forward starting point is to switch how we 
> structure the examples so that all of the examples uses a solr.xml with 
> multicore support.
>
> Then we can move forward on deprecating the specification of "Solr Home" 
> using JNDI/systemvars and switch to having the location of the solr.xml be 
> the one master config option with everything else coming after that.
>
>
>
> -Hoss
>
>
>   

Re: Single Core or Multiple Core?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sat, Sep 12, 2009 at 9:43 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : > For the record: even if you're only going to have one SOlrCore, using
> the
> : > multicore support (ie: having a solr.xml file) might prove handy from a
> : > maintence standpoint ... the ability to configure new "on deck cores"
> with
>         ...
> : Yeah, it is a shame that single-core deployments (no solr.xml) does not
> have
> : a way to enable CoreAdminHandler. This is something we should definitely
> : look at in Solr 1.5.
>
> I think the most straight forward starting point is to switch how we
> structure the examples so that all of the examples uses a solr.xml with
> multicore support.
>
> Then we can move forward on deprecating the specification of "Solr Home"
> using JNDI/systemvars and switch to having the location of the solr.xml be
> the one master config option with everything else coming after that.
>
>
+1

-- 
Regards,
Shalin Shekhar Mangar.

Re: Single Core or Multiple Core?

Posted by Chris Hostetter <ho...@fucit.org>.
: > For the record: even if you're only going to have one SOlrCore, using the
: > multicore support (ie: having a solr.xml file) might prove handy from a
: > maintence standpoint ... the ability to configure new "on deck cores" with
	...
: Yeah, it is a shame that single-core deployments (no solr.xml) does not have
: a way to enable CoreAdminHandler. This is something we should definitely
: look at in Solr 1.5.

I think the most straight forward starting point is to switch how we 
structure the examples so that all of the examples uses a solr.xml with 
multicore support.

Then we can move forward on deprecating the specification of "Solr Home" 
using JNDI/systemvars and switch to having the location of the solr.xml be 
the one master config option with everything else coming after that.



-Hoss


Re: Single Core or Multiple Core?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sat, Sep 12, 2009 at 12:12 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> For the record: even if you're only going to have one SOlrCore, using the
> multicore support (ie: having a solr.xml file) might prove handy from a
> maintence standpoint ... the ability to configure new "on deck cores" with
> new configs, populate them with data, and then swap them in place for your
> previous core without any downtime is a really nice feature to take
> advantage of.
>
>
Yeah, it is a shame that single-core deployments (no solr.xml) does not have
a way to enable CoreAdminHandler. This is something we should definitely
look at in Solr 1.5.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Single Core or Multiple Core?

Posted by Chris Hostetter <ho...@fucit.org>.
For the record: even if you're only going to have one SOlrCore, using the 
multicore support (ie: having a solr.xml file) might prove handy from a 
maintence standpoint ... the ability to configure new "on deck cores" with 
new configs, populate them with data, and then swap them in place for your 
previous core without any downtime is a really nice feature to take 
advantage of.


: Date: Thu, 3 Sep 2009 20:05:15 -0300
: From: Jonathan Ariel <io...@gmail.com>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Single Core or Multiple Core?
: 
: It seems like it is really hard to decide when the Multiple Core solution is
: more appropriate.As I could understand from this list and wiki the Multiple
: Core feature was designed to address the need of handling different sets of
: data within the same solr instance, where the sets of data don't need to be
: joined.
: In my case the documents are of a specific site and country. So document A
: can be of Site 1 / Country 1, B of Site 2 / Country 1, C of Site 1 / Country
: 2, and so on.
: For the use cases of my application I will never query across countries or
: sites. I will always have to provide to the query the country id and the
: site id.
: Would you suggest to split my data into cores? I have few sites (around 20)
: and more countries (around 90).
: Should I split my data into sites (around 20 cores) and within a core filter
: by site? Should I split by Site and Country (around 1800 cores)?
: What should I consider when splitting my data into multiple cores?
: 
: Thanks
: 
: Jonathan
: 



-Hoss