You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by JohnRodey <ti...@yahoo.com> on 2011/06/03 01:57:20 UTC

Better to have lots of smaller cores or one really big core?

I am trying to decide what the right approach would be, to have one big core
and many smaller cores hosted by a solr instance.

I think there may be trade offs either way but wanted to see what others do. 
And by small I mean about 5-10 million documents, large may be 50 million.

It seems like small cores are better because
- If one server can host say 70 million documents (before memory issues) we
can get really close with a bunch of small indexes, vs only being able to
host one 50 million document index.  And when a software update comes out
that allows us to host 90 million then we could add a few more small
indexes. 
- It takes less time to build ten 5 million document indexes than one 50
million document index.

It seems like larger cores are better because
- Each core returns their result set, so if I want 1000 results and their
are 100 cores the network is transferring 100000 documents for that search. 
Where if I had only 10 much larger cores only 10000 documents would be sent
over the network.
- It would prolong my time until I hit uri length limits being that there
would be less cores in my system.

Any thoughts???  Other trade-offs???

How do you find what the right size for you is?

--
View this message in context: http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3017973.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Better to have lots of smaller cores or one really big core?

Posted by Erick Erickson <er...@gmail.com>.
Nope, cores are just a self-contained index, really.

What is the point of breaking them up? If you have some kind
of rolling currency (i.e. you only want to keep the last N days/weeks/months)
then you can always delete-by-query to age-out the relevant docs.

You'll be able to fit more on one server if it's in a single core, but what the
ratio is I'm not sure.

My take would be go for the simplest, which would be a single core (index)
for administrative purposes if for no other reason, but that may well just be
personal preference...

Best
Erick

On Fri, Jun 3, 2011 at 10:10 AM, JohnRodey <ti...@yahoo.com> wrote:
> Thanks Erick for the response.
>
> So my data structure is the same, i.e. they all use the same schema.  Though
> I think it makes sense for us to somehow break apart the data, for example
> by the date it was indexed.  I'm just trying to get a feel for how large we
> should aim to keep those (by day, by week, by month, etc...).
>
> So it sounds like we should aim to keep them at a size that one solr server
> can host to avoid serving multiple cores.
>
> One question, there is no real difference (other than configuration) from a
> server hosting its own index vs. it hosting one core, is there?
>
> Thanks!
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3019686.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Better to have lots of smaller cores or one really big core?

Posted by JohnRodey <ti...@yahoo.com>.
Thanks Erick for the response.

So my data structure is the same, i.e. they all use the same schema.  Though
I think it makes sense for us to somehow break apart the data, for example
by the date it was indexed.  I'm just trying to get a feel for how large we
should aim to keep those (by day, by week, by month, etc...).

So it sounds like we should aim to keep them at a size that one solr server
can host to avoid serving multiple cores.

One question, there is no real difference (other than configuration) from a
server hosting its own index vs. it hosting one core, is there?

Thanks!

--
View this message in context: http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3019686.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Better to have lots of smaller cores or one really big core?

Posted by Erick Erickson <er...@gmail.com>.
Take another approach <G>? Cores are often used for isolation
purposes. That is, the data in one core may have nothing to do with
another core, the schemas don't have to match etc. They #may# be
both logically and physically separate.

I don't have measurements for this, so I'm guessing a little. But I expect
that using multiple cores will actually use a few more resources than a
single core (e.g. memory). Each core will be keeping a separate
cache, duplicating terms etc. (I may be wrong on this one!).

But if you have a single schema in a logically single core that just grows
too big to server queries acceptably, the usual approach is to go to
shards, which are just a core but Solr manages the query part over
multiple shards via configuration, which is probably easier. So the answer
in this case is to put stuff on a single machine in a single core until it
grows too big, then go to sharding....

So the question is really whether you consider the cores sub-parts of a
single index or distinct units (say one core per customer). In the former,
I'd use one core until it gets too big, then shard. In the latter, multiple
cores are a good solution, largely for administrative/security reasons,
but then you aren't manually constructing a huge URL...

Hope that helps
Erick

On Thu, Jun 2, 2011 at 7:57 PM, JohnRodey <ti...@yahoo.com> wrote:
> I am trying to decide what the right approach would be, to have one big core
> and many smaller cores hosted by a solr instance.
>
> I think there may be trade offs either way but wanted to see what others do.
> And by small I mean about 5-10 million documents, large may be 50 million.
>
> It seems like small cores are better because
> - If one server can host say 70 million documents (before memory issues) we
> can get really close with a bunch of small indexes, vs only being able to
> host one 50 million document index.  And when a software update comes out
> that allows us to host 90 million then we could add a few more small
> indexes.
> - It takes less time to build ten 5 million document indexes than one 50
> million document index.
>
> It seems like larger cores are better because
> - Each core returns their result set, so if I want 1000 results and their
> are 100 cores the network is transferring 100000 documents for that search.
> Where if I had only 10 much larger cores only 10000 documents would be sent
> over the network.
> - It would prolong my time until I hit uri length limits being that there
> would be less cores in my system.
>
> Any thoughts???  Other trade-offs???
>
> How do you find what the right size for you is?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3017973.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>