You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Daniel Brügge <da...@googlemail.com> on 2012/11/09 10:03:16 UTC

Collections limit in SolrCloud aka best to use single index, SOLR-1293

Hi,

I have currently a problem creating cores in SolrCloud 4 (2 leaders, 2
replicas, 4 physical machines). I am using the new collections API
(e.g. with
http://myhost:8983/solr/admin/collections?action=CREATE&name=testcollection1&numShards=2&replicationFactor=2)
and try to create
approx 5000 cores. I am just creating them in the first row without filling
them with data, but after the first approx. 1000 cores (which works fine,
especially
with the first cores),  the system doesn't respond anymore and members
seems to be down.

I have searched and stumbled across
http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-td4018367.htmlwhere
Erick Erickson said

Keep in mind, though, that all the SolrCloud goodness centers around the
> idea of a single index that may be sharded. I don't think SolrCloud has
> had
> time to really think about handling the situation in which you have a
> bunch
> of cores that may or may not be sharded but are running on the same server.


And when i see SOLR-1293 i really wonder, if its currently the best to have
one huge collection instead of thousands smaller. The data
for the whole system is couple of hundreds of millions documents. Currently
for testing a single index setup with 100mio documents, feels really
slow, especially when it comes to faceting.

Thanks. Daniel

Re: Collections limit in SolrCloud aka best to use single index, SOLR-1293

Posted by Erick Erickson <er...@gmail.com>.
Well, SOLR-1293 is something of a special use-case. Not all of the cores in
this particular use-case are expected to be open at once. Part of that
effort is to be able to limit the number of open cores at once, and have
them closed/opened automatically.

That said, we're still working out what that means in the SolrCloud world,
especially how to make this notion play nice in that world. I'll defer to
Mark for any comments on Zookeeper....

My comments were meant mostly as a placeholder. I was thinking wondering
how having thousands of cores each of which may or may not have any
replicas and may or may not be sharded would be handled. Seems complex to
say the least....

As to your question about one huge collection .vs. thousands of smaller
ones... it depends (tm). If they're all going to be loaded at the same
time, you don't gain much of anything. In fact, having lots of little cores
will use more resources. The thrust of 1293 is the model where these cores
are rarely, if ever, all open at once and it makes sense to have, say, a
cap of 250 at once (or whatever).

But keep tracking 1293 for updates <G>..


On Fri, Nov 9, 2012 at 6:57 PM, Mark Miller <ma...@gmail.com> wrote:

> Have you looked at your logs? I think at around 1000 collections, the
> clusterstate.json node will become too large for zookeeper by default. It
> has a default limit of 1MB per node - you should be able to raise/override
> that limit with a sys prop or something when starting zookeeper. I can't
> remember offhand, but I know a quick google search digs it up.
>
> - Mark
>
> On Nov 9, 2012, at 4:03 AM, Daniel Brügge <da...@googlemail.com>
> wrote:
>
> > Hi,
> >
> > I have currently a problem creating cores in SolrCloud 4 (2 leaders, 2
> > replicas, 4 physical machines). I am using the new collections API
> > (e.g. with
> >
> http://myhost:8983/solr/admin/collections?action=CREATE&name=testcollection1&numShards=2&replicationFactor=2
> )
> > and try to create
> > approx 5000 cores. I am just creating them in the first row without
> filling
> > them with data, but after the first approx. 1000 cores (which works fine,
> > especially
> > with the first cores),  the system doesn't respond anymore and members
> > seems to be down.
> >
> > I have searched and stumbled across
> >
> http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-td4018367.htmlwhere
> > Erick Erickson said
> >
> > Keep in mind, though, that all the SolrCloud goodness centers around the
> >> idea of a single index that may be sharded. I don't think SolrCloud has
> >> had
> >> time to really think about handling the situation in which you have a
> >> bunch
> >> of cores that may or may not be sharded but are running on the same
> server.
> >
> >
> > And when i see SOLR-1293 i really wonder, if its currently the best to
> have
> > one huge collection instead of thousands smaller. The data
> > for the whole system is couple of hundreds of millions documents.
> Currently
> > for testing a single index setup with 100mio documents, feels really
> > slow, especially when it comes to faceting.
> >
> > Thanks. Daniel
>
>

Re: Collections limit in SolrCloud aka best to use single index, SOLR-1293

Posted by Mark Miller <ma...@gmail.com>.
Have you looked at your logs? I think at around 1000 collections, the clusterstate.json node will become too large for zookeeper by default. It has a default limit of 1MB per node - you should be able to raise/override that limit with a sys prop or something when starting zookeeper. I can't remember offhand, but I know a quick google search digs it up.

- Mark

On Nov 9, 2012, at 4:03 AM, Daniel Brügge <da...@googlemail.com> wrote:

> Hi,
> 
> I have currently a problem creating cores in SolrCloud 4 (2 leaders, 2
> replicas, 4 physical machines). I am using the new collections API
> (e.g. with
> http://myhost:8983/solr/admin/collections?action=CREATE&name=testcollection1&numShards=2&replicationFactor=2)
> and try to create
> approx 5000 cores. I am just creating them in the first row without filling
> them with data, but after the first approx. 1000 cores (which works fine,
> especially
> with the first cores),  the system doesn't respond anymore and members
> seems to be down.
> 
> I have searched and stumbled across
> http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-td4018367.htmlwhere
> Erick Erickson said
> 
> Keep in mind, though, that all the SolrCloud goodness centers around the
>> idea of a single index that may be sharded. I don't think SolrCloud has
>> had
>> time to really think about handling the situation in which you have a
>> bunch
>> of cores that may or may not be sharded but are running on the same server.
> 
> 
> And when i see SOLR-1293 i really wonder, if its currently the best to have
> one huge collection instead of thousands smaller. The data
> for the whole system is couple of hundreds of millions documents. Currently
> for testing a single index setup with 100mio documents, feels really
> slow, especially when it comes to faceting.
> 
> Thanks. Daniel