You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris R <co...@gmail.com> on 2013/03/26 03:29:21 UTC

Solrcloud 4.1 Collection with multiple slices only use

I have two issues and I'm unsure if they are related:

Problem:  After setting up a multiple collection Solrcloud 4.1 instance on
seven servers, when I index the documents they aren't distributed across
the index slices.  It feels as though, I don't actually have a "cloud"
implementation, yet everything I see in the admin interface and zookeeper
implies I do.  I feel as I'm overlooking something obvious, but have not
been able to figure out what.

Configuration: Seven servers and four collections, each with 12 slices (no
replica shards yet).  Zookeeper configured in a three node ensemble.  When
I send documents to Server1/Collection1 (which holds two slices of
collection1), all the documents show up in a single index shard (core).
 Perhaps related, I have found it impossible to get Solr to recognize the
server names with anything but a literal host="servername" parameter in the
solr.xml.  hostname parameters, host files, network, dns, are all
configured correctly....

I have a Solr 4.0 single collection set up similarly and it works just
fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
implementation with only the luceneMatchVersion changed to LUCENE_41.

sample solr.xml from server1

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
<cores adminPath="/admin/cores" hostPort="8080" host="server1"
shareSchema="true" zkClientTimeout="60000">
<core collection="col201301" shard="col201301s04"
instanceDir="/solr/col201301/col201301s04sh01" name="col201301s04sh01"
dataDir="/solr/col201301/col201301s04sh01/data"/>
<core collection="col201301" shard="col201301s11"
instanceDir="/solr/col201301/col201301s11sh01" name="col201301s11sh01"
dataDir="/solr/col201301/col201301s11sh01/data"/>
<core collection="col201302" shard="col201302s06"
instanceDir="/solr/col201302/col201302s06sh01" name="col201302s06sh01"
dataDir="/solr/col201302/col201302s06sh01/data"/>
<core collection="col201303" shard="col201303s01"
instanceDir="/solr/col201303/col201303s01sh01" name="col201303s01sh01"
dataDir="/solr/col201303/col201303s01sh01/data"/>
<core collection="col201303" shard="col201303s08"
instanceDir="/solr/col201303/col201303s08sh01" name="col201303s08sh01"
dataDir="/solr/col201303/col201303s08sh01/data"/>
<core collection="col201304" shard="col201304s03"
instanceDir="/solr/col201304/col201304s03sh01" name="col201304s03sh01"
dataDir="/solr/col201304/col201304s03sh01/data"/>
<core collection="col201304" shard="col201304s10"
instanceDir="/solr/col201304/col201304s10sh01" name="col201304s10sh01"
dataDir="/solr/col201304/col201304s10sh01/data"/>
</cores>
</solr>

Thanks
Chris

Re: Solrcloud 4.1 Collection with multiple slices only use

Posted by Chris R <co...@gmail.com>.
Interesting, I saw some comments about numshards, but it wasnt ever
specific enough to catch.my attention.  I will give it a try tomorrow.
Thanks.
On Mar 25, 2013 11:35 PM, "Mark Miller" <ma...@gmail.com> wrote:

> I'm guessing you didn't specify numShards. Things changed in 4.1 - if you
> don't specify numShards it goes into a mode where it's up to you to
> distribute updates.
>
> - Mark
>
> On Mar 25, 2013, at 10:29 PM, Chris R <co...@gmail.com> wrote:
>
> > I have two issues and I'm unsure if they are related:
> >
> > Problem:  After setting up a multiple collection Solrcloud 4.1 instance
> on
> > seven servers, when I index the documents they aren't distributed across
> > the index slices.  It feels as though, I don't actually have a "cloud"
> > implementation, yet everything I see in the admin interface and zookeeper
> > implies I do.  I feel as I'm overlooking something obvious, but have not
> > been able to figure out what.
> >
> > Configuration: Seven servers and four collections, each with 12 slices
> (no
> > replica shards yet).  Zookeeper configured in a three node ensemble.
>  When
> > I send documents to Server1/Collection1 (which holds two slices of
> > collection1), all the documents show up in a single index shard (core).
> > Perhaps related, I have found it impossible to get Solr to recognize the
> > server names with anything but a literal host="servername" parameter in
> the
> > solr.xml.  hostname parameters, host files, network, dns, are all
> > configured correctly....
> >
> > I have a Solr 4.0 single collection set up similarly and it works just
> > fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
> > implementation with only the luceneMatchVersion changed to LUCENE_41.
> >
> > sample solr.xml from server1
> >
> > <?xml version="1.0" encoding="UTF-8" ?>
> > <solr persistent="true">
> > <cores adminPath="/admin/cores" hostPort="8080" host="server1"
> > shareSchema="true" zkClientTimeout="60000">
> > <core collection="col201301" shard="col201301s04"
> > instanceDir="/solr/col201301/col201301s04sh01" name="col201301s04sh01"
> > dataDir="/solr/col201301/col201301s04sh01/data"/>
> > <core collection="col201301" shard="col201301s11"
> > instanceDir="/solr/col201301/col201301s11sh01" name="col201301s11sh01"
> > dataDir="/solr/col201301/col201301s11sh01/data"/>
> > <core collection="col201302" shard="col201302s06"
> > instanceDir="/solr/col201302/col201302s06sh01" name="col201302s06sh01"
> > dataDir="/solr/col201302/col201302s06sh01/data"/>
> > <core collection="col201303" shard="col201303s01"
> > instanceDir="/solr/col201303/col201303s01sh01" name="col201303s01sh01"
> > dataDir="/solr/col201303/col201303s01sh01/data"/>
> > <core collection="col201303" shard="col201303s08"
> > instanceDir="/solr/col201303/col201303s08sh01" name="col201303s08sh01"
> > dataDir="/solr/col201303/col201303s08sh01/data"/>
> > <core collection="col201304" shard="col201304s03"
> > instanceDir="/solr/col201304/col201304s03sh01" name="col201304s03sh01"
> > dataDir="/solr/col201304/col201304s03sh01/data"/>
> > <core collection="col201304" shard="col201304s10"
> > instanceDir="/solr/col201304/col201304s10sh01" name="col201304s10sh01"
> > dataDir="/solr/col201304/col201304s10sh01/data"/>
> > </cores>
> > </solr>
> >
> > Thanks
> > Chris
>
>

Re: Solrcloud 4.1 Collection with multiple slices only use

Posted by Erick Erickson <er...@gmail.com>.
First, three documents isn't enough to really test. The formula for
assigning shards is to hash on the unique ID. It _is_ possible that
all three just happened to land on the same shard. If you index all 32
docs in the example dir and they're all on the same shard, we should
talk.

Second, a regular query to the cluster will always search all the
shards. Use &distrib=false on the URL to restrict the search to just
the node you fire the request at.....

Let us know if you index more docs and still see the problem.

Best
Erick

On Wed, Mar 27, 2013 at 9:39 AM, Chris R <co...@gmail.com> wrote:
> So - I must be missing something very basic here and I've gone back to the
> Wiki example.  After setting up the two shard example in the first tutorial
> and indexing the three example documents, look at the shards in the Admin
> UI.  The documents are stored in the index where the update with directed -
> they aren't distributed across both shards.
>
> Release notes state that the compositeId router is the default when using
> the numshards parameter?  I want an even distribution of documents based on
> ID across all shards.... suggestions on what I'm screwing up.
>
> Chris
>
> On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller <ma...@gmail.com> wrote:
>
>> I'm guessing you didn't specify numShards. Things changed in 4.1 - if you
>> don't specify numShards it goes into a mode where it's up to you to
>> distribute updates.
>>
>> - Mark
>>
>> On Mar 25, 2013, at 10:29 PM, Chris R <co...@gmail.com> wrote:
>>
>> > I have two issues and I'm unsure if they are related:
>> >
>> > Problem:  After setting up a multiple collection Solrcloud 4.1 instance
>> on
>> > seven servers, when I index the documents they aren't distributed across
>> > the index slices.  It feels as though, I don't actually have a "cloud"
>> > implementation, yet everything I see in the admin interface and zookeeper
>> > implies I do.  I feel as I'm overlooking something obvious, but have not
>> > been able to figure out what.
>> >
>> > Configuration: Seven servers and four collections, each with 12 slices
>> (no
>> > replica shards yet).  Zookeeper configured in a three node ensemble.
>>  When
>> > I send documents to Server1/Collection1 (which holds two slices of
>> > collection1), all the documents show up in a single index shard (core).
>> > Perhaps related, I have found it impossible to get Solr to recognize the
>> > server names with anything but a literal host="servername" parameter in
>> the
>> > solr.xml.  hostname parameters, host files, network, dns, are all
>> > configured correctly....
>> >
>> > I have a Solr 4.0 single collection set up similarly and it works just
>> > fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
>> > implementation with only the luceneMatchVersion changed to LUCENE_41.
>> >
>> > sample solr.xml from server1
>> >
>> > <?xml version="1.0" encoding="UTF-8" ?>
>> > <solr persistent="true">
>> > <cores adminPath="/admin/cores" hostPort="8080" host="server1"
>> > shareSchema="true" zkClientTimeout="60000">
>> > <core collection="col201301" shard="col201301s04"
>> > instanceDir="/solr/col201301/col201301s04sh01" name="col201301s04sh01"
>> > dataDir="/solr/col201301/col201301s04sh01/data"/>
>> > <core collection="col201301" shard="col201301s11"
>> > instanceDir="/solr/col201301/col201301s11sh01" name="col201301s11sh01"
>> > dataDir="/solr/col201301/col201301s11sh01/data"/>
>> > <core collection="col201302" shard="col201302s06"
>> > instanceDir="/solr/col201302/col201302s06sh01" name="col201302s06sh01"
>> > dataDir="/solr/col201302/col201302s06sh01/data"/>
>> > <core collection="col201303" shard="col201303s01"
>> > instanceDir="/solr/col201303/col201303s01sh01" name="col201303s01sh01"
>> > dataDir="/solr/col201303/col201303s01sh01/data"/>
>> > <core collection="col201303" shard="col201303s08"
>> > instanceDir="/solr/col201303/col201303s08sh01" name="col201303s08sh01"
>> > dataDir="/solr/col201303/col201303s08sh01/data"/>
>> > <core collection="col201304" shard="col201304s03"
>> > instanceDir="/solr/col201304/col201304s03sh01" name="col201304s03sh01"
>> > dataDir="/solr/col201304/col201304s03sh01/data"/>
>> > <core collection="col201304" shard="col201304s10"
>> > instanceDir="/solr/col201304/col201304s10sh01" name="col201304s10sh01"
>> > dataDir="/solr/col201304/col201304s10sh01/data"/>
>> > </cores>
>> > </solr>
>> >
>> > Thanks
>> > Chris
>>
>>

Re: Solrcloud 4.1 Collection with multiple slices only use

Posted by Chris R <co...@gmail.com>.
So - I must be missing something very basic here and I've gone back to the
Wiki example.  After setting up the two shard example in the first tutorial
and indexing the three example documents, look at the shards in the Admin
UI.  The documents are stored in the index where the update with directed -
they aren't distributed across both shards.

Release notes state that the compositeId router is the default when using
the numshards parameter?  I want an even distribution of documents based on
ID across all shards.... suggestions on what I'm screwing up.

Chris

On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller <ma...@gmail.com> wrote:

> I'm guessing you didn't specify numShards. Things changed in 4.1 - if you
> don't specify numShards it goes into a mode where it's up to you to
> distribute updates.
>
> - Mark
>
> On Mar 25, 2013, at 10:29 PM, Chris R <co...@gmail.com> wrote:
>
> > I have two issues and I'm unsure if they are related:
> >
> > Problem:  After setting up a multiple collection Solrcloud 4.1 instance
> on
> > seven servers, when I index the documents they aren't distributed across
> > the index slices.  It feels as though, I don't actually have a "cloud"
> > implementation, yet everything I see in the admin interface and zookeeper
> > implies I do.  I feel as I'm overlooking something obvious, but have not
> > been able to figure out what.
> >
> > Configuration: Seven servers and four collections, each with 12 slices
> (no
> > replica shards yet).  Zookeeper configured in a three node ensemble.
>  When
> > I send documents to Server1/Collection1 (which holds two slices of
> > collection1), all the documents show up in a single index shard (core).
> > Perhaps related, I have found it impossible to get Solr to recognize the
> > server names with anything but a literal host="servername" parameter in
> the
> > solr.xml.  hostname parameters, host files, network, dns, are all
> > configured correctly....
> >
> > I have a Solr 4.0 single collection set up similarly and it works just
> > fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
> > implementation with only the luceneMatchVersion changed to LUCENE_41.
> >
> > sample solr.xml from server1
> >
> > <?xml version="1.0" encoding="UTF-8" ?>
> > <solr persistent="true">
> > <cores adminPath="/admin/cores" hostPort="8080" host="server1"
> > shareSchema="true" zkClientTimeout="60000">
> > <core collection="col201301" shard="col201301s04"
> > instanceDir="/solr/col201301/col201301s04sh01" name="col201301s04sh01"
> > dataDir="/solr/col201301/col201301s04sh01/data"/>
> > <core collection="col201301" shard="col201301s11"
> > instanceDir="/solr/col201301/col201301s11sh01" name="col201301s11sh01"
> > dataDir="/solr/col201301/col201301s11sh01/data"/>
> > <core collection="col201302" shard="col201302s06"
> > instanceDir="/solr/col201302/col201302s06sh01" name="col201302s06sh01"
> > dataDir="/solr/col201302/col201302s06sh01/data"/>
> > <core collection="col201303" shard="col201303s01"
> > instanceDir="/solr/col201303/col201303s01sh01" name="col201303s01sh01"
> > dataDir="/solr/col201303/col201303s01sh01/data"/>
> > <core collection="col201303" shard="col201303s08"
> > instanceDir="/solr/col201303/col201303s08sh01" name="col201303s08sh01"
> > dataDir="/solr/col201303/col201303s08sh01/data"/>
> > <core collection="col201304" shard="col201304s03"
> > instanceDir="/solr/col201304/col201304s03sh01" name="col201304s03sh01"
> > dataDir="/solr/col201304/col201304s03sh01/data"/>
> > <core collection="col201304" shard="col201304s10"
> > instanceDir="/solr/col201304/col201304s10sh01" name="col201304s10sh01"
> > dataDir="/solr/col201304/col201304s10sh01/data"/>
> > </cores>
> > </solr>
> >
> > Thanks
> > Chris
>
>

Re: Solrcloud 4.1 Collection with multiple slices only use

Posted by Mark Miller <ma...@gmail.com>.
I'm guessing you didn't specify numShards. Things changed in 4.1 - if you don't specify numShards it goes into a mode where it's up to you to distribute updates.

- Mark

On Mar 25, 2013, at 10:29 PM, Chris R <co...@gmail.com> wrote:

> I have two issues and I'm unsure if they are related:
> 
> Problem:  After setting up a multiple collection Solrcloud 4.1 instance on
> seven servers, when I index the documents they aren't distributed across
> the index slices.  It feels as though, I don't actually have a "cloud"
> implementation, yet everything I see in the admin interface and zookeeper
> implies I do.  I feel as I'm overlooking something obvious, but have not
> been able to figure out what.
> 
> Configuration: Seven servers and four collections, each with 12 slices (no
> replica shards yet).  Zookeeper configured in a three node ensemble.  When
> I send documents to Server1/Collection1 (which holds two slices of
> collection1), all the documents show up in a single index shard (core).
> Perhaps related, I have found it impossible to get Solr to recognize the
> server names with anything but a literal host="servername" parameter in the
> solr.xml.  hostname parameters, host files, network, dns, are all
> configured correctly....
> 
> I have a Solr 4.0 single collection set up similarly and it works just
> fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
> implementation with only the luceneMatchVersion changed to LUCENE_41.
> 
> sample solr.xml from server1
> 
> <?xml version="1.0" encoding="UTF-8" ?>
> <solr persistent="true">
> <cores adminPath="/admin/cores" hostPort="8080" host="server1"
> shareSchema="true" zkClientTimeout="60000">
> <core collection="col201301" shard="col201301s04"
> instanceDir="/solr/col201301/col201301s04sh01" name="col201301s04sh01"
> dataDir="/solr/col201301/col201301s04sh01/data"/>
> <core collection="col201301" shard="col201301s11"
> instanceDir="/solr/col201301/col201301s11sh01" name="col201301s11sh01"
> dataDir="/solr/col201301/col201301s11sh01/data"/>
> <core collection="col201302" shard="col201302s06"
> instanceDir="/solr/col201302/col201302s06sh01" name="col201302s06sh01"
> dataDir="/solr/col201302/col201302s06sh01/data"/>
> <core collection="col201303" shard="col201303s01"
> instanceDir="/solr/col201303/col201303s01sh01" name="col201303s01sh01"
> dataDir="/solr/col201303/col201303s01sh01/data"/>
> <core collection="col201303" shard="col201303s08"
> instanceDir="/solr/col201303/col201303s08sh01" name="col201303s08sh01"
> dataDir="/solr/col201303/col201303s08sh01/data"/>
> <core collection="col201304" shard="col201304s03"
> instanceDir="/solr/col201304/col201304s03sh01" name="col201304s03sh01"
> dataDir="/solr/col201304/col201304s03sh01/data"/>
> <core collection="col201304" shard="col201304s10"
> instanceDir="/solr/col201304/col201304s10sh01" name="col201304s10sh01"
> dataDir="/solr/col201304/col201304s10sh01/data"/>
> </cores>
> </solr>
> 
> Thanks
> Chris