You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pulkit Singhal <pu...@gmail.com> on 2011/09/09 18:38:42 UTC

Re: SolrCloud Feedback

Hello Jan,

You've made a very good point in (b). I would be happy to make the
edit to the wiki if I understood your explanation completely.

When you say that it is "looking up what collection that core is part
of" ... I'm curious how a core is being put under a particular
collection in the first place? And what that collection is named?
Obviously you've made it clear that colelction1 is really the name of
the core itself. And where this association is being stored for the
code to look it up?

If not Jan, then perhaps the gurus who wrote Solr Cloud could answer :)

Thanks!
- Pulkit

On Thu, Feb 10, 2011 at 9:10 AM, Jan Høydahl <ja...@cominvent.com> wrote:
> Hi,
>
> I have so far just tested the examples and got a N by M cluster running. My feedback:
>
> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there.
>
> b) The "collection" terminology is too much confused with "core", and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that "collection1" in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named "collection1", looking up what collection that core is part of and distributing the query to all shards in that collection.
>
> c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only
>
> d) How are admins supposed to update configs in ZK? Install their favourite ZK editor?
>
> e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the "schema" and "solrconfig" links assume local file. This requires decent tool support to make ZK interaction intuitive, such as "import" and "export" commands.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>
>> Hello Users,
>>
>> About a little over a year ago, a few of us started working on what we called SolrCloud.
>>
>> This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.
>>
>> The next step is the indexing side. And we plan on starting to tackle that sometime soon.
>>
>> But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.
>>
>> I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!
>>
>> I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?
>>
>> If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.
>>
>> - Mark Miller
>> -- lucidimagination.com
>
>

Re: SolrCloud Feedback

Posted by Mark Miller <ma...@gmail.com>.
On Sep 9, 2011, at 1:09 PM, Pulkit Singhal wrote:

> I think I understand it a bit better now but wouldn't mind some validation.
> 
> 1) solr.xml does not become part of ZooKeeper

Right - currently it does not. Info is put there to tell Solr how to connect to zookeeper and register the cores.

> 2) The default looks like this out-of-box:
>  <cores adminPath="/admin/cores" defaultCoreName="collection1">
>    <core name="collection1" instanceDir="." shard="shard1"/>
>  </cores>
> so that may leave one wondering where the core's association to a
> collection name is made?
> 
> It can be made like so:
> a) statically in a file:
> <core name="collection1" instanceDir="." shard="shard1" collection="myconf" />
> b) at start time via java:
> java ... -Dcollection.configName=myconf ... -jar start.jar

These are two different things. First, just to make the bootstrap case simple, if you don't specify a collection name, it defaults to the SolrCore name. That is why we make a default SolrCore name of collection1. In the simple wiki SolrCloud example, you can avoid naming the collection on each shard and simply have things come up under collection1 by default.

a) shows how to override using the SolrCore name for the collection name.

b) shows how to set the configuration set name for the config files that you upload with -Dbootstrap_confdir=. If you specify nothing for collection.configName, it defaults to configuration1.

> 
> And I'm guessing that since the core's name ("collection1") for shard1
> has already been associated with -Dcollection.configname=myconf in
> http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster
> once already, adding an additional shard2 with the same core name
> ("collection1"), automatically throws it in with the collection name
> ("myconf") without any need to specify anything at startup via -D or
> statically in solr.xml file.

"myconf" is not the collection name - it's the name of a collection of configuration files. If only one such set exists, you don't have to specify which to use (which you would do by changing the value at a given node in the zookeeper layout). If you wanted multiple named collection file sets, you would have to explicitly set each collection -> name configuration file set.

> 
> Validate away otherwise I'll just accept any hate mail after making
> edits to the Solr wiki directly.
> 
> - Pulkit
> 
> On Fri, Sep 9, 2011 at 11:38 AM, Pulkit Singhal <pu...@gmail.com> wrote:
>> Hello Jan,
>> 
>> You've made a very good point in (b). I would be happy to make the
>> edit to the wiki if I understood your explanation completely.
>> 
>> When you say that it is "looking up what collection that core is part
>> of" ... I'm curious how a core is being put under a particular
>> collection in the first place? And what that collection is named?
>> Obviously you've made it clear that colelction1 is really the name of
>> the core itself. And where this association is being stored for the
>> code to look it up?
>> 
>> If not Jan, then perhaps the gurus who wrote Solr Cloud could answer :)
>> 
>> Thanks!
>> - Pulkit
>> 
>> On Thu, Feb 10, 2011 at 9:10 AM, Jan Høydahl <ja...@cominvent.com> wrote:
>>> Hi,
>>> 
>>> I have so far just tested the examples and got a N by M cluster running. My feedback:
>>> 
>>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there.
>>> 
>>> b) The "collection" terminology is too much confused with "core", and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that "collection1" in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named "collection1", looking up what collection that core is part of and distributing the query to all shards in that collection.
>>> 
>>> c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only
>>> 
>>> d) How are admins supposed to update configs in ZK? Install their favourite ZK editor?
>>> 
>>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the "schema" and "solrconfig" links assume local file. This requires decent tool support to make ZK interaction intuitive, such as "import" and "export" commands.
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>>> 
>>>> Hello Users,
>>>> 
>>>> About a little over a year ago, a few of us started working on what we called SolrCloud.
>>>> 
>>>> This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.
>>>> 
>>>> The next step is the indexing side. And we plan on starting to tackle that sometime soon.
>>>> 
>>>> But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.
>>>> 
>>>> I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!
>>>> 
>>>> I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?
>>>> 
>>>> If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.
>>>> 
>>>> - Mark Miller
>>>> -- lucidimagination.com
>>> 
>>> 
>> 

- Mark Miller
lucidimagination.com
2011.lucene-eurocon.org | Oct 17-20 | Barcelona











Re: SolrCloud Feedback

Posted by Pulkit Singhal <pu...@gmail.com>.
I think I understand it a bit better now but wouldn't mind some validation.

1) solr.xml does not become part of ZooKeeper
2) The default looks like this out-of-box:
  <cores adminPath="/admin/cores" defaultCoreName="collection1">
    <core name="collection1" instanceDir="." shard="shard1"/>
  </cores>
so that may leave one wondering where the core's association to a
collection name is made?

It can be made like so:
a) statically in a file:
<core name="collection1" instanceDir="." shard="shard1" collection="myconf" />
b) at start time via java:
java ... -Dcollection.configName=myconf ... -jar start.jar

And I'm guessing that since the core's name ("collection1") for shard1
has already been associated with -Dcollection.configname=myconf in
http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster
once already, adding an additional shard2 with the same core name
("collection1"), automatically throws it in with the collection name
("myconf") without any need to specify anything at startup via -D or
statically in solr.xml file.

Validate away otherwise I'll just accept any hate mail after making
edits to the Solr wiki directly.

- Pulkit

On Fri, Sep 9, 2011 at 11:38 AM, Pulkit Singhal <pu...@gmail.com> wrote:
> Hello Jan,
>
> You've made a very good point in (b). I would be happy to make the
> edit to the wiki if I understood your explanation completely.
>
> When you say that it is "looking up what collection that core is part
> of" ... I'm curious how a core is being put under a particular
> collection in the first place? And what that collection is named?
> Obviously you've made it clear that colelction1 is really the name of
> the core itself. And where this association is being stored for the
> code to look it up?
>
> If not Jan, then perhaps the gurus who wrote Solr Cloud could answer :)
>
> Thanks!
> - Pulkit
>
> On Thu, Feb 10, 2011 at 9:10 AM, Jan Høydahl <ja...@cominvent.com> wrote:
>> Hi,
>>
>> I have so far just tested the examples and got a N by M cluster running. My feedback:
>>
>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there.
>>
>> b) The "collection" terminology is too much confused with "core", and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that "collection1" in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named "collection1", looking up what collection that core is part of and distributing the query to all shards in that collection.
>>
>> c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only
>>
>> d) How are admins supposed to update configs in ZK? Install their favourite ZK editor?
>>
>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the "schema" and "solrconfig" links assume local file. This requires decent tool support to make ZK interaction intuitive, such as "import" and "export" commands.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>>
>>> Hello Users,
>>>
>>> About a little over a year ago, a few of us started working on what we called SolrCloud.
>>>
>>> This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.
>>>
>>> The next step is the indexing side. And we plan on starting to tackle that sometime soon.
>>>
>>> But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.
>>>
>>> I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!
>>>
>>> I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?
>>>
>>> If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.
>>>
>>> - Mark Miller
>>> -- lucidimagination.com
>>
>>
>