You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2011/01/19 21:07:09 UTC

SolrCloud Feedback

Hello Users,

About a little over a year ago, a few of us started working on what we called SolrCloud.

This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.

The next step is the indexing side. And we plan on starting to tackle that sometime soon.

But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.

I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!

I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?

If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.

- Mark Miller
-- lucidimagination.com

Re: SolrCloud Feedback

Posted by Mark Miller <ma...@gmail.com>.

On Sep 9, 2011, at 1:09 PM, Pulkit Singhal wrote:

> I think I understand it a bit better now but wouldn't mind some validation.
> 
> 1) solr.xml does not become part of ZooKeeper

Right - currently it does not. Info is put there to tell Solr how to connect to zookeeper and register the cores.

> 2) The default looks like this out-of-box:
>  <cores adminPath="/admin/cores" defaultCoreName="collection1">
>    <core name="collection1" instanceDir="." shard="shard1"/>
>  </cores>
> so that may leave one wondering where the core's association to a
> collection name is made?
> 
> It can be made like so:
> a) statically in a file:
> <core name="collection1" instanceDir="." shard="shard1" collection="myconf" />
> b) at start time via java:
> java ... -Dcollection.configName=myconf ... -jar start.jar

These are two different things. First, just to make the bootstrap case simple, if you don't specify a collection name, it defaults to the SolrCore name. That is why we make a default SolrCore name of collection1. In the simple wiki SolrCloud example, you can avoid naming the collection on each shard and simply have things come up under collection1 by default.

a) shows how to override using the SolrCore name for the collection name.

b) shows how to set the configuration set name for the config files that you upload with -Dbootstrap_confdir=. If you specify nothing for collection.configName, it defaults to configuration1.

> 
> And I'm guessing that since the core's name ("collection1") for shard1
> has already been associated with -Dcollection.configname=myconf in
> http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster
> once already, adding an additional shard2 with the same core name
> ("collection1"), automatically throws it in with the collection name
> ("myconf") without any need to specify anything at startup via -D or
> statically in solr.xml file.

"myconf" is not the collection name - it's the name of a collection of configuration files. If only one such set exists, you don't have to specify which to use (which you would do by changing the value at a given node in the zookeeper layout). If you wanted multiple named collection file sets, you would have to explicitly set each collection -> name configuration file set.

> 
> Validate away otherwise I'll just accept any hate mail after making
> edits to the Solr wiki directly.
> 
> - Pulkit
> 
> On Fri, Sep 9, 2011 at 11:38 AM, Pulkit Singhal <pu...@gmail.com> wrote:
>> Hello Jan,
>> 
>> You've made a very good point in (b). I would be happy to make the
>> edit to the wiki if I understood your explanation completely.
>> 
>> When you say that it is "looking up what collection that core is part
>> of" ... I'm curious how a core is being put under a particular
>> collection in the first place? And what that collection is named?
>> Obviously you've made it clear that colelction1 is really the name of
>> the core itself. And where this association is being stored for the
>> code to look it up?
>> 
>> If not Jan, then perhaps the gurus who wrote Solr Cloud could answer :)
>> 
>> Thanks!
>> - Pulkit
>> 
>> On Thu, Feb 10, 2011 at 9:10 AM, Jan Høydahl <ja...@cominvent.com> wrote:
>>> Hi,
>>> 
>>> I have so far just tested the examples and got a N by M cluster running. My feedback:
>>> 
>>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there.
>>> 
>>> b) The "collection" terminology is too much confused with "core", and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that "collection1" in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named "collection1", looking up what collection that core is part of and distributing the query to all shards in that collection.
>>> 
>>> c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only
>>> 
>>> d) How are admins supposed to update configs in ZK? Install their favourite ZK editor?
>>> 
>>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the "schema" and "solrconfig" links assume local file. This requires decent tool support to make ZK interaction intuitive, such as "import" and "export" commands.
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>>> 
>>>> Hello Users,
>>>> 
>>>> About a little over a year ago, a few of us started working on what we called SolrCloud.
>>>> 
>>>> This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.
>>>> 
>>>> The next step is the indexing side. And we plan on starting to tackle that sometime soon.
>>>> 
>>>> But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.
>>>> 
>>>> I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!
>>>> 
>>>> I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?
>>>> 
>>>> If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.
>>>> 
>>>> - Mark Miller
>>>> -- lucidimagination.com
>>> 
>>> 
>> 

- Mark Miller
lucidimagination.com
2011.lucene-eurocon.org | Oct 17-20 | Barcelona

Re: SolrCloud Feedback

Posted by Pulkit Singhal <pu...@gmail.com>.

I think I understand it a bit better now but wouldn't mind some validation.

1) solr.xml does not become part of ZooKeeper
2) The default looks like this out-of-box:
  <cores adminPath="/admin/cores" defaultCoreName="collection1">
    <core name="collection1" instanceDir="." shard="shard1"/>
  </cores>
so that may leave one wondering where the core's association to a
collection name is made?

It can be made like so:
a) statically in a file:
<core name="collection1" instanceDir="." shard="shard1" collection="myconf" />
b) at start time via java:
java ... -Dcollection.configName=myconf ... -jar start.jar

And I'm guessing that since the core's name ("collection1") for shard1
has already been associated with -Dcollection.configname=myconf in
http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster
once already, adding an additional shard2 with the same core name
("collection1"), automatically throws it in with the collection name
("myconf") without any need to specify anything at startup via -D or
statically in solr.xml file.

Validate away otherwise I'll just accept any hate mail after making
edits to the Solr wiki directly.

- Pulkit

On Fri, Sep 9, 2011 at 11:38 AM, Pulkit Singhal <pu...@gmail.com> wrote:
> Hello Jan,
>
> You've made a very good point in (b). I would be happy to make the
> edit to the wiki if I understood your explanation completely.
>
> When you say that it is "looking up what collection that core is part
> of" ... I'm curious how a core is being put under a particular
> collection in the first place? And what that collection is named?
> Obviously you've made it clear that colelction1 is really the name of
> the core itself. And where this association is being stored for the
> code to look it up?
>
> If not Jan, then perhaps the gurus who wrote Solr Cloud could answer :)
>
> Thanks!
> - Pulkit
>
> On Thu, Feb 10, 2011 at 9:10 AM, Jan Høydahl <ja...@cominvent.com> wrote:
>> Hi,
>>
>> I have so far just tested the examples and got a N by M cluster running. My feedback:
>>
>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there.
>>
>> b) The "collection" terminology is too much confused with "core", and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that "collection1" in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named "collection1", looking up what collection that core is part of and distributing the query to all shards in that collection.
>>
>> c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only
>>
>> d) How are admins supposed to update configs in ZK? Install their favourite ZK editor?
>>
>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the "schema" and "solrconfig" links assume local file. This requires decent tool support to make ZK interaction intuitive, such as "import" and "export" commands.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>>
>>> Hello Users,
>>>
>>> About a little over a year ago, a few of us started working on what we called SolrCloud.
>>>
>>> This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.
>>>
>>> The next step is the indexing side. And we plan on starting to tackle that sometime soon.
>>>
>>> But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.
>>>
>>> I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!
>>>
>>> I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?
>>>
>>> If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.
>>>
>>> - Mark Miller
>>> -- lucidimagination.com
>>
>>
>

Re: SolrCloud Feedback

Posted by Pulkit Singhal <pu...@gmail.com>.

Hello Jan,

You've made a very good point in (b). I would be happy to make the
edit to the wiki if I understood your explanation completely.

When you say that it is "looking up what collection that core is part
of" ... I'm curious how a core is being put under a particular
collection in the first place? And what that collection is named?
Obviously you've made it clear that colelction1 is really the name of
the core itself. And where this association is being stored for the
code to look it up?

If not Jan, then perhaps the gurus who wrote Solr Cloud could answer :)

Thanks!
- Pulkit

On Thu, Feb 10, 2011 at 9:10 AM, Jan Høydahl <ja...@cominvent.com> wrote:
> Hi,
>
> I have so far just tested the examples and got a N by M cluster running. My feedback:
>
> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there.
>
> b) The "collection" terminology is too much confused with "core", and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that "collection1" in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named "collection1", looking up what collection that core is part of and distributing the query to all shards in that collection.
>
> c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only
>
> d) How are admins supposed to update configs in ZK? Install their favourite ZK editor?
>
> e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the "schema" and "solrconfig" links assume local file. This requires decent tool support to make ZK interaction intuitive, such as "import" and "export" commands.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>
>> Hello Users,
>>
>> About a little over a year ago, a few of us started working on what we called SolrCloud.
>>
>> This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.
>>
>> The next step is the indexing side. And we plan on starting to tackle that sometime soon.
>>
>> But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.
>>
>> I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!
>>
>> I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?
>>
>> If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.
>>
>> - Mark Miller
>> -- lucidimagination.com
>
>

Re: SolrCloud Feedback

Posted by Lance Norskog <go...@gmail.com>.

Replication's polling technique does not scale to massively multicore
environments. What is the official answer for this problem?  "Use ZK
and cloud?"

On Sat, Jun 11, 2011 at 7:11 AM, Mark Miller <ma...@gmail.com> wrote:
> Jan, I feel terrible for leaving you hanging on this - I missed this email entirely. Seems some of these should be made JIRA issues if they are not already?
>
> bq. j) Question: Is ReplicationHandler ZK-aware yet?
>
> As I think you now know, not yet ;)
>
> - Mark
>
> On Feb 14, 2011, at 4:40 PM, Jan Høydahl wrote:
>
>> Some more comments:
>>
>> f) For consistency, the JAVA OPTIONS should all be prefixed with solr.* even if they are related to embedded ZK
>>   -Dsolr.hostPort=8900 -Dsolr.zkRun -Dsolr.zkHost=localhost:9900 -Dsolr.zkBootstrap_confdir=./solr/conf
>>
>> g) I often share parts of my config between cores, e.g. a common schema.xml or synonyms.xml
>>   In the file-based mode I can thus use ../../common_conf/synonyms.xml or similar.
>>   I have not tried to bootstrap such a config into ZK but I assume it will not work
>>   ZK mode should support such a use case either by supporting notations like ".."
>>   or by allowing an explicit zk name space: zk://configs/common-cfg/synonyms.xml
>>
>> h) Support for dev / test / prod environments
>>   In real life you want to develop in one environment, test in another and run production in a third
>>   Thus, the ZK data structure should have a clear separation between logical feature configuration and
>>   physical deployment config.
>>
>>   Perhaps a new level above /COLLECTIONS could be used to model this, e.g.
>>   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardA/prod01.server.com:8080
>>   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardB/prod02.server.com:8080
>>   /ENV/PROD/COLLECTIONS/FILES/SHARDS/shardA/prod03.server.com:8080
>>   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardA/test01.server.com:8080
>>   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardB/test01.server.com:9090
>>   /ENV/TEST/COLLECTIONS/FILES[@configName=TESTFILES]/SHARDS/shardA/test01.server.com:7070
>>
>>   When starting solr we may specify environment: -Dsolr.env=TEST (or configure a default)
>>   The main benefit is that we can maintain and store one single ZK config in our SCM,
>>   distribute the same configs to all servers, and if you like, point all envs to the same ZK ensemble.
>>
>>   In the future, we can use this for automatic install of a new node as well:
>>   By simply adding a ZK entry on the right place, the node can discover "who it is" from ZK.
>>
>> i) Ideally, no config inside conf should contain host names.
>>   My DIH config will most likely include server names, which will be different between TEST and PROD
>>   This could be solved as above, by letting the collection in TEST use another configName than PROD,
>>   but for some use cases, it might be more elegant to swap out a hardcoded string with a ZK node
>>   in a generic way, such as jdbcString="my-hardcoded-string" to jdbcString="${zk://ENV/PROD/jdbcstrA}"
>>
>> j) Question: Is ReplicationHandler ZK-aware yet?
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> On 10. feb. 2011, at 16.10, Jan Høydahl wrote:
>>
>>> Hi,
>>>
>>> I have so far just tested the examples and got a N by M cluster running. My feedback:
>>>
>>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there.
>>>
>>> b) The "collection" terminology is too much confused with "core", and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that "collection1" in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named "collection1", looking up what collection that core is part of and distributing the query to all shards in that collection.
>>>
>>> c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only
>>>
>>> d) How are admins supposed to update configs in ZK? Install their favourite ZK editor?
>>>
>>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the "schema" and "solrconfig" links assume local file. This requires decent tool support to make ZK interaction intuitive, such as "import" and "export" commands.
>>>
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>>
>>> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>>>
>>>> Hello Users,
>>>>
>>>> About a little over a year ago, a few of us started working on what we called SolrCloud.
>>>>
>>>> This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.
>>>>
>>>> The next step is the indexing side. And we plan on starting to tackle that sometime soon.
>>>>
>>>> But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.
>>>>
>>>> I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!
>>>>
>>>> I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?
>>>>
>>>> If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.
>>>>
>>>> - Mark Miller
>>>> -- lucidimagination.com
>>>
>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: SolrCloud Feedback

Posted by Mark Miller <ma...@gmail.com>.

Jan, I feel terrible for leaving you hanging on this - I missed this email entirely. Seems some of these should be made JIRA issues if they are not already?

bq. j) Question: Is ReplicationHandler ZK-aware yet?

As I think you now know, not yet ;)

- Mark

On Feb 14, 2011, at 4:40 PM, Jan Høydahl wrote:

> Some more comments:
> 
> f) For consistency, the JAVA OPTIONS should all be prefixed with solr.* even if they are related to embedded ZK
>   -Dsolr.hostPort=8900 -Dsolr.zkRun -Dsolr.zkHost=localhost:9900 -Dsolr.zkBootstrap_confdir=./solr/conf
> 
> g) I often share parts of my config between cores, e.g. a common schema.xml or synonyms.xml
>   In the file-based mode I can thus use ../../common_conf/synonyms.xml or similar.
>   I have not tried to bootstrap such a config into ZK but I assume it will not work
>   ZK mode should support such a use case either by supporting notations like ".."
>   or by allowing an explicit zk name space: zk://configs/common-cfg/synonyms.xml
> 
> h) Support for dev / test / prod environments
>   In real life you want to develop in one environment, test in another and run production in a third
>   Thus, the ZK data structure should have a clear separation between logical feature configuration and
>   physical deployment config.
> 
>   Perhaps a new level above /COLLECTIONS could be used to model this, e.g.
>   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardA/prod01.server.com:8080
>   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardB/prod02.server.com:8080
>   /ENV/PROD/COLLECTIONS/FILES/SHARDS/shardA/prod03.server.com:8080
>   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardA/test01.server.com:8080
>   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardB/test01.server.com:9090
>   /ENV/TEST/COLLECTIONS/FILES[@configName=TESTFILES]/SHARDS/shardA/test01.server.com:7070
> 
>   When starting solr we may specify environment: -Dsolr.env=TEST (or configure a default)
>   The main benefit is that we can maintain and store one single ZK config in our SCM,
>   distribute the same configs to all servers, and if you like, point all envs to the same ZK ensemble.
> 
>   In the future, we can use this for automatic install of a new node as well:
>   By simply adding a ZK entry on the right place, the node can discover "who it is" from ZK.
> 
> i) Ideally, no config inside conf should contain host names.
>   My DIH config will most likely include server names, which will be different between TEST and PROD
>   This could be solved as above, by letting the collection in TEST use another configName than PROD,
>   but for some use cases, it might be more elegant to swap out a hardcoded string with a ZK node 
>   in a generic way, such as jdbcString="my-hardcoded-string" to jdbcString="${zk://ENV/PROD/jdbcstrA}"
> 
> j) Question: Is ReplicationHandler ZK-aware yet?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 10. feb. 2011, at 16.10, Jan Høydahl wrote:
> 
>> Hi,
>> 
>> I have so far just tested the examples and got a N by M cluster running. My feedback:
>> 
>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there.
>> 
>> b) The "collection" terminology is too much confused with "core", and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that "collection1" in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named "collection1", looking up what collection that core is part of and distributing the query to all shards in that collection.
>> 
>> c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only
>> 
>> d) How are admins supposed to update configs in ZK? Install their favourite ZK editor?
>> 
>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the "schema" and "solrconfig" links assume local file. This requires decent tool support to make ZK interaction intuitive, such as "import" and "export" commands.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>> 
>>> Hello Users,
>>> 
>>> About a little over a year ago, a few of us started working on what we called SolrCloud.
>>> 
>>> This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.
>>> 
>>> The next step is the indexing side. And we plan on starting to tackle that sometime soon.
>>> 
>>> But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.
>>> 
>>> I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!
>>> 
>>> I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?
>>> 
>>> If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.
>>> 
>>> - Mark Miller
>>> -- lucidimagination.com
>> 
> 

- Mark Miller
lucidimagination.com

Re: SolrCloud Feedback

Posted by Jan Høydahl <ja...@cominvent.com>.

Some more comments:

f) For consistency, the JAVA OPTIONS should all be prefixed with solr.* even if they are related to embedded ZK
   -Dsolr.hostPort=8900 -Dsolr.zkRun -Dsolr.zkHost=localhost:9900 -Dsolr.zkBootstrap_confdir=./solr/conf

g) I often share parts of my config between cores, e.g. a common schema.xml or synonyms.xml
   In the file-based mode I can thus use ../../common_conf/synonyms.xml or similar.
   I have not tried to bootstrap such a config into ZK but I assume it will not work
   ZK mode should support such a use case either by supporting notations like ".."
   or by allowing an explicit zk name space: zk://configs/common-cfg/synonyms.xml

h) Support for dev / test / prod environments
   In real life you want to develop in one environment, test in another and run production in a third
   Thus, the ZK data structure should have a clear separation between logical feature configuration and
   physical deployment config.

   Perhaps a new level above /COLLECTIONS could be used to model this, e.g.
   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardA/prod01.server.com:8080
   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardB/prod02.server.com:8080
   /ENV/PROD/COLLECTIONS/FILES/SHARDS/shardA/prod03.server.com:8080
   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardA/test01.server.com:8080
   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardB/test01.server.com:9090
   /ENV/TEST/COLLECTIONS/FILES[@configName=TESTFILES]/SHARDS/shardA/test01.server.com:7070

   When starting solr we may specify environment: -Dsolr.env=TEST (or configure a default)
   The main benefit is that we can maintain and store one single ZK config in our SCM,
   distribute the same configs to all servers, and if you like, point all envs to the same ZK ensemble.

   In the future, we can use this for automatic install of a new node as well:
   By simply adding a ZK entry on the right place, the node can discover "who it is" from ZK.

i) Ideally, no config inside conf should contain host names.
   My DIH config will most likely include server names, which will be different between TEST and PROD
   This could be solved as above, by letting the collection in TEST use another configName than PROD,
   but for some use cases, it might be more elegant to swap out a hardcoded string with a ZK node 
   in a generic way, such as jdbcString="my-hardcoded-string" to jdbcString="${zk://ENV/PROD/jdbcstrA}"

j) Question: Is ReplicationHandler ZK-aware yet?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 10. feb. 2011, at 16.10, Jan Høydahl wrote:

> Hi,
> 
> I have so far just tested the examples and got a N by M cluster running. My feedback:
> 
> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there.
> 
> b) The "collection" terminology is too much confused with "core", and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that "collection1" in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named "collection1", looking up what collection that core is part of and distributing the query to all shards in that collection.
> 
> c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only
> 
> d) How are admins supposed to update configs in ZK? Install their favourite ZK editor?
> 
> e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the "schema" and "solrconfig" links assume local file. This requires decent tool support to make ZK interaction intuitive, such as "import" and "export" commands.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 19. jan. 2011, at 21.07, Mark Miller wrote:
> 
>> Hello Users,
>> 
>> About a little over a year ago, a few of us started working on what we called SolrCloud.
>> 
>> This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.
>> 
>> The next step is the indexing side. And we plan on starting to tackle that sometime soon.
>> 
>> But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.
>> 
>> I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!
>> 
>> I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?
>> 
>> If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.
>> 
>> - Mark Miller
>> -- lucidimagination.com
>

Re: SolrCloud Feedback

Posted by Jan Høydahl <ja...@cominvent.com>.

Hi,

I have so far just tested the examples and got a N by M cluster running. My feedback:

a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there.

b) The "collection" terminology is too much confused with "core", and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that "collection1" in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named "collection1", looking up what collection that core is part of and distributing the query to all shards in that collection.

c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only

d) How are admins supposed to update configs in ZK? Install their favourite ZK editor?

e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the "schema" and "solrconfig" links assume local file. This requires decent tool support to make ZK interaction intuitive, such as "import" and "export" commands.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 19. jan. 2011, at 21.07, Mark Miller wrote:

> Hello Users,
> 
> About a little over a year ago, a few of us started working on what we called SolrCloud.
> 
> This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit.
> 
> The next step is the indexing side. And we plan on starting to tackle that sometime soon.
> 
> But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production.
> 
> I would love to have your help in targeting what we now try and improve. Any suggestions or feedback? If you have sent this before, I/others likely missed it - send it again!
> 
> I know anyone that has used SolrCloud has some feedback. I know it because I've used it too ;) It's too complicated to setup still. There are still plenty of pain points. We accepted some compromise trying to fit into what Solr was, and not wanting to dig in too far before feeling things out and letting users try things out a bit. Thinking that we might be able to adjust Solr to be more in favor of SolrCloud as we go, what is the ideal state of the work we have currently done?
> 
> If anyone using SolrCloud helps with the feedback, I'll help with the coding effort.
> 
> - Mark Miller
> -- lucidimagination.com

Re: SolrCloud Feedback

Posted by Mark Miller <ma...@gmail.com>.

Well yikes - I dropped the ball here...

Apologies...my time to keep up with the user list was out the window for a bit!

!*@#&!

- Mark

On Feb 10, 2011, at 5:45 PM, thorsten wrote:

> 
> Hi Mark, hi all,
> 
> I just got a customer request to conduct an analysis on the state of
> SolrCloud. 
> 
> He wants to see SolrCloud part of the next solr 1.5 release and is willing
> to sponsor our dev time to close outstanding bugs and open issues that may
> prevent the inclusion of SolrCloud in the next release. I need to give him a
> listing of issues and an estimation how long it will take us to fix them.
> 
> I did
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+SOLR+AND+(summary+~+cloud+OR+description+~+cloud+OR+comment+~+cloud)+AND+resolution+%3D+Unresolved
> which returns me 8 bug. Do you consider this a comprehensive list of open
> issues or are there missing some important ones in this list?
> 
> I read http://wiki.apache.org/solr/SolrCloud and it is talking about a
> branch of its own however when I review
> https://issues.apache.org/jira/browse/SOLR-1873 I get the impression that
> the work is already merged back into trunk, right?
> 
> So what is the best to start testing the branch or trunk?
> 
> TIA for any informations
> 
> salu2
> -- 
> Thorsten Scherler <thorsten.at.apache.org>
> codeBusters S.L. - web based systems
> <consulting, training and solutions>
> http://www.codebusters.es/
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Feedback-tp2290048p2467091.html
> Sent from the Solr - User mailing list archive at Nabble.com.

- Mark Miller
lucidimagination.com

Re: SolrCloud Feedback

Posted by thorsten <th...@apache.org>.

Hi Mark, hi all,

I just got a customer request to conduct an analysis on the state of
SolrCloud. 

He wants to see SolrCloud part of the next solr 1.5 release and is willing
to sponsor our dev time to close outstanding bugs and open issues that may
prevent the inclusion of SolrCloud in the next release. I need to give him a
listing of issues and an estimation how long it will take us to fix them.

I did
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+SOLR+AND+(summary+~+cloud+OR+description+~+cloud+OR+comment+~+cloud)+AND+resolution+%3D+Unresolved
which returns me 8 bug. Do you consider this a comprehensive list of open
issues or are there missing some important ones in this list?

I read http://wiki.apache.org/solr/SolrCloud and it is talking about a
branch of its own however when I review
https://issues.apache.org/jira/browse/SOLR-1873 I get the impression that
the work is already merged back into trunk, right?

So what is the best to start testing the branch or trunk?

TIA for any informations

salu2
-- 
Thorsten Scherler <thorsten.at.apache.org>
codeBusters S.L. - web based systems
<consulting, training and solutions>
http://www.codebusters.es/
-- 
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Feedback-tp2290048p2467091.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud Feedback

Posted by Mark Miller <ma...@gmail.com>.

On Jan 20, 2011, at 12:49 AM, Grijesh wrote:

> 
> Hi Mark,
> 
> I was just working on SolrCloud for my R&D and I got a question in my Mind.
> Since in SolrCloud the configuration files are being shared on all Cloud
> instances and If I have different configuration files for different cores
> then how can I manage it by my Zookeeper managed SolrCloud.
> 
> -----
> Thanx:
> Grijesh

You can create as many configuration sets as you want - then you just set on the collection zk node which set of config files should be used.

On the Cloud wiki you will see:

-Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf

That will upload the config files in ./solr/conf to a config set named myconf. When there is only one config set, that is what will be used - but when there is more than one, you can set which config set to use on the collection node.

- Mark Miller
lucidimagination.com

Re: SolrCloud Feedback

Posted by Grijesh <pi...@gmail.com>.

Hi Mark,

I was just working on SolrCloud for my R&D and I got a question in my Mind.
Since in SolrCloud the configuration files are being shared on all Cloud
instances and If I have different configuration files for different cores
then how can I manage it by my Zookeeper managed SolrCloud.

-----
Thanx:
Grijesh
-- 
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Feedback-tp2290048p2292933.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud Feedback

Posted by Sean Bigdatafun <se...@gmail.com>.

Could you please give a pointer to the SolrCloud architecture?

Could you please give a comprehensive explanation between it and Katta?
     * targetted app difference?
     * scalability difference?
     * flexibility difference and so on

Thanks,
Sean

On Wed, Jan 19, 2011 at 12:07 PM, Mark Miller <ma...@gmail.com> wrote:

> Hello Users,
>
> About a little over a year ago, a few of us started working on what we
> called SolrCloud.
>
> This initial bit of work was really a combination of laying some base work
> - figuring out how to integrate ZooKeeper with Solr in a limited way,
> dealing with some infrastructure - and picking off some low hanging search
> side fruit.
>
> The next step is the indexing side. And we plan on starting to tackle that
> sometime soon.
>
> But first - could you help with some feedback?ISome people are using our
> SolrCloud start - I have seen evidence of it ;) Some, even in production.
>
> I would love to have your help in targeting what we now try and improve.
> Any suggestions or feedback? If you have sent this before, I/others likely
> missed it - send it again!
>
> I know anyone that has used SolrCloud has some feedback. I know it because
> I've used it too ;) It's too complicated to setup still. There are still
> plenty of pain points. We accepted some compromise trying to fit into what
> Solr was, and not wanting to dig in too far before feeling things out and
> letting users try things out a bit. Thinking that we might be able to adjust
> Solr to be more in favor of SolrCloud as we go, what is the ideal state of
> the work we have currently done?
>
> If anyone using SolrCloud helps with the feedback, I'll help with the
> coding effort.
>
> - Mark Miller
> -- lucidimagination.com




-- 
--Sean