You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Philip Durbin <ph...@harvard.edu> on 2014/03/25 15:41:43 UTC

creating shards on the fly in a single Solr instance ("shards" query parameter)

I'm new to Solr and am exploring the idea of creating shards on the
fly. Once the shards have been created and populated, I am hoping to
use the "shards" query parameter to combine results from multiple
shards into a single results set.

By following the "Testing Index Sharding on Two Local Servers"
instructions[1] in the wiki I'm able to target two different shards
individually. It works great, I get three results from one shard and
one result from another shard. When I select them both with the
following query (straight from the wiki) I get all four results:

curl 'http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr'

My immediate problem is trying to figure out how to convert this
example from two instances of Solr running on different ports (8983
and 7574) that each have a single shard to one instance of Solr that
has two shards.

Various posts[2] suggest this is possible but refer to older versions
of Solr and the instructions don't seem to work for Solr 4.7.0.

When I try the CREATESHARD API call from the wiki[3] I get "Solr
instance is not running in SolrCloud mode" which isn't a huge surprise
because it's documented under the Collections API under SolrCloud.

I don't know anything about SolrCloud. Not yet, anyway. My experience
with Solr so far involves running `java -jar start.jar` from the
"example" directory. I barely know what Zookeeper is.

My goal for now is to use CREATESHARD to create a new shard in a
single Solr instance and then verify it was created with the STATUS[4]
command.

Can anyone please explain how I would accomplish this?

The thought is to have one shard for public data and a shard per user,
which is why I'm asking about creating the shards on the fly. (Logged
in users would see a mix of public and private data.) For now I'd like
to keep using a single Solr instance for simplicity. For more
background on where I'm coming from, please see
http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2014-02-06#l99
and https://trello.com/c/5z5PpR4r/50-design-solr-document-level-security-filter-solution

Thanks,

Phil

1. https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding

2. http://solr.pl/en/2013/01/07/solr-4-1-solrcloud-multiple-shards-on-the-same-solr-node/

3. i.e. curl 'http://localhost:8983/solr/admin/collections?action=CREATESHARD&shard=shardName&collection=collection1'
from  https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateaShard

4. i.e. http://localhost:8983/solr/admin/cores?action=STATUS via
https://wiki.apache.org/solr/CoreAdmin#STATUS (or whatever the right
command would be to list shards)

-- 
Philip Durbin
Software Developer for http://thedata.org
http://www.iq.harvard.edu/people/philip-durbin

Re: creating shards on the fly in a single Solr instance ("shards" query parameter)

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Hi Philip,

Comments inline:

On Tue, Mar 25, 2014 at 8:11 PM, Philip Durbin
<ph...@harvard.edu> wrote:
> I'm new to Solr and am exploring the idea of creating shards on the
> fly. Once the shards have been created and populated, I am hoping to
> use the "shards" query parameter to combine results from multiple
> shards into a single results set.
>
> By following the "Testing Index Sharding on Two Local Servers"
> instructions[1] in the wiki I'm able to target two different shards
> individually. It works great, I get three results from one shard and
> one result from another shard. When I select them both with the
> following query (straight from the wiki) I get all four results:
>
> curl 'http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr'
>
> My immediate problem is trying to figure out how to convert this
> example from two instances of Solr running on different ports (8983
> and 7574) that each have a single shard to one instance of Solr that
> has two shards.
>
> Various posts[2] suggest this is possible but refer to older versions
> of Solr and the instructions don't seem to work for Solr 4.7.0.

That post[2] refers to a SolrCloud installation but I guess you are
trying a non-ZK solr cluster. In the old way it is possible via
multiple cores on the same instance so you could do
shards=localhost:8983/solr/core1,localhost:8983/solr/core2 and so on.

That being said, I think you should use SolrCloud for your own sanity's sake :)

>
> When I try the CREATESHARD API call from the wiki[3] I get "Solr
> instance is not running in SolrCloud mode" which isn't a huge surprise
> because it's documented under the Collections API under SolrCloud.
>

Yes, CREATESHARD is a Collection API which will work only with a
SolrCloud installation. Plus it only works for what we call custom
sharding i.e. where the user controls which documents goes where.

> I don't know anything about SolrCloud. Not yet, anyway. My experience
> with Solr so far involves running `java -jar start.jar` from the
> "example" directory. I barely know what Zookeeper is.

ZooKeeper is a distributed co-ordination service. In simple words,
think of it as a mediator between different Solr nodes which helps
them to reach a consensus on topics. All Solr instances must know the
host:port of your ZooKeeper instances. Solr ships with an embedded
ZooKeeper which can be used to get started but for production consider
running your own instances of ZooKeeper (at least 3).

>
> My goal for now is to use CREATESHARD to create a new shard in a
> single Solr instance and then verify it was created with the STATUS[4]
> command.
>
> Can anyone please explain how I would accomplish this?

I'm assuming you have a binary distribution of Solr 4.6.1 or later on
a linux distribution.

Here's the high level: We will create a shard for "public" and shards
for each unique "user". We will do this by adding a field to each doc
called "user" whose value (public, user1, user2 etc) will be used to
route documents to the correct shard.

1. Create a Solr schema which has a field called "user" - It's value
will be mapped to a shard.
2. Let's start by installing ZooKeeper:
http://zookeeper.apache.org/doc/r3.4.6/zookeeperStarted.html
3. Upload configuration to ZooKeeper:
cd example; ./cloud-scripts/zkcli.sh -cmd upconfig -zkhost
localhost:2181 -confdir solr/collection1/conf -confname conf1
4. Start Solr: java -DzkHost=localhost:2181 -jar start.jar
5. Create a collection with custom sharding enabled:
http://localhost:8983/solr/admin/collections?action=CREATE&name=mydata&collection.configName=conf1&maxShardsPerNode=10&router.name=implicit&router.field=user&shards=public,user1,user2
6. Whenever a new user is added to the system, you can use CREATESHARD:
http://localhost:8983/solr/admin/collections?action=CREATESHARD&collection=mydata&shard=user3

While indexing, you can send data to the right shard by adding a
parameter called _route_=<shard/user name> e.g. _route_=public or
_route_=user1
While searching, you can search both the public as well as the user1's
shard by adding _route_=public,user2

You can verify that the shards are added by going to the Collections
page on the UI which will show each shard and the Solr instance on
which it is running.

>
> The thought is to have one shard for public data and a shard per user,
> which is why I'm asking about creating the shards on the fly. (Logged
> in users would see a mix of public and private data.) For now I'd like
> to keep using a single Solr instance for simplicity. For more
> background on where I'm coming from, please see
> http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2014-02-06#l99
> and https://trello.com/c/5z5PpR4r/50-design-solr-document-level-security-filter-solution
>
> Thanks,
>
> Phil
>
> 1. https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding
>
> 2. http://solr.pl/en/2013/01/07/solr-4-1-solrcloud-multiple-shards-on-the-same-solr-node/
>
> 3. i.e. curl 'http://localhost:8983/solr/admin/collections?action=CREATESHARD&shard=shardName&collection=collection1'
> from  https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateaShard
>
> 4. i.e. http://localhost:8983/solr/admin/cores?action=STATUS via
> https://wiki.apache.org/solr/CoreAdmin#STATUS (or whatever the right
> command would be to list shards)
>
> --
> Philip Durbin
> Software Developer for http://thedata.org
> http://www.iq.harvard.edu/people/philip-durbin



-- 
Regards,
Shalin Shekhar Mangar.