You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ku3ia <de...@gmail.com> on 2012/11/02 09:53:36 UTC

SolrCloud: general questions

Hi all!
We plan to migrate from Solr 3.5 to SolrCloud 4.0. We pass some tests and I
want to conform results with you.

So, what I have on tests:
Ubuntu 12.04 LTS, Oracle JDK 7u7, Jetty 8, SolrCloud 4.0, 4 shards (4 JVM's
on the same machine on different ports [9080, 9081, 9082, 9083]), no
replicas

My questions are:
1) Is it true, that I may send data to any of shards [9080, 9081, 9082,
9083] and don't care about how SolrCloud will distribute data between
shards? What algorithm is used: round robin?

2) For example, in ColrCloud there is a document:
<doc><field name="id">1</field><field name="name">this is Solr
3.5</field></doc>
I have no information about shard in which this doc is. I need to update
information at field "name". The new doc is:
<doc><field name="id">1</field><field name="name">this is
SolrCloud</field></doc>
Is it true, that I may send this doc to any of shards [9080, 9081, 9082,
9083] and after commit, when I run the query, I'll have "this is SolrCloud "
instead of "this is Solr 3.5" in results? As I see old data is still at
index until optimize done?

3) Is it true, that delete by query works regardless of where to send the
request?

4) My DnumShards=4. If I need to expand SolrCloud, for example, to 6 shards,
I need to remove Zookeeper data directory, set DnumShards to 6 and restart
Jetty. Can I set DnumShards=20 and only add new shards in a future with out
any removal and restart JVM?

5) Currently we have 30 shards with 50M docs. What schema you advice: shards
with ~15M docs, or more shards with less count of docs? What will be faster:
search on shards with ~15M docs or search on more shards with less count of
docs? Expected count of docs are ~1 500 000 000.

Thanks for your responses.



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-general-questions-tp4017769.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: general questions

Posted by ku3ia <de...@gmail.com>.
Hi Tomás!!!
The first three questions are major for me. Many thanks for your response.

About number of shards and documents in it I'll try to test.

Thanks.



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-general-questions-tp4017769p4017836.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: general questions

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
>
> My questions are:
> 1) Is it true, that I may send data to any of shards [9080, 9081, 9082,
> 9083] and don't care about how SolrCloud will distribute data between
> shards? What algorithm is used: round robin?
>
It is true, the document is forwarded to the correct shard automatically.
It's not round robin, it's a hash function applied to the unique key of the
document.


>
> 2) For example, in ColrCloud there is a document:
> <doc><field name="id">1</field><field name="name">this is Solr
> 3.5</field></doc>
> I have no information about shard in which this doc is. I need to update
> information at field "name". The new doc is:
> <doc><field name="id">1</field><field name="name">this is
> SolrCloud</field></doc>
> Is it true, that I may send this doc to any of shards [9080, 9081, 9082,
> 9083] and after commit, when I run the query, I'll have "this is SolrCloud
> "
> instead of "this is Solr 3.5" in results? As I see old data is still at
> index until optimize done?
>
You'll only see the updated document, yes, as the hash function will give
the same result on the "id" field and it will go to the same shard as
before, there the document will be "updated" (deleted the old one and
inserted the new one). The old document will remain on the index (not
visible, as you said) until the segment where it is located is merged, this
can be due to optimize or background segment merging.


>
> 3) Is it true, that delete by query works regardless of where to send the
> request?
>
yes.

>
> 4) My DnumShards=4. If I need to expand SolrCloud, for example, to 6
> shards,
> I need to remove Zookeeper data directory, set DnumShards to 6 and restart
> Jetty. Can I set DnumShards=20 and only add new shards in a future with out
> any removal and restart JVM?
>
I think you could remove the collection and create it again. See the new
collections API. You need to have at least as many Solr instances (or Solr
cores) as the number of shards in order to be able to anything with your
collection. You won't be able to index of search if the number of nodes <
number of shards. Any change in the number of shards requires re indexing
everything.


>
> 5) Currently we have 30 shards with 50M docs. What schema you advice:
> shards
> with ~15M docs, or more shards with less count of docs? What will be
> faster:
> search on shards with ~15M docs or search on more shards with less count of
> docs? Expected count of docs are ~1 500 000 000.
>

I think you'll have to test it, as it will depend much on your context (the
shape of your docs/index, your queries and other use cases), shards with
15M docs doesn't sound crazy, but I never tested with 100 shards really.

Tomás


>
> Thanks for your responses.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-general-questions-tp4017769.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>