You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jason <hi...@gmail.com> on 2012/12/05 08:21:41 UTC

how to assign dedicated server for indexing and add more shard in SolrCloud

I'm using master and slave server for scaling.
Master is dedicated for indexing and slave is for searching.
Now, I'm planning to move SolrCloud.
It has leader and replicas.
Leader acts like master and replicas acts like slave. Is it right?
so, I'm wondering two things.

First,
How can I assign dedicated server for indexing in SolrCloud?

Second,
Consider I'm using  two shard cluster with shard replicas
<http://wiki.apache.org/solr/SolrCloud#Example_B:_Simple_two_shard_cluster_with_shard_replicas>  
and I need to extend one more shard with replicas.
In this case, existing two shards and replicas will already have many docs.
so, I want to add indexing docs in new one only.
How can I do this?

Actually, I don't understand perfectly about SolrCloud.
So, my questions can be ridiculous.
Any inputs are welcome.
Thanks,



--
View this message in context: http://lucene.472066.n3.nabble.com/how-to-assign-dedicated-server-for-indexing-and-add-more-shard-in-SolrCloud-tp4024404.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to assign dedicated server for indexing and add more shard in SolrCloud

Posted by Erick Erickson <er...@gmail.com>.

well, you could probably do what you want. Go ahead and index on the "super
cool AWS instance", just don't bring the replicas up yet. All the indexing
is going to this machine. Once your index is constructed, bring up
replicas. Old-style replication will take place and you should be off to
the races.

Bur personally, I'd just stay with old-style master/replication in the
situation you describe. It's still perfectly possible with SolrCloud/Solr4,
none of that functionality has been taken away.

I guess you're talking about two different use cases here.
SolrCloud/ZooKeeper deals with the NRT issues, which are really difficult
with traditional master/slave setups. But static indexes are a bit of a
different situation.

But you're right, you get a lot of merging going on in the background with
NRT and frequent commits. As in all things, it's a tradeoff.

Best
Erick


On Thu, Dec 6, 2012 at 12:35 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Jason,
> Thanks for raising it!
>
> Erick,
> That's what I want to discuss for a long time. Frankly speaking, the
> question is:
>
> if old-school (master/slave) search deployments doesn't comply to vision by
> SolrCloud/ElasticSearch, does it mean that they are wrong?
>
> Let me enumerate kinds of 'old-school search':
> - number of docs is not so dramatic to make sharding profitable from search
> latency's POV;
> - index updates are not frequent, they are rather rare nightly bulks;
> - search index is not a SOR (system of records) - it's a secondary system,
> provides the search service, still significant for the enterprise;
> - there is an SOR - primary system, which is kind of CMS or RDBMS or CMS
> with publish through RDBMS, etc;
>
> Does it look like your system? No, - click Delete button!
>
> // for few people who still read this:
>
> That's what I have with Solr Cloud in this case:
> - I can decide don't deal with sharding. Good! put numShards=0, and buy
> more (VM) instances to have more replicas to increase throughput;
> - start nightly reindex - delQ *:* , add(....), commit()
> - in this case all my instances will spend resources to indexing same docs,
> instead of handling search requests - BAD#1;
> - even I'm able to supply long Iterable<SolrInputDocument>,
> DistribudedUpdateProcessor will throw documents one by one, not by huge
> chunks, that leads to many small segments - eg. if I have 100Mb RAM buffer,
> and 10 servlet container threads I'll have sequence of 10Mb segments;
> - every of these flushes also flushes some part of current index mapped to
> the RAM that impacts search latency BAD#2;
> - when indexing is over I have a many small segments, and then The Merge
> starts, which also flushes current index from RAM BAD#3.
>
> In summary: I waste resources for indexing same stuff on searcher nodes, as
> a side effect I have longer period of latency impact.
>
> How I want to do it:
>  - in the cloud I add small instances as replicas on demand to adjust for
> work load dynamically;
>  - when I need to reindex (full import) I can rent super cool VM instance
> with manyway-CPU, run indexing on it;
>  - if it blows off, no problem I can run full import from my CMS/DB again
> from the beginning - or i can run two imports simultaneously;
>  - after indexing  finished, I can push index to searchers or start new
> ones mounting index to them.
>
> Please tell me where I'm wrong, whether it SolrCloud features, 'cloud'
> economy, hard/VMware architecture or Lucene internals. Can Jason and myself
> adjust SolrCloud for our 'old-school' pattern?
>
> Thanks for sharing your opinion!
>
>
>
> On Thu, Dec 6, 2012 at 7:19 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > First, forget about master/slave with SolrCloud! Leaders really exist to
> > resolve conflicts, the old notion of M/S replication is largely
> irrelevant.
> >
> > Updates can go to any node in the cluster, leader, replica, whatever. The
> > node forwards the doc to the correct leader based on a hash of the
> > <uniqueKey>, which then forwards the raw document to all replicas. Then
> all
> > the replicas index the document separately. Note that this is true on
> > mutli-document packets too. You can't get NRT with the old-style
> > replication process where the master indexes the doc and then the _index_
> > is replicated...
> >
> > As for your second question, it sounds like you want to go from
> > numShards=2, say to numShards=3. You can't do that as it stands. There
> are
> > two approaches:
> > 1> "shard splitting" which would redistribute the documents to a new set
> of
> > shards
> > 2> pluggable hashing which allows you to specify the code that does the
> > shard assignment.
> > Neither of these are available yet, although <2> is imminent. There is
> > active work on <1>, but I don't think that will be ready as soon.
> >
> > Best
> > Erick
> >
> >
> > On Tue, Dec 4, 2012 at 11:21 PM, Jason <hi...@gmail.com> wrote:
> >
> > > I'm using master and slave server for scaling.
> > > Master is dedicated for indexing and slave is for searching.
> > > Now, I'm planning to move SolrCloud.
> > > It has leader and replicas.
> > > Leader acts like master and replicas acts like slave. Is it right?
> > > so, I'm wondering two things.
> > >
> > > First,
> > > How can I assign dedicated server for indexing in SolrCloud?
> > >
> > > Second,
> > > Consider I'm using  two shard cluster with shard replicas
> > > <
> > >
> >
> http://wiki.apache.org/solr/SolrCloud#Example_B:_Simple_two_shard_cluster_with_shard_replicas
> > > >
> > > and I need to extend one more shard with replicas.
> > > In this case, existing two shards and replicas will already have many
> > docs.
> > > so, I want to add indexing docs in new one only.
> > > How can I do this?
> > >
> > > Actually, I don't understand perfectly about SolrCloud.
> > > So, my questions can be ridiculous.
> > > Any inputs are welcome.
> > > Thanks,
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/how-to-assign-dedicated-server-for-indexing-and-add-more-shard-in-SolrCloud-tp4024404.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>

Re: how to assign dedicated server for indexing and add more shard in SolrCloud

Posted by Mikhail Khludnev <mk...@griddynamics.com>.

Jason,
Thanks for raising it!

Erick,
That's what I want to discuss for a long time. Frankly speaking, the
question is:

if old-school (master/slave) search deployments doesn't comply to vision by
SolrCloud/ElasticSearch, does it mean that they are wrong?

Let me enumerate kinds of 'old-school search':
- number of docs is not so dramatic to make sharding profitable from search
latency's POV;
- index updates are not frequent, they are rather rare nightly bulks;
- search index is not a SOR (system of records) - it's a secondary system,
provides the search service, still significant for the enterprise;
- there is an SOR - primary system, which is kind of CMS or RDBMS or CMS
with publish through RDBMS, etc;

Does it look like your system? No, - click Delete button!

// for few people who still read this:

That's what I have with Solr Cloud in this case:
- I can decide don't deal with sharding. Good! put numShards=0, and buy
more (VM) instances to have more replicas to increase throughput;
- start nightly reindex - delQ *:* , add(....), commit()
- in this case all my instances will spend resources to indexing same docs,
instead of handling search requests - BAD#1;
- even I'm able to supply long Iterable<SolrInputDocument>,
DistribudedUpdateProcessor will throw documents one by one, not by huge
chunks, that leads to many small segments - eg. if I have 100Mb RAM buffer,
and 10 servlet container threads I'll have sequence of 10Mb segments;
- every of these flushes also flushes some part of current index mapped to
the RAM that impacts search latency BAD#2;
- when indexing is over I have a many small segments, and then The Merge
starts, which also flushes current index from RAM BAD#3.

In summary: I waste resources for indexing same stuff on searcher nodes, as
a side effect I have longer period of latency impact.

How I want to do it:
 - in the cloud I add small instances as replicas on demand to adjust for
work load dynamically;
 - when I need to reindex (full import) I can rent super cool VM instance
with manyway-CPU, run indexing on it;
 - if it blows off, no problem I can run full import from my CMS/DB again
from the beginning - or i can run two imports simultaneously;
 - after indexing  finished, I can push index to searchers or start new
ones mounting index to them.

Please tell me where I'm wrong, whether it SolrCloud features, 'cloud'
economy, hard/VMware architecture or Lucene internals. Can Jason and myself
adjust SolrCloud for our 'old-school' pattern?

Thanks for sharing your opinion!



On Thu, Dec 6, 2012 at 7:19 PM, Erick Erickson <er...@gmail.com>wrote:

> First, forget about master/slave with SolrCloud! Leaders really exist to
> resolve conflicts, the old notion of M/S replication is largely irrelevant.
>
> Updates can go to any node in the cluster, leader, replica, whatever. The
> node forwards the doc to the correct leader based on a hash of the
> <uniqueKey>, which then forwards the raw document to all replicas. Then all
> the replicas index the document separately. Note that this is true on
> mutli-document packets too. You can't get NRT with the old-style
> replication process where the master indexes the doc and then the _index_
> is replicated...
>
> As for your second question, it sounds like you want to go from
> numShards=2, say to numShards=3. You can't do that as it stands. There are
> two approaches:
> 1> "shard splitting" which would redistribute the documents to a new set of
> shards
> 2> pluggable hashing which allows you to specify the code that does the
> shard assignment.
> Neither of these are available yet, although <2> is imminent. There is
> active work on <1>, but I don't think that will be ready as soon.
>
> Best
> Erick
>
>
> On Tue, Dec 4, 2012 at 11:21 PM, Jason <hi...@gmail.com> wrote:
>
> > I'm using master and slave server for scaling.
> > Master is dedicated for indexing and slave is for searching.
> > Now, I'm planning to move SolrCloud.
> > It has leader and replicas.
> > Leader acts like master and replicas acts like slave. Is it right?
> > so, I'm wondering two things.
> >
> > First,
> > How can I assign dedicated server for indexing in SolrCloud?
> >
> > Second,
> > Consider I'm using  two shard cluster with shard replicas
> > <
> >
> http://wiki.apache.org/solr/SolrCloud#Example_B:_Simple_two_shard_cluster_with_shard_replicas
> > >
> > and I need to extend one more shard with replicas.
> > In this case, existing two shards and replicas will already have many
> docs.
> > so, I want to add indexing docs in new one only.
> > How can I do this?
> >
> > Actually, I don't understand perfectly about SolrCloud.
> > So, my questions can be ridiculous.
> > Any inputs are welcome.
> > Thanks,
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/how-to-assign-dedicated-server-for-indexing-and-add-more-shard-in-SolrCloud-tp4024404.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: how to assign dedicated server for indexing and add more shard in SolrCloud

Posted by Erick Erickson <er...@gmail.com>.

First, forget about master/slave with SolrCloud! Leaders really exist to
resolve conflicts, the old notion of M/S replication is largely irrelevant.

Updates can go to any node in the cluster, leader, replica, whatever. The
node forwards the doc to the correct leader based on a hash of the
<uniqueKey>, which then forwards the raw document to all replicas. Then all
the replicas index the document separately. Note that this is true on
mutli-document packets too. You can't get NRT with the old-style
replication process where the master indexes the doc and then the _index_
is replicated...

As for your second question, it sounds like you want to go from
numShards=2, say to numShards=3. You can't do that as it stands. There are
two approaches:
1> "shard splitting" which would redistribute the documents to a new set of
shards
2> pluggable hashing which allows you to specify the code that does the
shard assignment.
Neither of these are available yet, although <2> is imminent. There is
active work on <1>, but I don't think that will be ready as soon.

Best
Erick

On Tue, Dec 4, 2012 at 11:21 PM, Jason <hi...@gmail.com> wrote:

> I'm using master and slave server for scaling.
> Master is dedicated for indexing and slave is for searching.
> Now, I'm planning to move SolrCloud.
> It has leader and replicas.
> Leader acts like master and replicas acts like slave. Is it right?
> so, I'm wondering two things.
>
> First,
> How can I assign dedicated server for indexing in SolrCloud?
>
> Second,
> Consider I'm using  two shard cluster with shard replicas
> <
> http://wiki.apache.org/solr/SolrCloud#Example_B:_Simple_two_shard_cluster_with_shard_replicas
> >
> and I need to extend one more shard with replicas.
> In this case, existing two shards and replicas will already have many docs.
> so, I want to add indexing docs in new one only.
> How can I do this?
>
> Actually, I don't understand perfectly about SolrCloud.
> So, my questions can be ridiculous.
> Any inputs are welcome.
> Thanks,
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-assign-dedicated-server-for-indexing-and-add-more-shard-in-SolrCloud-tp4024404.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>