You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Gian Maria Ricci - aka Alkampfer <al...@nablasoft.com> on 2016/01/11 12:28:57 UTC

Pro and cons of using Solr Cloud vs standard Master Slave Replica

Hi guys,

 

a customer need a comprehensive list of all pro and cons of using standard
Master Slave replica VS using Solr Cloud. I'm interested especially in query
performance consideration, because in this specific situation the rate of
new documents is really slow, but the amount of data is about 50 millions of
document, and the index size on disk for single core is about 30 GB. 

 

Such amount of data should be easily handled by a Master Slave replica with
a  single core replicated on a certain number of slaves, but we need to
evaluate also the option of SolrCloud, especially for fault tolerance.

 

I've googled around, but did not find anything really comprehensive, so I'm
looking for real experience from you in Mailing List. :).

 

Thanks in advance.

 

--
Gian Maria Ricci
Cell: +39 320 0136949

 <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635>
<http://www.linkedin.com/in/gianmariaricci>
<https://twitter.com/alkampfer>   <http://feeds.feedburner.com/AlkampferEng>

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Rahul Ramesh <rr...@gmail.com>.

Please have a look at this post

https://support.lucidworks.com/hc/en-us/articles/201298317-What-is-SolrCloud-And-how-does-it-compare-to-master-slave-

We dont use Master slave architecture, however we use solr cloud and
standalone solr for our documents.

Indexing is a bit slow in cloud when compared to Standalone. This is
because of replication I think. However you will get a faster query
response.

Solr Cloud also requires a slightly elaborate setup with Zookeepers
compared to master/slave or standalone.

However, once Solr cloud is setup, it runs very smoothly and you dont have
to worry about the performance / high availability.

Please check the post, a detailed analysis and comparison between the two
has been given.

-Rahul


On Mon, Jan 11, 2016 at 4:58 PM, Gian Maria Ricci - aka Alkampfer <
alkampfer@nablasoft.com> wrote:

> Hi guys,
>
>
>
> a customer need a comprehensive list of all pro and cons of using standard
> Master Slave replica VS using Solr Cloud. I’m interested especially in
> query performance consideration, because in this specific situation the
> rate of new documents is really slow, but the amount of data is about 50
> millions of document, and the index size on disk for single core is about
> 30 GB.
>
>
>
> Such amount of data should be easily handled by a Master Slave replica
> with a  single core replicated on a certain number of slaves, but we need
> to evaluate also the option of SolrCloud, especially for fault tolerance.
>
>
>
> I’ve googled around, but did not find anything really comprehensive, so
> I’m looking for real experience from you in Mailing List. J.
>
>
>
> Thanks in advance.
>
>
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
> [image:
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
> <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635> [image:
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
> <http://www.linkedin.com/in/gianmariaricci> [image:
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
> <https://twitter.com/alkampfer> [image:
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
> <http://feeds.feedburner.com/AlkampferEng> [image:
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>
>
>

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Jack Krupansky <ja...@gmail.com>.

The "Legacy Scaling and Distribution" section of the Solr Reference Guide
also gives info elated to so-called master-slave mode:
https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and+Distribution

Also, although the old master-slave mode is still technically supported in
the sense that the code and doc is still there, You won't be able to get
the level of community support  here on the mailing list as you can get for
SolrCloud.

Unless you're simply trying to decide whether to leave an old legacy system
as-is with the old distributed mode, nobody should be considered a fresh
new distributed Solr deployment with anything other than SolrCloud.

(Hmmm... have any of the committers considered deprecating the old
non-SolrCloud distributed mode features?)

-- Jack Krupansky

On Wed, Jan 13, 2016 at 9:02 AM, Shivaji Dutta <sd...@hortonworks.com>
wrote:

> - SolrCloud uses zookeeper to manage HA
>         - Zookeeper is a standard for all HA in Apache Hadoop
> - You have collections which will manage your shards across nodes
> - SolrJ Client is now fault tolerant with CloudSolrClient
>
> This is the way future direction of the product will go.
>
>
>
> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
> <al...@nablasoft.com> wrote:
>
> >Thanks.
> >
> >--
> >Gian Maria Ricci
> >Cell: +39 320 0136949
> >
> >
> >
> >-----Original Message-----
> >From: Shawn Heisey [mailto:apache@elyograg.org]
> >Sent: lunedì 11 gennaio 2016 18:28
> >To: solr-user@lucene.apache.org
> >Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
> >Replica
> >
> >On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> >> a customer need a comprehensive list of all pro and cons of using
> >> standard Master Slave replica VS using Solr Cloud. I¹m interested
> >> especially in query performance consideration, because in this
> >> specific situation the rate of new documents is really slow, but the
> >> amount of data is about 50 millions of document, and the index size on
> >> disk for single core is about 30 GB.
> >
> >The primary advantage to SolrCloud is that SolrCloud handles most of the
> >administrative and operational details for you automatically.
> >
> >SolrCloud is a little more complicated to set up initially, because you
> >must worry about Zookeeper as well as Solr, but once it's properly set
> >up, there is no single point of failure.
> >
> >> Such amount of data should be easily handled by a Master Slave replica
> >> with a  single core replicated on a certain number of slaves, but we
> >> need to evaluate also the option of SolrCloud, especially for fault
> >> tolerance.
> >>
> >
> >Once you're beyond initial setup, fault tolerance with SolrCloud is much
> >easier than master/slave replication.  Switching a slave to a master is
> >possible, but the procedure is somewhat complicated.  SolrCloud does not
> >*have* masters, it is a true cluster.
> >
> >With master/slave replication, the master handles all indexing, and the
> >finished index segments are copied to the slaves via HTTP, and the slaves
> >simply need to open them.  SolrCloud does indexing on all shard replicas,
> >nearly simultaneously.  Usually this is an advantage, not a disadvantage,
> >but in heavy indexing situations master/slave replication
> >*might* show better performance on the slaves.
> >
> >Thanks,
> >Shawn
> >
> >
>
>

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Jack Krupansky <ja...@gmail.com>.

Yeah, and to the original question, there is no master list of features and
how SolrCloud vs. legacy distributed mode compare feature by feature.

And until SolrCloud actually does subsume every single (important) feature
of legacy distributed mode, Solr probably still needs to continue to
support legacy distributed mode, including backup.

The doc does need better coverage of backup and restore at the cluster
level, including configuration files. What's there now is basically the old
single-node replication backup. What exactly is the recommended best
practice for backing up a single shard, let alone all shards. Should
backups be collection-based as well?


-- Jack Krupansky

On Fri, Jan 15, 2016 at 3:26 AM, Gian Maria Ricci - aka Alkampfer <
alkampfer@nablasoft.com> wrote:

> Yes, I've checked that jira some weeks ago and it is the reason why I was
> telling that there is still no clear procedure to backup SolrCloud in
> current latest version.  I'm glad that the priority is Major, but until it
> is not closed in an official version, I have to tell to customers that
> there is not easy and supported backup procedure for SolrCloud
> configuration :(.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: giovedì 14 gennaio 2016 16:46
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
> Replica
>
> re: SolrCloud backup/restore:
> https://issues.apache.org/jira/browse/SOLR-5750
>
> not committed yet, but getting attention.
>
>
>
> On Thu, Jan 14, 2016 at 6:19 AM, Gian Maria Ricci - aka Alkampfer <
> alkampfer@nablasoft.com> wrote:
> > Actually there are situation where a restore is needed, suppose that
> someone does some error and deletes all documents from a collection, or
> maybe deletes a series of document, etc. I know that this is not likely to
> happen, but in mission critical enterprise system, we always need a
> detailed procedure for disaster recovering.
> >
> > For such scenario we need to plan the worst case, where everything is
> lost.
> >
> > With Master Slave is just a matter of recreating machines, reconfigure
> the core, and restore a backup, and the game is done, with SolrCloud is not
> really clear for me how can I backup / restore data. From what I've found
> in the internet I need to backup every shard of the collection, and, if we
> need to restore everything from a backup, we can recreate the collection
> and then restore all the individual shards. I do not know if this is a
> supported scenario / procedure, but theoretically it could work.
> >
> > --
> > Gian Maria Ricci
> > Cell: +39 320 0136949
> >
> >
> >
> > -----Original Message-----
> > From: Alessandro Benedetti [mailto:abenedetti@apache.org]
> > Sent: giovedì 14 gennaio 2016 10:46
> > To: solr-user@lucene.apache.org
> > Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
> > Replica
> >
> > It's true that SolrCloud is adding some complexity.
> > But few observations :
> >
> > SolrCloud has some disadvantages and c an't beat the easiness and
> > simpleness
> >> of
> >> Master Slave Replica. So I can only encourage to keep Master Slave
> >> Replica in future versions.
> >
> >
> > I agree, it can happen situations when you have really simple and not
> critical systems.
> > Anyway old style replication is still used in SolrCloud, so I think it
> is going to stay for a while ( until is replaced with something else) .
> >
> > To answer to Gian :
> >
> > One of the problem I've found is that I've not found a simple way to
> > backup
> >> the content of a collection to restore in situation of disaster
> recovery.
> >> With simple master / slave scenario we can use the replication
> >> handler to generate backups that can be easily used to restore
> >> content of a core, while with SolrCloud is not clear how can we
> >> obtain a full backup
> >
> >
> > To be fair, Disaster recovery is when SolrCloud shines.
> > If you lose random nodes across your collection, you simply need to fix
> them and spin up again .
> > The system will automatically restore the content to the last version
> availa ble ( the tlog first and the  leader ( if the tlog is not enough)
> will help the dead node to catch up .
> > If you lose all the replicas for a shard and you lose the content in
> disk of all this replicas ( index and tlog), SolrCloud can't help you.
> > For this unlikely scenarios a backup is suggested.
> > You could restore anyway the backup only to one node, and the replicas
> are going to catch up .
> >
> > Probably is just a matter of backupping every shard with standard
> >> replication handler and then restore each shard after recreating the
> >> collection
> >
> >
> > Definitely not, SolrCloud is there to avoid this manual stuff.
> >
> > Cheers
> >
> >
> > On 14 January 2016 at 08:58, Gian Maria Ricci - aka Alkampfer <
> alkampfer@nablasoft.com> wrote:
> >
> >> I agree that SolrCloud has not only advantages, I really understand
> >> that it offers many more features, but it introduces some complexity.
> >>
> >> One of the problem I've found is that I've not found a simple way to
> >> backup the content of a collection to restore in situation of disaste
> > r
> >> recovery. With simple master / slave scenario we can use the
> >> replication handler to generate backups that can be easily used to
> >> restore content of a core, while with SolrCloud is not clear how can we
> obtain a full backup.
> >> Probably is just a matter of backupping every shard with standard
> >> replication handler and then restore each shard after recreating the
> >> collection, but I've not found (probably I need to search better)
> >> official documentation on backup / restore procedures for SolrCloud.
> >>
> >> Thanks.
> >>
> >> --
> >> Gian Maria Ricci
> >> Cell: +39 320 0136949
> >>
> >>
> >> -----Original Message-----
> >> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
> >> Sent: giovedì 14 gennaio 2016 08:22
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Pro and cons of using Solr Cloud vs standard Master
> >> Slave Replica
> >>
> >> SolrCloud has some disadvantages and can't beat the easiness and
> >> simpleness of Master Slave Replica. So I can only encourage to keep
> >> Master Slave Replica in
> > future versions.
> >>
> >> Bernd
> >>
> >> Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
> >> > The "Legacy Scaling and Distribution" section of the Solr Reference
> >> > Guide also gives info elated to so-called master-slave mode:
> >> > https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and
> >> > +
> >> > Di
> >> > stribution
> >> >
> >> > Also, although the old master-slave mode is still technically
> >> > supported in the sense that the code and doc is still there, You
> >> > won't be able to get the level of community support  here on the
> >> > mailing list as you can get for SolrCloud.
> >> >
> >> > Unless you're simply trying to decide whether to leave an old
> >> > legacy system as-is with the old distributed mode, nobody should be
> >> > considered a fresh new distributed Solr deployment with anything
> >> > other
> >> than SolrCloud.
> >> >
> >> > (Hmmm... have any of the committers considered deprecating the old
> >> > non-SolrCloud distributed mode features?)
> >>
> >> -1
> >>
> >> >
> >> > -- Jack Krupansky
> >> >
> >> > On Wed, Jan 13, 2016 at 9:02 AM, Shiv
> > aji Dutta
> >> > <sd...@hortonworks.com>
> >> > wrote:
> >> >
> >> >> - SolrCloud uses zookeeper to manage HA
> >> >>         - Zookeeper is a standard for all HA in Apache Hadoop
> >> >> - You have collections which will manage your shards across nodes
> >> >> - SolrJ Client is now fault tolerant with CloudSolrClient
> >> >>
> >> >> This is the way future direction of the product will go.
> >> >>
> >> >>
> >> >>
> >> >> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
> >> >> <al...@nablasoft.com> wrote:
> >> >>
> >> >>> Thanks.
> >> >>>
> >> >>> --
> >> >>> Gian Maria Ricci
> >> >>> Cell: +39 320 0136949
> >> >>>
> >> >>>
> >> >>>
> >> >>> -----Original Message-----
> >> >>> From: Shawn Heisey [mailto:apache@elyograg.org]
> >> >>> Sent: lunedì 11 gennaio 2016 18:28
> >> >>> To: solr-user@lucene.apache.org
> >> >>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master
> >> >>> Slave Replica
> >> >>>
> >> >>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> >> >>>> a customer need a comprehensive list of all pro and cons of
> >> >>>> using
> >
> >> >>>> standard Master Slave replica VS using Solr Cloud. I¹m
> >> >>>> interested especially in query performance consideration,
> >> >>>> because in this specific situation the rate of new documents is
> >> >>>> really slow, but the amount of data is about 50 millions of
> >> >>>> document, and the index size on disk for single core is about 30
> GB.
> >> >>>
> >> >>> The primary advantage to SolrCloud is that SolrCloud handles most
> >> >>> of the administrative and operational details for you automatically.
> >> >>>
> >> >>> SolrCloud is a little more complicated to set up initially,
> >> >>> because you must worry about Zookeeper as well as Solr, but once
> >> >>> it's properly set up, there is no single point of failure.
> >> >>>
> >> >>>> Such amount of data should be easily handled by a Master Slave
> >> >>>> replica with a  single core replicated on a certain number of
> >> >>>> slaves, but we need to evaluate also the option of SolrCloud,
> >> >>>> especially for fault tolerance.
> >> >>>>
> >> >>>
> >> >>> Once you're beyond in
> > itial setup, fault tolerance with SolrCloud is
> >> >>> much easier than master/slave replication.  Switching a slave to
> >> >>> a master is possible, but the procedure is somewhat complicated.
> >> >>> SolrCloud does not
> >> >>> *have* masters, it is a true cluster.
> >> >>>
> >> >>> With master/slave replication, the master handles all indexing,
> >> >>> and the finished index segments are copied to the slaves via
> >> >>> HTTP, and the slaves simply need to open them.  SolrCloud does
> >> >>> indexing on all shard replicas, nearly simultaneously.  Usually
> >> >>> this is an advantage, not a disadvantage, but in heavy indexing
> >> >>> situations master/slave replication
> >> >>> *might* show better performance on the slaves.
> >> >>>
> >> >>> Thanks,
> >> >>> Shawn
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >>
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symm
> > etry?"
> >
> > William Blake - Songs of Experience -1794 England
>

RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

In the multi-tenant model, SolrCloud shines because the configuration directories need not include any details about the cluster.    SolrCloud also shines if the number of documents and/or indexing rate requires sharding.

But master-slave with replica configuration is OK if you have just a couple of related cores and their configuration isn't too dynamic.    I know that in my very old-school systems environment, getting all the ports/firewalls configured right for SolrCloud and maintaining security is a bit hairy.

Hoping this helps,

-----Original Message-----
From: outlook_288fbf38c031d5f3@outlook.com [mailto:outlook_288fbf38c031d5f3@outlook.com] On Behalf Of Gian Maria Ricci - aka Alkampfer
Sent: Friday, January 15, 2016 3:26 AM
To: solr-user@lucene.apache.org
Subject: RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Yes, I've checked that jira some weeks ago and it is the reason why I was telling that there is still no clear procedure to backup SolrCloud in current latest version.  I'm glad that the priority is Major, but until it is not closed in an official version, I have to tell to customers that there is not easy and supported backup procedure for SolrCloud configuration :(.

--
Gian Maria Ricci
Cell: +39 320 0136949
    


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: giovedì 14 gennaio 2016 16:46
To: solr-user <so...@lucene.apache.org>
Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

re: SolrCloud backup/restore: https://issues.apache.org/jira/browse/SOLR-5750

not committed yet, but getting attention.



On Thu, Jan 14, 2016 at 6:19 AM, Gian Maria Ricci - aka Alkampfer <al...@nablasoft.com> wrote:
> Actually there are situation where a restore is needed, suppose that someone does some error and deletes all documents from a collection, or maybe deletes a series of document, etc. I know that this is not likely to happen, but in mission critical enterprise system, we always need a detailed procedure for disaster recovering.
>
> For such scenario we need to plan the worst case, where everything is lost.
>
> With Master Slave is just a matter of recreating machines, reconfigure the core, and restore a backup, and the game is done, with SolrCloud is not really clear for me how can I backup / restore data. From what I've found in the internet I need to backup every shard of the collection, and, if we need to restore everything from a backup, we can recreate the collection and then restore all the individual shards. I do not know if this is a supported scenario / procedure, but theoretically it could work.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -----Original Message-----
> From: Alessandro Benedetti [mailto:abenedetti@apache.org]
> Sent: giovedì 14 gennaio 2016 10:46
> To: solr-user@lucene.apache.org
> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave 
> Replica
>
> It's true that SolrCloud is adding some complexity.
> But few observations :
>
> SolrCloud has some disadvantages and c an't beat the easiness and 
> simpleness
>> of
>> Master Slave Replica. So I can only encourage to keep Master Slave 
>> Replica in future versions.
>
>
> I agree, it can happen situations when you have really simple and not critical systems.
> Anyway old style replication is still used in SolrCloud, so I think it is going to stay for a while ( until is replaced with something else) .
>
> To answer to Gian :
>
> One of the problem I've found is that I've not found a simple way to 
> backup
>> the content of a collection to restore in situation of disaster recovery.
>> With simple master / slave scenario we can use the replication 
>> handler to generate backups that can be easily used to restore 
>> content of a core, while with SolrCloud is not clear how can we 
>> obtain a full backup
>
>
> To be fair, Disaster recovery is when SolrCloud shines.
> If you lose random nodes across your collection, you simply need to fix them and spin up again .
> The system will automatically restore the content to the last version availa ble ( the tlog first and the  leader ( if the tlog is not enough) will help the dead node to catch up .
> If you lose all the replicas for a shard and you lose the content in disk of all this replicas ( index and tlog), SolrCloud can't help you.
> For this unlikely scenarios a backup is suggested.
> You could restore anyway the backup only to one node, and the replicas are going to catch up .
>
> Probably is just a matter of backupping every shard with standard
>> replication handler and then restore each shard after recreating the 
>> collection
>
>
> Definitely not, SolrCloud is there to avoid this manual stuff.
>
> Cheers
>
>
> On 14 January 2016 at 08:58, Gian Maria Ricci - aka Alkampfer < alkampfer@nablasoft.com> wrote:
>
>> I agree that SolrCloud has not only advantages, I really understand 
>> that it offers many more features, but it introduces some complexity.
>>
>> One of the problem I've found is that I've not found a simple way to 
>> backup the content of a collection to restore in situation of disaste
> r
>> recovery. With simple master / slave scenario we can use the 
>> replication handler to generate backups that can be easily used to 
>> restore content of a core, while with SolrCloud is not clear how can we obtain a full backup.
>> Probably is just a matter of backupping every shard with standard 
>> replication handler and then restore each shard after recreating the 
>> collection, but I've not found (probably I need to search better) 
>> official documentation on backup / restore procedures for SolrCloud.
>>
>> Thanks.
>>
>> --
>> Gian Maria Ricci
>> Cell: +39 320 0136949
>>
>>
>> -----Original Message-----
>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>> Sent: giovedì 14 gennaio 2016 08:22
>> To: solr-user@lucene.apache.org
>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master 
>> Slave Replica
>>
>> SolrCloud has some disadvantages and can't beat the easiness and 
>> simpleness of Master Slave Replica. So I can only encourage to keep 
>> Master Slave Replica in
> future versions.
>>
>> Bernd
>>
>> Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
>> > The "Legacy Scaling and Distribution" section of the Solr Reference 
>> > Guide also gives info elated to so-called master-slave mode:
>> > https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and
>> > +
>> > Di
>> > stribution
>> >
>> > Also, although the old master-slave mode is still technically 
>> > supported in the sense that the code and doc is still there, You 
>> > won't be able to get the level of community support  here on the 
>> > mailing list as you can get for SolrCloud.
>> >
>> > Unless you're simply trying to decide whether to leave an old 
>> > legacy system as-is with the old distributed mode, nobody should be 
>> > considered a fresh new distributed Solr deployment with anything 
>> > other
>> than SolrCloud.
>> >
>> > (Hmmm... have any of the committers considered deprecating the old 
>> > non-SolrCloud distributed mode features?)
>>
>> -1
>>
>> >
>> > -- Jack Krupansky
>> >
>> > On Wed, Jan 13, 2016 at 9:02 AM, Shiv
> aji Dutta
>> > <sd...@hortonworks.com>
>> > wrote:
>> >
>> >> - SolrCloud uses zookeeper to manage HA
>> >>         - Zookeeper is a standard for all HA in Apache Hadoop
>> >> - You have collections which will manage your shards across nodes
>> >> - SolrJ Client is now fault tolerant with CloudSolrClient
>> >>
>> >> This is the way future direction of the product will go.
>> >>
>> >>
>> >>
>> >> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
>> >> <al...@nablasoft.com> wrote:
>> >>
>> >>> Thanks.
>> >>>
>> >>> --
>> >>> Gian Maria Ricci
>> >>> Cell: +39 320 0136949
>> >>>
>> >>>
>> >>>
>> >>> -----Original Message-----
>> >>> From: Shawn Heisey [mailto:apache@elyograg.org]
>> >>> Sent: lunedì 11 gennaio 2016 18:28
>> >>> To: solr-user@lucene.apache.org
>> >>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master 
>> >>> Slave Replica
>> >>>
>> >>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> >>>> a customer need a comprehensive list of all pro and cons of 
>> >>>> using
>
>> >>>> standard Master Slave replica VS using Solr Cloud. I¹m 
>> >>>> interested especially in query performance consideration, 
>> >>>> because in this specific situation the rate of new documents is 
>> >>>> really slow, but the amount of data is about 50 millions of 
>> >>>> document, and the index size on disk for single core is about 30 GB.
>> >>>
>> >>> The primary advantage to SolrCloud is that SolrCloud handles most 
>> >>> of the administrative and operational details for you automatically.
>> >>>
>> >>> SolrCloud is a little more complicated to set up initially, 
>> >>> because you must worry about Zookeeper as well as Solr, but once 
>> >>> it's properly set up, there is no single point of failure.
>> >>>
>> >>>> Such amount of data should be easily handled by a Master Slave 
>> >>>> replica with a  single core replicated on a certain number of 
>> >>>> slaves, but we need to evaluate also the option of SolrCloud, 
>> >>>> especially for fault tolerance.
>> >>>>
>> >>>
>> >>> Once you're beyond in
> itial setup, fault tolerance with SolrCloud is
>> >>> much easier than master/slave replication.  Switching a slave to 
>> >>> a master is possible, but the procedure is somewhat complicated.
>> >>> SolrCloud does not
>> >>> *have* masters, it is a true cluster.
>> >>>
>> >>> With master/slave replication, the master handles all indexing, 
>> >>> and the finished index segments are copied to the slaves via 
>> >>> HTTP, and the slaves simply need to open them.  SolrCloud does 
>> >>> indexing on all shard replicas, nearly simultaneously.  Usually 
>> >>> this is an advantage, not a disadvantage, but in heavy indexing 
>> >>> situations master/slave replication
>> >>> *might* show better performance on the slaves.
>> >>>
>> >>> Thanks,
>> >>> Shawn
>> >>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symm
> etry?"
>
> William Blake - Songs of Experience -1794 England

RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Gian Maria Ricci - aka Alkampfer <al...@nablasoft.com>.

Yes, I've checked that jira some weeks ago and it is the reason why I was telling that there is still no clear procedure to backup SolrCloud in current latest version.  I'm glad that the priority is Major, but until it is not closed in an official version, I have to tell to customers that there is not easy and supported backup procedure for SolrCloud configuration :(.

--
Gian Maria Ricci
Cell: +39 320 0136949
    


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: giovedì 14 gennaio 2016 16:46
To: solr-user <so...@lucene.apache.org>
Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

re: SolrCloud backup/restore: https://issues.apache.org/jira/browse/SOLR-5750

not committed yet, but getting attention.



On Thu, Jan 14, 2016 at 6:19 AM, Gian Maria Ricci - aka Alkampfer <al...@nablasoft.com> wrote:
> Actually there are situation where a restore is needed, suppose that someone does some error and deletes all documents from a collection, or maybe deletes a series of document, etc. I know that this is not likely to happen, but in mission critical enterprise system, we always need a detailed procedure for disaster recovering.
>
> For such scenario we need to plan the worst case, where everything is lost.
>
> With Master Slave is just a matter of recreating machines, reconfigure the core, and restore a backup, and the game is done, with SolrCloud is not really clear for me how can I backup / restore data. From what I've found in the internet I need to backup every shard of the collection, and, if we need to restore everything from a backup, we can recreate the collection and then restore all the individual shards. I do not know if this is a supported scenario / procedure, but theoretically it could work.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -----Original Message-----
> From: Alessandro Benedetti [mailto:abenedetti@apache.org]
> Sent: giovedì 14 gennaio 2016 10:46
> To: solr-user@lucene.apache.org
> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave 
> Replica
>
> It's true that SolrCloud is adding some complexity.
> But few observations :
>
> SolrCloud has some disadvantages and c an't beat the easiness and 
> simpleness
>> of
>> Master Slave Replica. So I can only encourage to keep Master Slave 
>> Replica in future versions.
>
>
> I agree, it can happen situations when you have really simple and not critical systems.
> Anyway old style replication is still used in SolrCloud, so I think it is going to stay for a while ( until is replaced with something else) .
>
> To answer to Gian :
>
> One of the problem I've found is that I've not found a simple way to 
> backup
>> the content of a collection to restore in situation of disaster recovery.
>> With simple master / slave scenario we can use the replication 
>> handler to generate backups that can be easily used to restore 
>> content of a core, while with SolrCloud is not clear how can we 
>> obtain a full backup
>
>
> To be fair, Disaster recovery is when SolrCloud shines.
> If you lose random nodes across your collection, you simply need to fix them and spin up again .
> The system will automatically restore the content to the last version availa ble ( the tlog first and the  leader ( if the tlog is not enough) will help the dead node to catch up .
> If you lose all the replicas for a shard and you lose the content in disk of all this replicas ( index and tlog), SolrCloud can't help you.
> For this unlikely scenarios a backup is suggested.
> You could restore anyway the backup only to one node, and the replicas are going to catch up .
>
> Probably is just a matter of backupping every shard with standard
>> replication handler and then restore each shard after recreating the 
>> collection
>
>
> Definitely not, SolrCloud is there to avoid this manual stuff.
>
> Cheers
>
>
> On 14 January 2016 at 08:58, Gian Maria Ricci - aka Alkampfer < alkampfer@nablasoft.com> wrote:
>
>> I agree that SolrCloud has not only advantages, I really understand 
>> that it offers many more features, but it introduces some complexity.
>>
>> One of the problem I've found is that I've not found a simple way to 
>> backup the content of a collection to restore in situation of disaste
> r
>> recovery. With simple master / slave scenario we can use the 
>> replication handler to generate backups that can be easily used to 
>> restore content of a core, while with SolrCloud is not clear how can we obtain a full backup.
>> Probably is just a matter of backupping every shard with standard 
>> replication handler and then restore each shard after recreating the 
>> collection, but I've not found (probably I need to search better) 
>> official documentation on backup / restore procedures for SolrCloud.
>>
>> Thanks.
>>
>> --
>> Gian Maria Ricci
>> Cell: +39 320 0136949
>>
>>
>> -----Original Message-----
>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>> Sent: giovedì 14 gennaio 2016 08:22
>> To: solr-user@lucene.apache.org
>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master 
>> Slave Replica
>>
>> SolrCloud has some disadvantages and can't beat the easiness and 
>> simpleness of Master Slave Replica. So I can only encourage to keep 
>> Master Slave Replica in
> future versions.
>>
>> Bernd
>>
>> Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
>> > The "Legacy Scaling and Distribution" section of the Solr Reference 
>> > Guide also gives info elated to so-called master-slave mode:
>> > https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and
>> > +
>> > Di
>> > stribution
>> >
>> > Also, although the old master-slave mode is still technically 
>> > supported in the sense that the code and doc is still there, You 
>> > won't be able to get the level of community support  here on the 
>> > mailing list as you can get for SolrCloud.
>> >
>> > Unless you're simply trying to decide whether to leave an old 
>> > legacy system as-is with the old distributed mode, nobody should be 
>> > considered a fresh new distributed Solr deployment with anything 
>> > other
>> than SolrCloud.
>> >
>> > (Hmmm... have any of the committers considered deprecating the old 
>> > non-SolrCloud distributed mode features?)
>>
>> -1
>>
>> >
>> > -- Jack Krupansky
>> >
>> > On Wed, Jan 13, 2016 at 9:02 AM, Shiv
> aji Dutta
>> > <sd...@hortonworks.com>
>> > wrote:
>> >
>> >> - SolrCloud uses zookeeper to manage HA
>> >>         - Zookeeper is a standard for all HA in Apache Hadoop
>> >> - You have collections which will manage your shards across nodes
>> >> - SolrJ Client is now fault tolerant with CloudSolrClient
>> >>
>> >> This is the way future direction of the product will go.
>> >>
>> >>
>> >>
>> >> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
>> >> <al...@nablasoft.com> wrote:
>> >>
>> >>> Thanks.
>> >>>
>> >>> --
>> >>> Gian Maria Ricci
>> >>> Cell: +39 320 0136949
>> >>>
>> >>>
>> >>>
>> >>> -----Original Message-----
>> >>> From: Shawn Heisey [mailto:apache@elyograg.org]
>> >>> Sent: lunedì 11 gennaio 2016 18:28
>> >>> To: solr-user@lucene.apache.org
>> >>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master 
>> >>> Slave Replica
>> >>>
>> >>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> >>>> a customer need a comprehensive list of all pro and cons of 
>> >>>> using
>
>> >>>> standard Master Slave replica VS using Solr Cloud. I¹m 
>> >>>> interested especially in query performance consideration, 
>> >>>> because in this specific situation the rate of new documents is 
>> >>>> really slow, but the amount of data is about 50 millions of 
>> >>>> document, and the index size on disk for single core is about 30 GB.
>> >>>
>> >>> The primary advantage to SolrCloud is that SolrCloud handles most 
>> >>> of the administrative and operational details for you automatically.
>> >>>
>> >>> SolrCloud is a little more complicated to set up initially, 
>> >>> because you must worry about Zookeeper as well as Solr, but once 
>> >>> it's properly set up, there is no single point of failure.
>> >>>
>> >>>> Such amount of data should be easily handled by a Master Slave 
>> >>>> replica with a  single core replicated on a certain number of 
>> >>>> slaves, but we need to evaluate also the option of SolrCloud, 
>> >>>> especially for fault tolerance.
>> >>>>
>> >>>
>> >>> Once you're beyond in
> itial setup, fault tolerance with SolrCloud is
>> >>> much easier than master/slave replication.  Switching a slave to 
>> >>> a master is possible, but the procedure is somewhat complicated.
>> >>> SolrCloud does not
>> >>> *have* masters, it is a true cluster.
>> >>>
>> >>> With master/slave replication, the master handles all indexing, 
>> >>> and the finished index segments are copied to the slaves via 
>> >>> HTTP, and the slaves simply need to open them.  SolrCloud does 
>> >>> indexing on all shard replicas, nearly simultaneously.  Usually 
>> >>> this is an advantage, not a disadvantage, but in heavy indexing 
>> >>> situations master/slave replication
>> >>> *might* show better performance on the slaves.
>> >>>
>> >>> Thanks,
>> >>> Shawn
>> >>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symm
> etry?"
>
> William Blake - Songs of Experience -1794 England

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Erick Erickson <er...@gmail.com>.

re: SolrCloud backup/restore: https://issues.apache.org/jira/browse/SOLR-5750

not committed yet, but getting attention.



On Thu, Jan 14, 2016 at 6:19 AM, Gian Maria Ricci - aka Alkampfer
<al...@nablasoft.com> wrote:
> Actually there are situation where a restore is needed, suppose that someone does some error and deletes all documents from a collection, or maybe deletes a series of document, etc. I know that this is not likely to happen, but in mission critical enterprise system, we always need a detailed procedure for disaster recovering.
>
> For such scenario we need to plan the worst case, where everything is lost.
>
> With Master Slave is just a matter of recreating machines, reconfigure the core, and restore a backup, and the game is done, with SolrCloud is not really clear for me how can I backup / restore data. From what I've found in the internet I need to backup every shard of the collection, and, if we need to restore everything from a backup, we can recreate the collection and then restore all the individual shards. I do not know if this is a supported scenario / procedure, but theoretically it could work.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -----Original Message-----
> From: Alessandro Benedetti [mailto:abenedetti@apache.org]
> Sent: giovedì 14 gennaio 2016 10:46
> To: solr-user@lucene.apache.org
> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica
>
> It's true that SolrCloud is adding some complexity.
> But few observations :
>
> SolrCloud has some disadvantages and c
> an't beat the easiness and simpleness
>> of
>> Master Slave Replica. So I can only encourage to keep Master Slave
>> Replica in future versions.
>
>
> I agree, it can happen situations when you have really simple and not critical systems.
> Anyway old style replication is still used in SolrCloud, so I think it is going to stay for a while ( until is replaced with something else) .
>
> To answer to Gian :
>
> One of the problem I've found is that I've not found a simple way to backup
>> the content of a collection to restore in situation of disaster recovery.
>> With simple master / slave scenario we can use the replication handler
>> to generate backups that can be easily used to restore content of a
>> core, while with SolrCloud is not clear how can we obtain a full
>> backup
>
>
> To be fair, Disaster recovery is when SolrCloud shines.
> If you lose random nodes across your collection, you simply need to fix them and spin up again .
> The system will automatically restore the content to the last version availa ble ( the tlog first and the  leader ( if the tlog is not enough) will help the dead node to catch up .
> If you lose all the replicas for a shard and you lose the content in disk of all this replicas ( index and tlog), SolrCloud can't help you.
> For this unlikely scenarios a backup is suggested.
> You could restore anyway the backup only to one node, and the replicas are going to catch up .
>
> Probably is just a matter of backupping every shard with standard
>> replication handler and then restore each shard after recreating the
>> collection
>
>
> Definitely not, SolrCloud is there to avoid this manual stuff.
>
> Cheers
>
>
> On 14 January 2016 at 08:58, Gian Maria Ricci - aka Alkampfer < alkampfer@nablasoft.com> wrote:
>
>> I agree that SolrCloud has not only advantages, I really understand
>> that it offers many more features, but it introduces some complexity.
>>
>> One of the problem I've found is that I've not found a simple way to
>> backup the content of a collection to restore in situation of disaste
> r
>> recovery. With simple master / slave scenario we can use the
>> replication handler to generate backups that can be easily used to
>> restore content of a core, while with SolrCloud is not clear how can we obtain a full backup.
>> Probably is just a matter of backupping every shard with standard
>> replication handler and then restore each shard after recreating the
>> collection, but I've not found (probably I need to search better)
>> official documentation on backup / restore procedures for SolrCloud.
>>
>> Thanks.
>>
>> --
>> Gian Maria Ricci
>> Cell: +39 320 0136949
>>
>>
>> -----Original Message-----
>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>> Sent: giovedì 14 gennaio 2016 08:22
>> To: solr-user@lucene.apache.org
>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
>> Replica
>>
>> SolrCloud has some disadvantages and can't beat the easiness and
>> simpleness of Master Slave Replica. So I can only encourage to keep
>> Master Slave Replica in
> future versions.
>>
>> Bernd
>>
>> Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
>> > The "Legacy Scaling and Distribution" section of the Solr Reference
>> > Guide also gives info elated to so-called master-slave mode:
>> > https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and+
>> > Di
>> > stribution
>> >
>> > Also, although the old master-slave mode is still technically
>> > supported in the sense that the code and doc is still there, You
>> > won't be able to get the level of community support  here on the
>> > mailing list as you can get for SolrCloud.
>> >
>> > Unless you're simply trying to decide whether to leave an old legacy
>> > system as-is with the old distributed mode, nobody should be
>> > considered a fresh new distributed Solr deployment with anything
>> > other
>> than SolrCloud.
>> >
>> > (Hmmm... have any of the committers considered deprecating the old
>> > non-SolrCloud distributed mode features?)
>>
>> -1
>>
>> >
>> > -- Jack Krupansky
>> >
>> > On Wed, Jan 13, 2016 at 9:02 AM, Shiv
> aji Dutta
>> > <sd...@hortonworks.com>
>> > wrote:
>> >
>> >> - SolrCloud uses zookeeper to manage HA
>> >>         - Zookeeper is a standard for all HA in Apache Hadoop
>> >> - You have collections which will manage your shards across nodes
>> >> - SolrJ Client is now fault tolerant with CloudSolrClient
>> >>
>> >> This is the way future direction of the product will go.
>> >>
>> >>
>> >>
>> >> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
>> >> <al...@nablasoft.com> wrote:
>> >>
>> >>> Thanks.
>> >>>
>> >>> --
>> >>> Gian Maria Ricci
>> >>> Cell: +39 320 0136949
>> >>>
>> >>>
>> >>>
>> >>> -----Original Message-----
>> >>> From: Shawn Heisey [mailto:apache@elyograg.org]
>> >>> Sent: lunedì 11 gennaio 2016 18:28
>> >>> To: solr-user@lucene.apache.org
>> >>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master
>> >>> Slave Replica
>> >>>
>> >>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> >>>> a customer need a comprehensive list of all pro and cons of using
>
>> >>>> standard Master Slave replica VS using Solr Cloud. I¹m interested
>> >>>> especially in query performance consideration, because in this
>> >>>> specific situation the rate of new documents is really slow, but
>> >>>> the amount of data is about 50 millions of document, and the
>> >>>> index size on disk for single core is about 30 GB.
>> >>>
>> >>> The primary advantage to SolrCloud is that SolrCloud handles most
>> >>> of the administrative and operational details for you automatically.
>> >>>
>> >>> SolrCloud is a little more complicated to set up initially,
>> >>> because you must worry about Zookeeper as well as Solr, but once
>> >>> it's properly set up, there is no single point of failure.
>> >>>
>> >>>> Such amount of data should be easily handled by a Master Slave
>> >>>> replica with a  single core replicated on a certain number of
>> >>>> slaves, but we need to evaluate also the option of SolrCloud,
>> >>>> especially for fault tolerance.
>> >>>>
>> >>>
>> >>> Once you're beyond in
> itial setup, fault tolerance with SolrCloud is
>> >>> much easier than master/slave replication.  Switching a slave to a
>> >>> master is possible, but the procedure is somewhat complicated.
>> >>> SolrCloud does not
>> >>> *have* masters, it is a true cluster.
>> >>>
>> >>> With master/slave replication, the master handles all indexing,
>> >>> and the finished index segments are copied to the slaves via HTTP,
>> >>> and the slaves simply need to open them.  SolrCloud does indexing
>> >>> on all shard replicas, nearly simultaneously.  Usually this is an
>> >>> advantage, not a disadvantage, but in heavy indexing situations
>> >>> master/slave replication
>> >>> *might* show better performance on the slaves.
>> >>>
>> >>> Thanks,
>> >>> Shawn
>> >>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symm
> etry?"
>
> William Blake - Songs of Experience -1794 England

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Alessandro Benedetti <ab...@apache.org>.

The issue linked by Erick is really interesting.
Gia, to answer to your further question :

For such scenario we need to plan the worst case, where everything is lost.
> With Master Slave is just a matter of recreating machines, reconfigure the
> core, and restore a backup, and the game is done, with SolrCloud is not
> really clear for me how can I backup / restore data. From what I've found
> in the internet I need to backup every shard of the collection, and, if we
> need to restore everything from a backup, we can recreate the collection
> and then restore all the individual shards. I do not know if this is a
> supported scenario / procedure, but theoretically it could work.


So we are in the worst case, you lost everything.
But you were doing the backup for each shard periodically.
You create again the collection, and you restore the backup on each leader.
Then all the replicas are going to catch up automatically.
And old style replication will happen under the hood.
I don't see this as an additional disadvantage of SolrCloud , I think we
are still talking about the "disadvantage" of a longer and more complicated
initial setup.

With the issue Erick mentioned we are going to be potentially able to make
that part as easy as possible.

Cheers

On 14 January 2016 at 14:19, Gian Maria Ricci - aka Alkampfer <
alkampfer@nablasoft.com> wrote:

> Actually there are situation where a restore is needed, suppose that
> someone does some error and deletes all documents from a collection, or
> maybe deletes a series of document, etc. I know that this is not likely to
> happen, but in mission critical enterprise system, we always need a
> detailed procedure for disaster recovering.
>
> For such scenario we need to plan the worst case, where everything is lost.
>
> With Master Slave is just a matter of recreating machines, reconfigure the
> core, and restore a backup, and the game is done, with SolrCloud is not
> really clear for me how can I backup / restore data. From what I've found
> in the internet I need to backup every shard of the collection, and, if we
> need to restore everything from a backup, we can recreate the collection
> and then restore all the individual shards. I do not know if this is a
> supported scenario / procedure, but theoretically it could work.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -----Original Message-----
> From: Alessandro Benedetti [mailto:abenedetti@apache.org]
> Sent: giovedì 14 gennaio 2016 10:46
> To: solr-user@lucene.apache.org
> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
> Replica
>
> It's true that SolrCloud is adding some complexity.
> But few observations :
>
> SolrCloud has some disadvantages and c
> an't beat the easiness and simpleness
> > of
> > Master Slave Replica. So I can only encourage to keep Master Slave
> > Replica in future versions.
>
>
> I agree, it can happen situations when you have really simple and not
> critical systems.
> Anyway old style replication is still used in SolrCloud, so I think it is
> going to stay for a while ( until is replaced with something else) .
>
> To answer to Gian :
>
> One of the problem I've found is that I've not found a simple way to backup
> > the content of a collection to restore in situation of disaster recovery.
> > With simple master / slave scenario we can use the replication handler
> > to generate backups that can be easily used to restore content of a
> > core, while with SolrCloud is not clear how can we obtain a full
> > backup
>
>
> To be fair, Disaster recovery is when SolrCloud shines.
> If you lose random nodes across your collection, you simply need to fix
> them and spin up again .
> The system will automatically restore the content to the last version
> availa ble ( the tlog first and the  leader ( if the tlog is not enough)
> will help the dead node to catch up .
> If you lose all the replicas for a shard and you lose the content in disk
> of all this replicas ( index and tlog), SolrCloud can't help you.
> For this unlikely scenarios a backup is suggested.
> You could restore anyway the backup only to one node, and the replicas are
> going to catch up .
>
> Probably is just a matter of backupping every shard with standard
> > replication handler and then restore each shard after recreating the
> > collection
>
>
> Definitely not, SolrCloud is there to avoid this manual stuff.
>
> Cheers
>
>
> On 14 January 2016 at 08:58, Gian Maria Ricci - aka Alkampfer <
> alkampfer@nablasoft.com> wrote:
>
> > I agree that SolrCloud has not only advantages, I really understand
> > that it offers many more features, but it introduces some complexity.
> >
> > One of the problem I've found is that I've not found a simple way to
> > backup the content of a collection to restore in situation of disaste
> r
> > recovery. With simple master / slave scenario we can use the
> > replication handler to generate backups that can be easily used to
> > restore content of a core, while with SolrCloud is not clear how can we
> obtain a full backup.
> > Probably is just a matter of backupping every shard with standard
> > replication handler and then restore each shard after recreating the
> > collection, but I've not found (probably I need to search better)
> > official documentation on backup / restore procedures for SolrCloud.
> >
> > Thanks.
> >
> > --
> > Gian Maria Ricci
> > Cell: +39 320 0136949
> >
> >
> > -----Original Message-----
> > From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
> > Sent: giovedì 14 gennaio 2016 08:22
> > To: solr-user@lucene.apache.org
> > Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
> > Replica
> >
> > SolrCloud has some disadvantages and can't beat the easiness and
> > simpleness of Master Slave Replica. So I can only encourage to keep
> > Master Slave Replica in
> future versions.
> >
> > Bernd
> >
> > Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
> > > The "Legacy Scaling and Distribution" section of the Solr Reference
> > > Guide also gives info elated to so-called master-slave mode:
> > > https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and+
> > > Di
> > > stribution
> > >
> > > Also, although the old master-slave mode is still technically
> > > supported in the sense that the code and doc is still there, You
> > > won't be able to get the level of community support  here on the
> > > mailing list as you can get for SolrCloud.
> > >
> > > Unless you're simply trying to decide whether to leave an old legacy
> > > system as-is with the old distributed mode, nobody should be
> > > considered a fresh new distributed Solr deployment with anything
> > > other
> > than SolrCloud.
> > >
> > > (Hmmm... have any of the committers considered deprecating the old
> > > non-SolrCloud distributed mode features?)
> >
> > -1
> >
> > >
> > > -- Jack Krupansky
> > >
> > > On Wed, Jan 13, 2016 at 9:02 AM, Shiv
> aji Dutta
> > > <sd...@hortonworks.com>
> > > wrote:
> > >
> > >> - SolrCloud uses zookeeper to manage HA
> > >>         - Zookeeper is a standard for all HA in Apache Hadoop
> > >> - You have collections which will manage your shards across nodes
> > >> - SolrJ Client is now fault tolerant with CloudSolrClient
> > >>
> > >> This is the way future direction of the product will go.
> > >>
> > >>
> > >>
> > >> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
> > >> <al...@nablasoft.com> wrote:
> > >>
> > >>> Thanks.
> > >>>
> > >>> --
> > >>> Gian Maria Ricci
> > >>> Cell: +39 320 0136949
> > >>>
> > >>>
> > >>>
> > >>> -----Original Message-----
> > >>> From: Shawn Heisey [mailto:apache@elyograg.org]
> > >>> Sent: lunedì 11 gennaio 2016 18:28
> > >>> To: solr-user@lucene.apache.org
> > >>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master
> > >>> Slave Replica
> > >>>
> > >>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> > >>>> a customer need a comprehensive list of all pro and cons of using
>
> > >>>> standard Master Slave replica VS using Solr Cloud. I¹m interested
> > >>>> especially in query performance consideration, because in this
> > >>>> specific situation the rate of new documents is really slow, but
> > >>>> the amount of data is about 50 millions of document, and the
> > >>>> index size on disk for single core is about 30 GB.
> > >>>
> > >>> The primary advantage to SolrCloud is that SolrCloud handles most
> > >>> of the administrative and operational details for you automatically.
> > >>>
> > >>> SolrCloud is a little more complicated to set up initially,
> > >>> because you must worry about Zookeeper as well as Solr, but once
> > >>> it's properly set up, there is no single point of failure.
> > >>>
> > >>>> Such amount of data should be easily handled by a Master Slave
> > >>>> replica with a  single core replicated on a certain number of
> > >>>> slaves, but we need to evaluate also the option of SolrCloud,
> > >>>> especially for fault tolerance.
> > >>>>
> > >>>
> > >>> Once you're beyond in
> itial setup, fault tolerance with SolrCloud is
> > >>> much easier than master/slave replication.  Switching a slave to a
> > >>> master is possible, but the procedure is somewhat complicated.
> > >>> SolrCloud does not
> > >>> *have* masters, it is a true cluster.
> > >>>
> > >>> With master/slave replication, the master handles all indexing,
> > >>> and the finished index segments are copied to the slaves via HTTP,
> > >>> and the slaves simply need to open them.  SolrCloud does indexing
> > >>> on all shard replicas, nearly simultaneously.  Usually this is an
> > >>> advantage, not a disadvantage, but in heavy indexing situations
> > >>> master/slave replication
> > >>> *might* show better performance on the slaves.
> > >>>
> > >>> Thanks,
> > >>> Shawn
> > >>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symm
> etry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Gian Maria Ricci - aka Alkampfer <al...@nablasoft.com>.

Actually there are situation where a restore is needed, suppose that someone does some error and deletes all documents from a collection, or maybe deletes a series of document, etc. I know that this is not likely to happen, but in mission critical enterprise system, we always need a detailed procedure for disaster recovering. 

For such scenario we need to plan the worst case, where everything is lost. 

With Master Slave is just a matter of recreating machines, reconfigure the core, and restore a backup, and the game is done, with SolrCloud is not really clear for me how can I backup / restore data. From what I've found in the internet I need to backup every shard of the collection, and, if we need to restore everything from a backup, we can recreate the collection and then restore all the individual shards. I do not know if this is a supported scenario / procedure, but theoretically it could work.

--
Gian Maria Ricci
Cell: +39 320 0136949
    


-----Original Message-----
From: Alessandro Benedetti [mailto:abenedetti@apache.org] 
Sent: giovedì 14 gennaio 2016 10:46
To: solr-user@lucene.apache.org
Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

It's true that SolrCloud is adding some complexity.
But few observations :

SolrCloud has some disadvantages and c
an't beat the easiness and simpleness
> of
> Master Slave Replica. So I can only encourage to keep Master Slave 
> Replica in future versions.


I agree, it can happen situations when you have really simple and not critical systems.
Anyway old style replication is still used in SolrCloud, so I think it is going to stay for a while ( until is replaced with something else) .

To answer to Gian :

One of the problem I've found is that I've not found a simple way to backup
> the content of a collection to restore in situation of disaster recovery.
> With simple master / slave scenario we can use the replication handler 
> to generate backups that can be easily used to restore content of a 
> core, while with SolrCloud is not clear how can we obtain a full 
> backup


To be fair, Disaster recovery is when SolrCloud shines.
If you lose random nodes across your collection, you simply need to fix them and spin up again .
The system will automatically restore the content to the last version availa ble ( the tlog first and the  leader ( if the tlog is not enough) will help the dead node to catch up .
If you lose all the replicas for a shard and you lose the content in disk of all this replicas ( index and tlog), SolrCloud can't help you.
For this unlikely scenarios a backup is suggested.
You could restore anyway the backup only to one node, and the replicas are going to catch up .

Probably is just a matter of backupping every shard with standard
> replication handler and then restore each shard after recreating the 
> collection


Definitely not, SolrCloud is there to avoid this manual stuff.

Cheers


On 14 January 2016 at 08:58, Gian Maria Ricci - aka Alkampfer < alkampfer@nablasoft.com> wrote:

> I agree that SolrCloud has not only advantages, I really understand 
> that it offers many more features, but it introduces some complexity.
>
> One of the problem I've found is that I've not found a simple way to 
> backup the content of a collection to restore in situation of disaste
r
> recovery. With simple master / slave scenario we can use the 
> replication handler to generate backups that can be easily used to 
> restore content of a core, while with SolrCloud is not clear how can we obtain a full backup.
> Probably is just a matter of backupping every shard with standard 
> replication handler and then restore each shard after recreating the 
> collection, but I've not found (probably I need to search better) 
> official documentation on backup / restore procedures for SolrCloud.
>
> Thanks.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
> -----Original Message-----
> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
> Sent: giovedì 14 gennaio 2016 08:22
> To: solr-user@lucene.apache.org
> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave 
> Replica
>
> SolrCloud has some disadvantages and can't beat the easiness and 
> simpleness of Master Slave Replica. So I can only encourage to keep 
> Master Slave Replica in
future versions.
>
> Bernd
>
> Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
> > The "Legacy Scaling and Distribution" section of the Solr Reference 
> > Guide also gives info elated to so-called master-slave mode:
> > https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and+
> > Di
> > stribution
> >
> > Also, although the old master-slave mode is still technically 
> > supported in the sense that the code and doc is still there, You 
> > won't be able to get the level of community support  here on the 
> > mailing list as you can get for SolrCloud.
> >
> > Unless you're simply trying to decide whether to leave an old legacy 
> > system as-is with the old distributed mode, nobody should be 
> > considered a fresh new distributed Solr deployment with anything 
> > other
> than SolrCloud.
> >
> > (Hmmm... have any of the committers considered deprecating the old 
> > non-SolrCloud distributed mode features?)
>
> -1
>
> >
> > -- Jack Krupansky
> >
> > On Wed, Jan 13, 2016 at 9:02 AM, Shiv
aji Dutta
> > <sd...@hortonworks.com>
> > wrote:
> >
> >> - SolrCloud uses zookeeper to manage HA
> >>         - Zookeeper is a standard for all HA in Apache Hadoop
> >> - You have collections which will manage your shards across nodes
> >> - SolrJ Client is now fault tolerant with CloudSolrClient
> >>
> >> This is the way future direction of the product will go.
> >>
> >>
> >>
> >> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
> >> <al...@nablasoft.com> wrote:
> >>
> >>> Thanks.
> >>>
> >>> --
> >>> Gian Maria Ricci
> >>> Cell: +39 320 0136949
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Shawn Heisey [mailto:apache@elyograg.org]
> >>> Sent: lunedì 11 gennaio 2016 18:28
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master 
> >>> Slave Replica
> >>>
> >>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> >>>> a customer need a comprehensive list of all pro and cons of using

> >>>> standard Master Slave replica VS using Solr Cloud. I¹m interested 
> >>>> especially in query performance consideration, because in this 
> >>>> specific situation the rate of new documents is really slow, but 
> >>>> the amount of data is about 50 millions of document, and the 
> >>>> index size on disk for single core is about 30 GB.
> >>>
> >>> The primary advantage to SolrCloud is that SolrCloud handles most 
> >>> of the administrative and operational details for you automatically.
> >>>
> >>> SolrCloud is a little more complicated to set up initially, 
> >>> because you must worry about Zookeeper as well as Solr, but once 
> >>> it's properly set up, there is no single point of failure.
> >>>
> >>>> Such amount of data should be easily handled by a Master Slave 
> >>>> replica with a  single core replicated on a certain number of 
> >>>> slaves, but we need to evaluate also the option of SolrCloud, 
> >>>> especially for fault tolerance.
> >>>>
> >>>
> >>> Once you're beyond in
itial setup, fault tolerance with SolrCloud is
> >>> much easier than master/slave replication.  Switching a slave to a 
> >>> master is possible, but the procedure is somewhat complicated.
> >>> SolrCloud does not
> >>> *have* masters, it is a true cluster.
> >>>
> >>> With master/slave replication, the master handles all indexing, 
> >>> and the finished index segments are copied to the slaves via HTTP, 
> >>> and the slaves simply need to open them.  SolrCloud does indexing 
> >>> on all shard replicas, nearly simultaneously.  Usually this is an 
> >>> advantage, not a disadvantage, but in heavy indexing situations 
> >>> master/slave replication
> >>> *might* show better performance on the slaves.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>>
> >>
> >>
> >
>
>


--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symm
etry?"

William Blake - Songs of Experience -1794 England

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Alessandro Benedetti <ab...@apache.org>.

It's true that SolrCloud is adding some complexity.
But few observations :

SolrCloud has some disadvantages and can't beat the easiness and simpleness
> of
> Master Slave Replica. So I can only encourage to keep Master Slave Replica
> in future versions.


I agree, it can happen situations when you have really simple and not
critical systems.
Anyway old style replication is still used in SolrCloud, so I think it is
going to stay for a while ( until is replaced with something else) .

To answer to Gian :

One of the problem I've found is that I've not found a simple way to backup
> the content of a collection to restore in situation of disaster recovery.
> With simple master / slave scenario we can use the replication handler to
> generate backups that can be easily used to restore content of a core,
> while with SolrCloud is not clear how can we obtain a full backup


To be fair, Disaster recovery is when SolrCloud shines.
If you lose random nodes across your collection, you simply need to fix
them and spin up again .
The system will automatically restore the content to the last version
available ( the tlog first and the  leader ( if the tlog is not enough)
will help the dead node to catch up .
If you lose all the replicas for a shard and you lose the content in disk
of all this replicas ( index and tlog), SolrCloud can't help you.
For this unlikely scenarios a backup is suggested.
You could restore anyway the backup only to one node, and the replicas are
going to catch up .

Probably is just a matter of backupping every shard with standard
> replication handler and then restore each shard after recreating the
> collection


Definitely not, SolrCloud is there to avoid this manual stuff.

Cheers


On 14 January 2016 at 08:58, Gian Maria Ricci - aka Alkampfer <
alkampfer@nablasoft.com> wrote:

> I agree that SolrCloud has not only advantages, I really understand that
> it offers many more features, but it introduces some complexity.
>
> One of the problem I've found is that I've not found a simple way to
> backup the content of a collection to restore in situation of disaster
> recovery. With simple master / slave scenario we can use the replication
> handler to generate backups that can be easily used to restore content of a
> core, while with SolrCloud is not clear how can we obtain a full backup.
> Probably is just a matter of backupping every shard with standard
> replication handler and then restore each shard after recreating the
> collection, but I've not found (probably I need to search better) official
> documentation on backup / restore procedures for SolrCloud.
>
> Thanks.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
> -----Original Message-----
> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
> Sent: giovedì 14 gennaio 2016 08:22
> To: solr-user@lucene.apache.org
> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
> Replica
>
> SolrCloud has some disadvantages and can't beat the easiness and
> simpleness of Master Slave Replica. So I can only encourage to keep Master
> Slave Replica in future versions.
>
> Bernd
>
> Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
> > The "Legacy Scaling and Distribution" section of the Solr Reference
> > Guide also gives info elated to so-called master-slave mode:
> > https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and+Di
> > stribution
> >
> > Also, although the old master-slave mode is still technically
> > supported in the sense that the code and doc is still there, You won't
> > be able to get the level of community support  here on the mailing
> > list as you can get for SolrCloud.
> >
> > Unless you're simply trying to decide whether to leave an old legacy
> > system as-is with the old distributed mode, nobody should be
> > considered a fresh new distributed Solr deployment with anything other
> than SolrCloud.
> >
> > (Hmmm... have any of the committers considered deprecating the old
> > non-SolrCloud distributed mode features?)
>
> -1
>
> >
> > -- Jack Krupansky
> >
> > On Wed, Jan 13, 2016 at 9:02 AM, Shivaji Dutta
> > <sd...@hortonworks.com>
> > wrote:
> >
> >> - SolrCloud uses zookeeper to manage HA
> >>         - Zookeeper is a standard for all HA in Apache Hadoop
> >> - You have collections which will manage your shards across nodes
> >> - SolrJ Client is now fault tolerant with CloudSolrClient
> >>
> >> This is the way future direction of the product will go.
> >>
> >>
> >>
> >> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
> >> <al...@nablasoft.com> wrote:
> >>
> >>> Thanks.
> >>>
> >>> --
> >>> Gian Maria Ricci
> >>> Cell: +39 320 0136949
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Shawn Heisey [mailto:apache@elyograg.org]
> >>> Sent: lunedì 11 gennaio 2016 18:28
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master
> >>> Slave Replica
> >>>
> >>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> >>>> a customer need a comprehensive list of all pro and cons of using
> >>>> standard Master Slave replica VS using Solr Cloud. I¹m interested
> >>>> especially in query performance consideration, because in this
> >>>> specific situation the rate of new documents is really slow, but
> >>>> the amount of data is about 50 millions of document, and the index
> >>>> size on disk for single core is about 30 GB.
> >>>
> >>> The primary advantage to SolrCloud is that SolrCloud handles most of
> >>> the administrative and operational details for you automatically.
> >>>
> >>> SolrCloud is a little more complicated to set up initially, because
> >>> you must worry about Zookeeper as well as Solr, but once it's
> >>> properly set up, there is no single point of failure.
> >>>
> >>>> Such amount of data should be easily handled by a Master Slave
> >>>> replica with a  single core replicated on a certain number of
> >>>> slaves, but we need to evaluate also the option of SolrCloud,
> >>>> especially for fault tolerance.
> >>>>
> >>>
> >>> Once you're beyond initial setup, fault tolerance with SolrCloud is
> >>> much easier than master/slave replication.  Switching a slave to a
> >>> master is possible, but the procedure is somewhat complicated.
> >>> SolrCloud does not
> >>> *have* masters, it is a true cluster.
> >>>
> >>> With master/slave replication, the master handles all indexing, and
> >>> the finished index segments are copied to the slaves via HTTP, and
> >>> the slaves simply need to open them.  SolrCloud does indexing on all
> >>> shard replicas, nearly simultaneously.  Usually this is an
> >>> advantage, not a disadvantage, but in heavy indexing situations
> >>> master/slave replication
> >>> *might* show better performance on the slaves.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>>
> >>
> >>
> >
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Gian Maria Ricci - aka Alkampfer <al...@nablasoft.com>.

I agree that SolrCloud has not only advantages, I really understand that it offers many more features, but it introduces some complexity. 

One of the problem I've found is that I've not found a simple way to backup the content of a collection to restore in situation of disaster recovery. With simple master / slave scenario we can use the replication handler to generate backups that can be easily used to restore content of a core, while with SolrCloud is not clear how can we obtain a full backup. Probably is just a matter of backupping every shard with standard replication handler and then restore each shard after recreating the collection, but I've not found (probably I need to search better) official documentation on backup / restore procedures for SolrCloud.

Thanks.

--
Gian Maria Ricci
Cell: +39 320 0136949
    

-----Original Message-----
From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de] 
Sent: giovedì 14 gennaio 2016 08:22
To: solr-user@lucene.apache.org
Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

SolrCloud has some disadvantages and can't beat the easiness and simpleness of Master Slave Replica. So I can only encourage to keep Master Slave Replica in future versions.

Bernd

Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
> The "Legacy Scaling and Distribution" section of the Solr Reference 
> Guide also gives info elated to so-called master-slave mode:
> https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and+Di
> stribution
> 
> Also, although the old master-slave mode is still technically 
> supported in the sense that the code and doc is still there, You won't 
> be able to get the level of community support  here on the mailing 
> list as you can get for SolrCloud.
> 
> Unless you're simply trying to decide whether to leave an old legacy 
> system as-is with the old distributed mode, nobody should be 
> considered a fresh new distributed Solr deployment with anything other than SolrCloud.
> 
> (Hmmm... have any of the committers considered deprecating the old 
> non-SolrCloud distributed mode features?)

-1

> 
> -- Jack Krupansky
> 
> On Wed, Jan 13, 2016 at 9:02 AM, Shivaji Dutta 
> <sd...@hortonworks.com>
> wrote:
> 
>> - SolrCloud uses zookeeper to manage HA
>>         - Zookeeper is a standard for all HA in Apache Hadoop
>> - You have collections which will manage your shards across nodes
>> - SolrJ Client is now fault tolerant with CloudSolrClient
>>
>> This is the way future direction of the product will go.
>>
>>
>>
>> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
>> <al...@nablasoft.com> wrote:
>>
>>> Thanks.
>>>
>>> --
>>> Gian Maria Ricci
>>> Cell: +39 320 0136949
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Shawn Heisey [mailto:apache@elyograg.org]
>>> Sent: lunedì 11 gennaio 2016 18:28
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master 
>>> Slave Replica
>>>
>>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
>>>> a customer need a comprehensive list of all pro and cons of using 
>>>> standard Master Slave replica VS using Solr Cloud. I¹m interested 
>>>> especially in query performance consideration, because in this 
>>>> specific situation the rate of new documents is really slow, but 
>>>> the amount of data is about 50 millions of document, and the index 
>>>> size on disk for single core is about 30 GB.
>>>
>>> The primary advantage to SolrCloud is that SolrCloud handles most of 
>>> the administrative and operational details for you automatically.
>>>
>>> SolrCloud is a little more complicated to set up initially, because 
>>> you must worry about Zookeeper as well as Solr, but once it's 
>>> properly set up, there is no single point of failure.
>>>
>>>> Such amount of data should be easily handled by a Master Slave 
>>>> replica with a  single core replicated on a certain number of 
>>>> slaves, but we need to evaluate also the option of SolrCloud, 
>>>> especially for fault tolerance.
>>>>
>>>
>>> Once you're beyond initial setup, fault tolerance with SolrCloud is 
>>> much easier than master/slave replication.  Switching a slave to a 
>>> master is possible, but the procedure is somewhat complicated.  
>>> SolrCloud does not
>>> *have* masters, it is a true cluster.
>>>
>>> With master/slave replication, the master handles all indexing, and 
>>> the finished index segments are copied to the slaves via HTTP, and 
>>> the slaves simply need to open them.  SolrCloud does indexing on all 
>>> shard replicas, nearly simultaneously.  Usually this is an 
>>> advantage, not a disadvantage, but in heavy indexing situations 
>>> master/slave replication
>>> *might* show better performance on the slaves.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>>
>

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Bernd Fehling <be...@uni-bielefeld.de>.

SolrCloud has some disadvantages and can't beat the easiness and simpleness of
Master Slave Replica. So I can only encourage to keep Master Slave Replica
in future versions.

Bernd

Am 13.01.2016 um 21:57 schrieb Jack Krupansky:
> The "Legacy Scaling and Distribution" section of the Solr Reference Guide
> also gives info elated to so-called master-slave mode:
> https://cwiki.apache.org/confluence/display/solr/Legacy+Scaling+and+Distribution
> 
> Also, although the old master-slave mode is still technically supported in
> the sense that the code and doc is still there, You won't be able to get
> the level of community support  here on the mailing list as you can get for
> SolrCloud.
> 
> Unless you're simply trying to decide whether to leave an old legacy system
> as-is with the old distributed mode, nobody should be considered a fresh
> new distributed Solr deployment with anything other than SolrCloud.
> 
> (Hmmm... have any of the committers considered deprecating the old
> non-SolrCloud distributed mode features?)

-1

> 
> -- Jack Krupansky
> 
> On Wed, Jan 13, 2016 at 9:02 AM, Shivaji Dutta <sd...@hortonworks.com>
> wrote:
> 
>> - SolrCloud uses zookeeper to manage HA
>>         - Zookeeper is a standard for all HA in Apache Hadoop
>> - You have collections which will manage your shards across nodes
>> - SolrJ Client is now fault tolerant with CloudSolrClient
>>
>> This is the way future direction of the product will go.
>>
>>
>>
>> On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
>> <al...@nablasoft.com> wrote:
>>
>>> Thanks.
>>>
>>> --
>>> Gian Maria Ricci
>>> Cell: +39 320 0136949
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Shawn Heisey [mailto:apache@elyograg.org]
>>> Sent: lunedì 11 gennaio 2016 18:28
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
>>> Replica
>>>
>>> On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
>>>> a customer need a comprehensive list of all pro and cons of using
>>>> standard Master Slave replica VS using Solr Cloud. I¹m interested
>>>> especially in query performance consideration, because in this
>>>> specific situation the rate of new documents is really slow, but the
>>>> amount of data is about 50 millions of document, and the index size on
>>>> disk for single core is about 30 GB.
>>>
>>> The primary advantage to SolrCloud is that SolrCloud handles most of the
>>> administrative and operational details for you automatically.
>>>
>>> SolrCloud is a little more complicated to set up initially, because you
>>> must worry about Zookeeper as well as Solr, but once it's properly set
>>> up, there is no single point of failure.
>>>
>>>> Such amount of data should be easily handled by a Master Slave replica
>>>> with a  single core replicated on a certain number of slaves, but we
>>>> need to evaluate also the option of SolrCloud, especially for fault
>>>> tolerance.
>>>>
>>>
>>> Once you're beyond initial setup, fault tolerance with SolrCloud is much
>>> easier than master/slave replication.  Switching a slave to a master is
>>> possible, but the procedure is somewhat complicated.  SolrCloud does not
>>> *have* masters, it is a true cluster.
>>>
>>> With master/slave replication, the master handles all indexing, and the
>>> finished index segments are copied to the slaves via HTTP, and the slaves
>>> simply need to open them.  SolrCloud does indexing on all shard replicas,
>>> nearly simultaneously.  Usually this is an advantage, not a disadvantage,
>>> but in heavy indexing situations master/slave replication
>>> *might* show better performance on the slaves.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>>
>

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Shivaji Dutta <sd...@hortonworks.com>.

- SolrCloud uses zookeeper to manage HA
	- Zookeeper is a standard for all HA in Apache Hadoop
- You have collections which will manage your shards across nodes
- SolrJ Client is now fault tolerant with CloudSolrClient

This is the way future direction of the product will go.



On 1/13/16, 5:58 AM, "Gian Maria Ricci - aka Alkampfer"
<al...@nablasoft.com> wrote:

>Thanks.
>
>--
>Gian Maria Ricci
>Cell: +39 320 0136949
>    
>
>
>-----Original Message-----
>From: Shawn Heisey [mailto:apache@elyograg.org]
>Sent: lunedì 11 gennaio 2016 18:28
>To: solr-user@lucene.apache.org
>Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave
>Replica
>
>On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> a customer need a comprehensive list of all pro and cons of using
>> standard Master Slave replica VS using Solr Cloud. I¹m interested
>> especially in query performance consideration, because in this
>> specific situation the rate of new documents is really slow, but the
>> amount of data is about 50 millions of document, and the index size on
>> disk for single core is about 30 GB.
>
>The primary advantage to SolrCloud is that SolrCloud handles most of the
>administrative and operational details for you automatically.
>
>SolrCloud is a little more complicated to set up initially, because you
>must worry about Zookeeper as well as Solr, but once it's properly set
>up, there is no single point of failure.
>
>> Such amount of data should be easily handled by a Master Slave replica
>> with a  single core replicated on a certain number of slaves, but we
>> need to evaluate also the option of SolrCloud, especially for fault
>> tolerance.
>>
>
>Once you're beyond initial setup, fault tolerance with SolrCloud is much
>easier than master/slave replication.  Switching a slave to a master is
>possible, but the procedure is somewhat complicated.  SolrCloud does not
>*have* masters, it is a true cluster.
>
>With master/slave replication, the master handles all indexing, and the
>finished index segments are copied to the slaves via HTTP, and the slaves
>simply need to open them.  SolrCloud does indexing on all shard replicas,
>nearly simultaneously.  Usually this is an advantage, not a disadvantage,
>but in heavy indexing situations master/slave replication
>*might* show better performance on the slaves.
>
>Thanks,
>Shawn
>
>

RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Gian Maria Ricci - aka Alkampfer <al...@nablasoft.com>.

Thanks.

--
Gian Maria Ricci
Cell: +39 320 0136949

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: lunedì 11 gennaio 2016 18:28
To: solr-user@lucene.apache.org
Subject: Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> a customer need a comprehensive list of all pro and cons of using 
> standard Master Slave replica VS using Solr Cloud. I’m interested 
> especially in query performance consideration, because in this 
> specific situation the rate of new documents is really slow, but the 
> amount of data is about 50 millions of document, and the index size on 
> disk for single core is about 30 GB.

The primary advantage to SolrCloud is that SolrCloud handles most of the administrative and operational details for you automatically.

SolrCloud is a little more complicated to set up initially, because you must worry about Zookeeper as well as Solr, but once it's properly set up, there is no single point of failure.

> Such amount of data should be easily handled by a Master Slave replica 
> with a  single core replicated on a certain number of slaves, but we 
> need to evaluate also the option of SolrCloud, especially for fault 
> tolerance.
>

Once you're beyond initial setup, fault tolerance with SolrCloud is much easier than master/slave replication.  Switching a slave to a master is possible, but the procedure is somewhat complicated.  SolrCloud does not
*have* masters, it is a true cluster.

With master/slave replication, the master handles all indexing, and the finished index segments are copied to the slaves via HTTP, and the slaves simply need to open them.  SolrCloud does indexing on all shard replicas, nearly simultaneously.  Usually this is an advantage, not a disadvantage, but in heavy indexing situations master/slave replication
*might* show better performance on the slaves.

Thanks,
Shawn

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

Posted by Shawn Heisey <ap...@elyograg.org>.

On 1/11/2016 4:28 AM, Gian Maria Ricci - aka Alkampfer wrote:
> a customer need a comprehensive list of all pro and cons of using
> standard Master Slave replica VS using Solr Cloud. I’m interested
> especially in query performance consideration, because in this
> specific situation the rate of new documents is really slow, but the
> amount of data is about 50 millions of document, and the index size on
> disk for single core is about 30 GB.

The primary advantage to SolrCloud is that SolrCloud handles most of the
administrative and operational details for you automatically.

SolrCloud is a little more complicated to set up initially, because you
must worry about Zookeeper as well as Solr, but once it's properly set
up, there is no single point of failure.

> Such amount of data should be easily handled by a Master Slave replica
> with a  single core replicated on a certain number of slaves, but we
> need to evaluate also the option of SolrCloud, especially for fault
> tolerance.
>

Once you're beyond initial setup, fault tolerance with SolrCloud is much
easier than master/slave replication.  Switching a slave to a master is
possible, but the procedure is somewhat complicated.  SolrCloud does not
*have* masters, it is a true cluster.

With master/slave replication, the master handles all indexing, and the
finished index segments are copied to the slaves via HTTP, and the
slaves simply need to open them.  SolrCloud does indexing on all shard
replicas, nearly simultaneously.  Usually this is an advantage, not a
disadvantage, but in heavy indexing situations master/slave replication
*might* show better performance on the slaves.

Thanks,
Shawn