You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2009/01/03 21:40:53 UTC

collectionDistribution vs SolrReplication

Hey there,

I would like to know the advantages of moving from:
a master-slave system using CollectionDistribution with all their .sh
scripts
http://wiki.apache.org/solr/CollectionDistribution

to:
use SolrReplication and his solrconfig.xml configuration.
http://wiki.apache.org/solr/SolrReplication


Its tecnically much better or mainly for more easy use?
Does SolrReplication do warming aswell?

Checking performance numbers is solrReplication wiki page things seem to be
similar except for the RAM, are the advantages about that?

Thanks in advance!!
-- 
View this message in context: http://www.nabble.com/collectionDistribution-vs-SolrReplication-tp21269112p21269112.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: collectionDistribution vs SolrReplication

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
The problem with that approach is that unlike databases, a commit is an
expensive operation in Lucene right now. It is not very practical to commit
per document, therefore log replication offers very little.

On Tue, Jan 6, 2009 at 12:07 AM, Jacob Singh <ja...@gmail.com> wrote:

> Has there been a discussion anywhere about a "binary log" style
> replications scheme (ala mysql?)  Wherein, every write request goes to
> the master, and the the slaves read in a queue of the requests and
> update themselves one record at a time instead of wholesale?  Or is
> this just not worth the development time?
>
> Best,
> Jacob
>
> On Mon, Jan 5, 2009 at 10:26 AM, Noble Paul നോബിള്‍ नोब्ळ्
> <no...@gmail.com> wrote:
> > The default IndexDeletionPolicy just keeps the last commit only
> > (KeepOnlyLastCommitDeletionPolicy) .Files belonging to older commits
> > are removed. If the files are needed longer for replication, they are
> > leased . The lease is extended 10 secs at a time. Once all the slaves
> > have copied the lease is never extended and the files will be purged.
> >
> > In the snapshot based system , unless the snapshots are deleted from
> > the file system the old files will continue to live on the disk
> > --Noble
> >
> > On Mon, Jan 5, 2009 at 6:59 PM, Mark Miller <ma...@gmail.com>
> wrote:
> >> Noble Paul ??????? ?????? wrote:
> >>>
> >>> * SolrReplication does not create snapshots . So you have less cleanup
> >>> to do. The script based replication results is more disk space
> >>> consumption (especially if you do frequent commits)
> >>>
> >>
> >> Doesn't SolrReplication effectively take a snapshot by using a custom
> >> IndexDeletionPolicy to keep the right index files around? Isn't that
> >> maintaining a snapshot?
> >>
> >> Could you elaborate on the difference Noble?
> >>
> >> - Mark
> >>
> >
> >
> >
> > --
> > --Noble Paul
> >
>
>
>
> --
>
> +1 510 277-0891 (o)
> +91 9999 33 7458 (m)
>
> web: http://pajamadesign.com
>
> Skype: pajamadesign
> Yahoo: jacobsingh
> AIM: jacobsingh
> gTalk: jacobsingh@gmail.com
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: collectionDistribution vs SolrReplication

Posted by Jacob Singh <ja...@gmail.com>.
Has there been a discussion anywhere about a "binary log" style
replications scheme (ala mysql?)  Wherein, every write request goes to
the master, and the the slaves read in a queue of the requests and
update themselves one record at a time instead of wholesale?  Or is
this just not worth the development time?

Best,
Jacob

On Mon, Jan 5, 2009 at 10:26 AM, Noble Paul നോബിള്‍ नोब्ळ्
<no...@gmail.com> wrote:
> The default IndexDeletionPolicy just keeps the last commit only
> (KeepOnlyLastCommitDeletionPolicy) .Files belonging to older commits
> are removed. If the files are needed longer for replication, they are
> leased . The lease is extended 10 secs at a time. Once all the slaves
> have copied the lease is never extended and the files will be purged.
>
> In the snapshot based system , unless the snapshots are deleted from
> the file system the old files will continue to live on the disk
> --Noble
>
> On Mon, Jan 5, 2009 at 6:59 PM, Mark Miller <ma...@gmail.com> wrote:
>> Noble Paul ??????? ?????? wrote:
>>>
>>> * SolrReplication does not create snapshots . So you have less cleanup
>>> to do. The script based replication results is more disk space
>>> consumption (especially if you do frequent commits)
>>>
>>
>> Doesn't SolrReplication effectively take a snapshot by using a custom
>> IndexDeletionPolicy to keep the right index files around? Isn't that
>> maintaining a snapshot?
>>
>> Could you elaborate on the difference Noble?
>>
>> - Mark
>>
>
>
>
> --
> --Noble Paul
>



-- 

+1 510 277-0891 (o)
+91 9999 33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsingh@gmail.com

Re: collectionDistribution vs SolrReplication

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
The default IndexDeletionPolicy just keeps the last commit only
(KeepOnlyLastCommitDeletionPolicy) .Files belonging to older commits
are removed. If the files are needed longer for replication, they are
leased . The lease is extended 10 secs at a time. Once all the slaves
have copied the lease is never extended and the files will be purged.

In the snapshot based system , unless the snapshots are deleted from
the file system the old files will continue to live on the disk
--Noble

On Mon, Jan 5, 2009 at 6:59 PM, Mark Miller <ma...@gmail.com> wrote:
> Noble Paul ??????? ?????? wrote:
>>
>> * SolrReplication does not create snapshots . So you have less cleanup
>> to do. The script based replication results is more disk space
>> consumption (especially if you do frequent commits)
>>
>
> Doesn't SolrReplication effectively take a snapshot by using a custom
> IndexDeletionPolicy to keep the right index files around? Isn't that
> maintaining a snapshot?
>
> Could you elaborate on the difference Noble?
>
> - Mark
>



-- 
--Noble Paul

Re: collectionDistribution vs SolrReplication

Posted by Mark Miller <ma...@gmail.com>.
Noble Paul ??????? ?????? wrote:
> * SolrReplication does not create snapshots . So you have less cleanup
> to do. The script based replication results is more disk space
> consumption (especially if you do frequent commits)
>   
Doesn't SolrReplication effectively take a snapshot by using a custom 
IndexDeletionPolicy to keep the right index files around? Isn't that 
maintaining a snapshot?

Could you elaborate on the difference Noble?

- Mark

Re: collectionDistribution vs SolrReplication

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
* SolrReplication does not create snapshots . So you have less cleanup
to do. The script based replication results is more disk space
consumption (especially if you do frequent commits)
* Performance is roughly same unless you are replicating across
different LAN where SolrReplication can zip and transfer




On Sun, Jan 4, 2009 at 11:57 AM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> I think the main reason is ease of use. Warming is done the same way by
> adding a newSearcher listener in solrconfig.xml
>
> On Sun, Jan 4, 2009 at 2:10 AM, Marc Sturlese <ma...@gmail.com>wrote:
>
>>
>> Hey there,
>>
>> I would like to know the advantages of moving from:
>> a master-slave system using CollectionDistribution with all their .sh
>> scripts
>> http://wiki.apache.org/solr/CollectionDistribution
>>
>> to:
>> use SolrReplication and his solrconfig.xml configuration.
>> http://wiki.apache.org/solr/SolrReplication
>>
>>
>> Its tecnically much better or mainly for more easy use?
>> Does SolrReplication do warming aswell?
>>
>> Checking performance numbers is solrReplication wiki page things seem to be
>> similar except for the RAM, are the advantages about that?
>>
>> Thanks in advance!!
>> --
>> View this message in context:
>> http://www.nabble.com/collectionDistribution-vs-SolrReplication-tp21269112p21269112.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
--Noble Paul

Re: collectionDistribution vs SolrReplication

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
I think the main reason is ease of use. Warming is done the same way by
adding a newSearcher listener in solrconfig.xml

On Sun, Jan 4, 2009 at 2:10 AM, Marc Sturlese <ma...@gmail.com>wrote:

>
> Hey there,
>
> I would like to know the advantages of moving from:
> a master-slave system using CollectionDistribution with all their .sh
> scripts
> http://wiki.apache.org/solr/CollectionDistribution
>
> to:
> use SolrReplication and his solrconfig.xml configuration.
> http://wiki.apache.org/solr/SolrReplication
>
>
> Its tecnically much better or mainly for more easy use?
> Does SolrReplication do warming aswell?
>
> Checking performance numbers is solrReplication wiki page things seem to be
> similar except for the RAM, are the advantages about that?
>
> Thanks in advance!!
> --
> View this message in context:
> http://www.nabble.com/collectionDistribution-vs-SolrReplication-tp21269112p21269112.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: collectionDistribution vs SolrReplication

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
http://wiki.apache.org/solr/SolrReplication#head-b77f4610e9b2f38433fdffc7f07cc9789ecabe72



On Sun, Jan 18, 2009 at 10:53 PM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> On Sun, Jan 18, 2009 at 9:51 PM, Yonik Seeley <ys...@gmail.com> wrote:
>
>>
>> I've not looked into the file replication part much (as opposed to the
>> index replication).
>> Master and Slave solrconfig.xml will most likely need to be different
>> though... is that addressed somehow?
>>
>
> Yes. You can provide an alias to a configuration file. So, on the master,
> you can add a file named solrconfig-slave.xml which will be copied as
> solrconfig.xml in the slave.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
--Noble Paul

Re: collectionDistribution vs SolrReplication

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sun, Jan 18, 2009 at 9:51 PM, Yonik Seeley <ys...@gmail.com> wrote:

>
> I've not looked into the file replication part much (as opposed to the
> index replication).
> Master and Slave solrconfig.xml will most likely need to be different
> though... is that addressed somehow?
>

Yes. You can provide an alias to a configuration file. So, on the master,
you can add a file named solrconfig-slave.xml which will be copied as
solrconfig.xml in the slave.

-- 
Regards,
Shalin Shekhar Mangar.

Re: collectionDistribution vs SolrReplication

Posted by Yonik Seeley <ys...@gmail.com>.
On Fri, Jan 16, 2009 at 3:33 AM, Noble Paul നോബിള്‍  नोब्ळ्
<no...@gmail.com> wrote:
> inbuilt replication allows schema/conf replication which makes a lot
> of these unnecessary.
> All disable enable stuff are exposed as http commands

I've not looked into the file replication part much (as opposed to the
index replication).
Master and Slave solrconfig.xml will most likely need to be different
though... is that addressed somehow?

-Yonik

Re: collectionDistribution vs SolrReplication

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
On Fri, Jan 16, 2009 at 3:37 AM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : I would like to know the advantages of moving from:
> : a master-slave system using CollectionDistribution with all their .sh
> : scripts
> : http://wiki.apache.org/solr/CollectionDistribution
> : to:
> : use SolrReplication and his solrconfig.xml configuration.
> : http://wiki.apache.org/solr/SolrReplication
>
> in addition to other comments posted it's important to keep in mind that
> one of the original motivations for the new style of replication was to
> have a 100% java based solution, as a result, it's is the only
> replication approach that works on windows.
>
> (in particular: it has no dependency on being able to delete hardlinks, or
> on running rsync, or on using ssh, or on having external crons, etc..)
>
> I still haven't had a chance to really kick the tires on the java based
> replication, so i have no real experience to base either of these claims
> on, but my hunch is that:
>  1) new users will find the java based replication *much* easier to get
> up an running (a lot less moving parts and external processes to deal
> with)
>  2) existing users who already have the script based replication working
> for them may find the java based replication less transparent and harder
> to maniplate in tricky ways.
>
> ...that second hunch comes from the fact that since the java replication
> is all self contained in solr, and doesn't use use all of hte various
> external processes (cron, rsync, snapshooter, snappuller, ssh, etc...)
> there are less places for people to manipulate the replication when doing
> atypical' operations ... for example: during a phased rollout of some new
> code/schema, you might disable all replication by shutting down the rsyncd
> port; then disabling it for a few slaves by commenting out the snappuller
> cron before turning rsyncd back on ... etc.
inbuilt replication allows schema/conf replication which makes a lot
of these unnecessary.
All disable enable stuff are exposed as http commands
>
> these types of tricks are probably unneccessary in 90% of the use cases,
> and people who aren't use to being able to do them probably won't care,
> but if you are use to having that level of control, you might miss them.
>
> (but as i said: i haven't had a chance to try out the java replication at
> all, so for all i know it's just as tweakable and i'm just an idiot.)
>
> -Hoss
>
>



-- 
--Noble Paul

Re: collectionDistribution vs SolrReplication

Posted by Chris Hostetter <ho...@fucit.org>.
: I would like to know the advantages of moving from:
: a master-slave system using CollectionDistribution with all their .sh
: scripts
: http://wiki.apache.org/solr/CollectionDistribution
: to:
: use SolrReplication and his solrconfig.xml configuration.
: http://wiki.apache.org/solr/SolrReplication

in addition to other comments posted it's important to keep in mind that 
one of the original motivations for the new style of replication was to 
have a 100% java based solution, as a result, it's is the only 
replication approach that works on windows.

(in particular: it has no dependency on being able to delete hardlinks, or 
on running rsync, or on using ssh, or on having external crons, etc..)

I still haven't had a chance to really kick the tires on the java based 
replication, so i have no real experience to base either of these claims 
on, but my hunch is that:
  1) new users will find the java based replication *much* easier to get 
up an running (a lot less moving parts and external processes to deal 
with)
  2) existing users who already have the script based replication working 
for them may find the java based replication less transparent and harder 
to maniplate in tricky ways.

...that second hunch comes from the fact that since the java replication 
is all self contained in solr, and doesn't use use all of hte various 
external processes (cron, rsync, snapshooter, snappuller, ssh, etc...) 
there are less places for people to manipulate the replication when doing 
atypical' operations ... for example: during a phased rollout of some new 
code/schema, you might disable all replication by shutting down the rsyncd 
port; then disabling it for a few slaves by commenting out the snappuller 
cron before turning rsyncd back on ... etc.

these types of tricks are probably unneccessary in 90% of the use cases, 
and people who aren't use to being able to do them probably won't care, 
but if you are use to having that level of control, you might miss them.

(but as i said: i haven't had a chance to try out the java replication at 
all, so for all i know it's just as tweakable and i'm just an idiot.)

-Hoss