You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2009/01/03 21:40:53 UTC
collectionDistribution vs SolrReplication
Hey there,
I would like to know the advantages of moving from:
a master-slave system using CollectionDistribution with all their .sh
scripts
http://wiki.apache.org/solr/CollectionDistribution
to:
use SolrReplication and his solrconfig.xml configuration.
http://wiki.apache.org/solr/SolrReplication
Its tecnically much better or mainly for more easy use?
Does SolrReplication do warming aswell?
Checking performance numbers is solrReplication wiki page things seem to be
similar except for the RAM, are the advantages about that?
Thanks in advance!!
--
View this message in context: http://www.nabble.com/collectionDistribution-vs-SolrReplication-tp21269112p21269112.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: collectionDistribution vs SolrReplication
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
The problem with that approach is that unlike databases, a commit is an
expensive operation in Lucene right now. It is not very practical to commit
per document, therefore log replication offers very little.
On Tue, Jan 6, 2009 at 12:07 AM, Jacob Singh <ja...@gmail.com> wrote:
> Has there been a discussion anywhere about a "binary log" style
> replications scheme (ala mysql?) Wherein, every write request goes to
> the master, and the the slaves read in a queue of the requests and
> update themselves one record at a time instead of wholesale? Or is
> this just not worth the development time?
>
> Best,
> Jacob
>
> On Mon, Jan 5, 2009 at 10:26 AM, Noble Paul നോബിള് नोब्ळ्
> <no...@gmail.com> wrote:
> > The default IndexDeletionPolicy just keeps the last commit only
> > (KeepOnlyLastCommitDeletionPolicy) .Files belonging to older commits
> > are removed. If the files are needed longer for replication, they are
> > leased . The lease is extended 10 secs at a time. Once all the slaves
> > have copied the lease is never extended and the files will be purged.
> >
> > In the snapshot based system , unless the snapshots are deleted from
> > the file system the old files will continue to live on the disk
> > --Noble
> >
> > On Mon, Jan 5, 2009 at 6:59 PM, Mark Miller <ma...@gmail.com>
> wrote:
> >> Noble Paul ??????? ?????? wrote:
> >>>
> >>> * SolrReplication does not create snapshots . So you have less cleanup
> >>> to do. The script based replication results is more disk space
> >>> consumption (especially if you do frequent commits)
> >>>
> >>
> >> Doesn't SolrReplication effectively take a snapshot by using a custom
> >> IndexDeletionPolicy to keep the right index files around? Isn't that
> >> maintaining a snapshot?
> >>
> >> Could you elaborate on the difference Noble?
> >>
> >> - Mark
> >>
> >
> >
> >
> > --
> > --Noble Paul
> >
>
>
>
> --
>
> +1 510 277-0891 (o)
> +91 9999 33 7458 (m)
>
> web: http://pajamadesign.com
>
> Skype: pajamadesign
> Yahoo: jacobsingh
> AIM: jacobsingh
> gTalk: jacobsingh@gmail.com
>
--
Regards,
Shalin Shekhar Mangar.
Re: collectionDistribution vs SolrReplication
Posted by Jacob Singh <ja...@gmail.com>.
Has there been a discussion anywhere about a "binary log" style
replications scheme (ala mysql?) Wherein, every write request goes to
the master, and the the slaves read in a queue of the requests and
update themselves one record at a time instead of wholesale? Or is
this just not worth the development time?
Best,
Jacob
On Mon, Jan 5, 2009 at 10:26 AM, Noble Paul നോബിള് नोब्ळ्
<no...@gmail.com> wrote:
> The default IndexDeletionPolicy just keeps the last commit only
> (KeepOnlyLastCommitDeletionPolicy) .Files belonging to older commits
> are removed. If the files are needed longer for replication, they are
> leased . The lease is extended 10 secs at a time. Once all the slaves
> have copied the lease is never extended and the files will be purged.
>
> In the snapshot based system , unless the snapshots are deleted from
> the file system the old files will continue to live on the disk
> --Noble
>
> On Mon, Jan 5, 2009 at 6:59 PM, Mark Miller <ma...@gmail.com> wrote:
>> Noble Paul ??????? ?????? wrote:
>>>
>>> * SolrReplication does not create snapshots . So you have less cleanup
>>> to do. The script based replication results is more disk space
>>> consumption (especially if you do frequent commits)
>>>
>>
>> Doesn't SolrReplication effectively take a snapshot by using a custom
>> IndexDeletionPolicy to keep the right index files around? Isn't that
>> maintaining a snapshot?
>>
>> Could you elaborate on the difference Noble?
>>
>> - Mark
>>
>
>
>
> --
> --Noble Paul
>
--
+1 510 277-0891 (o)
+91 9999 33 7458 (m)
web: http://pajamadesign.com
Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsingh@gmail.com
Re: collectionDistribution vs SolrReplication
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
The default IndexDeletionPolicy just keeps the last commit only
(KeepOnlyLastCommitDeletionPolicy) .Files belonging to older commits
are removed. If the files are needed longer for replication, they are
leased . The lease is extended 10 secs at a time. Once all the slaves
have copied the lease is never extended and the files will be purged.
In the snapshot based system , unless the snapshots are deleted from
the file system the old files will continue to live on the disk
--Noble
On Mon, Jan 5, 2009 at 6:59 PM, Mark Miller <ma...@gmail.com> wrote:
> Noble Paul ??????? ?????? wrote:
>>
>> * SolrReplication does not create snapshots . So you have less cleanup
>> to do. The script based replication results is more disk space
>> consumption (especially if you do frequent commits)
>>
>
> Doesn't SolrReplication effectively take a snapshot by using a custom
> IndexDeletionPolicy to keep the right index files around? Isn't that
> maintaining a snapshot?
>
> Could you elaborate on the difference Noble?
>
> - Mark
>
--
--Noble Paul
Re: collectionDistribution vs SolrReplication
Posted by Mark Miller <ma...@gmail.com>.
Noble Paul ??????? ?????? wrote:
> * SolrReplication does not create snapshots . So you have less cleanup
> to do. The script based replication results is more disk space
> consumption (especially if you do frequent commits)
>
Doesn't SolrReplication effectively take a snapshot by using a custom
IndexDeletionPolicy to keep the right index files around? Isn't that
maintaining a snapshot?
Could you elaborate on the difference Noble?
- Mark
Re: collectionDistribution vs SolrReplication
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
* SolrReplication does not create snapshots . So you have less cleanup
to do. The script based replication results is more disk space
consumption (especially if you do frequent commits)
* Performance is roughly same unless you are replicating across
different LAN where SolrReplication can zip and transfer
On Sun, Jan 4, 2009 at 11:57 AM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> I think the main reason is ease of use. Warming is done the same way by
> adding a newSearcher listener in solrconfig.xml
>
> On Sun, Jan 4, 2009 at 2:10 AM, Marc Sturlese <ma...@gmail.com>wrote:
>
>>
>> Hey there,
>>
>> I would like to know the advantages of moving from:
>> a master-slave system using CollectionDistribution with all their .sh
>> scripts
>> http://wiki.apache.org/solr/CollectionDistribution
>>
>> to:
>> use SolrReplication and his solrconfig.xml configuration.
>> http://wiki.apache.org/solr/SolrReplication
>>
>>
>> Its tecnically much better or mainly for more easy use?
>> Does SolrReplication do warming aswell?
>>
>> Checking performance numbers is solrReplication wiki page things seem to be
>> similar except for the RAM, are the advantages about that?
>>
>> Thanks in advance!!
>> --
>> View this message in context:
>> http://www.nabble.com/collectionDistribution-vs-SolrReplication-tp21269112p21269112.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
--
--Noble Paul
Re: collectionDistribution vs SolrReplication
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
I think the main reason is ease of use. Warming is done the same way by
adding a newSearcher listener in solrconfig.xml
On Sun, Jan 4, 2009 at 2:10 AM, Marc Sturlese <ma...@gmail.com>wrote:
>
> Hey there,
>
> I would like to know the advantages of moving from:
> a master-slave system using CollectionDistribution with all their .sh
> scripts
> http://wiki.apache.org/solr/CollectionDistribution
>
> to:
> use SolrReplication and his solrconfig.xml configuration.
> http://wiki.apache.org/solr/SolrReplication
>
>
> Its tecnically much better or mainly for more easy use?
> Does SolrReplication do warming aswell?
>
> Checking performance numbers is solrReplication wiki page things seem to be
> similar except for the RAM, are the advantages about that?
>
> Thanks in advance!!
> --
> View this message in context:
> http://www.nabble.com/collectionDistribution-vs-SolrReplication-tp21269112p21269112.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
--
Regards,
Shalin Shekhar Mangar.
Re: collectionDistribution vs SolrReplication
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
http://wiki.apache.org/solr/SolrReplication#head-b77f4610e9b2f38433fdffc7f07cc9789ecabe72
On Sun, Jan 18, 2009 at 10:53 PM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> On Sun, Jan 18, 2009 at 9:51 PM, Yonik Seeley <ys...@gmail.com> wrote:
>
>>
>> I've not looked into the file replication part much (as opposed to the
>> index replication).
>> Master and Slave solrconfig.xml will most likely need to be different
>> though... is that addressed somehow?
>>
>
> Yes. You can provide an alias to a configuration file. So, on the master,
> you can add a file named solrconfig-slave.xml which will be copied as
> solrconfig.xml in the slave.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
--
--Noble Paul
Re: collectionDistribution vs SolrReplication
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sun, Jan 18, 2009 at 9:51 PM, Yonik Seeley <ys...@gmail.com> wrote:
>
> I've not looked into the file replication part much (as opposed to the
> index replication).
> Master and Slave solrconfig.xml will most likely need to be different
> though... is that addressed somehow?
>
Yes. You can provide an alias to a configuration file. So, on the master,
you can add a file named solrconfig-slave.xml which will be copied as
solrconfig.xml in the slave.
--
Regards,
Shalin Shekhar Mangar.
Re: collectionDistribution vs SolrReplication
Posted by Yonik Seeley <ys...@gmail.com>.
On Fri, Jan 16, 2009 at 3:33 AM, Noble Paul നോബിള് नोब्ळ्
<no...@gmail.com> wrote:
> inbuilt replication allows schema/conf replication which makes a lot
> of these unnecessary.
> All disable enable stuff are exposed as http commands
I've not looked into the file replication part much (as opposed to the
index replication).
Master and Slave solrconfig.xml will most likely need to be different
though... is that addressed somehow?
-Yonik
Re: collectionDistribution vs SolrReplication
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
On Fri, Jan 16, 2009 at 3:37 AM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : I would like to know the advantages of moving from:
> : a master-slave system using CollectionDistribution with all their .sh
> : scripts
> : http://wiki.apache.org/solr/CollectionDistribution
> : to:
> : use SolrReplication and his solrconfig.xml configuration.
> : http://wiki.apache.org/solr/SolrReplication
>
> in addition to other comments posted it's important to keep in mind that
> one of the original motivations for the new style of replication was to
> have a 100% java based solution, as a result, it's is the only
> replication approach that works on windows.
>
> (in particular: it has no dependency on being able to delete hardlinks, or
> on running rsync, or on using ssh, or on having external crons, etc..)
>
> I still haven't had a chance to really kick the tires on the java based
> replication, so i have no real experience to base either of these claims
> on, but my hunch is that:
> 1) new users will find the java based replication *much* easier to get
> up an running (a lot less moving parts and external processes to deal
> with)
> 2) existing users who already have the script based replication working
> for them may find the java based replication less transparent and harder
> to maniplate in tricky ways.
>
> ...that second hunch comes from the fact that since the java replication
> is all self contained in solr, and doesn't use use all of hte various
> external processes (cron, rsync, snapshooter, snappuller, ssh, etc...)
> there are less places for people to manipulate the replication when doing
> atypical' operations ... for example: during a phased rollout of some new
> code/schema, you might disable all replication by shutting down the rsyncd
> port; then disabling it for a few slaves by commenting out the snappuller
> cron before turning rsyncd back on ... etc.
inbuilt replication allows schema/conf replication which makes a lot
of these unnecessary.
All disable enable stuff are exposed as http commands
>
> these types of tricks are probably unneccessary in 90% of the use cases,
> and people who aren't use to being able to do them probably won't care,
> but if you are use to having that level of control, you might miss them.
>
> (but as i said: i haven't had a chance to try out the java replication at
> all, so for all i know it's just as tweakable and i'm just an idiot.)
>
> -Hoss
>
>
--
--Noble Paul
Re: collectionDistribution vs SolrReplication
Posted by Chris Hostetter <ho...@fucit.org>.
: I would like to know the advantages of moving from:
: a master-slave system using CollectionDistribution with all their .sh
: scripts
: http://wiki.apache.org/solr/CollectionDistribution
: to:
: use SolrReplication and his solrconfig.xml configuration.
: http://wiki.apache.org/solr/SolrReplication
in addition to other comments posted it's important to keep in mind that
one of the original motivations for the new style of replication was to
have a 100% java based solution, as a result, it's is the only
replication approach that works on windows.
(in particular: it has no dependency on being able to delete hardlinks, or
on running rsync, or on using ssh, or on having external crons, etc..)
I still haven't had a chance to really kick the tires on the java based
replication, so i have no real experience to base either of these claims
on, but my hunch is that:
1) new users will find the java based replication *much* easier to get
up an running (a lot less moving parts and external processes to deal
with)
2) existing users who already have the script based replication working
for them may find the java based replication less transparent and harder
to maniplate in tricky ways.
...that second hunch comes from the fact that since the java replication
is all self contained in solr, and doesn't use use all of hte various
external processes (cron, rsync, snapshooter, snappuller, ssh, etc...)
there are less places for people to manipulate the replication when doing
atypical' operations ... for example: during a phased rollout of some new
code/schema, you might disable all replication by shutting down the rsyncd
port; then disabling it for a few slaves by commenting out the snappuller
cron before turning rsyncd back on ... etc.
these types of tricks are probably unneccessary in 90% of the use cases,
and people who aren't use to being able to do them probably won't care,
but if you are use to having that level of control, you might miss them.
(but as i said: i haven't had a chance to try out the java replication at
all, so for all i know it's just as tweakable and i'm just an idiot.)
-Hoss