You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by abhishes <ab...@gmail.com> on 2010/02/15 08:50:24 UTC

Question on Index Replication

Hello All,

Upon reading the article 

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

I have a question around index replication. 

If the query load is very high and I want multiple severs to be able to
search the index. Can multiple servers share one read-only copy of the
index?

so one server (Master) builds the index and it is stored on a SAN. Then
multiple Slave servers point to the same copy of the data and answer user
queries.

In the replication diagram, I see that the index is being copied on each of
the Slave servers. 

This is not desirable because index is read-only (for the slave servers,
because only master updates the index) and copying of indexes can take very
long (depending on index size) and can unnecessarily waste disk space.
-- 
View this message in context: http://old.nabble.com/Question-on-Index-Replication-tp27590418p27590418.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question on Index Replication

Posted by Lance Norskog <go...@gmail.com>.
When you change an index you do not have to copy the entire index
again. The new part of the index is in separate files and the
replication code knows to only pull the differences.

Indexing on a master and copying to slaves works very well - there are
thousands of Solr installations using that technique.

With a SAN, the slaves have to be close enough for the SAN cables to
work. Also there are file system locking problems but those can be
worked around.

On Mon, Feb 15, 2010 at 10:03 AM, Erick Erickson
<er...@gmail.com> wrote:
> Caveats:
> <1> I don't know either.
> <2> I think you can just fire off auto-warming queries at each SOLR
> instance.
> the main caching is on the server machine as far as SOLR search speed
> is concerned.
>
> But I'd really recommend thinking about just replicating the indexes, disk
> space is very cheap. Probably a lot cheaper than that much RAM!
> How big are your indexes?
>
> Erick
>
>
> On Mon, Feb 15, 2010 at 11:11 AM, abhishes <ab...@gmail.com> wrote:
>
>>
>> What you say makes perfect sense.
>>
>> However i can offset the risk of disk i/o and latency by having good amount
>> of RAM say 64 GB and 64 bit OS.
>>
>> 2 caveats being that
>>
>> 1. I have no clue if J2EE servers can use this much RAM (64 bit OS and
>> JVM).
>>
>> 2. I have no idea on how can cache be auto-warmed. so that the users don't
>> pay the penalty of loading the cache.
>>
>>
>>
>>
>> Erick Erickson wrote:
>> >
>> > Sure, you can do that. But you're making a change that kind of defeats
>> > the purpose. The underlying Lucene engine can be very disk intensive,
>> > and any network latency will adversely affect the search speed. Which
>> > is the point of replicating the indexes, to get them local to the SOLR/
>> > Lucene instance that's using them so disk access is as fast as
>> > possible.
>> >
>> > If you're willing to trade the search speed for saving disk space, you
>> > can set things up like you want. But I'd sure run some performance
>> > tests against a local as opposed to remote instance of my index
>> > before making a decision...
>> >
>> > HTH
>> > Erick
>> >
>> > On Mon, Feb 15, 2010 at 2:50 AM, abhishes <ab...@gmail.com> wrote:
>> >
>> >>
>> >> Hello All,
>> >>
>> >> Upon reading the article
>> >>
>> >>
>> >>
>> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
>> >>
>> >> I have a question around index replication.
>> >>
>> >> If the query load is very high and I want multiple severs to be able to
>> >> search the index. Can multiple servers share one read-only copy of the
>> >> index?
>> >>
>> >> so one server (Master) builds the index and it is stored on a SAN. Then
>> >> multiple Slave servers point to the same copy of the data and answer
>> user
>> >> queries.
>> >>
>> >> In the replication diagram, I see that the index is being copied on each
>> >> of
>> >> the Slave servers.
>> >>
>> >> This is not desirable because index is read-only (for the slave servers,
>> >> because only master updates the index) and copying of indexes can take
>> >> very
>> >> long (depending on index size) and can unnecessarily waste disk space.
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/Question-on-Index-Replication-tp27590418p27590418.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Question-on-Index-Replication-tp27590418p27596034.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Question on Index Replication

Posted by Erick Erickson <er...@gmail.com>.
Caveats:
<1> I don't know either.
<2> I think you can just fire off auto-warming queries at each SOLR
instance.
the main caching is on the server machine as far as SOLR search speed
is concerned.

But I'd really recommend thinking about just replicating the indexes, disk
space is very cheap. Probably a lot cheaper than that much RAM!
How big are your indexes?

Erick


On Mon, Feb 15, 2010 at 11:11 AM, abhishes <ab...@gmail.com> wrote:

>
> What you say makes perfect sense.
>
> However i can offset the risk of disk i/o and latency by having good amount
> of RAM say 64 GB and 64 bit OS.
>
> 2 caveats being that
>
> 1. I have no clue if J2EE servers can use this much RAM (64 bit OS and
> JVM).
>
> 2. I have no idea on how can cache be auto-warmed. so that the users don't
> pay the penalty of loading the cache.
>
>
>
>
> Erick Erickson wrote:
> >
> > Sure, you can do that. But you're making a change that kind of defeats
> > the purpose. The underlying Lucene engine can be very disk intensive,
> > and any network latency will adversely affect the search speed. Which
> > is the point of replicating the indexes, to get them local to the SOLR/
> > Lucene instance that's using them so disk access is as fast as
> > possible.
> >
> > If you're willing to trade the search speed for saving disk space, you
> > can set things up like you want. But I'd sure run some performance
> > tests against a local as opposed to remote instance of my index
> > before making a decision...
> >
> > HTH
> > Erick
> >
> > On Mon, Feb 15, 2010 at 2:50 AM, abhishes <ab...@gmail.com> wrote:
> >
> >>
> >> Hello All,
> >>
> >> Upon reading the article
> >>
> >>
> >>
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
> >>
> >> I have a question around index replication.
> >>
> >> If the query load is very high and I want multiple severs to be able to
> >> search the index. Can multiple servers share one read-only copy of the
> >> index?
> >>
> >> so one server (Master) builds the index and it is stored on a SAN. Then
> >> multiple Slave servers point to the same copy of the data and answer
> user
> >> queries.
> >>
> >> In the replication diagram, I see that the index is being copied on each
> >> of
> >> the Slave servers.
> >>
> >> This is not desirable because index is read-only (for the slave servers,
> >> because only master updates the index) and copying of indexes can take
> >> very
> >> long (depending on index size) and can unnecessarily waste disk space.
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/Question-on-Index-Replication-tp27590418p27590418.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Question-on-Index-Replication-tp27590418p27596034.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Question on Index Replication

Posted by abhishes <ab...@gmail.com>.
What you say makes perfect sense.

However i can offset the risk of disk i/o and latency by having good amount
of RAM say 64 GB and 64 bit OS. 

2 caveats being that 

1. I have no clue if J2EE servers can use this much RAM (64 bit OS and JVM).

2. I have no idea on how can cache be auto-warmed. so that the users don't
pay the penalty of loading the cache.




Erick Erickson wrote:
> 
> Sure, you can do that. But you're making a change that kind of defeats
> the purpose. The underlying Lucene engine can be very disk intensive,
> and any network latency will adversely affect the search speed. Which
> is the point of replicating the indexes, to get them local to the SOLR/
> Lucene instance that's using them so disk access is as fast as
> possible.
> 
> If you're willing to trade the search speed for saving disk space, you
> can set things up like you want. But I'd sure run some performance
> tests against a local as opposed to remote instance of my index
> before making a decision...
> 
> HTH
> Erick
> 
> On Mon, Feb 15, 2010 at 2:50 AM, abhishes <ab...@gmail.com> wrote:
> 
>>
>> Hello All,
>>
>> Upon reading the article
>>
>>
>> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
>>
>> I have a question around index replication.
>>
>> If the query load is very high and I want multiple severs to be able to
>> search the index. Can multiple servers share one read-only copy of the
>> index?
>>
>> so one server (Master) builds the index and it is stored on a SAN. Then
>> multiple Slave servers point to the same copy of the data and answer user
>> queries.
>>
>> In the replication diagram, I see that the index is being copied on each
>> of
>> the Slave servers.
>>
>> This is not desirable because index is read-only (for the slave servers,
>> because only master updates the index) and copying of indexes can take
>> very
>> long (depending on index size) and can unnecessarily waste disk space.
>> --
>> View this message in context:
>> http://old.nabble.com/Question-on-Index-Replication-tp27590418p27590418.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Question-on-Index-Replication-tp27590418p27596034.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question on Index Replication

Posted by Erick Erickson <er...@gmail.com>.
Sure, you can do that. But you're making a change that kind of defeats
the purpose. The underlying Lucene engine can be very disk intensive,
and any network latency will adversely affect the search speed. Which
is the point of replicating the indexes, to get them local to the SOLR/
Lucene instance that's using them so disk access is as fast as
possible.

If you're willing to trade the search speed for saving disk space, you
can set things up like you want. But I'd sure run some performance
tests against a local as opposed to remote instance of my index
before making a decision...

HTH
Erick

On Mon, Feb 15, 2010 at 2:50 AM, abhishes <ab...@gmail.com> wrote:

>
> Hello All,
>
> Upon reading the article
>
>
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
>
> I have a question around index replication.
>
> If the query load is very high and I want multiple severs to be able to
> search the index. Can multiple servers share one read-only copy of the
> index?
>
> so one server (Master) builds the index and it is stored on a SAN. Then
> multiple Slave servers point to the same copy of the data and answer user
> queries.
>
> In the replication diagram, I see that the index is being copied on each of
> the Slave servers.
>
> This is not desirable because index is read-only (for the slave servers,
> because only master updates the index) and copying of indexes can take very
> long (depending on index size) and can unnecessarily waste disk space.
> --
> View this message in context:
> http://old.nabble.com/Question-on-Index-Replication-tp27590418p27590418.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>