You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by zarni aung <za...@gmail.com> on 2011/06/16 17:41:44 UTC

Document Scoring

Hi,

I am designing my indexes to have 1 write-only master core, 2 read-only
slave cores.  That means the read-only cores will only have snapshots pulled
from the master and will not have near real time changes.  I was thinking
about adding a hybrid read and write master core that will have the most
recent changes from my primary data source.  I am thinking to query the
hybrid master and the read-only slaves and somehow try to intersect the
results in order to support near real time full text search.  Is this
feasible?

Thank you,

Zarni

Re: Document Scoring

Posted by zarni aung <za...@gmail.com>.
Thank you, I will give that a shot.

Zarni

Re: Document Scoring

Posted by Erick Erickson <er...@gmail.com>.
I think this is the way to go. When trying to minimize latency, there are two
statistics to pay particular attention to on your #searchers#.

1> What is the warmup time for your caches?
2> What is your polling interval?

Make sure your polling interval is, say, at least three times longer than
your warmup interval when trying to minimize the latency. Also, set
<maxWarmingSearchers> to no more than two....

So the time between sending a document to the indexer and it being
available for search is at most the sum of

Time to next commit on the master
Polling interval on the slave
Time to replicate the changed part of the index
Warmup interval

As an aside, I've often found that, while product managers often say they
want "real time searching", explaining to them that "I can set up 5 minute
latency in 1 day, or program 20 second latency in XXX weeks" gives
them the information they need to decide how important "real time" really is!
Especially if you follow up with "and spending XXX weeks doing this
will mean that features A through F will not get into the release"....

Best
Erick

On Fri, Jun 17, 2011 at 11:24 AM, zarni aung <za...@gmail.com> wrote:
> Thank you this is something that I wanted to hear.  I knew the design was
> most likely flawed because I have never done Solr or any kind of full text
> searching, but needed an unbiased opinion.  I think that if I were to tune
> the configs and pay close attention to the logs with lots of performance
> testing I might be able to achieve close to near real time (1-5 mins).  I've
> been reading this mailing list, Hathi Trust, Lucid Imagination and other
> sites for insights.
>
> Again Thank you.
>
> Zarni
>
> On Thu, Jun 16, 2011 at 9:49 PM, Erick Erickson <er...@gmail.com>wrote:
>
>> I really wouldn't go there, it sounds like there are endless
>> opportunities for errors!
>>
>> How "real-time" is "real-time"? Could you fix this entirely
>> by
>> 1> adjusting expectations for, say, 5 minutes.
>> 2> adjusting your commit (on the master) and poll (on the slave)
>> appropriately?
>>
>> Best
>> Erick
>>
>> On Thu, Jun 16, 2011 at 11:41 AM, zarni aung <za...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am designing my indexes to have 1 write-only master core, 2 read-only
>> > slave cores.  That means the read-only cores will only have snapshots
>> pulled
>> > from the master and will not have near real time changes.  I was thinking
>> > about adding a hybrid read and write master core that will have the most
>> > recent changes from my primary data source.  I am thinking to query the
>> > hybrid master and the read-only slaves and somehow try to intersect the
>> > results in order to support near real time full text search.  Is this
>> > feasible?
>> >
>> > Thank you,
>> >
>> > Zarni
>> >
>>
>

Re: Document Scoring

Posted by zarni aung <za...@gmail.com>.
Thank you this is something that I wanted to hear.  I knew the design was
most likely flawed because I have never done Solr or any kind of full text
searching, but needed an unbiased opinion.  I think that if I were to tune
the configs and pay close attention to the logs with lots of performance
testing I might be able to achieve close to near real time (1-5 mins).  I've
been reading this mailing list, Hathi Trust, Lucid Imagination and other
sites for insights.

Again Thank you.

Zarni

On Thu, Jun 16, 2011 at 9:49 PM, Erick Erickson <er...@gmail.com>wrote:

> I really wouldn't go there, it sounds like there are endless
> opportunities for errors!
>
> How "real-time" is "real-time"? Could you fix this entirely
> by
> 1> adjusting expectations for, say, 5 minutes.
> 2> adjusting your commit (on the master) and poll (on the slave)
> appropriately?
>
> Best
> Erick
>
> On Thu, Jun 16, 2011 at 11:41 AM, zarni aung <za...@gmail.com> wrote:
> > Hi,
> >
> > I am designing my indexes to have 1 write-only master core, 2 read-only
> > slave cores.  That means the read-only cores will only have snapshots
> pulled
> > from the master and will not have near real time changes.  I was thinking
> > about adding a hybrid read and write master core that will have the most
> > recent changes from my primary data source.  I am thinking to query the
> > hybrid master and the read-only slaves and somehow try to intersect the
> > results in order to support near real time full text search.  Is this
> > feasible?
> >
> > Thank you,
> >
> > Zarni
> >
>

Re: Document Scoring

Posted by Erick Erickson <er...@gmail.com>.
I really wouldn't go there, it sounds like there are endless
opportunities for errors!

How "real-time" is "real-time"? Could you fix this entirely
by
1> adjusting expectations for, say, 5 minutes.
2> adjusting your commit (on the master) and poll (on the slave) appropriately?

Best
Erick

On Thu, Jun 16, 2011 at 11:41 AM, zarni aung <za...@gmail.com> wrote:
> Hi,
>
> I am designing my indexes to have 1 write-only master core, 2 read-only
> slave cores.  That means the read-only cores will only have snapshots pulled
> from the master and will not have near real time changes.  I was thinking
> about adding a hybrid read and write master core that will have the most
> recent changes from my primary data source.  I am thinking to query the
> hybrid master and the read-only slaves and somehow try to intersect the
> results in order to support near real time full text search.  Is this
> feasible?
>
> Thank you,
>
> Zarni
>