You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Naveen Gupta <nk...@gmail.com> on 2011/06/03 05:29:42 UTC

Strategy --> Frequent updates in our application

Hi

We are having an application where every 10 mins, we are doing indexing of
users docs repository, and eventually, if some thread is being added in that
particular discussion, we need to index the thread again (please note we are
not doing blind indexing each time, we have various rules to filter out
which thread is new and thus that is a candidate for indexing plus new ones
which has arrived).

So we are doing updates for each user docs repository .. the performance is
not looking so far very good. the future is that we are going to get hits in
volume(1000 to 10,000 hits per mins), so looking for strategy where we can
tune solr in order to index the data in real time

and what about NRT, is it fine to apply in this case of scenario. i read
that solr NRT is not very good in performance, but i am not going to believe
it since it is one of the best open sources ..so it is going to have this
problem sorted in near future ..but if any benchmark is there, kindly share
with me ... we would like to analyze with our requirements.

Is there any way to add incremental indexes which we generally find in other
search engine like endeca and etc? i don't know much in detail about solr...
since i am newbie, so can you please tell me if we can have some settings
which can keep track of incremental indexing?


Thanks
Naveen

Re: Strategy --> Frequent updates in our application

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Naveen,

Solr does support incremental indexing.
Solr currently doesn't make use of Lucene's NRT support, but that is starting to 
change.
If you provide more specifics about issues you are having and your architecture, 
data and query volume, we may be able to help better.

Otis 
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Naveen Gupta <nk...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, June 2, 2011 11:29:42 PM
> Subject: Strategy --> Frequent updates in our application
> 
> Hi
> 
> We are having an application where every 10 mins, we are doing  indexing of
> users docs repository, and eventually, if some thread is being  added in that
> particular discussion, we need to index the thread again  (please note we are
> not doing blind indexing each time, we have various rules  to filter out
> which thread is new and thus that is a candidate for indexing  plus new ones
> which has arrived).
> 
> So we are doing updates for each  user docs repository .. the performance is
> not looking so far very good. the  future is that we are going to get hits in
> volume(1000 to 10,000 hits per  mins), so looking for strategy where we can
> tune solr in order to index the  data in real time
> 
> and what about NRT, is it fine to apply in this case of  scenario. i read
> that solr NRT is not very good in performance, but i am not  going to believe
> it since it is one of the best open sources ..so it is going  to have this
> problem sorted in near future ..but if any benchmark is there,  kindly share
> with me ... we would like to analyze with our  requirements.
> 
> Is there any way to add incremental indexes which we  generally find in other
> search engine like endeca and etc? i don't know much  in detail about solr...
> since i am newbie, so can you please tell me if we  can have some settings
> which can keep track of incremental  indexing?
> 
> 
> Thanks
> Naveen
> 

Re: Strategy --> Frequent updates in our application

Posted by Erick Erickson <er...@gmail.com>.
Do be careful how often you pull down indexes on your slaves. A
too-short polling interval can
lead to some problems. Start with, say, 5 minutes and insure that your
autowarm time (see your
logs) is less than your polling interval....

Best
Erick


On Fri, Jun 3, 2011 at 8:43 AM, pravesh <su...@yahoo.com> wrote:
> You can go ahead with the Master/Slave setup provided by SOLR. Its trivial to
> setup and you also get SOLR's operational scripts for index synch'ing b/w
> Master-to-Slave(s), OR the Java based replication feature.
>
> There is no need to re-invent other architecture :)
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019475.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Strategy --> Frequent updates in our application

Posted by pravesh <su...@yahoo.com>.
You can go ahead with the Master/Slave setup provided by SOLR. Its trivial to
setup and you also get SOLR's operational scripts for index synch'ing b/w
Master-to-Slave(s), OR the Java based replication feature.

There is no need to re-invent other architecture :)

--
View this message in context: http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019475.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strategy --> Frequent updates in our application

Posted by Naveen Gupta <nk...@gmail.com>.
Hi Pravesh

We don't have that setup right now .. we are thinking of doing that ....

for writes we are going to have one instance and for read, we are going to
have another...

do you have other design in mind .. kindly share

Thanks
Naveen

On Fri, Jun 3, 2011 at 2:50 PM, pravesh <su...@yahoo.com> wrote:

> You can use DataImportHandler for your full/incremental indexing. Now NRT
> indexing could vary as per business requirements (i mean delay cud be
> 5-mins
> ,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will
> be indexed incrementally.
> BTW, r u having Master+Slave SOLR setup?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Strategy --> Frequent updates in our application

Posted by pravesh <su...@yahoo.com>.
You can use DataImportHandler for your full/incremental indexing. Now NRT
indexing could vary as per business requirements (i mean delay cud be 5-mins
,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will
be indexed incrementally.
BTW, r u having Master+Slave SOLR setup?

--
View this message in context: http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strategy --> Frequent updates in our application

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Yes, when people talk about NRT search they refer to 'add to view lag'.  In a 
typical Solr master-slave setup this is dominated by waiting for replication, 
doing the replication, and then warming up.

If your problem is indexing speed then that's a separate story that I think 
you'll find answers to on http://search-lucene.com/ or if you can't find them we 
can repeat :)

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Jack Repenning <jr...@collab.net>
> To: solr-user@lucene.apache.org
> Sent: Fri, June 3, 2011 2:10:27 PM
> Subject: Re: Strategy --> Frequent updates in our application
> 
> On Jun 2, 2011, at 8:29 PM, Naveen Gupta wrote:
> 
> > and what about NRT,  is it fine to apply in this case of scenario
> 
> Is NRT really what's wanted  here? I'm asking the experts, as I have a 
>situation  not too different from  the b.p.
> 
> It appears to me (from the dox) that NRT makes a difference in  the lag between 
>a document being added and it being available in searches. But  the BP really 
>sounds to me like a concern over documents-added-per-second. Does  the 
>RankingAlgorithm form of NRT improve the docs-added-per-second  performance?
> 
> My add-to-view limits aren't really threatened by Solr  performance today; 
>something like 30 seconds is just fine. But I am feeling  close enough to the 
>documents-per-second boundary that I'm pondering measures  like master/slave. If 
>NRT only improvs add-to-view lag, I'm not overly  interested, but if it can 
>improve add throughput, I'm all over it  ;-)
> 
> -==-
> Jack Repenning
> Technologist
> Codesion Business  Unit
> CollabNet, Inc.
> 8000 Marina Boulevard, Suite 600
> Brisbane,  California 94005
> office: +1 650.228.2562
> twitter: http://twitter.com/jrep
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

Re: Strategy --> Frequent updates in our application

Posted by Jack Repenning <jr...@collab.net>.
On Jun 2, 2011, at 8:29 PM, Naveen Gupta wrote:

> and what about NRT, is it fine to apply in this case of scenario

Is NRT really what's wanted here? I'm asking the experts, as I have a situation  not too different from the b.p.

It appears to me (from the dox) that NRT makes a difference in the lag between a document being added and it being available in searches. But the BP really sounds to me like a concern over documents-added-per-second. Does the RankingAlgorithm form of NRT improve the docs-added-per-second performance?

My add-to-view limits aren't really threatened by Solr performance today; something like 30 seconds is just fine. But I am feeling close enough to the documents-per-second boundary that I'm pondering measures like master/slave. If NRT only improvs add-to-view lag, I'm not overly interested, but if it can improve add throughput, I'm all over it ;-)

-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep










Re: Strategy --> Frequent updates in our application

Posted by Nagendra Nagarajayya <nn...@transaxtions.com>.
Hi Naveen:

Solr with RankingAlgorithm supports NRT. The performance is about 262 
docs / sec. You can get more information about the performance and NRT 
from here:
http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search

You can download Solr with RankingAlgorithm from here:
http://solr-ra.tgels.com

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.com

On 6/2/2011 8:29 PM, Naveen Gupta wrote:
> Hi
>
> We are having an application where every 10 mins, we are doing indexing of
> users docs repository, and eventually, if some thread is being added in that
> particular discussion, we need to index the thread again (please note we are
> not doing blind indexing each time, we have various rules to filter out
> which thread is new and thus that is a candidate for indexing plus new ones
> which has arrived).
>
> So we are doing updates for each user docs repository .. the performance is
> not looking so far very good. the future is that we are going to get hits in
> volume(1000 to 10,000 hits per mins), so looking for strategy where we can
> tune solr in order to index the data in real time
>
> and what about NRT, is it fine to apply in this case of scenario. i read
> that solr NRT is not very good in performance, but i am not going to believe
> it since it is one of the best open sources ..so it is going to have this
> problem sorted in near future ..but if any benchmark is there, kindly share
> with me ... we would like to analyze with our requirements.
>
> Is there any way to add incremental indexes which we generally find in other
> search engine like endeca and etc? i don't know much in detail about solr...
> since i am newbie, so can you please tell me if we can have some settings
> which can keep track of incremental indexing?
>
>
> Thanks
> Naveen
>