You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "vijay.sampath" <vi...@baml.com> on 2011/11/02 03:58:38 UTC

Solr real-time update taking time

Hi All, 

  I recently started working on SOLR 3.3 and would need your expertise to
provide a solution. I'm working on a POC, in which I've imported 3.5 million
document records using DIH. We have a source system which publishes change
data capture in a XML format. The requirement is to integrate SOLR with the
real time CDC updates. I've written an utility program which receives the
XML message, transform and update SOLR using SOLRJ. The source system
publishes atleast 3-4 messages per second, and the requirement is to have
the changes reflected within 1-2 seconds. Right now it takes almost 15-25
seconds to get the changes committed in SOLR. I know, commit at every record
or every second would hamper the search and indexing.

I thought of having a Master for writes and a Slave for reads, but again not
sure how fast the replication would be? Since the requirement is to have the
change data capture in 1-2 seconds. 

Any thoughts or suggesstions are appreciated. Thanks again. 



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-tp3472709p3472709.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr real-time update taking time

Posted by Erick Erickson <er...@gmail.com>.
I think that 1-2 second requirement is unreasonable. The first thing I'd
do is push back and understand whether this is actually a requirement or
just somebody picking numbers our of thin air.

Committing often enough for this to work is just *asking* for trouble
with 3.3. I'd
take a look at the Near Real Time (NRT) stuff happening on trunk if this turns
out to be a hard requirement.

Best
Erick

On Wed, Nov 2, 2011 at 11:30 PM, Vijay Sampath
<vi...@baml.com> wrote:
> Hi Jan,
>
>  Thanks very much for the suggestion. I used CommitWithin(5000) and the
> response came down to less than a second.  But I see an inconsistent
> behaviour on the response times. Sometimes it's taking more than 20-25
> seconds. May be I'll open up a separate thread.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-tp3472709p3476091.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr real-time update taking time

Posted by Vijay Sampath <vi...@baml.com>.
Hi Jan, 

  Thanks very much for the suggestion. I used CommitWithin(5000) and the
response came down to less than a second.  But I see an inconsistent
behaviour on the response times. Sometimes it's taking more than 20-25
seconds. May be I'll open up a separate thread. 

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-tp3472709p3476091.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr real-time update taking time

Posted by Vijay Sampath <vi...@baml.com>.
I'll try to use CommitWithin. Just to confirm, if I have the value as 2
seconds, will it affect my search performance?  

To answer you questions, 
1. spellCheck is not used with buildOnCommit
2. Index size is 16.1 GB and RAM allocated to JVM 1GB. 

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-tp3472709p3474334.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr real-time update taking time

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

You probably want to use CommitWithin: http://wiki.apache.org/solr/CommitWithin to limit the number of commits to a minimum.

Some other questions:
* Are you using spellcheck with builtOnCommit? That totally kills commit performance...
* What's your index size, total RAM and allocated RAM to JVM?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 2. nov. 2011, at 03:58, vijay.sampath wrote:

> Hi All, 
> 
>  I recently started working on SOLR 3.3 and would need your expertise to
> provide a solution. I'm working on a POC, in which I've imported 3.5 million
> document records using DIH. We have a source system which publishes change
> data capture in a XML format. The requirement is to integrate SOLR with the
> real time CDC updates. I've written an utility program which receives the
> XML message, transform and update SOLR using SOLRJ. The source system
> publishes atleast 3-4 messages per second, and the requirement is to have
> the changes reflected within 1-2 seconds. Right now it takes almost 15-25
> seconds to get the changes committed in SOLR. I know, commit at every record
> or every second would hamper the search and indexing.
> 
> I thought of having a Master for writes and a Slave for reads, but again not
> sure how fast the replication would be? Since the requirement is to have the
> change data capture in 1-2 seconds. 
> 
> Any thoughts or suggesstions are appreciated. Thanks again. 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-tp3472709p3472709.html
> Sent from the Solr - User mailing list archive at Nabble.com.