You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by v_shan <va...@gmail.com> on 2012/02/16 12:18:44 UTC

Realtime search with multi clients updating index simultaneously.

I have a heldesk application developed in PHP/MySQL. I want to implement real
time Full text search and I have shortlisted Solr. MySQL database will store
all the tickets and their updates and that data will be imported for
building Solr index. All Search requests will be handled by Solr.

What I want is a real time search. The moment someone updates a ticket, it
should be available for search. 

As per my understanding of Solr, this is how I think the system will work. 
A user updates a ticket -> database record is modified -> a request is sent
to Solr server to modify corresponding document in index.

I have read a book on Solr and below questions are troubling me.
1. The book mentions that "commits are slow in Solr. Depending on the index
size, Solr's auto-warming
configuration, and Solr's cache state prior to committing, a commit can take
a non-trivial amount of time. Typically, it takes a few seconds, but it can
take
some number of minutes in extreme cases". If this is true then how will I
know when the data will be availbale for search and how can I implemnt
realtime search? Also I don't want the ticket update operation to be slowed
down (by adding extra step of updating Solr index)

2. It is also mentioned that "there is no transaction isolation. This means
that if more than one Solr client
were to submit modifications and commit them at overlapping times, it is
possible for part of one client's set of changes to be committed before that
client told Solr to commit. This applies to rollback as well. If this is a
problem
for your architecture then consider using one client process responsible for
updating Solr."

Does it mean that due to lack of transactional commits, Solr can mess up the
updates when multiple people update the ticket simultaneously?

Now the question before me is: Is Solr fit in my case? If yes, How?

--
View this message in context: http://lucene.472066.n3.nabble.com/Realtime-search-with-multi-clients-updating-index-simultaneously-tp3749881p3749881.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Realtime search with multi clients updating index simultaneously.

Posted by Erick Erickson <er...@gmail.com>.
See below....

On Thu, Feb 16, 2012 at 6:18 AM, v_shan <va...@gmail.com> wrote:
> I have a heldesk application developed in PHP/MySQL. I want to implement real
> time Full text search and I have shortlisted Solr. MySQL database will store
> all the tickets and their updates and that data will be imported for
> building Solr index. All Search requests will be handled by Solr.
>
> What I want is a real time search. The moment someone updates a ticket, it
> should be available for search.
>
> As per my understanding of Solr, this is how I think the system will work.
> A user updates a ticket -> database record is modified -> a request is sent
> to Solr server to modify corresponding document in index.

The first thing to understand: Solr does not update a document, it deletes
the old one and adds a new one based on <uniqueKey>.

>
> I have read a book on Solr and below questions are troubling me.
> 1. The book mentions that "commits are slow in Solr. Depending on the index
> size, Solr's auto-warming
> configuration, and Solr's cache state prior to committing, a commit can take
> a non-trivial amount of time. Typically, it takes a few seconds, but it can
> take
> some number of minutes in extreme cases". If this is true then how will I
> know when the data will be availbale for search and how can I implemnt
> realtime search? Also I don't want the ticket update operation to be slowed
> down (by adding extra step of updating Solr index)

Well, Solr trunk is in the midst of getting NRT searching (Near Real Time),
so that may be of interest. Otherwise, there is some latency defined by
"time until commit" + "replication time" + "autowarming time". You haven't
indicated how big your data set is, so what those numbers really are
is hard to even guess. Even if you do know how many records will
be there, the answer is still "try it and see". Replication time may not
be necessary if you have a small enough system, it is possible to
index and search on the same machine. On larger installations,
a latency of a few minutes is common.


>
> 2. It is also mentioned that "there is no transaction isolation. This means
> that if more than one Solr client
> were to submit modifications and commit them at overlapping times, it is
> possible for part of one client's set of changes to be committed before that
> client told Solr to commit. This applies to rollback as well. If this is a
> problem
> for your architecture then consider using one client process responsible for
> updating Solr."
>
> Does it mean that due to lack of transactional commits, Solr can mess up the
> updates when multiple people update the ticket simultaneously?
>
As above, Solr deletes and replaces complete documents. So in this case
your update process would simply honor the last-received.

But I think you're missing a bit here. User's won't update your Solr index.
Somewhere, you'll have a process that queries your MySql database
and updates any changed records. The MySql database is your
system-of-record and where your transactional integrity is maintained.
The process that queries the database and sends the results to Solr will
just see the results of the aggregate changes to the underlying database
as single records, so I don't think this is an issue.

Best
Erick

> Now the question before me is: Is Solr fit in my case? If yes, How?
>
Can't answer this for you.

> --
> View this message in context: http://lucene.472066.n3.nabble.com/Realtime-search-with-multi-clients-updating-index-simultaneously-tp3749881p3749881.html
> Sent from the Solr - User mailing list archive at Nabble.com.