You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Luis Cappa Banda <lu...@gmail.com> on 2011/09/05 17:18:31 UTC

Solr 1.4.1: problems with replication and index operation both at the same time.

Hello.

I have the suspicion that while the replication is in process from a batch
machine to N slaves machines I have performance problems: read timed out
exceptions, etc. The thing is that I have deployed a real time environment
where the batch machine recieves petitions, process them and then index. At
the same time N slaves machines are listening (I have an autocommit warm
every ten minutes) and replicates the new indexes. By the way, I did some
tests and noticed that while some thousands of index petitions are in
process while replication is also activated, Solr performance decreases.

The questions are:

 - Do you know if Solr 1.4.1 has that kind of bug?

- Is it posible that with a newest version will solve it?

- Any suggestion that will help me to solve the problem? :-(


Thank you very much!

Re: Solr 1.4.1: problems with replication and index operation both at the same time.

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, why all those tomcats? Are they all running Solr?
I'm pretty sure you'd be a lot better off simply having
one indexer and one searcher on this box. Give the searcher
the most resources I'd guess. In fact, you'd
be even better off offloading the indexing process to a different
(perhaps less powerful machine) and running your searcher
on this box, giving it a bunch of resources.

Here's what I'd do: Monitor the box (top on a *nix machine is
a good first approximation) and see what your resource contention
is all about. I suspect you're simply I/O bound since, if I'm reading
this all correctly, this is all on one box and they're all Solr's. It's
all sitting on one poor little disk, which is probably being beaten to
death when it replicates to the 5 tomcats on the same box that
are slaves at the same time.

Try it with one slave rather than 5 for a quick test whether this is
indeed an issue.

Of course I may be mis-reading your setup entirely

Best
Erick

Re: Solr 1.4.1: problems with replication and index operation both at the same time.

Posted by Luis Cappa Banda <lu...@gmail.com>.
Excuse me, I mean an Apache Tomcat 6.

Re: Solr 1.4.1: problems with replication and index operation both at the same time.

Posted by Luis Cappa Banda <lu...@gmail.com>.
Hello, Erik.

Thank you for answering again. I'm using Java JDK 1.5 and an Apache Tomcat
1.6 configuring it's memory parameters from 1G to 2G maximum for each Tomcat
server. The machine has a RAID5 HDD, 32G RAM and eight cores, and I have six
Tomcat launched with their process running at the same time. However, I
think that the computer is enough powerful to handle the workload. Well, at
least I thought that... Finally, I'm optimizing the index one time per two
nights, but no indexing is in process, because I've got an Active MQ queue
that queues petitions while optimization is in progresss and then index
them. The alternatives that I think that could solve the problem are:

1. Updating Solr version and check the behaviour. I think that the latest
versions have replication improvements.
2. Control (right now I don't know how) the replication and index operations
so that I'll activate replication in slaves queueing new petitions (the no
index will be in progress) and then re-activate it after replication. It's a
bit complicated, but posible.

Any alternative would be welcome, :-)

Thank you for your help.

Re: Solr 1.4.1: problems with replication and index operation both at the same time.

Posted by Erick Erickson <er...@gmail.com>.
Well, if the documents do get indexed, then all you have to do
is lengthen the timeout for your connection, what is it set at now?

But this isn't expected. The first place I'd look is whether your
indexing machine is allowing the op system enough memory
to manage its disk caches well. The second question I'd ask
is whether you're optimizing your index each time you
add documents, this isn't necessary.

What are your JVM settings? What physical machine are
you running on? Have you looked at any other processes
that might be running on that machine (if it's a *nix machine,
top can be a staring point).

At a guess I'd wonder about being I/O bound, so that's the first
place I'd start looking

Best
Erick

On Tue, Sep 6, 2011 at 8:04 AM, Luis Cappa Banda <lu...@gmail.com> wrote:
> Hello, Erik.
>
> Thank you for answering. The performance decreases during indexing: while
> replication is in process the batch machine could not recieve and process
> quickly the indexing petitions and some "read timed out" exceptions appear.
> Luckily I just load some hundreds of documents every day because it isn't a
> batch operation itself (I just index daily the new documents recieved), but
> in those minutes the batch machine seems unable to operate correctly. The
> machines are in the same LAN, so I think that it's not a connection
> performance problem. Is it posible that the batch machine has I/O HDD
> problems while reading and writing into disk at the same time?
>
> Thank you very much.
>

Re: Solr 1.4.1: problems with replication and index operation both at the same time.

Posted by Luis Cappa Banda <lu...@gmail.com>.
Hello, Erik.

Thank you for answering. The performance decreases during indexing: while
replication is in process the batch machine could not recieve and process
quickly the indexing petitions and some "read timed out" exceptions appear.
Luckily I just load some hundreds of documents every day because it isn't a
batch operation itself (I just index daily the new documents recieved), but
in those minutes the batch machine seems unable to operate correctly. The
machines are in the same LAN, so I think that it's not a connection
performance problem. Is it posible that the batch machine has I/O HDD
problems while reading and writing into disk at the same time?

Thank you very much.

Re: Solr 1.4.1: problems with replication and index operation both at the same time.

Posted by Erick Erickson <er...@gmail.com>.
Luis:

First, I managed to "invite you to chat" by mistake, don't se a way to
cancel it... Sorry.....

Anyway, what exactly slows down? Indexing? search performance on the slaves?

We need some more details to answer your questions, it might help to review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, Sep 5, 2011 at 11:18 AM, Luis Cappa Banda <lu...@gmail.com> wrote:
> Hello.
>
> I have the suspicion that while the replication is in process from a batch
> machine to N slaves machines I have performance problems: read timed out
> exceptions, etc. The thing is that I have deployed a real time environment
> where the batch machine recieves petitions, process them and then index. At
> the same time N slaves machines are listening (I have an autocommit warm
> every ten minutes) and replicates the new indexes. By the way, I did some
> tests and noticed that while some thousands of index petitions are in
> process while replication is also activated, Solr performance decreases.
>
> The questions are:
>
>  - Do you know if Solr 1.4.1 has that kind of bug?
>
> - Is it posible that with a newest version will solve it?
>
> - Any suggestion that will help me to solve the problem? :-(
>
>
> Thank you very much!
>