You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Victor Ruiz <bi...@gmail.com> on 2013/04/12 14:45:38 UTC
SolrCloud vs Solr master-slave replication
Hi,
I've just posted this week an issue today with our Solr index:
http://lucene.472066.n3.nabble.com/corrupted-index-in-slave-td4054769.html,
Today, that error started to happen constantly for almost every request, and
I created a JIRA issue becaue I thought it was a bug
https://issues.apache.org/jira/browse/SOLR-4707
As you can read, at the end it was due to a fail in the Solr master-slave
replication, and now I don't know if we should think about migrating to
SolrCloud, since Solr master-slave replications seems not to fit to our
requirements:
* index size: ~20 million documents, ~9GB
* ~1200 updates/min
* ~10000 queries/min (distributed over 2 slaves) MoreLikeThis, RealTimeGet,
TermVectorComponent, SearchHandler
I would thank you if anyone could help me to answer these questions:
* Would it be advisable to migrate to SolrCloud? Would it have impact on the
replication performance?
* In that case, what would have better performance? to maintain a copy of
the index in every server, or to use shard servers?
* How many shards and replicas would you advice for ensuring high
availability?
Kind Regards,
Victor
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud vs Solr master-slave replication
Posted by Lance Norskog <go...@gmail.com>.
Run checksums on all files in both master and slave, and verify that
they are the same.
TCP/IP has a checksum algorithm that was state-of-the-art in 1969.
On 04/18/2013 02:10 AM, Victor Ruiz wrote:
> Also, I forgot to say... the same error started to happen again.. the index
> is again corrupted :(
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4056926.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud vs Solr master-slave replication
Posted by Victor Ruiz <bi...@gmail.com>.
Also, I forgot to say... the same error started to happen again.. the index
is again corrupted :(
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4056926.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud vs Solr master-slave replication
Posted by Victor Ruiz <bi...@gmail.com>.
Thank you again for your answer Shawn.
Network card seems to work fine, but we've found segmentation faults, so now
our hosting provider is going to run a full hw check. Hopefully they'll
replace the server and problem wil be solved
Regards,
Victor
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4056925.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud vs Solr master-slave replication
Posted by Shawn Heisey <so...@elyograg.org>.
On 4/15/2013 3:38 AM, Victor Ruiz wrote:
> About SolrCloud, I know it doesn't use master-slave replication, but
> incremental updates, item by item. That's why I thought it could work for
> us, since our bottleneck appear to be the replication cycles. But another
> point is, if the indexing occurs in all servers, 1200 updates/min could also
> overload the servers? and therefore have a worst performance than with
> master-slave replication?
One version (4.1, I think) has a problem that results in the entire
index being replicated every time. The I/O required for that makes
everything slow down on both master and slave.
There are reports of new master/slave replication problems with 4.2 and
4.2.1, but I'm not entirely clear on whether those are just cosmetic
problems with index version reporting or whether some people are having
actual real problems.
In 3.x and older, replication was generally the best option for multiple
copies of your index, because there was no NRT indexing capability.
Updating the index was a resource-intensive process with a high impact
on searching, loading a replicated index was better.
Version 4.x adds NRT capabilities, so indexing impacts searches far less
than it used to. SolrCloud with NRT features (frequent soft commits,
less frequent hard commits) is the recommended configuration path now.
Thanks,
Shawn
Re: SolrCloud vs Solr master-slave replication
Posted by Victor Ruiz <bi...@gmail.com>.
Hi Shawn,
thank you for your reply.
I'll check if network card drivers are ok. About the RAM, the JVM max heap
size is currently 6GB, but it never reaches the maximum, tipically the used
RAM is not more than 5GB. should I assign more RAM? I've read that excess of
RAM assigned could have also a bad effect on the performance. Apart of the
RAM used by JVM, the server has more than 10GB of unused RAM, which should
be enough to cache the index.
About SolrCloud, I know it doesn't use master-slave replication, but
incremental updates, item by item. That's why I thought it could work for
us, since our bottleneck appear to be the replication cycles. But another
point is, if the indexing occurs in all servers, 1200 updates/min could also
overload the servers? and therefore have a worst performance than with
master-slave replication?
Regards,
Victor
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4055995.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud vs Solr master-slave replication
Posted by Shawn Heisey <so...@elyograg.org>.
On 4/12/2013 6:45 AM, Victor Ruiz wrote:
> As you can read, at the end it was due to a fail in the Solr master-slave
> replication, and now I don't know if we should think about migrating to
> SolrCloud, since Solr master-slave replications seems not to fit to our
> requirements:
>
> * index size: ~20 million documents, ~9GB
> * ~1200 updates/min
> * ~10000 queries/min (distributed over 2 slaves) MoreLikeThis, RealTimeGet,
> TermVectorComponent, SearchHandler
>
> I would thank you if anyone could help me to answer these questions:
>
> * Would it be advisable to migrate to SolrCloud? Would it have impact on the
> replication performance?
> * In that case, what would have better performance? to maintain a copy of
> the index in every server, or to use shard servers?
> * How many shards and replicas would you advice for ensuring high
> availability?
The fact that your replication is producing a corrupt index suggests
that your network, your server hardware, or your software install is
unreliable. The TCP protocol used for all Solr communication (as well
as the Internet in general) has error detection and retransmissions.
I'm not saying that replication can't have bugs, but usually those bugs
result in replication not working, they don't typically cause index
corruption.
I see a previous message where you say everything is on the same LAN
with gigabit ethernet. There are a lot of things that can go wrong with
gigabit. At the physical layer: Using cat5 cable instead of cat5e or
cat6 can lead to problems. You could have a bad cable, or the RJ45
connectors could be badly crimped. If you are using patch panels, they
may be bad or only rated for cat5. At layer 2, you can have duplex
mismatches, common when one side is hard-set to full duplex and the
other side is left at auto or is a dumb switch that can't be changed.
Even if you have these problems, it still won't usually cause data
corruption unless the hardware or OS is also faulty.
One somewhat common example of a problem that can cause data corruption
in network communication is buggy firmware on the network card,
especially with Broadcom chips. Upgrading to the latest firmware will
usually fix these problems.
Now for your questions: SolrCloud doesn't use replication during normal
operation. When you index, the indexing happens on all replicas in
parallel.
Replication does sometimes get used by SolrCloud, but only if a replica
goes down and there's not enough information in the transaction log to
reconstruct recent updates when it comes back up.
As for whether or not to use shards: that's really up to you. Solr
should have no trouble with a single-shard 9GB index that has 20 million
documents, as long as you give enough memory to the java heap and have
8GB or so left over for the OS to cache the index. That means you want
to have 12-16GB of RAM in each server. If Solr is not the only thing
running on the hardware, then you'd want more RAM.
For the update and query volume you have described, having plenty of RAM
and lots of CPU cores will be critical.
Thanks,
Shawn