You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Victor Ruiz <bi...@gmail.com> on 2013/04/12 14:45:38 UTC

SolrCloud vs Solr master-slave replication

Hi,

I've just posted this week an issue today with our Solr index:
http://lucene.472066.n3.nabble.com/corrupted-index-in-slave-td4054769.html,

Today, that error started to happen constantly for almost every request, and
I created a JIRA issue becaue I thought it was a bug
https://issues.apache.org/jira/browse/SOLR-4707

As you can read, at the end it was due to a fail in the Solr master-slave
replication, and now I don't know if we should think about migrating to
SolrCloud, since Solr master-slave replications seems not to fit to our
requirements:

* index size:  ~20 million documents, ~9GB
* ~1200 updates/min
* ~10000 queries/min (distributed over 2 slaves)  MoreLikeThis, RealTimeGet,
TermVectorComponent, SearchHandler

I would thank you if anyone could help me to answer these questions:

* Would it be advisable to migrate to SolrCloud? Would it have impact on the
replication performance? 
* In that case, what would have better performance? to maintain a copy of
the index in every server, or to use shard servers?
* How many shards and replicas would you advice for ensuring high
availability? 

Kind Regards,

Victor



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud vs Solr master-slave replication

Posted by Lance Norskog <go...@gmail.com>.

Run checksums on all files in both master and slave, and verify that 
they are the same.
TCP/IP has a checksum algorithm that was state-of-the-art in 1969.

On 04/18/2013 02:10 AM, Victor Ruiz wrote:
> Also, I forgot to say... the same error started to happen again.. the index
> is again corrupted :(
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4056926.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud vs Solr master-slave replication

Posted by Victor Ruiz <bi...@gmail.com>.

Also, I forgot to say... the same error started to happen again.. the index
is again corrupted :(



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4056926.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud vs Solr master-slave replication

Posted by Victor Ruiz <bi...@gmail.com>.

Thank you again for your answer Shawn. 

Network card seems to work fine, but we've found segmentation faults, so now
our hosting provider is going to run a full hw check. Hopefully they'll
replace the server and problem wil be solved

Regards,
Victor





--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4056925.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud vs Solr master-slave replication

Posted by Shawn Heisey <so...@elyograg.org>.

On 4/15/2013 3:38 AM, Victor Ruiz wrote:
> About SolrCloud, I know it doesn't use master-slave replication, but
> incremental updates, item by item. That's why I thought it could work for
> us, since our bottleneck appear to be the replication cycles. But another
> point is, if the indexing occurs in all servers, 1200 updates/min could also
> overload the servers? and therefore have a worst performance than with
> master-slave replication?

One version (4.1, I think) has a problem that results in the entire 
index being replicated every time.  The I/O required for that makes 
everything slow down on both master and slave.

There are reports of new master/slave replication problems with 4.2 and 
4.2.1, but I'm not entirely clear on whether those are just cosmetic 
problems with index version reporting or whether some people are having 
actual real problems.

In 3.x and older, replication was generally the best option for multiple 
copies of your index, because there was no NRT indexing capability. 
Updating the index was a resource-intensive process with a high impact 
on searching, loading a replicated index was better.

Version 4.x adds NRT capabilities, so indexing impacts searches far less 
than it used to.  SolrCloud with NRT features (frequent soft commits, 
less frequent hard commits) is the recommended configuration path now.

Thanks,
Shawn

Re: SolrCloud vs Solr master-slave replication

Posted by Victor Ruiz <bi...@gmail.com>.

Hi Shawn,

thank you for your reply. 

I'll check if network card drivers are ok. About the RAM, the JVM max heap
size is currently 6GB, but it never reaches the maximum, tipically the used
RAM is not more than 5GB. should I assign more RAM? I've read that excess of
RAM assigned could have also a bad effect on the performance. Apart of the
RAM used by JVM, the server has more than 10GB of unused RAM, which should
be enough to cache the index.

About SolrCloud, I know it doesn't use master-slave replication, but
incremental updates, item by item. That's why I thought it could work for
us, since our bottleneck appear to be the replication cycles. But another
point is, if the indexing occurs in all servers, 1200 updates/min could also
overload the servers? and therefore have a worst performance than with
master-slave replication?

Regards,
Victor





--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4055995.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud vs Solr master-slave replication

Posted by Shawn Heisey <so...@elyograg.org>.

On 4/12/2013 6:45 AM, Victor Ruiz wrote:
> As you can read, at the end it was due to a fail in the Solr master-slave
> replication, and now I don't know if we should think about migrating to
> SolrCloud, since Solr master-slave replications seems not to fit to our
> requirements:
>
> * index size:  ~20 million documents, ~9GB
> * ~1200 updates/min
> * ~10000 queries/min (distributed over 2 slaves)  MoreLikeThis, RealTimeGet,
> TermVectorComponent, SearchHandler
>
> I would thank you if anyone could help me to answer these questions:
>
> * Would it be advisable to migrate to SolrCloud? Would it have impact on the
> replication performance?
> * In that case, what would have better performance? to maintain a copy of
> the index in every server, or to use shard servers?
> * How many shards and replicas would you advice for ensuring high
> availability?

The fact that your replication is producing a corrupt index suggests 
that your network, your server hardware, or your software install is 
unreliable.  The TCP protocol used for all Solr communication (as well 
as the Internet in general) has error detection and retransmissions. 
I'm not saying that replication can't have bugs, but usually those bugs 
result in replication not working, they don't typically cause index 
corruption.

I see a previous message where you say everything is on the same LAN 
with gigabit ethernet.  There are a lot of things that can go wrong with 
gigabit.  At the physical layer: Using cat5 cable instead of cat5e or 
cat6 can lead to problems.  You could have a bad cable, or the RJ45 
connectors could be badly crimped.  If you are using patch panels, they 
may be bad or only rated for cat5.  At layer 2, you can have duplex 
mismatches, common when one side is hard-set to full duplex and the 
other side is left at auto or is a dumb switch that can't be changed. 
Even if you have these problems, it still won't usually cause data 
corruption unless the hardware or OS is also faulty.

One somewhat common example of a problem that can cause data corruption 
in network communication is buggy firmware on the network card, 
especially with Broadcom chips.  Upgrading to the latest firmware will 
usually fix these problems.

Now for your questions: SolrCloud doesn't use replication during normal 
operation.  When you index, the indexing happens on all replicas in 
parallel.

Replication does sometimes get used by SolrCloud, but only if a replica 
goes down and there's not enough information in the transaction log to 
reconstruct recent updates when it comes back up.

As for whether or not to use shards: that's really up to you.  Solr 
should have no trouble with a single-shard 9GB index that has 20 million 
documents, as long as you give enough memory to the java heap and have 
8GB or so left over for the OS to cache the index.  That means you want 
to have 12-16GB of RAM in each server.  If Solr is not the only thing 
running on the hardware, then you'd want more RAM.

For the update and query volume you have described, having plenty of RAM 
and lots of CPU cores will be critical.

Thanks,
Shawn