You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mahmoud Almokadem <pr...@gmail.com> on 2016/07/18 11:05:07 UTC

Cold replication

Hi, 

We have SolrCloud 6.0 installed on 4 i2.2xlarge instances with 4 shards. We store the indices on EBS attached to these instances. Fortunately these instances are equipped with TEMPORARY SSDs. We need to the store the indices on the SSDs but they are not safe.

The index is updated every five minutes. 

Could we use the SSDs to store the indices and create an incremental backup or cold replication on the EBS? So we use EBS only for storing indices not serving the data to the solr.

Incase of losing the data on SSDs we can restore a backup from the EBS. Is it possible?

Thanks, 
Mahmoud 



Re: Cold replication

Posted by Erick Erickson <er...@gmail.com>.
The fact that your index is 200G is meaningless,
assuming you're talking about disk size. Please
just measure before you make assumptions about
what will work, it'll save you a world of hurt. I'm not
claiming that just using EBS will satisfy your
need, but if you're swapping your search speed will
suffer a lot, it doesn't matter whether you're swapping
to SSD or EBS. Well, it does matter but either way I'm
90% sure you won't be satisfied with performance. It's
just that you'll be _less_ unhappy with SSD.

If your index is changing rapidly, SSDs can be really
useful as they make loading parts of the index
into memory faster.

I say that the disk size of your index is meaningless, and
by that I mean that, for instance, if you've set stored="true"
for a field, a verbatim copy of that is stored on disk
and is largely irrelevant in terms of the memory it requires as
it's only read for assembling the final doc to return to
the user, not for finding matches. The stored data is held in
*.fdt files. The *.fdt files may be a very small percent of the
disk space or a very large percent, there's no way to know.
Other options also have disk .vs. memory implications.

As far as autowarming, that's simply a way to replay some
of the recent queries and filter queries when a new searcher
is opened (i.e. you commit new documents). It's intended
to smooth over spikes by moving relevant parts of the index
to memory from disk.

Best,
Erick


On Tue, Jul 19, 2016 at 1:16 AM, Emir Arnautovic
<em...@sematext.com> wrote:
> Hi Mahmoud,
> What you can do is use local SSD disk as cache for EBS. You can try lvmcache
> or bcache. It will boost your performance while data will remain on EBS.
>
> Thanks,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
> On 18.07.2016 19:34, Mahmoud Almokadem wrote:
>>
>> Thanks Erick,
>>
>> I'll take a look at the replication on Solr. But I don't know if it well
>> support incremental backup or not.
>>
>> And I want to use SSD because my index cannot be held in memory. The index
>> is about 200GB on each instance and the RAM is 61GB and the update
>> frequency is high. So, I want to use SSDs equipped with the servers
>> instead
>> on EBSs.
>>
>> Would you explain what you mean with proper warming?
>>
>> Thanks,
>> Mahmoud
>>
>>
>> On Mon, Jul 18, 2016 at 5:46 PM, Erick Erickson <er...@gmail.com>
>> wrote:
>>
>>> Have you tried the replication API backup command here?
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler
>>>
>>> Warning, I haven't worked with this personally in this
>>> situation so test.
>>>
>>> I do have to ask why you think SSDs are required here and
>>> if you've measured. With proper warming, most of the
>>> index is held in memory anyway and the source of
>>> the data (SSD or spinning) is not a huge issue. SSDs
>>> certainly are better/faster, but have you measured whether
>>> they are _enough_ faster to be worth the added
>>> complexity?
>>>
>>> Best,
>>> Erick
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, Jul 18, 2016 at 4:05 AM, Mahmoud Almokadem
>>> <pr...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We have SolrCloud 6.0 installed on 4 i2.2xlarge instances with 4 shards.
>>>
>>> We store the indices on EBS attached to these instances. Fortunately
>>> these
>>> instances are equipped with TEMPORARY SSDs. We need to the store the
>>> indices on the SSDs but they are not safe.
>>>>
>>>> The index is updated every five minutes.
>>>>
>>>> Could we use the SSDs to store the indices and create an incremental
>>>
>>> backup or cold replication on the EBS? So we use EBS only for storing
>>> indices not serving the data to the solr.
>>>>
>>>> Incase of losing the data on SSDs we can restore a backup from the EBS.
>>>
>>> Is it possible?
>>>>
>>>> Thanks,
>>>> Mahmoud
>>>>
>>>>
>

Re: Cold replication

Posted by Emir Arnautovic <em...@sematext.com>.
Hi Mahmoud,
What you can do is use local SSD disk as cache for EBS. You can try 
lvmcache or bcache. It will boost your performance while data will 
remain on EBS.

Thanks,
Emir

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 18.07.2016 19:34, Mahmoud Almokadem wrote:
> Thanks Erick,
>
> I'll take a look at the replication on Solr. But I don't know if it well
> support incremental backup or not.
>
> And I want to use SSD because my index cannot be held in memory. The index
> is about 200GB on each instance and the RAM is 61GB and the update
> frequency is high. So, I want to use SSDs equipped with the servers instead
> on EBSs.
>
> Would you explain what you mean with proper warming?
>
> Thanks,
> Mahmoud
>
>
> On Mon, Jul 18, 2016 at 5:46 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Have you tried the replication API backup command here?
>>
>> https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler
>>
>> Warning, I haven't worked with this personally in this
>> situation so test.
>>
>> I do have to ask why you think SSDs are required here and
>> if you've measured. With proper warming, most of the
>> index is held in memory anyway and the source of
>> the data (SSD or spinning) is not a huge issue. SSDs
>> certainly are better/faster, but have you measured whether
>> they are _enough_ faster to be worth the added
>> complexity?
>>
>> Best,
>> Erick
>>
>> Best,
>> Erick
>>
>> On Mon, Jul 18, 2016 at 4:05 AM, Mahmoud Almokadem
>> <pr...@gmail.com> wrote:
>>> Hi,
>>>
>>> We have SolrCloud 6.0 installed on 4 i2.2xlarge instances with 4 shards.
>> We store the indices on EBS attached to these instances. Fortunately these
>> instances are equipped with TEMPORARY SSDs. We need to the store the
>> indices on the SSDs but they are not safe.
>>> The index is updated every five minutes.
>>>
>>> Could we use the SSDs to store the indices and create an incremental
>> backup or cold replication on the EBS? So we use EBS only for storing
>> indices not serving the data to the solr.
>>> Incase of losing the data on SSDs we can restore a backup from the EBS.
>> Is it possible?
>>> Thanks,
>>> Mahmoud
>>>
>>>

Re: Cold replication

Posted by Mahmoud Almokadem <pr...@gmail.com>.
Thanks Erick,

I'll take a look at the replication on Solr. But I don't know if it well
support incremental backup or not.

And I want to use SSD because my index cannot be held in memory. The index
is about 200GB on each instance and the RAM is 61GB and the update
frequency is high. So, I want to use SSDs equipped with the servers instead
on EBSs.

Would you explain what you mean with proper warming?

Thanks,
Mahmoud


On Mon, Jul 18, 2016 at 5:46 PM, Erick Erickson <er...@gmail.com>
wrote:

> Have you tried the replication API backup command here?
>
> https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler
>
> Warning, I haven't worked with this personally in this
> situation so test.
>
> I do have to ask why you think SSDs are required here and
> if you've measured. With proper warming, most of the
> index is held in memory anyway and the source of
> the data (SSD or spinning) is not a huge issue. SSDs
> certainly are better/faster, but have you measured whether
> they are _enough_ faster to be worth the added
> complexity?
>
> Best,
> Erick
>
> Best,
> Erick
>
> On Mon, Jul 18, 2016 at 4:05 AM, Mahmoud Almokadem
> <pr...@gmail.com> wrote:
> > Hi,
> >
> > We have SolrCloud 6.0 installed on 4 i2.2xlarge instances with 4 shards.
> We store the indices on EBS attached to these instances. Fortunately these
> instances are equipped with TEMPORARY SSDs. We need to the store the
> indices on the SSDs but they are not safe.
> >
> > The index is updated every five minutes.
> >
> > Could we use the SSDs to store the indices and create an incremental
> backup or cold replication on the EBS? So we use EBS only for storing
> indices not serving the data to the solr.
> >
> > Incase of losing the data on SSDs we can restore a backup from the EBS.
> Is it possible?
> >
> > Thanks,
> > Mahmoud
> >
> >
>

Re: Cold replication

Posted by Erick Erickson <er...@gmail.com>.
Have you tried the replication API backup command here?
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler

Warning, I haven't worked with this personally in this
situation so test.

I do have to ask why you think SSDs are required here and
if you've measured. With proper warming, most of the
index is held in memory anyway and the source of
the data (SSD or spinning) is not a huge issue. SSDs
certainly are better/faster, but have you measured whether
they are _enough_ faster to be worth the added
complexity?

Best,
Erick

Best,
Erick

On Mon, Jul 18, 2016 at 4:05 AM, Mahmoud Almokadem
<pr...@gmail.com> wrote:
> Hi,
>
> We have SolrCloud 6.0 installed on 4 i2.2xlarge instances with 4 shards. We store the indices on EBS attached to these instances. Fortunately these instances are equipped with TEMPORARY SSDs. We need to the store the indices on the SSDs but they are not safe.
>
> The index is updated every five minutes.
>
> Could we use the SSDs to store the indices and create an incremental backup or cold replication on the EBS? So we use EBS only for storing indices not serving the data to the solr.
>
> Incase of losing the data on SSDs we can restore a backup from the EBS. Is it possible?
>
> Thanks,
> Mahmoud
>
>