You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Darko Todoric <to...@mdpi.com> on 2018/05/18 12:28:52 UTC

Solr import doubling space on disk

Hi guys,

We have about 250gb solr data on one server and when we start full 
import solr doubling space on disk... This is problem for us because we 
have 500gb SSD on this server and we hit almost 100% disk usage when 
full import running.
Because we don't use "clean" option, are they are way to tell full/delta 
import that update data immediately and don't wait to finished and then 
update all? In that way, full import no need to create this tmp folder 
from the 250gb.

Kind regards,
Darko Todoric

Re: Solr import doubling space on disk

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
And (as an additive comment),

You may want to index into a completely separate collection and then
do alias switching to point to it when done. That indexing could even
be on a separate machine.

Regards,
   Alex.

On 18 May 2018 at 08:47, Emir Arnautović <em...@sematext.com> wrote:
> Hi Darko,
> There is no updating data in Solr. It is always written into new segment and if some existing document has the same ID it will be flagged as deleted but will not be removed until that segment is merged. While merging it will keep old segments until new is done and searcher updated. So in any case there is a change that Solr might need more space than index. In some extreme cases it can be even three times the size of an index.
> I am bit rusty on DIH, but based on your comment it seems that full-import is doing temp index and then switch. Delta import should update existing and if you can use delta import you should be safe. Having 250GB index and max segment of 5GB you should not reach 500GB even if you delta import all documents.
> Please note that for full import it is advisable to create a new index so I would suggest that you start asking for bigger disks.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 18 May 2018, at 14:28, Darko Todoric <to...@mdpi.com> wrote:
>>
>> Hi guys,
>>
>> We have about 250gb solr data on one server and when we start full import solr doubling space on disk... This is problem for us because we have 500gb SSD on this server and we hit almost 100% disk usage when full import running.
>> Because we don't use "clean" option, are they are way to tell full/delta import that update data immediately and don't wait to finished and then update all? In that way, full import no need to create this tmp folder from the 250gb.
>>
>> Kind regards,
>> Darko Todoric
>

Re: Solr import doubling space on disk

Posted by Emir Arnautović <em...@sematext.com>.
Hi Darko,
There is no updating data in Solr. It is always written into new segment and if some existing document has the same ID it will be flagged as deleted but will not be removed until that segment is merged. While merging it will keep old segments until new is done and searcher updated. So in any case there is a change that Solr might need more space than index. In some extreme cases it can be even three times the size of an index.
I am bit rusty on DIH, but based on your comment it seems that full-import is doing temp index and then switch. Delta import should update existing and if you can use delta import you should be safe. Having 250GB index and max segment of 5GB you should not reach 500GB even if you delta import all documents.
Please note that for full import it is advisable to create a new index so I would suggest that you start asking for bigger disks.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 18 May 2018, at 14:28, Darko Todoric <to...@mdpi.com> wrote:
> 
> Hi guys,
> 
> We have about 250gb solr data on one server and when we start full import solr doubling space on disk... This is problem for us because we have 500gb SSD on this server and we hit almost 100% disk usage when full import running.
> Because we don't use "clean" option, are they are way to tell full/delta import that update data immediately and don't wait to finished and then update all? In that way, full import no need to create this tmp folder from the 250gb.
> 
> Kind regards,
> Darko Todoric