You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Hoggarth, Gil" <Gi...@bl.uk> on 2013/11/11 16:44:22 UTC

How to cancel a collection 'optimize'?

We have an internal Solr collection with ~1 billion documents. It's
split across 24 shards and uses ~3.2TB of disk space. Unfortunately
we've triggered an 'optimize' on the collection (via a restarted browser
tab), which has raised the disk usage to 4.6TB, with 130GB left on the
disk volume.

 

As I fully expect Solr to use up all of the disk space as the collection
is more than 50% of the disk volume, how can I cancel this optimize? And
separately, if I were to reissue with maxSegments=(high number, eg 40),
should I still expect the same disk usage? (I'm presuming so as doesn't
it need to gather the whole index to determine which docs should go into
which segments?)

 

Solr 4.4 on RHEL6.4, 160GB RAM, 5GB per shard.

 

(Great conference last week btw - so much to learn!)

 

 

Gil Hoggarth

Web Archiving Technical Services Engineer 

The British Library, Boston Spa, West Yorkshire, LS23 7BQ

Tel: 01937 546163

 


Re: How to cancel a collection 'optimize'?

Posted by Yonik Seeley <yo...@heliosearch.com>.
On Mon, Nov 11, 2013 at 11:28 AM, Hoggarth, Gil <Gi...@bl.uk> wrote:
> I could stop the whole Solr service as
> as yet there's no audience access to it, but might it be left in an
> incomplete state and thus try to complete optimisation when the service
> is restarted?

Should be fine.

Lucene has a write-once architecture... existing segment files are not
changed, and only deleted when a merge (producing a new segment
containing the old segment) has completed.  So if you stop things in
the middle of a commit/optimize, the index should always correctly
open on the last completed commit/optimize.

-Yonik
http://heliosearch.com -- making solr shine

RE: How to cancel a collection 'optimize'?

Posted by "Hoggarth, Gil" <Gi...@bl.uk>.
Hi Otis, thanks for the response. I could stop the whole Solr service as
as yet there's no audience access to it, but might it be left in an
incomplete state and thus try to complete optimisation when the service
is restarted?

[Yes, we did speak in Dublin - you can see we need that monitoring
service! Must set up the demo version, asap!]

-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com] 
Sent: 11 November 2013 16:02
To: solr-user@lucene.apache.org
Subject: Re: How to cancel a collection 'optimize'?

Hi Gil,
(we spoke in Dublin, didn't we?)

Short of stopping Solr I have a feeling there isn't much you can do....
hm..... or, I wonder if you could somehow get a thread dump, get the PID
of the thread (since I believe threads in Linux are run as processes),
and then kill that thread... Feels scary and I'm not sure what this
might do to the index, but maybe somebody else can jump in and comment
on this approach or suggest a better one.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics Solr &
Elasticsearch Support * http://sematext.com/


On Mon, Nov 11, 2013 at 10:44 AM, Hoggarth, Gil <Gi...@bl.uk>
wrote:
> We have an internal Solr collection with ~1 billion documents. It's 
> split across 24 shards and uses ~3.2TB of disk space. Unfortunately 
> we've triggered an 'optimize' on the collection (via a restarted 
> browser tab), which has raised the disk usage to 4.6TB, with 130GB 
> left on the disk volume.
>
>
>
> As I fully expect Solr to use up all of the disk space as the 
> collection is more than 50% of the disk volume, how can I cancel this 
> optimize? And separately, if I were to reissue with maxSegments=(high 
> number, eg 40), should I still expect the same disk usage? (I'm 
> presuming so as doesn't it need to gather the whole index to determine

> which docs should go into which segments?)
>
>
>
> Solr 4.4 on RHEL6.4, 160GB RAM, 5GB per shard.
>
>
>
> (Great conference last week btw - so much to learn!)
>
>
>
>
>
> Gil Hoggarth
>
> Web Archiving Technical Services Engineer
>
> The British Library, Boston Spa, West Yorkshire, LS23 7BQ
>
> Tel: 01937 546163
>
>
>

Re: How to cancel a collection 'optimize'?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Gil,
(we spoke in Dublin, didn't we?)

Short of stopping Solr I have a feeling there isn't much you can
do.... hm..... or, I wonder if you could somehow get a thread dump,
get the PID of the thread (since I believe threads in Linux are run as
processes), and then kill that thread... Feels scary and I'm not sure
what this might do to the index, but maybe somebody else can jump in
and comment on this approach or suggest a better one.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Nov 11, 2013 at 10:44 AM, Hoggarth, Gil <Gi...@bl.uk> wrote:
> We have an internal Solr collection with ~1 billion documents. It's
> split across 24 shards and uses ~3.2TB of disk space. Unfortunately
> we've triggered an 'optimize' on the collection (via a restarted browser
> tab), which has raised the disk usage to 4.6TB, with 130GB left on the
> disk volume.
>
>
>
> As I fully expect Solr to use up all of the disk space as the collection
> is more than 50% of the disk volume, how can I cancel this optimize? And
> separately, if I were to reissue with maxSegments=(high number, eg 40),
> should I still expect the same disk usage? (I'm presuming so as doesn't
> it need to gather the whole index to determine which docs should go into
> which segments?)
>
>
>
> Solr 4.4 on RHEL6.4, 160GB RAM, 5GB per shard.
>
>
>
> (Great conference last week btw - so much to learn!)
>
>
>
>
>
> Gil Hoggarth
>
> Web Archiving Technical Services Engineer
>
> The British Library, Boston Spa, West Yorkshire, LS23 7BQ
>
> Tel: 01937 546163
>
>
>