You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexey Kozhemiakin <Al...@epam.com> on 2014/12/03 12:35:34 UTC

SegmentInfos exposed to /admin/luke

Dear All,

We have a high percentage of deleted docs which do not go away because there are several huge ancient segments that do not merge with anything else naturally. Our use case in constant reindexing of same data - ~100 gb, 12 000 000 real records, 20 000 000 total records in index, which is ~80% deletes.

We plan to deal with situation by playing with mergeFactor, reclaimDeletesWeight and maxSegmentSizeMB settings to optimize for our re-indexing rate and data size.
And in order to do it with eyes-opened we want to see a picture similar to http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html  with columns of segment size and %of deletes.
The plan is to expose SegmentInfos via /admin/luke handler and draw column bars in Solr admin.

Is there an easier way to achieve that? Even in raw Luke we didn't' found these data.

We'd be happy to push the changes to Solr afterwards.


Thank you,
Alexey Kozhemiakin


Re: SegmentInfos exposed to /admin/luke

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/3/2014 4:35 AM, Alexey Kozhemiakin wrote:
> We have a high percentage of deleted docs which do not go away because there are several huge ancient segments that do not merge with anything else naturally. Our use case in constant reindexing of same data - ~100 gb, 12 000 000 real records, 20 000 000 total records in index, which is ~80% deletes.

The "normal" way to deal with this is to simply optimize the index,
which you can do with the click of a button in the admin UI on 4.x.  It
is likely to take an hour or so with 100GB of data unless your disk
subsystem is *extremely* fast, but I believe with version 4.x you can
even continue to update the index while it's optimizing.  It will also
cause a lot of I/O, which might hurt performance, so you'd want to do it
during a non-peak time.

The list archives include a lot of talk about optimizes being
unnecessary in newer versions ... but wiping out deleted documents is
still a major use case for the feature.

Thanks,
Shawn


Re: SegmentInfos exposed to /admin/luke

Posted by Erick Erickson <er...@gmail.com>.
Not sure how it plays with segment merging and optimizing, but have
you considered DocValues for your price fields? On the horizon there's
work being done to allow them to be independently updated (although
that won't help you now of course). It's not clear at this point how that
will play when lots and lots and lots of updates happen though.

Of course an optimize will purge your deletes, but I'm sure you already
know that I'm sure.


On Wed, Dec 3, 2014 at 5:45 AM, Alexey Kozhemiakin
<Al...@epam.com> wrote:
> Hi Alexandre, our rebuilds are not like 'full rebuilds' - it's a constant massive flow of price updates in ecommerce marketplace. Unfortunately  "substitution" option is not working for us :(
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Wednesday, December 3, 2014 16:39
> To: solr-user
> Subject: Re: SegmentInfos exposed to /admin/luke
>
> You can't use grouping aliases and do full rebuilds on a separate core
> + substitutions? Might be a better strategy for nearly complete
> replacement.
>
> Regards,
>    Alex.
> P.s. But I like your proposal anyway.
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 3 December 2014 at 06:35, Alexey Kozhemiakin
> <Al...@epam.com> wrote:
>> Dear All,
>>
>> We have a high percentage of deleted docs which do not go away because there are several huge ancient segments that do not merge with anything else naturally. Our use case in constant reindexing of same data - ~100 gb, 12 000 000 real records, 20 000 000 total records in index, which is ~80% deletes.
>>
>> We plan to deal with situation by playing with mergeFactor, reclaimDeletesWeight and maxSegmentSizeMB settings to optimize for our re-indexing rate and data size.
>> And in order to do it with eyes-opened we want to see a picture similar to http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html  with columns of segment size and %of deletes.
>> The plan is to expose SegmentInfos via /admin/luke handler and draw column bars in Solr admin.
>>
>> Is there an easier way to achieve that? Even in raw Luke we didn't' found these data.
>>
>> We'd be happy to push the changes to Solr afterwards.
>>
>>
>> Thank you,
>> Alexey Kozhemiakin
>>

RE: SegmentInfos exposed to /admin/luke

Posted by Alexey Kozhemiakin <Al...@epam.com>.
Hi Alexandre, our rebuilds are not like 'full rebuilds' - it's a constant massive flow of price updates in ecommerce marketplace. Unfortunately  "substitution" option is not working for us :(

-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Wednesday, December 3, 2014 16:39
To: solr-user
Subject: Re: SegmentInfos exposed to /admin/luke

You can't use grouping aliases and do full rebuilds on a separate core
+ substitutions? Might be a better strategy for nearly complete
replacement.

Regards,
   Alex.
P.s. But I like your proposal anyway.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 3 December 2014 at 06:35, Alexey Kozhemiakin
<Al...@epam.com> wrote:
> Dear All,
>
> We have a high percentage of deleted docs which do not go away because there are several huge ancient segments that do not merge with anything else naturally. Our use case in constant reindexing of same data - ~100 gb, 12 000 000 real records, 20 000 000 total records in index, which is ~80% deletes.
>
> We plan to deal with situation by playing with mergeFactor, reclaimDeletesWeight and maxSegmentSizeMB settings to optimize for our re-indexing rate and data size.
> And in order to do it with eyes-opened we want to see a picture similar to http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html  with columns of segment size and %of deletes.
> The plan is to expose SegmentInfos via /admin/luke handler and draw column bars in Solr admin.
>
> Is there an easier way to achieve that? Even in raw Luke we didn't' found these data.
>
> We'd be happy to push the changes to Solr afterwards.
>
>
> Thank you,
> Alexey Kozhemiakin
>

Re: SegmentInfos exposed to /admin/luke

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
You can't use grouping aliases and do full rebuilds on a separate core
+ substitutions? Might be a better strategy for nearly complete
replacement.

Regards,
   Alex.
P.s. But I like your proposal anyway.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 3 December 2014 at 06:35, Alexey Kozhemiakin
<Al...@epam.com> wrote:
> Dear All,
>
> We have a high percentage of deleted docs which do not go away because there are several huge ancient segments that do not merge with anything else naturally. Our use case in constant reindexing of same data - ~100 gb, 12 000 000 real records, 20 000 000 total records in index, which is ~80% deletes.
>
> We plan to deal with situation by playing with mergeFactor, reclaimDeletesWeight and maxSegmentSizeMB settings to optimize for our re-indexing rate and data size.
> And in order to do it with eyes-opened we want to see a picture similar to http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html  with columns of segment size and %of deletes.
> The plan is to expose SegmentInfos via /admin/luke handler and draw column bars in Solr admin.
>
> Is there an easier way to achieve that? Even in raw Luke we didn't' found these data.
>
> We'd be happy to push the changes to Solr afterwards.
>
>
> Thank you,
> Alexey Kozhemiakin
>

Re: SegmentInfos exposed to /admin/luke

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Alexey,

I've got that you need to get number of deleted docs in the index
http://localhost:8983/solr/admin/mbeans?stats=true&cat=CORE

here it goes
<int name="numDocs">27</int>
<int name="maxDoc">30</int>
<int name="deletedDocs">3</int>

if you need to get detailed segmentation, parse
<str name="reader">StandardDirectoryReader(segments_b:19:nrt _6(4.10.2):C9
_7(4.10.2):C9 _8(4.10.2):C9)</str>

note C# is number of docs in the segments, number of deletes are also
exposed at that toString().

Did I get what you need right?

PS: note the recent Mr McCandless G+, where he charted deleted docs ratio
under cinstant reindexing.





On Mon, Dec 8, 2014 at 2:23 PM, Dmitry Kan <so...@gmail.com> wrote:

> Hi Alexey,
>
> In GUI luke there is an option to "Just expunge deleted docs without
> re-merging". In case you want to give it a try.
>
> Dmitry
>
> On Wed, Dec 3, 2014 at 1:35 PM, Alexey Kozhemiakin <
> Alexey_Kozhemiakin@epam.com> wrote:
>
> > Dear All,
> >
> > We have a high percentage of deleted docs which do not go away because
> > there are several huge ancient segments that do not merge with anything
> > else naturally. Our use case in constant reindexing of same data - ~100
> gb,
> > 12 000 000 real records, 20 000 000 total records in index, which is ~80%
> > deletes.
> >
> > We plan to deal with situation by playing with mergeFactor,
> > reclaimDeletesWeight and maxSegmentSizeMB settings to optimize for our
> > re-indexing rate and data size.
> > And in order to do it with eyes-opened we want to see a picture similar
> to
> >
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> > with columns of segment size and %of deletes.
> > The plan is to expose SegmentInfos via /admin/luke handler and draw
> column
> > bars in Solr admin.
> >
> > Is there an easier way to achieve that? Even in raw Luke we didn't' found
> > these data.
> >
> > We'd be happy to push the changes to Solr afterwards.
> >
> >
> > Thank you,
> > Alexey Kozhemiakin
> >
> >
>
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: SegmentInfos exposed to /admin/luke

Posted by Dmitry Kan <so...@gmail.com>.
Hi Alexey,

In GUI luke there is an option to "Just expunge deleted docs without
re-merging". In case you want to give it a try.

Dmitry

On Wed, Dec 3, 2014 at 1:35 PM, Alexey Kozhemiakin <
Alexey_Kozhemiakin@epam.com> wrote:

> Dear All,
>
> We have a high percentage of deleted docs which do not go away because
> there are several huge ancient segments that do not merge with anything
> else naturally. Our use case in constant reindexing of same data - ~100 gb,
> 12 000 000 real records, 20 000 000 total records in index, which is ~80%
> deletes.
>
> We plan to deal with situation by playing with mergeFactor,
> reclaimDeletesWeight and maxSegmentSizeMB settings to optimize for our
> re-indexing rate and data size.
> And in order to do it with eyes-opened we want to see a picture similar to
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> with columns of segment size and %of deletes.
> The plan is to expose SegmentInfos via /admin/luke handler and draw column
> bars in Solr admin.
>
> Is there an easier way to achieve that? Even in raw Luke we didn't' found
> these data.
>
> We'd be happy to push the changes to Solr afterwards.
>
>
> Thank you,
> Alexey Kozhemiakin
>
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info