You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2019/10/29 22:18:58 UTC

forceMerge and unused metadata

A question came across the #solr IRC channel, where the user was seeing 
fields in their /admin/luke endpoint about a bunch of fields they used 
to use, but are no longer in any current documents.  That URL endpoint 
provides information about the fields in the index, getting most of that 
info directly from Lucene.

I asked them to run an optimize (forceMerge in Lucene) and see what that 
did.  It did not remove those fields.

Discussing it with other Solr committers on the lucene-solr slack 
channel, this is apparently known -- a forceMerge does not eliminate any 
field metadata, even if the field is not referenced by any non-deleted 
document.

What I'm wondering is whether it would be possible to adjust merging so 
that it can determine what pieces of metadata (like field information) 
are unused in the index and remove them.  It would be fine if this were 
only an option on forceMerge, but nice if it were something that could 
happen on any merge.  That discussion on slack indicated that it might 
be prohibitively expensive to do this.  Can one of our experts on Lucene 
merging respond?

This particular user has no option that I am aware of other than to 
rebuild their index.  They're running version 4.2.1.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: forceMerge and unused metadata

Posted by David Smiley <da...@gmail.com>.
To follow-up in a more official channel than Slack, I suggested that the
JIRA issue for this request is:
https://issues.apache.org/jira/browse/LUCENE-8551

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Oct 29, 2019 at 6:19 PM Shawn Heisey <ap...@elyograg.org> wrote:

> A question came across the #solr IRC channel, where the user was seeing
> fields in their /admin/luke endpoint about a bunch of fields they used
> to use, but are no longer in any current documents.  That URL endpoint
> provides information about the fields in the index, getting most of that
> info directly from Lucene.
>
> I asked them to run an optimize (forceMerge in Lucene) and see what that
> did.  It did not remove those fields.
>
> Discussing it with other Solr committers on the lucene-solr slack
> channel, this is apparently known -- a forceMerge does not eliminate any
> field metadata, even if the field is not referenced by any non-deleted
> document.
>
> What I'm wondering is whether it would be possible to adjust merging so
> that it can determine what pieces of metadata (like field information)
> are unused in the index and remove them.  It would be fine if this were
> only an option on forceMerge, but nice if it were something that could
> happen on any merge.  That discussion on slack indicated that it might
> be prohibitively expensive to do this.  Can one of our experts on Lucene
> merging respond?
>
> This particular user has no option that I am aware of other than to
> rebuild their index.  They're running version 4.2.1.
>
> Thanks,
> Shawn
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>