You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2019/10/29 22:18:58 UTC
forceMerge and unused metadata
A question came across the #solr IRC channel, where the user was seeing
fields in their /admin/luke endpoint about a bunch of fields they used
to use, but are no longer in any current documents. That URL endpoint
provides information about the fields in the index, getting most of that
info directly from Lucene.
I asked them to run an optimize (forceMerge in Lucene) and see what that
did. It did not remove those fields.
Discussing it with other Solr committers on the lucene-solr slack
channel, this is apparently known -- a forceMerge does not eliminate any
field metadata, even if the field is not referenced by any non-deleted
document.
What I'm wondering is whether it would be possible to adjust merging so
that it can determine what pieces of metadata (like field information)
are unused in the index and remove them. It would be fine if this were
only an option on forceMerge, but nice if it were something that could
happen on any merge. That discussion on slack indicated that it might
be prohibitively expensive to do this. Can one of our experts on Lucene
merging respond?
This particular user has no option that I am aware of other than to
rebuild their index. They're running version 4.2.1.
Thanks,
Shawn
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: forceMerge and unused metadata
Posted by David Smiley <da...@gmail.com>.
To follow-up in a more official channel than Slack, I suggested that the
JIRA issue for this request is:
https://issues.apache.org/jira/browse/LUCENE-8551
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
On Tue, Oct 29, 2019 at 6:19 PM Shawn Heisey <ap...@elyograg.org> wrote:
> A question came across the #solr IRC channel, where the user was seeing
> fields in their /admin/luke endpoint about a bunch of fields they used
> to use, but are no longer in any current documents. That URL endpoint
> provides information about the fields in the index, getting most of that
> info directly from Lucene.
>
> I asked them to run an optimize (forceMerge in Lucene) and see what that
> did. It did not remove those fields.
>
> Discussing it with other Solr committers on the lucene-solr slack
> channel, this is apparently known -- a forceMerge does not eliminate any
> field metadata, even if the field is not referenced by any non-deleted
> document.
>
> What I'm wondering is whether it would be possible to adjust merging so
> that it can determine what pieces of metadata (like field information)
> are unused in the index and remove them. It would be fine if this were
> only an option on forceMerge, but nice if it were something that could
> happen on any merge. That discussion on slack indicated that it might
> be prohibitively expensive to do this. Can one of our experts on Lucene
> merging respond?
>
> This particular user has no option that I am aware of other than to
> rebuild their index. They're running version 4.2.1.
>
> Thanks,
> Shawn
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>