You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Bridger Dyson-Smith <bd...@gmail.com> on 2019/10/29 20:25:19 UTC

Solr v4.2.1: fields without associated documents

Hi all -

I'm working with an application that uses Solr v4.2.1 and I'm seeing a
strange issue with our index: I have many fields (10s of thousands) of
fields that don't seem to have any associated document. I was wondering if
there was any way of getting them out of our index (or out of our
admin/luke endpoint, maybe more specifically).

A very helpful person on IRC suggested that the only way to get rid of
these might be a clean rebuild of the index, and that's not out of the
question for us; I hoped to get a bit more information here.

The fields appear in /solr/admin/luke:
<lst name="fedora_datastream_latest_hesler_200_0006_MIMETYPE_ms">
  <str name="type">string</str>
  <str name="schema">I-S-M---OF-----l</str>
  <str name="dynamicBase">*_ms</str>
</lst>

but querying for them, using something like
`fq=fedora_datastream_latest_hesler_200_0006_MIMETYPE_ms:[* TO *]` doesn't
return any documents, and when using the admin UI's Schema Browser there
isn't any corresponding 'Index' section (only 'Schema').

We don't have these fields statically assigned in our schema.

Other than a clean reindexing of our data, is there anything we can do to
clean these up?
Thanks in advance for your help!

Best,
Bridger

Re: Solr v4.2.1: fields without associated documents

Posted by Bridger Dyson-Smith <bd...@gmail.com>.

Hi Shawn -

Thanks again for your help on IRC -- I took your suggestions and info,
talked it over with my colleagues, and we decided that we'll rebuild our
index -- all before I had finished composing my original email to the list.

On Tue, Oct 29, 2019 at 6:17 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/29/2019 4:05 PM, Shawn Heisey wrote:
> > I can
> > ask on our dev list to see what I can learn.
>
> I should add something important to this.  Even if we can implement an
> enhancement, it would only be added to an 8.x version at the earliest.
> It is not possible to take an index from 4.2.1 and use it in Solr 8.x,
> so you'd have to rebuild your index anyway even if you upgraded to get
> the new feature.
>
> That makes complete sense. We're hopeful that we'll be able to move to a
new version of Solr in the next year, but we don't have any expectations
for moving the index over.

> Thanks,
> Shawn
>

Thank you!
Best,
Bridger

Re: Solr v4.2.1: fields without associated documents

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/29/2019 4:05 PM, Shawn Heisey wrote:
> I can 
> ask on our dev list to see what I can learn.

I should add something important to this.  Even if we can implement an 
enhancement, it would only be added to an 8.x version at the earliest. 
It is not possible to take an index from 4.2.1 and use it in Solr 8.x, 
so you'd have to rebuild your index anyway even if you upgraded to get 
the new feature.

Thanks,
Shawn

Re: Solr v4.2.1: fields without associated documents

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/29/2019 2:25 PM, Bridger Dyson-Smith wrote:
> A very helpful person on IRC suggested that the only way to get rid of
> these might be a clean rebuild of the index, and that's not out of the
> question for us; I hoped to get a bit more information here.

I'm the one who you talked to on IRC.

> Other than a clean reindexing of our data, is there anything we can do to
> clean these up?
> Thanks in advance for your help!

You should wait for confirmation, but I am not aware of any other way to 
fix this.  The optimize operation (that I was hopeful would take care of 
it) is a purely Lucene operation that knows nothing at all about Solr. 
I learned that the optimize operation preserves all field metadata built 
into the index, even if the field was only referenced by deleted 
documents.  Discussing the issue with other committers in our slack 
channel has revealed that it might be extremely difficult or impossible 
to improve the optimize operation so it purges unused metadata.  I can 
ask on our dev list to see what I can learn.

I personally feel that Solr users should always be prepared to 
completely rebuild indexes from scratch.  As painful as that prospect 
might be, it is the only solution to a number of problems, and is also 
frequently required by many configuration changes.

Thanks,
Shawn