You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Alexey Timofeev <a....@gmail.com> on 2016/11/01 23:33:54 UTC

Merge indexes with CoreAdminHandler - possible issue

Hello!

I am stuck with CoreAdminHandler's merge indexes functionality. It looks
like merge indexes behaves weird if we use it with "srcCore" parameter. (If
we use "indexDir" parameter then it works fine.) Please tell me if it's
real bug in Solr or I am using it wrong? If community admits that it's bug
indeed, then let's create ticket for it and I will be happy to suggest
patch to fix it.

Now let me explain what I think is wrong with merging indexes using "srcCore"
parameter. Trouble is that when merge code starts to merge doc values
fields it mistakenly determines all fields that can be uninverted to be doc
values fields. That results in uninverting of all uninvert-able fields and
writing result in resulting index. Thus, memory consumption is huge, chance
of OOM is big, resulting index is bloated.

Now, why merge code considers all uninvert-able fields to be doc values?
It's because it considers all fields where (FieldInfo.docValuesType !=
DocValuesType.NONE) to be doc values. FieldInfo objects are provided by
IndexReader and as we always have UninvertingReader in chain of readers
then we always get FieldInfo.docValuesType to be doc values type to which
that field can be converted. Thus, we almost always have
(FieldInfo.docValuesType
!= DocValuesType.NONE) and uninvert almost all fields.

Are there a way to create core without UninvertingReader in chain of
readers? If so then is it expected way of usage or just workaround?

If loading core without UninvertingReader is not what I meant to do then I
would suggest to consult schema to find out what fields are doc values
instead of relying on FieldInfo.docValuesType.

Thank you in advance. Looking forward to your replies!

-- 
Regards.

Re: Merge indexes with CoreAdminHandler - possible issue

Posted by Erick Erickson <er...@gmail.com>.

Alexey:

I don't know that code intimately, but we've had sporadic reports of
memory spikes during segment merging. I'd be _really_ interested in
the perspective from some of the Lucene guys, going on the assumption
that eventually the code path is the same.

Best,
Erick

On Tue, Nov 1, 2016 at 4:33 PM, Alexey Timofeev <a....@gmail.com> wrote:
> Hello!
>
> I am stuck with CoreAdminHandler's merge indexes functionality. It looks
> like merge indexes behaves weird if we use it with "srcCore" parameter. (If
> we use "indexDir" parameter then it works fine.) Please tell me if it's real
> bug in Solr or I am using it wrong? If community admits that it's bug
> indeed, then let's create ticket for it and I will be happy to suggest patch
> to fix it.
>
> Now let me explain what I think is wrong with merging indexes using
> "srcCore" parameter. Trouble is that when merge code starts to merge doc
> values fields it mistakenly determines all fields that can be uninverted to
> be doc values fields. That results in uninverting of all uninvert-able
> fields and writing result in resulting index. Thus, memory consumption is
> huge, chance of OOM is big, resulting index is bloated.
>
> Now, why merge code considers all uninvert-able fields to be doc values?
> It's because it considers all fields where (FieldInfo.docValuesType !=
> DocValuesType.NONE) to be doc values. FieldInfo objects are provided by
> IndexReader and as we always have UninvertingReader in chain of readers then
> we always get FieldInfo.docValuesType to be doc values type to which that
> field can be converted. Thus, we almost always have (FieldInfo.docValuesType
> != DocValuesType.NONE) and uninvert almost all fields.
>
> Are there a way to create core without UninvertingReader in chain of
> readers? If so then is it expected way of usage or just workaround?
>
> If loading core without UninvertingReader is not what I meant to do then I
> would suggest to consult schema to find out what fields are doc values
> instead of relying on FieldInfo.docValuesType.
>
> Thank you in advance. Looking forward to your replies!
>
> --
> Regards.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org