You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Mayya Sharipova (Jira)" <ji...@apache.org> on 2021/07/02 12:47:00 UTC

[jira] [Comment Edited] (LUCENE-9334) Require consistency between data-structures on a per-field basis

    [ https://issues.apache.org/jira/browse/LUCENE-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373494#comment-17373494 ] 

Mayya Sharipova edited comment on LUCENE-9334 at 7/2/21, 12:46 PM:
-------------------------------------------------------------------

[~jpountz] For old indices with inconsistent data structures, the current behaviour is following:
 # We can read them
 # We can't write new docs that introduce inconsistencies (e.g. a doc introduces a field indexed with points where previous this field was indexed only with docvalues).
 # We can't do any writes, if segments of the index have inconsistent differences in schemas (e.g. in one segment a field is indexed with points, and in another segment with points and doc values).  This behaviour is similar to [LUCENE-8134|https://issues.apache.org/jira/browse/LUCENE-8134] for old indices where different segments have different index options for a field. 
 # We can do writes (new docs and merges), if the segments have consistency, but individual docs are inconsistent (e.g. within a segment, one doc has a field indexed with points, and another doc has a field indexed with points and doc values).

I think #2 and #3 are desirable behaviours and we should keep them. 

#4 is not ideal, we can do a check for consistency and refuse to do writes if there are any inconsistencies between docs,  but this could be an expensive operation.  If we are ok with inconsistencies between docs, then there is nothing left to be done for this issue.

What do you think?


was (Author: mayya):
[~jpountz] For old indices with inconsistent data structures, the current behaviour is following:
 # We can read them
 # We can't write new docs that introduce inconsistencies (e.g. a doc introduces a field indexed with points where previous this field was indexed only with docvalues).
 # We can't do any writes, if segments of the index have inconsistent differences in schemas (e.g. in one segment a field is indexed with points, and in another segment with points and doc values).  This behaviour is similar to [LUCENE-8134https://issues.apache.org/jira/browse/LUCENE-8134] for old indices where different segments have different index options for a field. 
 # We can do writes (new docs and merges), if the segments have consistency, but individual docs are inconsistent (e.g. within a segment, one doc has a field indexed with points, and another doc has a field indexed with points and doc values).

I think #2 and #3 are desirable behaviours and we should keep them. 

#4 is not ideal, we can do a check for consistency and refuse to do writes if there are any inconsistencies between docs,  but this could be an expensive operation.  If we are ok with inconsistencies between docs, then there is nothing left to be done for this issue.

What do you think?

> Require consistency between data-structures on a per-field basis
> ----------------------------------------------------------------
>
>                 Key: LUCENE-9334
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9334
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Blocker
>             Fix For: main (9.0)
>
>          Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Follow-up of https://lists.apache.org/thread.html/r747de568afd7502008c45783b74cc3aeb31dab8aa60fcafaf65d5431%40%3Cdev.lucene.apache.org%3E.
> We would like to start requiring consitency across data-structures on a per-field basis in order to make it easier to do the right thing by default: range queries can run faster if doc values are enabled, sorted queries can run faster if points by indexed, etc.
> This would be a big change, so it should be rolled out in a major.
> Strict validation is tricky to implement, but we should still implement best-effort validation:
>  - Documents all use the same data-structures, e.g. it is illegal for a document to only enable points and another document to only enable doc values,
>  - When possible, check whether values are consistent too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org