You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2017/01/11 10:14:58 UTC

[jira] [Updated] (LUCENE-7626) IndexWriter shouldn't accept broken offsets

     [ https://issues.apache.org/jira/browse/LUCENE-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-7626:
---------------------------------------
    Attachment: LUCENE-7626.patch

Patch, just for 7.0.0.  I also added a (deprecated)
{{FixBrokenOffsetsFilter}}, for apps that have token filters they can't
easily fix to insert into their analysis chain.  Users can just add
this into their analysis chain and it will "fix" the offsets.


> IndexWriter shouldn't accept broken offsets
> -------------------------------------------
>
>                 Key: LUCENE-7626
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7626
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master (7.0)
>
>         Attachments: LUCENE-7626.patch
>
>
> I think we should do this in 7.0 (not 6.x).
> Long ago we stopped accepting broken offsets (where the start offset
> for a token is before the start offset of the last token) in postings
> (LUCENE-4127), but we are still lenient with term vectors.
> I think we should also check for term vectors: this would let users
> know that their analysis chain is producing offsets that cannot be
> used properly at search time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org