You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2010/10/02 06:28:34 UTC

[jira] Updated: (LUCENE-2529) always apply position increment gap between values

     [ https://issues.apache.org/jira/browse/LUCENE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated LUCENE-2529:
---------------------------------

    Attachment: LUCENE-2529_skip_posIncr_for_1st_token.patch

Always adding the position increment is good but insufficient to solve my problem.

A new patch rectifies the followup situation I reported inadvertently to LUCENE-2668 that I should have said here.  The jist is that DocInverterPerField _conditionally_ decrements the position and then always increments it, and this is problematic for attempting to keep position increments across several multi-value fields aligned (using an analyzer setting posIncr to 0) when the first value generates no tokens (either blank or stop words).  Mike McCandless pointed out that the unfortunate existing logic had to do with preventing the position from becoming -1 which doesn't work with payloads -- LUCENE-1542.  

My new patch here doesn't even have a pre-decrement nor post-increment and thus I find the code easier to follow.  It ignores the provided position increment of the first token (typically 1), voiding the need to shift them back and forth.  There is one oddity included here and that is I always add 1 to the position increment _gap_ (i.e. between values).  With this oddity included, all the tests pass (except for the test for this very issue, which I correct in this patch)  --yay!  Without this oddity, a handful of tests failed that depended on the first token adding one to the position.  My +1 up at the value loop can be seen as actually enforcing that the first token's position is 1, and also adding a +1 for when there is no token for a value (critical for aligning multiple fields).  Perhaps this +1 should happen at a different line number to be less confusing but the end result should be the same.

I expect for many people this is very confusing, especially if you're not knee deep in this subject as I am presently.  Mike, hopefully you're understanding what I'm up to here.  The tests pass, remember.

> always apply position increment gap between values
> --------------------------------------------------
>
>                 Key: LUCENE-2529
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2529
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.9.3, 3.0.2, 3.1, 4.0
>         Environment: (I don't know which version to say this affects since it's some quasi trunk release and the new versioning scheme confuses me.)
>            Reporter: David Smiley
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2529_always_apply_position_increment_gap_between_values.patch, LUCENE-2529_skip_posIncr_for_1st_token.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm doing some fancy stuff with span queries that is very sensitive to term positions.  I discovered that the position increment gap on indexing is only applied between values when there are existing terms indexed for the document.  I suspect this logic wasn't deliberate, it's just how its always been for no particular reason.  I think it should always apply the gap between fields.  Reference DocInverterPerField.java line 82:
> if (fieldState.length > 0)
>           fieldState.position += docState.analyzer.getPositionIncrementGap(fieldInfo.name);
> This is checking fieldState.length.  I think the condition should simply be:  if (i > 0).
> I don't think this change will affect anyone at all but it will certainly help me.  Presently, I can either change this line in Lucene, or I can put in a hack so that the first value for the document is some dummy value which is wasteful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org