You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Grant Glouser (JIRA)" <ji...@apache.org> on 2008/06/20 05:49:45 UTC

[jira] Created: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Phrase query with term repeated 3 times requires more slop than expected
------------------------------------------------------------------------

                 Key: LUCENE-1310
                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
             Project: Lucene - Java
          Issue Type: Bug
          Components: Search
    Affects Versions: 2.3.1, 2.3.2
            Reporter: Grant Glouser


Consider a document with the text "A A A".
The phrase query "A A A" (exact match) succeeds.
The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
"A A A"~2 succeeds again.

If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Grant Glouser (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609406#action_12609406 ] 

Grant Glouser commented on LUCENE-1310:
---------------------------------------

termPositionsDiffer can return the PhrasePositions that was passed in (pp), which means you could be passing the same PhrasePositions to flip in both arguments.  But flip assumes that the arguments are different.  This can result in an ArrayIndexOutOfBoundsException in flip.  Perhaps just checking that pp2 != pp on line 76 would be sufficient to avoid this.

I have not been able to come up with a simple test case that triggers this.  I have a complex one, but it uses a custom Analyzer.

> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1310:
--------------------------------

    Attachment: LUCENE-1310.patch

Updated patch with a fix for the NPE and with a test that fails with the previous fix for the NPE.
The point is to switch to the pp with higher (query) offset in case two pps are in the same (doc) position.


> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, LUCENE-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608484#action_12608484 ] 

Doron Cohen commented on LUCENE-1310:
-------------------------------------

I am too getting the error with the test. It is a bug indeed.
I think I see where the problem is - working on it.


> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1310:
--------------------------------

    Attachment: LUCENE-1310.patch

Updated patch with required check (pp!=pp2) before flipping as Grant suggested.


> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Grant Glouser (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Glouser updated LUCENE-1310:
----------------------------------

    Attachment: TestSloppyPhraseQuery.java

> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>         Attachments: TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1310:
--------------------------------

    Attachment: LUCENE-1310.patch

Previous patch might restore the queue wrongly - pop pp but put pp2.
This patch fixes that by returning the correct pp into the pq.
However it is yet not perfect since the one pp returned to pq might not be the last one advanced.
This means pq could be sorted incorrectly with regard to repeating terms.
I didn't manage to create a test case that fails due to this - testDoc4_Query3_All_Slops_Should_match in the test was the last trial to catch this.
The only perfect solution I see is to re-populate the queue when this happens but this is costly and I tend not to do it.
Open for suggestions...



> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1310:
--------------------------------

    Attachment: LUCENE-1310.patch

Patch with a fix.

Problem was in the logic for advancing PhrasePositions that were pointing to exactly the same term in the document.
Note that this advancing is required to avoid false matches (as was fixed in LUCENE-736).
However must first advance the PhrasePosition whose offset (in the query) is the highest.

As a side effect of this fix sorting of the "repeats" array (at scorer initialization) is no longer required.

Grant's test is also in the patch, slightly modified.

Grant, can you give it a try and report here?

> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619339#action_12619339 ] 

Doron Cohen commented on LUCENE-1310:
-------------------------------------

Committed to trunk.

I was wondering if this should be backported to 2.3.1 and/or 2.3.2. 
It is not a major bug so I think not, though it might be critical to some application.
Didn't find a guideline for this in the wiki but I may be missing it?


> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-2.3.1-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Grant Glouser (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Glouser updated LUCENE-1310:
----------------------------------

    Attachment: LUCENE-1310.1.patch

There is a potential NPE.  I am adding a patch (which should apply on top of the previous patch) that adds a test case and possible fix.

I will continue testing this.  Thanks for looking at it, Doron.

> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609436#action_12609436 ] 

Doron Cohen commented on LUCENE-1310:
-------------------------------------

You're right, Grant, good catch, thanks!
(I'm sure had this check but lost it when refactoring flipping to a method)
Updated patch to follow.


> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609022#action_12609022 ] 

Doron Cohen commented on LUCENE-1310:
-------------------------------------

You're right! And thanks for cacthing this!
NPE is possible and I see it too with the new test. 
I think the suggested new fix is likely to miss additional matches in the same doc. 
I'll later post a test that shows this.

(As a side comment, its useful to post patches named "LUCENE-NNN.patch" where NNN is the issue number.
This way JIRA shows which fix is the most recent very clearly, and also, being a complete patch, everyone can easily apply the entire patch.
More on this in the Wiki under HowToContribute.)

> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen resolved LUCENE-1310.
---------------------------------

    Resolution: Fixed

Fixed, thanks for the review Grant!

> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-2.3.1-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen reassigned LUCENE-1310:
-----------------------------------

    Assignee: Doron Cohen

> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1310:
--------------------------------

    Attachment: LUCENE-1310.patch

Updated patch fixes this issue. 
In case of repeating terms in the query, this might be slower than previous patch, but it is supposedly correct in all cases while the previous one was not guaranteed to be always correct. There are no performance implications for the more common case of no repeating terms in the query.
I plan to commit this in a day or two.


> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1310) Phrase query with term repeated 3 times requires more slop than expected

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1310:
--------------------------------

    Attachment: LUCENE-2.3.1-1310.patch

same patch but for 2.3.1.

> Phrase query with term repeated 3 times requires more slop than expected
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1310
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1310
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Grant Glouser
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1310.1.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-1310.patch, LUCENE-2.3.1-1310.patch, TestSloppyPhraseQuery.java
>
>
> Consider a document with the text "A A A".
> The phrase query "A A A" (exact match) succeeds.
> The query "A A A"~1 (same document and query, just increasing the slop value by one) fails.
> "A A A"~2 succeeds again.
> If the exact match succeeds, I wouldn't expect the same query but with more slop to fail.  The fault seems to require some term to be repeated at least three times in the query, but the three occurrences do not need to be adjacent.  I will attach a file that contains a set of JUnit tests that demonstrate what I mean.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org