You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Martin Schoenmakers (JIRA)" <ji...@apache.org> on 2014/05/23 15:46:01 UTC

[jira] [Updated] (LUCENE-5697) Preview issue

     [ https://issues.apache.org/jira/browse/LUCENE-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Schoenmakers updated LUCENE-5697:
----------------------------------------

    Description: 
In DocFetcher, which uses Lucene v3.5.0, we stumbled on a bug. The lead of DocFetcher has investigated and foud the problem seems to be in Lucene. I do not know if this bug has been fixed in a later Lucene version.

Issue: 
We use "proximity search": search on multiple words in a directory with about 300 PDF files.   
E.g. search for "wordA wordB wordC"~50, i.e. three words within 50 words distance of each other. The resulting documents are correct. But the highligted text in the document is often missing. 

If the words are in the SAME order as in the search AND on the SAME page, then the higlight works correct. But if the order of the words is different from the search (like "wordA wordC wordB" OR the words are not on the same page, then that text is not highlighted. 

As we use the proximity search on multiple words often, it severely
degrades the usability.

  was:
In DocFetcher, which uses Lucene, we stumbled on a bug. The lead of DocFetcher has investigated and foud the problem seems to be in Lucene.

Issue: we use "proximity search": search on multiple words in a directory with about 300 PDF files.   
E.g. search for "wordA wordB wordC"~50, so three words within 50 words distance of each other. The resulting documents are correct. But the highligted text in the document is often missing. 

If the words are in the SAME order as in the search AND on the SAME page, then the higlight works correct. But if the order of the words is different from the search (like "wordA wordC wordB" OR the words are not on the same page, then that text is not highlighted. 

As we use the proximity search on multiple words often, it severely
degrades the usability.


> Preview issue
> -------------
>
>                 Key: LUCENE-5697
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5697
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>         Environment: DocFetcher 1.1.11 on Win 7(64) pro
>            Reporter: Martin Schoenmakers
>
> In DocFetcher, which uses Lucene v3.5.0, we stumbled on a bug. The lead of DocFetcher has investigated and foud the problem seems to be in Lucene. I do not know if this bug has been fixed in a later Lucene version.
> Issue: 
> We use "proximity search": search on multiple words in a directory with about 300 PDF files.   
> E.g. search for "wordA wordB wordC"~50, i.e. three words within 50 words distance of each other. The resulting documents are correct. But the highligted text in the document is often missing. 
> If the words are in the SAME order as in the search AND on the SAME page, then the higlight works correct. But if the order of the words is different from the search (like "wordA wordC wordB" OR the words are not on the same page, then that text is not highlighted. 
> As we use the proximity search on multiple words often, it severely
> degrades the usability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org