You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Koji Sekiguchi <ko...@r.email.ne.jp> on 2009/07/17 12:44:02 UTC

SpanScorer problem?

Hello,

This problem was reported by my customer. They are using Solr 1.3
and uni-gram, but it can be reproduced with Lucene 2.9 and
WhitespaceAnalyzer.
The program for reproducing is at the end of this mail.

Query:
(f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g")

The snippet we expected is:
x y z <B>a</B> <B>b</B> <B>c</B> <B>d</B> e f g <B>b</B> <B>c</B> <B>g</B>

but we got:
x y z <B>a</B> b c <B>d</B> e f g <B>b</B> <B>c</B> <B>g</B>

Thank you,

Koji
---------

public class TestHighlighter {

static final String CONTENT = "x y z a b c d e f g b c g";
static final String PH1 = "\"a b c d\"";
static final String PH2 = "\"b c g\"";
static final String F1 = "f1";
static final String F2 = "f2";
static final String F1C = F1 + ":";
static final String F2C = F2 + ":";
static final String QUERY_STRING =
"(" + F1C + PH1 + " OR " + F2C + PH1 + ") AND ("
+ F1C + PH2 + " OR " + F2C + PH2 + ")";
static Analyzer analyzer = new WhitespaceAnalyzer();

public static void main(String[] args) throws Exception {
QueryParser qp = new QueryParser( F1, analyzer );
Query query = qp.parse( QUERY_STRING );
CachingTokenFilter stream = new CachingTokenFilter(
analyzer.tokenStream( F1, new StringReader( CONTENT ) ) );
Scorer scorer = new SpanScorer( query, F1, stream, false );
Highlighter h = new Highlighter( scorer );
System.out.println( "query : " + QUERY_STRING );
System.out.println( h.getBestFragment( analyzer, F1, CONTENT ) );
}
}


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: SpanScorer problem?

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Mark,

I just opened:
https://issues.apache.org/jira/browse/LUCENE-1752

Thank you very much for looking into this!

Koji

Mark Miller wrote:
> Thanks Koji - I just made a patch for the fix if you want to pop open a
> JIRA issue.
>
> Two query types were making their own terms map and passing them to
> extract, rather than using the top level term map - but extract would
> use the term map to see if it saw the term before. The result was, for
> the two query types that did that, boolean and disjunctmax: the last
> term sighting was the only valid position being stored. Thanks a lot for
> the test case - made this one fun.
>
> - Mark
>
> Koji Sekiguchi wrote:
>   
>> Hello,
>>
>> This problem was reported by my customer. They are using Solr 1.3
>> and uni-gram, but it can be reproduced with Lucene 2.9 and
>> WhitespaceAnalyzer.
>> The program for reproducing is at the end of this mail.
>>
>> Query:
>> (f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g")
>>
>> The snippet we expected is:
>> x y z <B>a</B> <B>b</B> <B>c</B> <B>d</B> e f g <B>b</B> <B>c</B> <B>g</B>
>>
>> but we got:
>> x y z <B>a</B> b c <B>d</B> e f g <B>b</B> <B>c</B> <B>g</B>
>>
>> Thank you,
>>
>> Koji
>> ---------
>>
>> public class TestHighlighter {
>>
>> static final String CONTENT = "x y z a b c d e f g b c g";
>> static final String PH1 = "\"a b c d\"";
>> static final String PH2 = "\"b c g\"";
>> static final String F1 = "f1";
>> static final String F2 = "f2";
>> static final String F1C = F1 + ":";
>> static final String F2C = F2 + ":";
>> static final String QUERY_STRING =
>> "(" + F1C + PH1 + " OR " + F2C + PH1 + ") AND ("
>> + F1C + PH2 + " OR " + F2C + PH2 + ")";
>> static Analyzer analyzer = new WhitespaceAnalyzer();
>>
>> public static void main(String[] args) throws Exception {
>> QueryParser qp = new QueryParser( F1, analyzer );
>> Query query = qp.parse( QUERY_STRING );
>> CachingTokenFilter stream = new CachingTokenFilter(
>> analyzer.tokenStream( F1, new StringReader( CONTENT ) ) );
>> Scorer scorer = new SpanScorer( query, F1, stream, false );
>> Highlighter h = new Highlighter( scorer );
>> System.out.println( "query : " + QUERY_STRING );
>> System.out.println( h.getBestFragment( analyzer, F1, CONTENT ) );
>> }
>> }
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>   
>>     
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: SpanScorer problem?

Posted by Mark Miller <ma...@gmail.com>.
Thanks Koji - I just made a patch for the fix if you want to pop open a
JIRA issue.

Two query types were making their own terms map and passing them to
extract, rather than using the top level term map - but extract would
use the term map to see if it saw the term before. The result was, for
the two query types that did that, boolean and disjunctmax: the last
term sighting was the only valid position being stored. Thanks a lot for
the test case - made this one fun.

- Mark

Koji Sekiguchi wrote:
> Hello,
>
> This problem was reported by my customer. They are using Solr 1.3
> and uni-gram, but it can be reproduced with Lucene 2.9 and
> WhitespaceAnalyzer.
> The program for reproducing is at the end of this mail.
>
> Query:
> (f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g")
>
> The snippet we expected is:
> x y z <B>a</B> <B>b</B> <B>c</B> <B>d</B> e f g <B>b</B> <B>c</B> <B>g</B>
>
> but we got:
> x y z <B>a</B> b c <B>d</B> e f g <B>b</B> <B>c</B> <B>g</B>
>
> Thank you,
>
> Koji
> ---------
>
> public class TestHighlighter {
>
> static final String CONTENT = "x y z a b c d e f g b c g";
> static final String PH1 = "\"a b c d\"";
> static final String PH2 = "\"b c g\"";
> static final String F1 = "f1";
> static final String F2 = "f2";
> static final String F1C = F1 + ":";
> static final String F2C = F2 + ":";
> static final String QUERY_STRING =
> "(" + F1C + PH1 + " OR " + F2C + PH1 + ") AND ("
> + F1C + PH2 + " OR " + F2C + PH2 + ")";
> static Analyzer analyzer = new WhitespaceAnalyzer();
>
> public static void main(String[] args) throws Exception {
> QueryParser qp = new QueryParser( F1, analyzer );
> Query query = qp.parse( QUERY_STRING );
> CachingTokenFilter stream = new CachingTokenFilter(
> analyzer.tokenStream( F1, new StringReader( CONTENT ) ) );
> Scorer scorer = new SpanScorer( query, F1, stream, false );
> Highlighter h = new Highlighter( scorer );
> System.out.println( "query : " + QUERY_STRING );
> System.out.println( h.getBestFragment( analyzer, F1, CONTENT ) );
> }
> }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org