You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Britske <gb...@gmail.com> on 2009/06/12 12:15:36 UTC

highlighting on edgeGramTokenized field --> hightlighting incorrect bc. position not incremented..

Hi, 

I'm trying to highlight based on a (multivalued) field (prefix2) that has
(among other things) a EdgeNGramFilterFactory defined. 
highlighting doesn't increment the start-position of the highlighted
portion, so in other words the highlighted portion is always the beginning
of the field. 




for example: 
for prefix2: "Orlando Verenigde Staten"
the query:
http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,id&q=prefix2:%22ver%22&wt=xml&hl=true&&hl.fl=prefix2

returns: 
<em>Orl</em>ando Verenigde Staten
while it should be: 
Orlando <em>Ver</em>enigde Staten

the field def: 

<fieldType name="prefix_token" class="solr.TextField"
positionIncrementGap="1">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="20"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

I checked that removing the EdgeNGramFilterFactory results in correct
positioning of  highlighting. (But then I can't search for ngrams...) 

What am I missing? 
Thanks in advance, 
Britske



-- 
View this message in context: http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: highlighting on edgeGramTokenized field --> hightlighting incorrect bc. position not incremented..

Posted by Britske <gb...@gmail.com>.

Thanks, I'll check it out. 


Otis Gospodnetic wrote:
> 
> 
> Britske,
> 
> I'd have to dig, but there are a couple of JIRA issues in Lucene's JIRA
> (the actual ngram code is part of Lucene) that have to do with ngram
> positions.  I have a feeling that may be the problem.
> 
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: Britske <gb...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, June 12, 2009 6:15:36 AM
>> Subject: highlighting on edgeGramTokenized field --> hightlighting
>> incorrect bc. position not incremented..
>> 
>> 
>> Hi, 
>> 
>> I'm trying to highlight based on a (multivalued) field (prefix2) that has
>> (among other things) a EdgeNGramFilterFactory defined. 
>> highlighting doesn't increment the start-position of the highlighted
>> portion, so in other words the highlighted portion is always the
>> beginning
>> of the field. 
>> 
>> 
>> 
>> 
>> for example: 
>> for prefix2: "Orlando Verenigde Staten"
>> the query:
>> http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,id&q=prefix2:%22ver%22&wt=xml&hl=true&&hl.fl=prefix2
>> 
>> returns: 
>> Orlando Verenigde Staten
>> while it should be: 
>> Orlando Verenigde Staten
>> 
>> the field def: 
>> 
>> 
>> positionIncrementGap="1">
>>   
>>     
>>     
>>     
>> maxGramSize="20"/>
>>   
>>   
>>     
>>     
>>   
>> 
>> 
>> I checked that removing the EdgeNGramFilterFactory results in correct
>> positioning of  highlighting. (But then I can't search for ngrams...) 
>> 
>> What am I missing? 
>> Thanks in advance, 
>> Britske
>> 
>> 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p24006375.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: highlighting on edgeGramTokenized field --> hightlighting incorrect bc. position not incremented..

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Britske,

I'd have to dig, but there are a couple of JIRA issues in Lucene's JIRA (the actual ngram code is part of Lucene) that have to do with ngram positions.  I have a feeling that may be the problem.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Britske <gb...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, June 12, 2009 6:15:36 AM
> Subject: highlighting on edgeGramTokenized field --> hightlighting incorrect bc. position not incremented..
> 
> 
> Hi, 
> 
> I'm trying to highlight based on a (multivalued) field (prefix2) that has
> (among other things) a EdgeNGramFilterFactory defined. 
> highlighting doesn't increment the start-position of the highlighted
> portion, so in other words the highlighted portion is always the beginning
> of the field. 
> 
> 
> 
> 
> for example: 
> for prefix2: "Orlando Verenigde Staten"
> the query:
> http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,id&q=prefix2:%22ver%22&wt=xml&hl=true&&hl.fl=prefix2
> 
> returns: 
> Orlando Verenigde Staten
> while it should be: 
> Orlando Verenigde Staten
> 
> the field def: 
> 
> 
> positionIncrementGap="1">
>   
>     
>     
>     
> maxGramSize="20"/>
>   
>   
>     
>     
>   
> 
> 
> I checked that removing the EdgeNGramFilterFactory results in correct
> positioning of  highlighting. (But then I can't search for ngrams...) 
> 
> What am I missing? 
> Thanks in advance, 
> Britske
> 
> 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html
> Sent from the Solr - User mailing list archive at Nabble.com.