You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Britske <gb...@gmail.com> on 2009/06/12 12:15:36 UTC
highlighting on edgeGramTokenized field --> hightlighting incorrect
bc. position not incremented..
Hi,
I'm trying to highlight based on a (multivalued) field (prefix2) that has
(among other things) a EdgeNGramFilterFactory defined.
highlighting doesn't increment the start-position of the highlighted
portion, so in other words the highlighted portion is always the beginning
of the field.
for example:
for prefix2: "Orlando Verenigde Staten"
the query:
http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,id&q=prefix2:%22ver%22&wt=xml&hl=true&&hl.fl=prefix2
returns:
<em>Orl</em>ando Verenigde Staten
while it should be:
Orlando <em>Ver</em>enigde Staten
the field def:
<fieldType name="prefix_token" class="solr.TextField"
positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="20"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
I checked that removing the EdgeNGramFilterFactory results in correct
positioning of highlighting. (But then I can't search for ngrams...)
What am I missing?
Thanks in advance,
Britske
--
View this message in context: http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: highlighting on edgeGramTokenized field --> hightlighting
incorrect bc. position not incremented..
Posted by Britske <gb...@gmail.com>.
Thanks, I'll check it out.
Otis Gospodnetic wrote:
>
>
> Britske,
>
> I'd have to dig, but there are a couple of JIRA issues in Lucene's JIRA
> (the actual ngram code is part of Lucene) that have to do with ngram
> positions. I have a feeling that may be the problem.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Britske <gb...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, June 12, 2009 6:15:36 AM
>> Subject: highlighting on edgeGramTokenized field --> hightlighting
>> incorrect bc. position not incremented..
>>
>>
>> Hi,
>>
>> I'm trying to highlight based on a (multivalued) field (prefix2) that has
>> (among other things) a EdgeNGramFilterFactory defined.
>> highlighting doesn't increment the start-position of the highlighted
>> portion, so in other words the highlighted portion is always the
>> beginning
>> of the field.
>>
>>
>>
>>
>> for example:
>> for prefix2: "Orlando Verenigde Staten"
>> the query:
>> http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,id&q=prefix2:%22ver%22&wt=xml&hl=true&&hl.fl=prefix2
>>
>> returns:
>> Orlando Verenigde Staten
>> while it should be:
>> Orlando Verenigde Staten
>>
>> the field def:
>>
>>
>> positionIncrementGap="1">
>>
>>
>>
>>
>> maxGramSize="20"/>
>>
>>
>>
>>
>>
>>
>>
>> I checked that removing the EdgeNGramFilterFactory results in correct
>> positioning of highlighting. (But then I can't search for ngrams...)
>>
>> What am I missing?
>> Thanks in advance,
>> Britske
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
--
View this message in context: http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p24006375.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: highlighting on edgeGramTokenized field --> hightlighting incorrect bc. position not incremented..
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Britske,
I'd have to dig, but there are a couple of JIRA issues in Lucene's JIRA (the actual ngram code is part of Lucene) that have to do with ngram positions. I have a feeling that may be the problem.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
> From: Britske <gb...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, June 12, 2009 6:15:36 AM
> Subject: highlighting on edgeGramTokenized field --> hightlighting incorrect bc. position not incremented..
>
>
> Hi,
>
> I'm trying to highlight based on a (multivalued) field (prefix2) that has
> (among other things) a EdgeNGramFilterFactory defined.
> highlighting doesn't increment the start-position of the highlighted
> portion, so in other words the highlighted portion is always the beginning
> of the field.
>
>
>
>
> for example:
> for prefix2: "Orlando Verenigde Staten"
> the query:
> http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,id&q=prefix2:%22ver%22&wt=xml&hl=true&&hl.fl=prefix2
>
> returns:
> Orlando Verenigde Staten
> while it should be:
> Orlando Verenigde Staten
>
> the field def:
>
>
> positionIncrementGap="1">
>
>
>
>
> maxGramSize="20"/>
>
>
>
>
>
>
>
> I checked that removing the EdgeNGramFilterFactory results in correct
> positioning of highlighting. (But then I can't search for ngrams...)
>
> What am I missing?
> Thanks in advance,
> Britske
>
>
>
> --
> View this message in context:
> http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html
> Sent from the Solr - User mailing list archive at Nabble.com.