You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by harsh kapoor <ha...@gmail.com> on 2013/11/28 09:57:08 UTC

Inconsistent highlighting in Solr

I have indexed data using Solr.I want to highlight matched keyword in
search results. highlighting is inconsistent.
eg. if search keyword is 'alonso'.

highlighted instances are: *Alonso*,fernando_*alonso*,**#Alonso**MeetVettel

non-highlightes instances are : @fernandoalonso, www.alonsodriver.com

Can anyone tell me why is that?

I am using this configuration-

  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/>
  </analyzer>
</fieldType>

-- 
Harsh Kapoor
Developer
Serendio Softwares Pvt ltd.
Contact: 7401551935,9571702158

Re: Inconsistent highlighting in Solr

Posted by harsh kapoor <ha...@gmail.com>.
Hi Ahmet,

Now things are making sense.Thank you for your reply.


On Thu, Nov 28, 2013 at 3:26 PM, Ahmet Arslan <io...@yahoo.com> wrote:

> Hi Hars,
>
> Highlighted text samples are matching because of
>  WordDelimiterFilterFactory splits them. You can see/test the behaviour of
> your fieldType name="text" at analysis page.
>
>
>
> On Thursday, November 28, 2013 11:51 AM, harsh kapoor <
> harshlnmiit@gmail.com> wrote:
>
> Hi Ahmet,
>
> Thanks for your reply but i am still not clear on this.Why highlighting
> occurs in text (fernando_*alonso, *Fernando*Alonso*(CamelCase) ) these are
> also words and Solr is highlighting inside words.
>
> But no highlighting takes place in lowercase 'fernandoalonso'. why is this?
>
>
>
>
>
> On Thu, Nov 28, 2013 at 2:58 PM, Ahmet Arslan <io...@yahoo.com> wrote:
>
> > Hi Harsh,
> >
> > Your query 'alonso' is not matching the text in your non-highlighted
> > instance examples. Thats why they are not highlighted. It seems that you
> > want to be able to search inside words too. You can use wildcard operator
> > for this. Please see for similar discussion:
> > http://search-lucene.com/m/HiKY02e1KgI1
> >
> >
> >
> > On Thursday, November 28, 2013 10:57 AM, harsh kapoor <
> > harshlnmiit@gmail.com> wrote:
> >
> > I have indexed data using Solr.I want to highlight matched keyword in
> > search results. highlighting is inconsistent.
> > eg. if search keyword is 'alonso'.
> >
> > highlighted instances are:
> *Alonso*,fernando_*alonso*,**#Alonso**MeetVettel
> >
> > non-highlightes instances are : @fernandoalonso, www.alonsodriver.com
> >
> > Can anyone tell me why is that?
> >
> > I am using this configuration-
> >
> >   <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
> >   <analyzer type="index">
> >     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >     <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"/>
> >     <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >     <filter class="solr.LowerCaseFilterFactory"/>
> >     <filter class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/>
> >   </analyzer>
> >   <analyzer type="query">
> >     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >     <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"/>
> >     <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >     <filter class="solr.LowerCaseFilterFactory"/>
> >     <filter class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/>
> >   </analyzer>
> > </fieldType>
> >
> > --
> > Harsh Kapoor
> > Developer
> > Serendio Softwares Pvt ltd.
> > Contact: 7401551935,9571702158
>
> >
>
>
>
> --
> Harsh Kapoor
> Developer
> Serendio Softwares Pvt ltd.
> Contact: 7401551935,9571702158
>



-- 
Harsh Kapoor
Developer
Serendio Softwares Pvt ltd.
Contact: 7401551935,9571702158

Re: Inconsistent highlighting in Solr

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Hars,

Highlighted text samples are matching because of  WordDelimiterFilterFactory splits them. You can see/test the behaviour of your fieldType name="text" at analysis page. 



On Thursday, November 28, 2013 11:51 AM, harsh kapoor <ha...@gmail.com> wrote:
 
Hi Ahmet,

Thanks for your reply but i am still not clear on this.Why highlighting
occurs in text (fernando_*alonso, *Fernando*Alonso*(CamelCase) ) these are
also words and Solr is highlighting inside words.

But no highlighting takes place in lowercase 'fernandoalonso'. why is this?





On Thu, Nov 28, 2013 at 2:58 PM, Ahmet Arslan <io...@yahoo.com> wrote:

> Hi Harsh,
>
> Your query 'alonso' is not matching the text in your non-highlighted
> instance examples. Thats why they are not highlighted. It seems that you
> want to be able to search inside words too. You can use wildcard operator
> for this. Please see for similar discussion:
> http://search-lucene.com/m/HiKY02e1KgI1
>
>
>
> On Thursday, November 28, 2013 10:57 AM, harsh kapoor <
> harshlnmiit@gmail.com> wrote:
>
> I have indexed data using Solr.I want to highlight matched keyword in
> search results. highlighting is inconsistent.
> eg. if search keyword is 'alonso'.
>
> highlighted instances are: *Alonso*,fernando_*alonso*,**#Alonso**MeetVettel
>
> non-highlightes instances are : @fernandoalonso, www.alonsodriver.com
>
> Can anyone tell me why is that?
>
> I am using this configuration-
>
>   <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>   <analyzer type="index">
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>     <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>   </analyzer>
>   <analyzer type="query">
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>     <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>   </analyzer>
> </fieldType>
>
> --
> Harsh Kapoor
> Developer
> Serendio Softwares Pvt ltd.
> Contact: 7401551935,9571702158

>



-- 
Harsh Kapoor
Developer
Serendio Softwares Pvt ltd.
Contact: 7401551935,9571702158

Re: Inconsistent highlighting in Solr

Posted by harsh kapoor <ha...@gmail.com>.
Hi Ahmet,

Thanks for your reply but i am still not clear on this.Why highlighting
occurs in text (fernando_*alonso, *Fernando*Alonso*(CamelCase) ) these are
also words and Solr is highlighting inside words.

But no highlighting takes place in lowercase 'fernandoalonso'. why is this?





On Thu, Nov 28, 2013 at 2:58 PM, Ahmet Arslan <io...@yahoo.com> wrote:

> Hi Harsh,
>
> Your query 'alonso' is not matching the text in your non-highlighted
> instance examples. Thats why they are not highlighted. It seems that you
> want to be able to search inside words too. You can use wildcard operator
> for this. Please see for similar discussion:
> http://search-lucene.com/m/HiKY02e1KgI1
>
>
>
> On Thursday, November 28, 2013 10:57 AM, harsh kapoor <
> harshlnmiit@gmail.com> wrote:
>
> I have indexed data using Solr.I want to highlight matched keyword in
> search results. highlighting is inconsistent.
> eg. if search keyword is 'alonso'.
>
> highlighted instances are: *Alonso*,fernando_*alonso*,**#Alonso**MeetVettel
>
> non-highlightes instances are : @fernandoalonso, www.alonsodriver.com
>
> Can anyone tell me why is that?
>
> I am using this configuration-
>
>   <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>   <analyzer type="index">
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>     <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>   </analyzer>
>   <analyzer type="query">
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>     <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>   </analyzer>
> </fieldType>
>
> --
> Harsh Kapoor
> Developer
> Serendio Softwares Pvt ltd.
> Contact: 7401551935,9571702158
>



-- 
Harsh Kapoor
Developer
Serendio Softwares Pvt ltd.
Contact: 7401551935,9571702158

Re: Inconsistent highlighting in Solr

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Harsh,

Your query 'alonso' is not matching the text in your non-highlighted instance examples. Thats why they are not highlighted. It seems that you want to be able to search inside words too. You can use wildcard operator for this. Please see for similar discussion: http://search-lucene.com/m/HiKY02e1KgI1



On Thursday, November 28, 2013 10:57 AM, harsh kapoor <ha...@gmail.com> wrote:
 
I have indexed data using Solr.I want to highlight matched keyword in
search results. highlighting is inconsistent.
eg. if search keyword is 'alonso'.

highlighted instances are: *Alonso*,fernando_*alonso*,**#Alonso**MeetVettel

non-highlightes instances are : @fernandoalonso, www.alonsodriver.com

Can anyone tell me why is that?

I am using this configuration-

  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/>
  </analyzer>
</fieldType>

-- 
Harsh Kapoor
Developer
Serendio Softwares Pvt ltd.
Contact: 7401551935,9571702158