You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2011/04/03 01:54:57 UTC

Re: Spellchecking Escaped Queries

: I'm having an issue performing a spellcheck on some information and
: search of the archive isn't helping.

For this type of quesiton, there's not much feedback anyone can offer w/o 
knowing exactly what analyzers you have configured for hte various 
fieldtypes (both the field you index/search and the fieldtype used for 
spellchecking)

it's also fairly critical to know how you have the spellcheck component 
configured.

off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a 
wonky way given your usecase -- but like i said: would need to see the 
configs to make a guess.


-Hoss

Re: Spellchecking Escaped Queries

Posted by Colin Vipurs <co...@shazamteam.com>.
Thanks Chris, 

The field used for indexing and spellcheck is the same and is configured
like this:..


<fieldType name="title" stored="true" indexed="true" multiValued="false" class="solr.TextField" >
   <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.PatternReplaceFilterFactory"
    		pattern="^([^!]+)\!([^!]+)$"
    		replacement="$1i$2"
	        replace="all"/> 
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="0" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
         <filter class="solr.ASCIIFoldingFilterFactory"/>
   </analyzer>
</fieldType>


I use the pattern replace filter to swap all instances of "!" within a
word to "i".  I know this part is working correctly as performing a
search works correctly.

The spellcheck is initialized like this:


<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
   <str name="queryAnalyzerFieldType">title</str>
   <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">searchfield</str>
      <str name="spellcheckIndexDir">./spellchecker</str>
      <str name="buildOnCommit">false</str>
   </lst>
</searchComponent>


This is attached as a component to my search handler and spellchecking
is done inline with the queries.

Thanks,

Colin



> : I'm having an issue performing a spellcheck on some information and
> : search of the archive isn't helping.
> 
> For this type of quesiton, there's not much feedback anyone can offer w/o 
> knowing exactly what analyzers you have configured for hte various 
> fieldtypes (both the field you index/search and the fieldtype used for 
> spellchecking)
> 
> it's also fairly critical to know how you have the spellcheck component 
> configured.
> 
> off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a 
> wonky way given your usecase -- but like i said: would need to see the 
> configs to make a guess.
> 
> 
> -Hoss
> 
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> ______________________________________________________________________


-- 


Colin Vipurs
Server Team Lead

Shazam Entertainment Ltd   
26-28 Hammersmith Grove, London W6 7HA
m:   +44 (0) 0000 000 000   t: +44 (0) 20 8742 6820
w:    www.shazam.com

Please consider the environment before printing this document

This e-mail and its contents are strictly private and confidential. It
must not be disclosed, distributed or copied without our prior consent.
If you have received this transmission in error, please notify Shazam
Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it
from your system. Please note that the information contained herein
shall additionally constitute Confidential Information for the purposes
of any NDA between the recipient/s and Shazam Entertainment. Shazam
Entertainment Limited is incorporated in England and Wales under company
number 3998831 and its registered office is at 26-28 Hammersmith Grove,
London W6 7HA. 






______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

Re: Spellchecking Escaped Queries

Posted by Colin Vipurs <co...@shazamteam.com>.
Apologies for the duplicate post.  I'm having Evolution problems


> Thanks Chris, 
> 
> The field used for indexing and spellcheck is the same and is
> configured like this:..
> 
> 
> <fieldType name="title" stored="true" indexed="true" multiValued="false" class="solr.TextField" >
>    <analyzer>
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.PatternReplaceFilterFactory"
>     		pattern="^([^!]+)\!([^!]+)$"
>     		replacement="$1i$2"
> 	        replace="all"/> 
>          <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="0" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
>          <filter class="solr.ASCIIFoldingFilterFactory"/>
>    </analyzer>
> </fieldType>
> 
> 
> I use the pattern replace filter to swap all instances of "!" within a
> word to "i".  I know this part is working correctly as performing a
> search works correctly.
> 
> The spellcheck is initialized like this:
> 
> 
> <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>    <str name="queryAnalyzerFieldType">title</str>
>    <lst name="spellchecker">
>       <str name="name">default</str>
>       <str name="field">searchfield</str>
>       <str name="spellcheckIndexDir">./spellchecker</str>
>       <str name="buildOnCommit">false</str>
>    </lst>
> </searchComponent>
> 
> And is attached to as a component to my search handler.
> 
> Thanks,
> 
> Colin
> 
> 
> > : I'm having an issue performing a spellcheck on some information and
> > : search of the archive isn't helping.
> > 
> > For this type of quesiton, there's not much feedback anyone can offer w/o 
> > knowing exactly what analyzers you have configured for hte various 
> > fieldtypes (both the field you index/search and the fieldtype used for 
> > spellchecking)
> > 
> > it's also fairly critical to know how you have the spellcheck component 
> > configured.
> > 
> > off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a 
> > wonky way given your usecase -- but like i said: would need to see the 
> > configs to make a guess.
> > 
> > 
> > -Hoss
> > 
> > ______________________________________________________________________
> > This email has been scanned by the MessageLabs Email Security System.
> > For more information please visit http://www.messagelabs.com/email 
> > ______________________________________________________________________
> 
> 
> -- 
> 
> 
> Colin Vipurs
> Server Team Lead
> 
> Shazam Entertainment Ltd   
> 26-28 Hammersmith Grove, London W6 7HA
> m:   +44 (0) 0000 000 000   t: +44 (0) 20 8742 6820
> w:    www.shazam.com
> 
> Please consider the environment before printing this document
> 
> This e-mail and its contents are strictly private and confidential. It
> must not be disclosed, distributed or copied without our prior
> consent. If you have received this transmission in error, please
> notify Shazam Entertainment immediately on: +44 (0) 020 8742 6820 and
> then delete it from your system. Please note that the information
> contained herein shall additionally constitute Confidential
> Information for the purposes of any NDA between the recipient/s and
> Shazam Entertainment. Shazam Entertainment Limited is incorporated in
> England and Wales under company number 3998831 and its registered
> office is at 26-28 Hammersmith Grove, London W6 7HA. 
> 
> 
> 
> 
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> ______________________________________________________________________
> 
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> ______________________________________________________________________


-- 


Colin Vipurs
Server Team Lead

Shazam Entertainment Ltd   
26-28 Hammersmith Grove, London W6 7HA
m:   +44 (0) 0000 000 000   t: +44 (0) 20 8742 6820
w:    www.shazam.com

Please consider the environment before printing this document

This e-mail and its contents are strictly private and confidential. It
must not be disclosed, distributed or copied without our prior consent.
If you have received this transmission in error, please notify Shazam
Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it
from your system. Please note that the information contained herein
shall additionally constitute Confidential Information for the purposes
of any NDA between the recipient/s and Shazam Entertainment. Shazam
Entertainment Limited is incorporated in England and Wales under company
number 3998831 and its registered office is at 26-28 Hammersmith Grove,
London W6 7HA. 




______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

Re: Spellchecking Escaped Queries

Posted by Colin Vipurs <co...@shazamteam.com>.
Thanks Chris, 

The field used for indexing and spellcheck is the same and is configured
like this:..


<fieldType name="title" stored="true" indexed="true" multiValued="false" class="solr.TextField" >
   <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.PatternReplaceFilterFactory"
    		pattern="^([^!]+)\!([^!]+)$"
    		replacement="$1i$2"
	        replace="all"/> 
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="0" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
         <filter class="solr.ASCIIFoldingFilterFactory"/>
   </analyzer>
</fieldType>


I use the pattern replace filter to swap all instances of "!" within a
word to "i".  I know this part is working correctly as performing a
search works correctly.

The spellcheck is initialized like this:


<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
   <str name="queryAnalyzerFieldType">title</str>
   <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">searchfield</str>
      <str name="spellcheckIndexDir">./spellchecker</str>
      <str name="buildOnCommit">false</str>
   </lst>
</searchComponent>

And is attached to as a component to my search handler.

Thanks,

Colin


> : I'm having an issue performing a spellcheck on some information and
> : search of the archive isn't helping.
> 
> For this type of quesiton, there's not much feedback anyone can offer w/o 
> knowing exactly what analyzers you have configured for hte various 
> fieldtypes (both the field you index/search and the fieldtype used for 
> spellchecking)
> 
> it's also fairly critical to know how you have the spellcheck component 
> configured.
> 
> off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a 
> wonky way given your usecase -- but like i said: would need to see the 
> configs to make a guess.
> 
> 
> -Hoss
> 
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> ______________________________________________________________________


-- 


Colin Vipurs
Server Team Lead

Shazam Entertainment Ltd   
26-28 Hammersmith Grove, London W6 7HA
m:   +44 (0) 0000 000 000   t: +44 (0) 20 8742 6820
w:    www.shazam.com

Please consider the environment before printing this document

This e-mail and its contents are strictly private and confidential. It
must not be disclosed, distributed or copied without our prior consent.
If you have received this transmission in error, please notify Shazam
Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it
from your system. Please note that the information contained herein
shall additionally constitute Confidential Information for the purposes
of any NDA between the recipient/s and Shazam Entertainment. Shazam
Entertainment Limited is incorporated in England and Wales under company
number 3998831 and its registered office is at 26-28 Hammersmith Grove,
London W6 7HA. 




______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________