You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Ghorayeb <de...@hotmail.com> on 2010/06/28 14:48:15 UTC

Strange query behavior

Hello,
I have a title that says "3DVIA Studio &amp; Virtools Maya and 3dsMax Exporters". The analysis tool for this field gives me these tokens:3dviadviastudio&;virtoolmaya3dsmaxdssystèmmaxexport


However, when i search for "3dsmax", i get no results :( Furthermore, if i search for "dsmax" i get the spellchecker that suggests me "3dsmax" even though it doesn't find any results. If i search for any other token ("3dvia", or "max" for example), the document is found. "3dsmax" is the only token that doesn't seem to work!! :(
Here is my schema for this field:<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
				
		<filter class="solr.WordDelimiterFilterFactory"
			generateWordParts="1"
			generateNumberParts="1"
			catenateWords="0"
			catenateNumbers="0"
			catenateAll="0"
			splitOnCaseChange="1"
			preserveOriginal="1"
		/>
		
		<filter class="solr.TrimFilterFactory" updateOffsets="true"/>
		<filter class="solr.LengthFilterFactory" min="2" max="15"/>		<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />		<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
				
		<filter class="solr.LowerCaseFilterFactory"/>
		<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
		<filter class="solr.SnowballPorterFilterFactory" language="${Language}" protected="protwords.txt"/>
	</analyzer>
	
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />
		
		<filter class="solr.WordDelimiterFilterFactory"
			generateWordParts="1"
			generateNumberParts="1"
			catenateWords="1"
			catenateNumbers="1"
			catenateAll="0"
			splitOnCaseChange="1"
			preserveOriginal="1"
		/>
		
		<filter class="solr.TrimFilterFactory" updateOffsets="true"/>
		<filter class="solr.LengthFilterFactory" min="2" max="15"/>
		<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
		<filter class="solr.SnowballPorterFilterFactory" language="${Language}" protected="protwords.txt" />
	</analyzer>
</fieldType>
Can anyone help me out please? :(
PS: the ${Language} is set to "en" (for english) in this case...
 		 	   		  
_________________________________________________________________
La boîte mail NOW Génération vous permet de réunir toutes vos boîtes mail dans Hotmail !
http://www.windowslive.fr/hotmail/nowgeneration/

Re: Strange query behavior

Posted by Joe Calderon <ca...@gmail.com>.
splitOnCaseChange is creating multiple tokens from 3dsMax disable it
or enable catenateAll, use the analysys page in the admin tool to see
exactly how your text will be indexed by analyzers without having to
reindex your documents, once you have it right you can do a full
reindex.

On Mon, Jun 28, 2010 at 5:48 AM, Marc Ghorayeb <de...@hotmail.com> wrote:
>
> Hello,
> I have a title that says "3DVIA Studio & Virtools Maya and 3dsMax Exporters". The analysis tool for this field gives me these tokens:3dviadviastudio&;virtoolmaya3dsmaxdssystèmmaxexport
>
>
> However, when i search for "3dsmax", i get no results :( Furthermore, if i search for "dsmax" i get the spellchecker that suggests me "3dsmax" even though it doesn't find any results. If i search for any other token ("3dvia", or "max" for example), the document is found. "3dsmax" is the only token that doesn't seem to work!! :(
> Here is my schema for this field:<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>        <analyzer type="index">
>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
>                <filter class="solr.WordDelimiterFilterFactory"
>                        generateWordParts="1"
>                        generateNumberParts="1"
>                        catenateWords="0"
>                        catenateNumbers="0"
>                        catenateAll="0"
>                        splitOnCaseChange="1"
>                        preserveOriginal="1"
>                />
>
>                <filter class="solr.TrimFilterFactory" updateOffsets="true"/>
>                <filter class="solr.LengthFilterFactory" min="2" max="15"/>             <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />               <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                <filter class="solr.SnowballPorterFilterFactory" language="${Language}" protected="protwords.txt"/>
>        </analyzer>
>
>        <analyzer type="query">
>                <tokenizer class="solr.WhitespaceTokenizerFactory" />
>
>                <filter class="solr.WordDelimiterFilterFactory"
>                        generateWordParts="1"
>                        generateNumberParts="1"
>                        catenateWords="1"
>                        catenateNumbers="1"
>                        catenateAll="0"
>                        splitOnCaseChange="1"
>                        preserveOriginal="1"
>                />
>
>                <filter class="solr.TrimFilterFactory" updateOffsets="true"/>
>                <filter class="solr.LengthFilterFactory" min="2" max="15"/>
>                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>                <filter class="solr.LowerCaseFilterFactory" />
>                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>                <filter class="solr.SnowballPorterFilterFactory" language="${Language}" protected="protwords.txt" />
>        </analyzer>
> </fieldType>
> Can anyone help me out please? :(
> PS: the ${Language} is set to "en" (for english) in this case...
>
> _________________________________________________________________
> La boîte mail NOW Génération vous permet de réunir toutes vos boîtes mail dans Hotmail !
> http://www.windowslive.fr/hotmail/nowgeneration/