You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Johann Höchtl <h....@ic-drei.de> on 2010/08/24 20:41:31 UTC
Solr creates whitespace in dismax query
I have a fieldtype with the following definition:
<fieldType name="text_kstem" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="openthesaurus.txt" ignoreCase="true" expand="true"/>
<filter class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory" protected="protwords.txt"/>
</analyzer>
</fieldType>
I have a value "blume2000.de" in a field with the fieldtype above. If I issue a query with select?q=blume2000&qt=dismax (yes the provided field gets searched by dismax handler) and
the result is empty. Only if I enter the query select?q=blume+2000&qt=dismax I get the result I want.
So I used the debugQuery=true to find out what's wrong. The interesting thing is, that the rawquerystring is still correct, but the
parsedquery is:
+DisjunctionMaxQuery((name:"blume 2000" | teaser:"blume 2000")) DisjunctionMaxQuery((teaser:"blume 2000"~3 | name:"blume 2000"~3))
Now I gotta ask, where does the whitespace come from and why isn't the document matched?
If I analyze the query using the admin backend: Field(name): name Fieldvalue(Index): blume2000.de and Fieldvalue(Query): blume2000.de it works...
Anybody already had that problem?
Re: Solr creates whitespace in dismax query
Posted by Erick Erickson <er...@gmail.com>.
keywordtokenizerfactory interprets the entire input as a single token, so
this could
be a problem for you. For instance, the text:
bloom2000.de is some text
will get indexed as a single token. Seaches on "some" or "text" won't match.
This
may be what you're looking for, but....
I really think Mitch pointed you in the right direction.
WordDelimiteFilterFactory
was probably part of your problem. The stemmer might have done interesting
things
too.
Also, if you didn't re-index after changing your schema, you might have had
trouble
too.
the admin/analysis page can help you a lot in these situations.
Best
Erick
On Tue, Aug 31, 2010 at 6:34 AM, Johann Höchtl <h....@ic-drei.de> wrote:
> No, it didn't solve the problem, bit I got a different solution. I make a
> second field in schema.xml and copy the content. This field gets analyzed by
> the keywordtokenizer factory.
>
> Thanks,
> Johann
>
> Am 24.08.2010 21:53, schrieb MitchK:
>
> Johann,
>>
>> try to remove the wordDelimiterFilter from the query-analyzer of your
>> fieldType.
>> If your index-analyzer-wordDelimiterFilter is well configured, it will
>> find
>> everything you want.
>>
>> Does this solve the problem?
>>
>> Kind regards,
>> - Mitch
>>
>>
>
Re: Solr creates whitespace in dismax query
Posted by Johann Höchtl <h....@ic-drei.de>.
No, it didn't solve the problem, bit I got a different solution. I make
a second field in schema.xml and copy the content. This field gets
analyzed by the keywordtokenizer factory.
Thanks,
Johann
Am 24.08.2010 21:53, schrieb MitchK:
> Johann,
>
> try to remove the wordDelimiterFilter from the query-analyzer of your
> fieldType.
> If your index-analyzer-wordDelimiterFilter is well configured, it will find
> everything you want.
>
> Does this solve the problem?
>
> Kind regards,
> - Mitch
>
Re: Solr creates whitespace in dismax query
Posted by MitchK <mi...@web.de>.
Johann,
try to remove the wordDelimiterFilter from the query-analyzer of your
fieldType.
If your index-analyzer-wordDelimiterFilter is well configured, it will find
everything you want.
Does this solve the problem?
Kind regards,
- Mitch
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-creates-whitespace-in-dismax-query-tp1317196p1318759.html
Sent from the Solr - User mailing list archive at Nabble.com.