You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Tarun Jain <tj...@yahoo.com> on 2009/09/28 21:54:25 UTC

alphanumeric queries using LuceneQParser

Hi,
I have created an index where the fields have been indexed with 
omitNorms="true" omitTermFreqAndPositions="true" 
to improve indexing performance. One of the side effects of this is that some of the searches with alphanumeric words are not working correctly.
Example..
Below is the debugQuery part of a query response
===============================================
<lst name="debug">
  <str name="rawquerystring">text_ar:1SAM550000R1009</str> 
  <str name="querystring">text_ar:1SAM550000R1009</str> 
  <str name="parsedquery">PhraseQuery(text_ar:"1 sam 550000 r 1009")</str> 
  <str name="parsedquery_toString">text_ar:"1 sam 550000 r 1009"</str> 
  <lst name="explain" /> 
  <str name="QParser">LuceneQParser</str> 
</lst>
===============================================

Also I have changed the definition of the text fieldType in the schema.xml to this (removed the WorkDelimiterFilterFactory)..
===============================================================
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
    </analyzer>
</fieldType>
=============================================================

I would like the query parser to not breakup alphanumeric query parameters.
How do I do this?

Tarun
-=-

Re: alphanumeric queries using LuceneQParser

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, Sep 28, 2009 at 3:54 PM, Tarun Jain <tj...@yahoo.com> wrote:
> Hi,
> I have created an index where the fields have been indexed with
> omitNorms="true" omitTermFreqAndPositions="true"
> to improve indexing performance. One of the side effects of this is that some of the searches with alphanumeric words are not working correctly.
> Example..
> Below is the debugQuery part of a query response
> ===============================================
> <lst name="debug">
>  <str name="rawquerystring">text_ar:1SAM550000R1009</str>
>  <str name="querystring">text_ar:1SAM550000R1009</str>
>  <str name="parsedquery">PhraseQuery(text_ar:"1 sam 550000 r 1009")</str>
>  <str name="parsedquery_toString">text_ar:"1 sam 550000 r 1009"</str>
>  <lst name="explain" />
>  <str name="QParser">LuceneQParser</str>
> </lst>
> ===============================================
>
> Also I have changed the definition of the text fieldType in the schema.xml to this (removed the WorkDelimiterFilterFactory)..

Removing the WordDelimiterFilterFactory should have done it.
Make sure you restarted the server and reindexed all of the documents.

-Yonik
http://www.lucidimagination.com



> ===============================================================
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>    <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>        <filter class="solr.LowerCaseFilterFactory" />
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>    </analyzer>
>    <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory" />
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
>        <filter class="solr.LowerCaseFilterFactory" />
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>    </analyzer>
> </fieldType>
> =============================================================
>
> I would like the query parser to not breakup alphanumeric query parameters.
> How do I do this?
>
> Tarun
> -=-
>