You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by con <co...@gmail.com> on 2009/12/03 17:40:28 UTC

Issues with alphanumeric search terms

Hi

My solr deployment is giving correct results for normal search terms like
"john".
But when i search with "john55" or "55" it will return all the search terms,
including those which neither contains john nor 55. 
Below is the fieldtype defined for this field.

<fieldType name="mytype" class="solr.TextField">
    <analyzer type="index">
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
    </analyzer>
</fieldType>

Is there any other tokenizers or filters need to be set for
alphanumeric/Number search?


-- 
View this message in context: http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26629048.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Issues with alphanumeric search terms

Posted by Erick Erickson <er...@gmail.com>.

as Ahmet says, you need to re-index.

Nothing about WordDelmiterFilterFactory alters case as far as I can tell
from
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Are you applying this in addition to the LowerCaseTokenizerFactory? In
which
case it's too late. The numbers have already been stripped...

Please get a copy of Luke and examine your index to see what actually
gets indexed, it'll give you a *much* better idea of what the various
analyzers actually put in your index.

Best
Erick

On Fri, Dec 4, 2009 at 6:57 AM, AHMET ARSLAN <io...@yahoo.com> wrote:

> > I have added
> >     <filter
> > class="solr.WordDelimiterFilterFactory" catenateAll="1"
> > />
> > to both index and query but still getting same behaviour.
> >
> > Is there any other that i am missing?
> >
>
> Did you re-start tomcat and re-index? Why not use StandardTokenizerFactory?
>
>
>
>

Re: Issues with alphanumeric search terms

Posted by AHMET ARSLAN <io...@yahoo.com>.

> I have added 
>     <filter
> class="solr.WordDelimiterFilterFactory" catenateAll="1"
> />
> to both index and query but still getting same behaviour.
> 
> Is there any other that i am missing?
> 

Did you re-start tomcat and re-index? Why not use StandardTokenizerFactory?

Re: Issues with alphanumeric search terms

Posted by con <co...@gmail.com>.

I have added 
	<filter class="solr.WordDelimiterFilterFactory" catenateAll="1" />
to both index and query but still getting same behaviour.

Is there any other that i am missing?




con wrote:
> 
> Yes. I meant all the indexed documents.
> 
> With debugQuery=on, i got the following result:
> 
> <response>
> −
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> −
> <lst name="params">
> <str name="debugQuery">on</str>
> <str name="indent">on</str>
> <str name="start">0</str>
> <str name="q">(phone:650 AND rowtype:contacts)</str>
> <str name="wt">xml</str>
> <str name="rows">1</str>
> <str name="version">2.2</str>
> </lst>
> </lst>
> −
> <result name="response" numFound="104" start="0">
> −
> <doc>
> <str name="ADDRESS">  </str> 
> <str name="CITY">  </str> 
> <str name="COUNTRY">  </str>
> <date name="CREATEDTIME">2009-09-22T06:50:36.943Z</date> 
> <str name="NAME">Adam</str> 
> <str name="email">adam@abc.com</str>
> <str name="firstname">Adam</str>
> <str name="lastname">smith</str>
> <str name="locale">en_US</str> 
> <str name="phone">  </str>
> <str name="rowtype">contacts</str> 
> </doc>
> </result>
> −
> <lst name="debug">
> <str name="rawquerystring">(phone:650 AND rowtype:contacts)</str>
> <str name="querystring">(phone:650 AND rowtype:contacts)</str>
> <str name="parsedquery">+rowtype:contacts</str>
> <str name="parsedquery_toString">+rowtype:contacts</str>
> −
> <lst name="explain">
> −
> <str name="1030422en_US">
> 
> 0.99043053 = (MATCH) fieldWeight(rowtype:contacts in 0), product of:
>   1.0 = tf(termFreq(rowtype:contacts)=1)
>   0.99043053 = idf(docFreq=104, maxDocs=104)
>   1.0 = fieldNorm(field=rowtype, doc=0)
> </str>
> </lst>
> <str name="QParser">LuceneQParser</str>
> −
> <lst name="timing">
> <double name="time">1.0</double>
> −
> <lst name="prepare">
> <double name="time">0.0</double>
> −
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> −
> <lst name="process">
> <double name="time">1.0</double>
> −
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">1.0</double>
> </lst>
> </lst>
> </lst>
> </lst>
> </response>
> 
> 
> 
> ************************************************************
> 
> 
> Erick Erickson wrote:
>> 
>> Hmmm, what does debugQuery=on show?
>> 
>> And did you mean documents here?
>> << it will return all the search terms>>
>> 
>> Best
>> Erick
>> 
>> On Thu, Dec 3, 2009 at 11:40 AM, con <co...@gmail.com> wrote:
>> 
>>>
>>> Hi
>>>
>>> My solr deployment is giving correct results for normal search terms
>>> like
>>> "john".
>>> But when i search with "john55" or "55" it will return all the search
>>> terms,
>>> including those which neither contains john nor 55.
>>> Below is the fieldtype defined for this field.
>>>
>>> <fieldType name="mytype" class="solr.TextField">
>>>    <analyzer type="index">
>>>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>>>    </analyzer>
>>>    <analyzer type="query">
>>>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>>>    </analyzer>
>>> </fieldType>
>>>
>>> Is there any other tokenizers or filters need to be set for
>>> alphanumeric/Number search?
>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26629048.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26635781.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Issues with alphanumeric search terms

Posted by Erick Erickson <er...@gmail.com>.

hmmmm, I don't think you want LowerCaseTokenizerFactory..

from:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory

Creates org.apache.lucene.analysis.LowerCaseTokenizer.

Creates tokens by lowercasing all letters and dropping non-letters.
Example: "I can't" ==> "i", "can", "t"


also see:
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/LowerCaseTokenizer.html

This seems consistent with this part of your debug query:
<str name="rawquerystring">(phone:
650 AND rowtype:contacts)</str>
<str name="querystring">(phone:650 AND rowtype:contacts)</str>
<str name="parsedquery">+rowtype:contacts</str>
<str name="parsedquery_toString">+rowtype:contacts</str>

Note that the number portion of your original query is
completely missing from the parsed query...

How do you want your input tokenized? Maybe you
want a WhitespaceTokenizer and a LowerCase *filter*?

HTH
Erick



On Thu, Dec 3, 2009 at 2:05 PM, con <co...@gmail.com> wrote:

>
> Yes. I meant all the indexed documents.
>
> With debugQuery=on, i got the following result:
>
> <response>
> -
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> -
> <lst name="params">
> <str name="debugQuery">on</str>
> <str name="indent">on</str>
> <str name="start">0</str>
> <str name="q">(phone:650 AND rowtype:contacts)</str>
> <str name="wt">xml</str>
> <str name="rows">1</str>
> <str name="version">2.2</str>
> </lst>
> </lst>
> -
> <result name="response" numFound="104" start="0">
> -
> <doc>
> <str name="ADDRESS">  </str>
> <str name="CITY">  </str>
> <str name="COUNTRY">  </str>
> <date name="CREATEDTIME">2009-09-22T06:50:36.943Z</date>
> <str name="NAME">Adam</str>
> <str name="email">adam@abc.com</str>
> <str name="firstname">Adam</str>
> <str name="lastname">smith</str>
> <str name="locale">en_US</str>
> <str name="phone">  </str>
> <str name="rowtype">contacts</str>
> </doc>
> </result>
> -
> <lst name="debug">
> <str name="rawquerystring">(phone:650 AND rowtype:contacts)</str>
> <str name="querystring">(phone:650 AND rowtype:contacts)</str>
> <str name="parsedquery">+rowtype:contacts</str>
> <str name="parsedquery_toString">+rowtype:contacts</str>
> -
> <lst name="explain">
> -
> <str name="1030422en_US">
>
> 0.99043053 = (MATCH) fieldWeight(rowtype:contacts in 0), product of:
>  1.0 = tf(termFreq(rowtype:contacts)=1)
>  0.99043053 = idf(docFreq=104, maxDocs=104)
>  1.0 = fieldNorm(field=rowtype, doc=0)
> </str>
> </lst>
> <str name="QParser">LuceneQParser</str>
> -
> <lst name="timing">
> <double name="time">1.0</double>
> -
> <lst name="prepare">
> <double name="time">0.0</double>
> -
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> -
> <lst name="process">
> <double name="time">1.0</double>
> -
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">1.0</double>
> </lst>
> </lst>
> </lst>
> </lst>
> </response>
>
>
>
> ************************************************************
>
>
> Erick Erickson wrote:
> >
> > Hmmm, what does debugQuery=on show?
> >
> > And did you mean documents here?
> > << it will return all the search terms>>
> >
> > Best
> > Erick
> >
> > On Thu, Dec 3, 2009 at 11:40 AM, con <co...@gmail.com> wrote:
> >
> >>
> >> Hi
> >>
> >> My solr deployment is giving correct results for normal search terms
> like
> >> "john".
> >> But when i search with "john55" or "55" it will return all the search
> >> terms,
> >> including those which neither contains john nor 55.
> >> Below is the fieldtype defined for this field.
> >>
> >> <fieldType name="mytype" class="solr.TextField">
> >>    <analyzer type="index">
> >>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
> >>    </analyzer>
> >>    <analyzer type="query">
> >>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
> >>    </analyzer>
> >> </fieldType>
> >>
> >> Is there any other tokenizers or filters need to be set for
> >> alphanumeric/Number search?
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26629048.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26631343.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Issues with alphanumeric search terms

Posted by con <co...@gmail.com>.

Yes. I meant all the indexed documents.

With debugQuery=on, i got the following result:

<response>
−
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
−
<lst name="params">
<str name="debugQuery">on</str>
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">(phone:650 AND rowtype:contacts)</str>
<str name="wt">xml</str>
<str name="rows">1</str>
<str name="version">2.2</str>
</lst>
</lst>
−
<result name="response" numFound="104" start="0">
−
<doc>
<str name="ADDRESS">  </str> 
<str name="CITY">  </str> 
<str name="COUNTRY">  </str>
<date name="CREATEDTIME">2009-09-22T06:50:36.943Z</date> 
<str name="NAME">Adam</str> 
<str name="email">adam@abc.com</str>
<str name="firstname">Adam</str>
<str name="lastname">smith</str>
<str name="locale">en_US</str> 
<str name="phone">  </str>
<str name="rowtype">contacts</str> 
</doc>
</result>
−
<lst name="debug">
<str name="rawquerystring">(phone:650 AND rowtype:contacts)</str>
<str name="querystring">(phone:650 AND rowtype:contacts)</str>
<str name="parsedquery">+rowtype:contacts</str>
<str name="parsedquery_toString">+rowtype:contacts</str>
−
<lst name="explain">
−
<str name="1030422en_US">

0.99043053 = (MATCH) fieldWeight(rowtype:contacts in 0), product of:
  1.0 = tf(termFreq(rowtype:contacts)=1)
  0.99043053 = idf(docFreq=104, maxDocs=104)
  1.0 = fieldNorm(field=rowtype, doc=0)
</str>
</lst>
<str name="QParser">LuceneQParser</str>
−
<lst name="timing">
<double name="time">1.0</double>
−
<lst name="prepare">
<double name="time">0.0</double>
−
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>
−
<lst name="process">
<double name="time">1.0</double>
−
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
−
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">1.0</double>
</lst>
</lst>
</lst>
</lst>
</response>



************************************************************


Erick Erickson wrote:
> 
> Hmmm, what does debugQuery=on show?
> 
> And did you mean documents here?
> << it will return all the search terms>>
> 
> Best
> Erick
> 
> On Thu, Dec 3, 2009 at 11:40 AM, con <co...@gmail.com> wrote:
> 
>>
>> Hi
>>
>> My solr deployment is giving correct results for normal search terms like
>> "john".
>> But when i search with "john55" or "55" it will return all the search
>> terms,
>> including those which neither contains john nor 55.
>> Below is the fieldtype defined for this field.
>>
>> <fieldType name="mytype" class="solr.TextField">
>>    <analyzer type="index">
>>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>>    </analyzer>
>>    <analyzer type="query">
>>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>>    </analyzer>
>> </fieldType>
>>
>> Is there any other tokenizers or filters need to be set for
>> alphanumeric/Number search?
>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26629048.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26631343.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Issues with alphanumeric search terms

Posted by Erick Erickson <er...@gmail.com>.

Hmmm, what does debugQuery=on show?

And did you mean documents here?
<< it will return all the search terms>>

Best
Erick

On Thu, Dec 3, 2009 at 11:40 AM, con <co...@gmail.com> wrote:

>
> Hi
>
> My solr deployment is giving correct results for normal search terms like
> "john".
> But when i search with "john55" or "55" it will return all the search
> terms,
> including those which neither contains john nor 55.
> Below is the fieldtype defined for this field.
>
> <fieldType name="mytype" class="solr.TextField">
>    <analyzer type="index">
>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>    </analyzer>
>    <analyzer type="query">
>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>    </analyzer>
> </fieldType>
>
> Is there any other tokenizers or filters need to be set for
> alphanumeric/Number search?
>
>
> --
> View this message in context:
> http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26629048.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>