You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by prerna07 <pk...@sapient.com> on 2008/10/17 06:36:40 UTC

Synonym format not working

Hi,

I am facing issue in synonym search of solr. The synonym.txt contain the
format:

ccc => cccc1,cccc2,ccc
ccc => cccc3

I am not getting any search result for ccc. I have created indexes with
string value.

Do i need to change anything in schema .xml ?

 String tag from Schema.xml : 
 <fieldType name="string" class="solr.StrField" sortMissingLast="true"
omitNorms="true">
	 <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
</fieldType>

Any pointers to solve the issue?

Thanks,
Prerna


-- 
View this message in context: http://www.nabble.com/Synonym--format-not-working-tp20026988p20026988.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Synonym format not working

Posted by Norberto Meijome <nu...@gmail.com>.
On Mon, 20 Oct 2008 00:08:07 -0700 (PDT)
prerna07 <pk...@sapient.com> wrote:

> 
> 
> The issue with synonym arise when i have number in synonym defination:
> 
> ccc =>cccc1,cccc2 gives following result in debugQuery= true :
>  <str name="parsedquery">MultiPhraseQuery(all:"cccc (1 cccc) (2 ccc cccc)
> 3")</str> 
>   <str name="parsedquery_toString">all:"cccc (1 cccc) (2 ccc cccc) 3"</str> 
> 
> However fooaaa=> fooaaa, baraaa,bazaaa gives correct synonym results:
> 
>   <str name="parsedquery">all:fooaaa all:baraaa all:bazaaa</str> 
>   <str name="parsedquery_toString">all:fooaaa all:baraaa all:bazaaa</str> 
> 
> Any pointers to solve the issue with numbers in synonyms?

Prerna,
in your first email you show your field type has :

[...]
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
[..]

generateNumberParts=1 will, AFAIK, generate a different token on a number. so
ccc1 will be indexed as "ccc", "1"  . If you use admin/analsys.jsp you can see
the step by step process taken by the tokenizer + filters for your data type -
you can then tweak it as necessary until you are happy with the results.

b
_________________________
{Beto|Norberto|Numard} Meijome

Immediate success shouldn't be necessary as a motivation to do the right thing.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Re: Synonym format not working

Posted by prerna07 <pk...@sapient.com>.

The issue with synonym arise when i have number in synonym defination:

ccc =>cccc1,cccc2 gives following result in debugQuery= true :
 <str name="parsedquery">MultiPhraseQuery(all:"cccc (1 cccc) (2 ccc cccc)
3")</str> 
  <str name="parsedquery_toString">all:"cccc (1 cccc) (2 ccc cccc) 3"</str> 

However fooaaa=> fooaaa, baraaa,bazaaa gives correct synonym results:

  <str name="parsedquery">all:fooaaa all:baraaa all:bazaaa</str> 
  <str name="parsedquery_toString">all:fooaaa all:baraaa all:bazaaa</str> 

Any pointers to solve the issue with numbers in synonyms?

Thanks,
Prerna




hossman wrote:
> 
> 
> : I am not getting any search result for ccc. I have created indexes with
> : string value.
> : 
> : Do i need to change anything in schema .xml ?
> : 
> :  String tag from Schema.xml : 
> :  <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> : omitNorms="true">
> : 	 <analyzer type="query">
> 
> StrField doesn't use an <analyzer> ... if you look at the values you've 
> indexed with the LukeRequestHandler you'll see that the literal values are 
> in your index ... you'll want to change that to "solr.TextField"
> 
> Most likely, you'll want to add an <analyzer type="index"> as well, 
> otherwise the same analyser will be used at index and at query time (so 
> you'll get synonym and worddelim expansion in both cases, and things won't 
> match the way you expect.
> 
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Synonym--format-not-working-tp20026988p20064274.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Synonym format not working

Posted by Chris Hostetter <ho...@fucit.org>.
: I am not getting any search result for ccc. I have created indexes with
: string value.
: 
: Do i need to change anything in schema .xml ?
: 
:  String tag from Schema.xml : 
:  <fieldType name="string" class="solr.StrField" sortMissingLast="true"
: omitNorms="true">
: 	 <analyzer type="query">

StrField doesn't use an <analyzer> ... if you look at the values you've 
indexed with the LukeRequestHandler you'll see that the literal values are 
in your index ... you'll want to change that to "solr.TextField"

Most likely, you'll want to add an <analyzer type="index"> as well, 
otherwise the same analyser will be used at index and at query time (so 
you'll get synonym and worddelim expansion in both cases, and things won't 
match the way you expect.




-Hoss


Re: Synonym format not working

Posted by prerna07 <pk...@sapient.com>.
Actual synonym :
ccc => cccc1,cccc2
ccc=>cccc3

The result when i added &dubugQuery=true is:

 <?xml version="1.0" encoding="UTF-8" ?> 
- <response>
- <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">15</int> 
- <lst name="params">
  <str name="rows">10</str> 
  <str name="start">0</str> 
  <str name="indent">on</str> 
  <str name="q">ccc</str> 
  <str name="debugQuery">true</str> 
  <str name="version">2.2</str> 
  </lst>
  </lst>
  <result name="response" numFound="0" start="0" /> 
- <lst name="debug">
  <str name="rawquerystring">ccc</str> 
  <str name="querystring">ccc</str> 
  <str name="parsedquery">MultiPhraseQuery(all:"cccc (1 cccc) (2 ccc cccc)
3")</str> 
  <str name="parsedquery_toString">all:"cccc (1 cccc) (2 ccc cccc) 3"</str> 
  <lst name="explain" /> 
  <str name="QParser">OldLuceneQParser</str> 
- <lst name="timing">
  <double name="time">8.0</double> 
- <lst name="prepare">
  <double name="time">2.0</double> 
- <lst name="org.apache.solr.handler.component.QueryComponent">
  <double name="time">1.0</double> 
  </lst>
- <lst name="org.apache.solr.handler.component.FacetComponent">
  <double name="time">0.0</double> 
  </lst>
- <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
  <double name="time">0.0</double> 
  </lst>
- <lst name="org.apache.solr.handler.component.HighlightComponent">
  <double name="time">0.0</double> 
  </lst>
- <lst name="org.apache.solr.handler.component.DebugComponent">
  <double name="time">0.0</double> 
  </lst>
  </lst>
- <lst name="process">
  <double name="time">4.0</double> 
- <lst name="org.apache.solr.handler.component.QueryComponent">
  <double name="time">2.0</double> 
  </lst>
- <lst name="org.apache.solr.handler.component.FacetComponent">
  <double name="time">0.0</double> 
  </lst>
- <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
  <double name="time">0.0</double> 
  </lst>
- <lst name="org.apache.solr.handler.component.HighlightComponent">
  <double name="time">0.0</double> 
  </lst>
- <lst name="org.apache.solr.handler.component.DebugComponent">
  <double name="time">2.0</double> 
  </lst>
  </lst>
  </lst>
  </lst>
  </response>



Otis Gospodnetic wrote:
> 
> I can't see the problem at the moment.  What do you see when you use
> &debugQuery=true in the URL?  Do you see the query that includes synonyms? 
> Can you give us the actual query and actual synonyms?
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: prerna07 <pk...@sapient.com>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, October 17, 2008 12:36:40 AM
>> Subject: Synonym  format not working
>> 
>> 
>> Hi,
>> 
>> I am facing issue in synonym search of solr. The synonym.txt contain the
>> format:
>> 
>> ccc => cccc1,cccc2,ccc
>> ccc => cccc3
>> 
>> I am not getting any search result for ccc. I have created indexes with
>> string value.
>> 
>> Do i need to change anything in schema .xml ?
>> 
>> String tag from Schema.xml : 
>> 
>> omitNorms="true">
>>     
>>         
>>         
>> ignoreCase="true" expand="true"/>
>>         
>> words="stopwords.txt"/>
>>         
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>         
>>         
>> protected="protwords.txt"/>
>>         
>>       
>> 
>> 
>> Any pointers to solve the issue?
>> 
>> Thanks,
>> Prerna
>> 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Synonym--format-not-working-tp20026988p20026988.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Synonym--format-not-working-tp20026988p20027720.html
Sent from the Solr - User mailing list archive at Nabble.com.