You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Qwerky <ne...@hmv.co.uk> on 2010/08/03 18:34:58 UTC

Multi word synomyms

I'm having trouble getting multi word synonyms to work. As an example I have
the following synonym;

exercise dvds => fitness

When I search for exercise dvds I want to return all docs in the index which
contain the keyword fitness. I've read the wiki about
solr.SynonymFilterFactory which recommends expanding the synonym when
indexing, but I'm not sure this is what I want as none of my documents have
the keywords exercise dvds.

Here is the field definition from my schema.xml;



	
		
		
		
		
		
	
	
		
		
		
		
		
		
	



When I test my search with the analysis page on the admin console it seems
to work fine;

Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory   {}


term position
1	2


term text
exercise	dvds

term type
word	word

source start,end
0,8	9,13


payload


org.apache.solr.analysis.SynonymFilterFactory   {ignoreCase=true,
synonyms=synonyms.txt, expand=true}


term position
1

term text
fitness


term type
word

source start,end
0,13

payload


org.apache.solr.analysis.TrimFilterFactory   {}



term position
1

term text
fitness

term type
word


source start,end
0,13

payload


org.apache.solr.analysis.StopFilterFactory   {ignoreCase=true,
enablePositionIncrements=true, words=stopwords.txt}


term position
1


term text
fitness

term type
word

source start,end
0,13

payload



org.apache.solr.analysis.LowerCaseFilterFactory   {}


term position
1

term text
fitness

term type

word

source start,end
0,13

payload


org.apache.solr.analysis.SnowballPorterFilterFactory   {language=English,
protected=protwords.txt}


term position

1

term text
fit

term type
word

source start,end
0,13


payload



...but when I perform the search it doesn't seem to use the
SynonymFilterFactory;



 0
 0
 
  exercise dvds
  0

  on
  
  standard
  
  
  2.2
  standard

  on
  *,score
  10
 
.....

 exercise dvds
 exercise dvds
 PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds

 PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds


-- 
View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1019722.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Multi word synomyms

Posted by Markus Jelsma <ma...@buyways.nl>.
Hi,

 

This happens because your tokenizer will generate seperate tokens for `exercise dvds`, so the SynonymFilter will try to find declared synonyms for `exercise` and `dvds` separately. It's behavior is documented [1] on the wiki.

 

[1]: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

 

Cheers,
 
-----Original message-----
From: Qwerky <ne...@hmv.co.uk>
Sent: Tue 03-08-2010 18:35
To: solr-user@lucene.apache.org; 
Subject: Multi word synomyms


I'm having trouble getting multi word synonyms to work. As an example I have
the following synonym;

exercise dvds => fitness

When I search for exercise dvds I want to return all docs in the index which
contain the keyword fitness. I've read the wiki about
solr.SynonymFilterFactory which recommends expanding the synonym when
indexing, but I'm not sure this is what I want as none of my documents have
the keywords exercise dvds.

Here is the field definition from my schema.xml;





















When I test my search with the analysis page on the admin console it seems
to work fine;

Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory   {}


term position
12


term text
exercisedvds

term type
wordword

source start,end
0,89,13


payload


org.apache.solr.analysis.SynonymFilterFactory   {ignoreCase=true,
synonyms=synonyms.txt, expand=true}


term position
1

term text
fitness


term type
word

source start,end
0,13

payload


org.apache.solr.analysis.TrimFilterFactory   {}



term position
1

term text
fitness

term type
word


source start,end
0,13

payload


org.apache.solr.analysis.StopFilterFactory   {ignoreCase=true,
enablePositionIncrements=true, words=stopwords.txt}


term position
1


term text
fitness

term type
word

source start,end
0,13

payload



org.apache.solr.analysis.LowerCaseFilterFactory   {}


term position
1

term text
fitness

term type

word

source start,end
0,13

payload


org.apache.solr.analysis.SnowballPorterFilterFactory   {language=English,
protected=protwords.txt}


term position

1

term text
fit

term type
word

source start,end
0,13


payload



...but when I perform the search it doesn't seem to use the
SynonymFilterFactory;



0
0

 exercise dvds
 0

 on
 
 standard
 
 
 2.2
 standard

 on
 *,score
 10

.....

exercise dvds
exercise dvds
PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds

PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds


-- 
View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1019722.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi word synomyms

Posted by Qwerky <ne...@hmv.co.uk>.
It would be nice if you could configure some kind of filter to be processed
before the query string is passed to the parser. The QueryComponent class
seems a nice place for this; a filter could be run against the raw query and
ResponseBuilder's queryString value could be modified before the QParser is
created.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1022461.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi word synomyms

Posted by Michael McCandless <lu...@mikemccandless.com>.
Unfortunately, Lucene's QueryParser pre-splits all incoming text on
whitespace, which means your search-time analyzer never has a chance
to detect the multi-word synonym.  Ie, your analyzer is invoked twice.
 Once with "exercise" and once with "dvds".

We need to fix that... but it's not exactly clear how.  The
QueryParser/Analyzer interaction is tricky.

Mike

On Tue, Aug 3, 2010 at 12:34 PM, Qwerky <ne...@hmv.co.uk> wrote:
>
> I'm having trouble getting multi word synonyms to work. As an example I have
> the following synonym;
>
> exercise dvds => fitness
>
> When I search for exercise dvds I want to return all docs in the index which
> contain the keyword fitness. I've read the wiki about
> solr.SynonymFilterFactory which recommends expanding the synonym when
> indexing, but I'm not sure this is what I want as none of my documents have
> the keywords exercise dvds.
>
> Here is the field definition from my schema.xml;
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> When I test my search with the analysis page on the admin console it seems
> to work fine;
>
> Query Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory   {}
>
>
> term position
> 1       2
>
>
> term text
> exercise        dvds
>
> term type
> word    word
>
> source start,end
> 0,8     9,13
>
>
> payload
>
>
> org.apache.solr.analysis.SynonymFilterFactory   {ignoreCase=true,
> synonyms=synonyms.txt, expand=true}
>
>
> term position
> 1
>
> term text
> fitness
>
>
> term type
> word
>
> source start,end
> 0,13
>
> payload
>
>
> org.apache.solr.analysis.TrimFilterFactory   {}
>
>
>
> term position
> 1
>
> term text
> fitness
>
> term type
> word
>
>
> source start,end
> 0,13
>
> payload
>
>
> org.apache.solr.analysis.StopFilterFactory   {ignoreCase=true,
> enablePositionIncrements=true, words=stopwords.txt}
>
>
> term position
> 1
>
>
> term text
> fitness
>
> term type
> word
>
> source start,end
> 0,13
>
> payload
>
>
>
> org.apache.solr.analysis.LowerCaseFilterFactory   {}
>
>
> term position
> 1
>
> term text
> fitness
>
> term type
>
> word
>
> source start,end
> 0,13
>
> payload
>
>
> org.apache.solr.analysis.SnowballPorterFilterFactory   {language=English,
> protected=protwords.txt}
>
>
> term position
>
> 1
>
> term text
> fit
>
> term type
> word
>
> source start,end
> 0,13
>
>
> payload
>
>
>
> ...but when I perform the search it doesn't seem to use the
> SynonymFilterFactory;
>
>
>
>  0
>  0
>
>  exercise dvds
>  0
>
>  on
>
>  standard
>
>
>  2.2
>  standard
>
>  on
>  *,score
>  10
>
> .....
>
>  exercise dvds
>  exercise dvds
>  PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds
>
>  PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1019722.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>