You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Qwerky <ne...@hmv.co.uk> on 2010/08/03 18:34:58 UTC
Multi word synomyms
I'm having trouble getting multi word synonyms to work. As an example I have
the following synonym;
exercise dvds => fitness
When I search for exercise dvds I want to return all docs in the index which
contain the keyword fitness. I've read the wiki about
solr.SynonymFilterFactory which recommends expanding the synonym when
indexing, but I'm not sure this is what I want as none of my documents have
the keywords exercise dvds.
Here is the field definition from my schema.xml;
When I test my search with the analysis page on the admin console it seems
to work fine;
Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position
1 2
term text
exercise dvds
term type
word word
source start,end
0,8 9,13
payload
org.apache.solr.analysis.SynonymFilterFactory {ignoreCase=true,
synonyms=synonyms.txt, expand=true}
term position
1
term text
fitness
term type
word
source start,end
0,13
payload
org.apache.solr.analysis.TrimFilterFactory {}
term position
1
term text
fitness
term type
word
source start,end
0,13
payload
org.apache.solr.analysis.StopFilterFactory {ignoreCase=true,
enablePositionIncrements=true, words=stopwords.txt}
term position
1
term text
fitness
term type
word
source start,end
0,13
payload
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position
1
term text
fitness
term type
word
source start,end
0,13
payload
org.apache.solr.analysis.SnowballPorterFilterFactory {language=English,
protected=protwords.txt}
term position
1
term text
fit
term type
word
source start,end
0,13
payload
...but when I perform the search it doesn't seem to use the
SynonymFilterFactory;
0
0
exercise dvds
0
on
standard
2.2
standard
on
*,score
10
.....
exercise dvds
exercise dvds
PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds
PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds
--
View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1019722.html
Sent from the Solr - User mailing list archive at Nabble.com.
RE: Multi word synomyms
Posted by Markus Jelsma <ma...@buyways.nl>.
Hi,
This happens because your tokenizer will generate seperate tokens for `exercise dvds`, so the SynonymFilter will try to find declared synonyms for `exercise` and `dvds` separately. It's behavior is documented [1] on the wiki.
[1]: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
Cheers,
-----Original message-----
From: Qwerky <ne...@hmv.co.uk>
Sent: Tue 03-08-2010 18:35
To: solr-user@lucene.apache.org;
Subject: Multi word synomyms
I'm having trouble getting multi word synonyms to work. As an example I have
the following synonym;
exercise dvds => fitness
When I search for exercise dvds I want to return all docs in the index which
contain the keyword fitness. I've read the wiki about
solr.SynonymFilterFactory which recommends expanding the synonym when
indexing, but I'm not sure this is what I want as none of my documents have
the keywords exercise dvds.
Here is the field definition from my schema.xml;
When I test my search with the analysis page on the admin console it seems
to work fine;
Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position
12
term text
exercisedvds
term type
wordword
source start,end
0,89,13
payload
org.apache.solr.analysis.SynonymFilterFactory {ignoreCase=true,
synonyms=synonyms.txt, expand=true}
term position
1
term text
fitness
term type
word
source start,end
0,13
payload
org.apache.solr.analysis.TrimFilterFactory {}
term position
1
term text
fitness
term type
word
source start,end
0,13
payload
org.apache.solr.analysis.StopFilterFactory {ignoreCase=true,
enablePositionIncrements=true, words=stopwords.txt}
term position
1
term text
fitness
term type
word
source start,end
0,13
payload
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position
1
term text
fitness
term type
word
source start,end
0,13
payload
org.apache.solr.analysis.SnowballPorterFilterFactory {language=English,
protected=protwords.txt}
term position
1
term text
fit
term type
word
source start,end
0,13
payload
...but when I perform the search it doesn't seem to use the
SynonymFilterFactory;
0
0
exercise dvds
0
on
standard
2.2
standard
on
*,score
10
.....
exercise dvds
exercise dvds
PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds
PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds
--
View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1019722.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi word synomyms
Posted by Qwerky <ne...@hmv.co.uk>.
It would be nice if you could configure some kind of filter to be processed
before the query string is passed to the parser. The QueryComponent class
seems a nice place for this; a filter could be run against the raw query and
ResponseBuilder's queryString value could be modified before the QParser is
created.
--
View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1022461.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi word synomyms
Posted by Michael McCandless <lu...@mikemccandless.com>.
Unfortunately, Lucene's QueryParser pre-splits all incoming text on
whitespace, which means your search-time analyzer never has a chance
to detect the multi-word synonym. Ie, your analyzer is invoked twice.
Once with "exercise" and once with "dvds".
We need to fix that... but it's not exactly clear how. The
QueryParser/Analyzer interaction is tricky.
Mike
On Tue, Aug 3, 2010 at 12:34 PM, Qwerky <ne...@hmv.co.uk> wrote:
>
> I'm having trouble getting multi word synonyms to work. As an example I have
> the following synonym;
>
> exercise dvds => fitness
>
> When I search for exercise dvds I want to return all docs in the index which
> contain the keyword fitness. I've read the wiki about
> solr.SynonymFilterFactory which recommends expanding the synonym when
> indexing, but I'm not sure this is what I want as none of my documents have
> the keywords exercise dvds.
>
> Here is the field definition from my schema.xml;
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> When I test my search with the analysis page on the admin console it seems
> to work fine;
>
> Query Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>
>
> term position
> 1 2
>
>
> term text
> exercise dvds
>
> term type
> word word
>
> source start,end
> 0,8 9,13
>
>
> payload
>
>
> org.apache.solr.analysis.SynonymFilterFactory {ignoreCase=true,
> synonyms=synonyms.txt, expand=true}
>
>
> term position
> 1
>
> term text
> fitness
>
>
> term type
> word
>
> source start,end
> 0,13
>
> payload
>
>
> org.apache.solr.analysis.TrimFilterFactory {}
>
>
>
> term position
> 1
>
> term text
> fitness
>
> term type
> word
>
>
> source start,end
> 0,13
>
> payload
>
>
> org.apache.solr.analysis.StopFilterFactory {ignoreCase=true,
> enablePositionIncrements=true, words=stopwords.txt}
>
>
> term position
> 1
>
>
> term text
> fitness
>
> term type
> word
>
> source start,end
> 0,13
>
> payload
>
>
>
> org.apache.solr.analysis.LowerCaseFilterFactory {}
>
>
> term position
> 1
>
> term text
> fitness
>
> term type
>
> word
>
> source start,end
> 0,13
>
> payload
>
>
> org.apache.solr.analysis.SnowballPorterFilterFactory {language=English,
> protected=protwords.txt}
>
>
> term position
>
> 1
>
> term text
> fit
>
> term type
> word
>
> source start,end
> 0,13
>
>
> payload
>
>
>
> ...but when I perform the search it doesn't seem to use the
> SynonymFilterFactory;
>
>
>
> 0
> 0
>
> exercise dvds
> 0
>
> on
>
> standard
>
>
> 2.2
> standard
>
> on
> *,score
> 10
>
> .....
>
> exercise dvds
> exercise dvds
> PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds
>
> PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1019722.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>