You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Scott Zientara <sc...@tekdata.com> on 2010/08/18 19:31:57 UTC

How to use synonms on a faceted field with multiple words

I am trying to use solr.SynonymFilterFactory on a faceted field in Solr 1.3. 
I am using Solr to index resources from a media library. The data is coming from various 
sources, some of which I do not have control over. I need to be able to map resource 
types in the data to common terms for faceting. For example:
video,audio => digital media
film,laser disc, vhs video => other

I am using solr.KeywordTokenizerFactory for the analyzer, but Solr will not treat 
multiple words as a single token. 
A single word to single word map (i.e. film => other) works perfectly .
A single to double word map (i.e. film => other stuff) becomes 2 terms which is unfit for 
faceting.
A double word to single word map (i.e. vhs video => videotape) doesn't seem to match at 
all.

I've tried this with and without the tokenizerFactory="solr.KeywordTokenizerFactory" 
attribute in the synonm filter element. I've tried to escape the space in the synonm file 
(i.e. video => digital\bmedia).

Is it possible to use the synonm filter to map multi-word terms for a facteted field? If 
so, what am I missing?


Re: How to use synonms on a faceted field with multiple words

Posted by Chris Hostetter <ho...@fucit.org>.
: A quick and dirty work around using Solr 1.4 is to replace spaces in the synonm file with 
: some other character/pattern. I used ## (i.e. video => digital##media). Then add the 
: solr.PatternReplaceFilterFactory after the synonm filter to replace pattern with space. 
: This works, but I'd love to know if there is a better way.

A feature was added a little while back (i think by Koji) to let you 
specify a "tokenizerFactory" attribute when you declare a 
SynonymFilterFactory -- it's then used to parse the synonyms file.

i think it was included in 1.4, but i may be wrong.

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss      ...  Stump The Chump!


Re: How to use synonms on a faceted field with multiple words

Posted by Scott Zientara <sc...@tekdata.com>.
A quick and dirty work around using Solr 1.4 is to replace spaces in the synonm file with 
some other character/pattern. I used ## (i.e. video => digital##media). Then add the 
solr.PatternReplaceFilterFactory after the synonm filter to replace pattern with space. 
This works, but I'd love to know if there is a better way.

Send reply to:  	solr-user@lucene.apache.org
From:           	"Scott Zientara" <sc...@tekdata.com>
Organization:   	Tek Data
To:             	solr-user@lucene.apache.org
Date sent:      	Wed, 18 Aug 2010 12:31:57 -0500
Subject:        	How to use synonms on a faceted field with multiple words
Send reply to:  	scott@tekdata.com
Priority:       	normal

[ Double-click this line for list subscription options ] 

I am trying to use solr.SynonymFilterFactory on a faceted field in Solr 1.3. 
I am using Solr to index resources from a media library. The data is coming from various 
sources, some of which I do not have control over. I need to be able to map resource 
types in the data to common terms for faceting. For example:
video,audio => digital media
film,laser disc, vhs video => other

I am using solr.KeywordTokenizerFactory for the analyzer, but Solr will not treat 
multiple words as a single token. 
A single word to single word map (i.e. film => other) works perfectly .
A single to double word map (i.e. film => other stuff) becomes 2 terms which is unfit for 

faceting.
A double word to single word map (i.e. vhs video => videotape) doesn't seem to match at 
all.

I've tried this with and without the tokenizerFactory="solr.KeywordTokenizerFactory" 
attribute in the synonm filter element. I've tried to escape the space in the synonm file 

(i.e. video => digital\bmedia).

Is it possible to use the synonm filter to map multi-word terms for a facteted field? If 
so, what am I missing?