You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Scott Zientara <sc...@tekdata.com> on 2010/08/18 19:31:57 UTC
How to use synonms on a faceted field with multiple words
I am trying to use solr.SynonymFilterFactory on a faceted field in Solr 1.3.
I am using Solr to index resources from a media library. The data is coming from various
sources, some of which I do not have control over. I need to be able to map resource
types in the data to common terms for faceting. For example:
video,audio => digital media
film,laser disc, vhs video => other
I am using solr.KeywordTokenizerFactory for the analyzer, but Solr will not treat
multiple words as a single token.
A single word to single word map (i.e. film => other) works perfectly .
A single to double word map (i.e. film => other stuff) becomes 2 terms which is unfit for
faceting.
A double word to single word map (i.e. vhs video => videotape) doesn't seem to match at
all.
I've tried this with and without the tokenizerFactory="solr.KeywordTokenizerFactory"
attribute in the synonm filter element. I've tried to escape the space in the synonm file
(i.e. video => digital\bmedia).
Is it possible to use the synonm filter to map multi-word terms for a facteted field? If
so, what am I missing?
Re: How to use synonms on a faceted field with multiple words
Posted by Chris Hostetter <ho...@fucit.org>.
: A quick and dirty work around using Solr 1.4 is to replace spaces in the synonm file with
: some other character/pattern. I used ## (i.e. video => digital##media). Then add the
: solr.PatternReplaceFilterFactory after the synonm filter to replace pattern with space.
: This works, but I'd love to know if there is a better way.
A feature was added a little while back (i think by Koji) to let you
specify a "tokenizerFactory" attribute when you declare a
SynonymFilterFactory -- it's then used to parse the synonyms file.
i think it was included in 1.4, but i may be wrong.
-Hoss
--
http://lucenerevolution.org/ ... October 7-8, Boston
http://bit.ly/stump-hoss ... Stump The Chump!
Re: How to use synonms on a faceted field with multiple words
Posted by Scott Zientara <sc...@tekdata.com>.
A quick and dirty work around using Solr 1.4 is to replace spaces in the synonm file with
some other character/pattern. I used ## (i.e. video => digital##media). Then add the
solr.PatternReplaceFilterFactory after the synonm filter to replace pattern with space.
This works, but I'd love to know if there is a better way.
Send reply to: solr-user@lucene.apache.org
From: "Scott Zientara" <sc...@tekdata.com>
Organization: Tek Data
To: solr-user@lucene.apache.org
Date sent: Wed, 18 Aug 2010 12:31:57 -0500
Subject: How to use synonms on a faceted field with multiple words
Send reply to: scott@tekdata.com
Priority: normal
[ Double-click this line for list subscription options ]
I am trying to use solr.SynonymFilterFactory on a faceted field in Solr 1.3.
I am using Solr to index resources from a media library. The data is coming from various
sources, some of which I do not have control over. I need to be able to map resource
types in the data to common terms for faceting. For example:
video,audio => digital media
film,laser disc, vhs video => other
I am using solr.KeywordTokenizerFactory for the analyzer, but Solr will not treat
multiple words as a single token.
A single word to single word map (i.e. film => other) works perfectly .
A single to double word map (i.e. film => other stuff) becomes 2 terms which is unfit for
faceting.
A double word to single word map (i.e. vhs video => videotape) doesn't seem to match at
all.
I've tried this with and without the tokenizerFactory="solr.KeywordTokenizerFactory"
attribute in the synonm filter element. I've tried to escape the space in the synonm file
(i.e. video => digital\bmedia).
Is it possible to use the synonm filter to map multi-word terms for a facteted field? If
so, what am I missing?