You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by MitchK <mi...@web.de> on 2010/03/21 17:28:16 UTC

Facetting with Synonyms

Hello out there,

I got a little problem: 
Users take care about what will be indexed and what not. Sometimes there is
a little problem:
For example: The artists "Snaga & Pillath" are similar to "S & P". When I
Index the document, I can solve this problem with the help of a
SynonymFilter. However, if I want to retrive some facets over a
result-response, there is a little problem: "S&P" and "Snaga & Pillath" both
will be responsed.
Is there a possibility to response only "S&P" OR "Snaga & Pillath"?

I think another example for something like this is "HP" and "Hewlett
Packard". If one user calls the manufacturer of his printer "HP" and another
one says "Hewlett Packard" and you want to do some facetting, there will be
two responsed terms. 

But the truth is: Every HP and every Hewlett Packard facet, as well as every
Snaga & Pillath/S&P facet should facet the same documents.

How would you solve this problem?

Kind regards
- Mitch
-- 
View this message in context: http://old.nabble.com/Facetting-with-Synonyms-tp27976997p27976997.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facetting with Synonyms

Posted by MitchK <mi...@web.de>.
Hi Otis,

thank you for responsing. 
Hmm, since I am not omniscient, this seems to be no way for me, because this
would mean I have to know all about the artist at index-time.
But your response makes me thinking about an idea: A synonym-mapper. 
The syonym-mapper should work on the responsed facets of the query.

It is not important to map S&P to Snaga & Pillath and force Solr to combine
both result sets. 
The same to HP and Hewlett Packard. To response only one of those terms to
the user is enough, since I can translate "HP" with the help of a
synonymFilter to "Hewlett Packard" at query-time, if the user is interested
in such a facet. 

What do you think about this?
If I want to do such changes to Solr, I think I need to customize something
that directly computes the results for the responseWriter. Do you know which
classes are responsible for that?
If this would be too complicated, because one has to make changes in too
much classes, maybe I will contribute a tool which does this on an already
built response. 
Another way would be to create only a new responseWriter, am I right?

If you think this would be a good idea, I will go on to ask some
architectural questions, to save memory and time. Maybe I will go on to open
an issue for that...

Any other ideas are welcome :-)!

Kind regards
- Mitch


Otis Gospodnetic wrote:
> 
> Hi Mitch,
> 
> You asked how others would solve this problem.  I would try to normalize
> the data before indexing it.  In other words, I'd clean it up myself to
> avoid GIGO situation.
>  Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
-- 
View this message in context: http://old.nabble.com/Facetting-with-Synonyms-tp27976997p27982710.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facetting with Synonyms

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Mitch,

You asked how others would solve this problem.  I would try to normalize the data before indexing it.  In other words, I'd clean it up myself to avoid GIGO situation.
 Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: MitchK <mi...@web.de>
> To: solr-user@lucene.apache.org
> Sent: Sun, March 21, 2010 12:28:16 PM
> Subject: Facetting with Synonyms
> 
> 
Hello out there,

I got a little problem: 
Users take care about 
> what will be indexed and what not. Sometimes there is
a little 
> problem:
For example: The artists "Snaga & Pillath" are similar to "S & P". 
> When I
Index the document, I can solve this problem with the help of 
> a
SynonymFilter. However, if I want to retrive some facets over 
> a
result-response, there is a little problem: "S&P" and "Snaga & Pillath" 
> both
will be responsed.
Is there a possibility to response only "S&P" OR 
> "Snaga & Pillath"?

I think another example for something like this is 
> "HP" and "Hewlett
Packard". If one user calls the manufacturer of his printer 
> "HP" and another
one says "Hewlett Packard" and you want to do some 
> facetting, there will be
two responsed terms. 

But the truth is: Every 
> HP and every Hewlett Packard facet, as well as every
Snaga & Pillath/S&P 
> facet should facet the same documents.

How would you solve this 
> problem?

Kind regards
- Mitch
-- 
View this message in context: 
> 
> target=_blank 
> >http://old.nabble.com/Facetting-with-Synonyms-tp27976997p27976997.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.