You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by climbingrose <cl...@gmail.com> on 2007/07/11 03:52:43 UTC

A few questions regarding multi-word synonyms and parameters encoding

Hi all,

I've been using Solr for the last few projects and the experience has been
great. I'll post the link to the website once it finishes. Just have a few
questions regarding synonyms and parameters encoding:

1) Is multi-word synonyms possible now in Solr? For example, can I have
things like synonyms like:
"I.T. & T", "IT & T", "Information Technologies", "Computer science"
I read the message on mailing list sometime ago (think back in mid 2006)
saying that there is no clean way to implement this. Is it possible now? In
my case, I have two field category and location in which category is of
default string type and location is of default text type:
+Category field is used only for faceting by category therefore, no anylasis
needs to be done. Can I use the synonyms config above to do facet query on
category field and the Solr will combine items having one of these category
into one facet category? For example:

I.T. & T (10)
IT & T (20)
Information Technologies (30)
Computer science (40)

Can I have something like:

I.T. & T (100)

Or do I have to manually filter query on for each category:"I.T. & T" and
count the results?

+Location field is used for searching by city, state and post code. Since I
collect the data from different sources, there might be mix & match
information. For example, on one record I might have "Inner Sydney, NSW"
while the other record I might have "Inner Sydney, New South Wales". In
Australia, NSW & New South Wales are interchangeable used so when the users
search for "NSW", I want "New South Wales" record to be returned and vice
versa. How could I achieve this? The "location" field is of the default text
type.

2) I'm having trouble with using facet values in my url. For example, I have
"title" facet field in my query and it returns something like:

Software engineer
C++ Programmer
C Programmer & PHP developer

Now I want create a link for each of these value so that the user can filter
the results by that title by clicking on the link. For example, if I click
on "Software Engineer", the results are now narrowed down to just include
records with "Software Engineer" in their title. Since "title" field can
contain special chars like '+', '&' ..., I really can't find a clean way to
do this. At the moment, I replace all the space by '+' and it seems to work
for words like "Software engineer" (converted to "Software+Engineer").
However, "C++ Programmer" is converted to "C+++Programmer", and it doesn't
seem to work (return no results). Any ideas?

Looking back, this is such a long email. If you reach this point, thanks a
lot for your time!!!

-- 
Regards,

Cuong Hoang

Re: A few questions regarding multi-word synonyms and parameters encoding

Posted by Chris Hostetter <ho...@fucit.org>.
: 1) Is multi-word synonyms possible now in Solr? For example, can I have
: things like synonyms like:
: "I.T. & T", "IT & T", "Information Technologies", "Computer science"

multi word synonyms are possible, but the only clean way to do it at index
time using expansion.  there is a note about this on in the docs..
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter

: needs to be done. Can I use the synonyms config above to do facet query on
: category field and the Solr will combine items having one of these category
: into one facet category? For example:

hmmm... if you want to facet on a field like this then you really want to
use synonym reduction (expand="false") which contridicts the general
solution for multi-word synonyms ... but frankly if you are faceting it
sounds like you want to use something like the KeywordTOkenizer anyway, in
which case your multi-word synonyms aren't really multi-word so it should
be okay.

but honestly i haven't relaly tried anything like this ... the code for
parsing the synonyms.txt file probaly splits the individual synonyms on
whitespace to prodce multiple tokens which might screw you up ... you may
need to get creative (perhaps use a PatternReplaceFilter to encode your
spaces as "_" before hte SynonymFilter and then another one to convert the
"_" back to " " after the Synonym filter ... kludgy but it might work)

: Now I want create a link for each of these value so that the user can filter
: the results by that title by clicking on the link. For example, if I click
: on "Software Engineer", the results are now narrowed down to just include
: records with "Software Engineer" in their title. Since "title" field can
: contain special chars like '+', '&' ..., I really can't find a clean way to
: do this. At the moment, I replace all the space by '+' and it seems to work
: for words like "Software engineer" (converted to "Software+Engineer").
: However, "C++ Programmer" is converted to "C+++Programmer", and it doesn't
: seem to work (return no results). Any ideas?

for starters you need to URL encode *all* of hte characters, not just the
spaces ... space escapes to "+" but only becuase "+" escapes to %2B.

second, if you are dealing with multi-word values like this in your
facets, you need to make sure to quote them when doing fq queries to
(before url encoding) ... so if you have a facet.field "skills" that lists
"C++ Programmer" as the value, the fq query you want to use would be...
     skills:"C++ Programmer"

when you URL encode that it should become...

     fq=skills%3A%22C%2B%2B+Programmer%22

...use teh echoParams=explicit&debugQuery=true params to see exactly what
your params look like when they've been URL decoded and what your
query objects look like once they've been parsed.



-Hoss