You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by solr-user <so...@hotmail.com> on 2014/04/02 02:01:21 UTC

Re: how do I get search for "fort st john" to match "ft saint john"

Hi Eric.

Sorry, been away.  

The city_index_synonyms.txt file is pretty small as it contains just these
two lines:

saint,st,ste
fort,ft

There is nothing at all in the city_query_synonyms.txt file, and it isn't
used either.

My understanding is that solr would create the appropriate synonym entries
in the index and so treat "fort" and "ft" as equal

if you have a simple one line schema (that uses the type definition from my
original email) and index "fort saint john", does it work for you?  i.e.
does it return results if you search for "ft st john" and "ft saint john"
and "fort st john"?  

My Solr 4.6.1 instance doesn't.  I am wondering if synonyms just don't work
for all/some words in a phrase



--
View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how do I get search for "fort st john" to match "ft saint john"

Posted by solr-user <so...@hotmail.com>.
thanks guys.

unfortunately the solr that contains this schema/data is in a legacy system
that requires the fields to not be changed.

we will, hopefully in the near future, be able to look at redesigning the
schema.

alternatively, I could look at boning up on Java (which I havent used in a
long time) and see if I can write a subword synonym plugin of some sort to
perform this type of synonyming

thanks anyhow.



--
View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128914.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how do I get search for "fort st john" to match "ft saint john"

Posted by Jack Krupansky <ja...@basetechnology.com>.
And, if you use the pf, pf2, and pf3 parameters of edismax, with boosting, 
you can assure that the closest matches always appear first.

And assuming you do index-time synonym expansion.

-- Jack Krupansky

-----Original Message----- 
From: Erick Erickson
Sent: Wednesday, April 2, 2014 3:09 PM
To: solr-user@lucene.apache.org
Subject: Re: how do I get search for "fort st john" to match "ft saint john"

No, there isn't a tokenizer that'll do what you want that I know
about. Really, I suspect you need to back up a bit and re-think the
problem. It looks to me like you've taken a path that's going to cause
you endless grief when, as Jack says, phrase searches are built in to
the tokenization process.

Best,
Erick


On Wed, Apr 2, 2014 at 12:58 PM, Jack Krupansky <ja...@basetechnology.com> 
wrote:
> Query by phrase is a core feature of tokenized text in Lucene and Solr, so
> there is no need to use a pattern token filter for that purpose. And yes,
> doing so pretty much breaks most token filters that would assume that the
> text is tokenized.
>
> -- Jack Krupansky
>
> -----Original Message----- From: solr-user
> Sent: Wednesday, April 2, 2014 12:46 PM
> To: solr-user@lucene.apache.org
>
> Subject: Re: how do I get search for "fort st john" to match "ft saint 
> john"
>
> Hi Eric.
>
> No, that doesnt fix the problem either (I have tested this previously and
> did so again just now)
>
> Since the PatternTokenizerFactory is not tokenizing on whitespace(by 
> design
> since I want the user to search by phrase), the phrase "marina former fort
> ord" (for example) does not get turned into four tokens ("marina", 
> "former",
> "fort" and "ord"), and so the SynonymFilterFactory does not create 
> synonyms
> for them (by design)
>
> the original question remains: is there a tokenizer/plugin that will allow
> me to synonym words in a unbroken phrase?
>
> note: the reason I dont want to tokenize the data by whitespace is that it
> would cause way to many results to get returned if I, for example, search 
> on
> "new" or "st" ...  However, I still want to be able to include "fort saint
> john" in the results if the user searches for "ft st john" or "fort st 
> john"
> or ...
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
> Sent from the Solr - User mailing list archive at Nabble.com. 


Re: how do I get search for "fort st john" to match "ft saint john"

Posted by Erick Erickson <er...@gmail.com>.
No, there isn't a tokenizer that'll do what you want that I know
about. Really, I suspect you need to back up a bit and re-think the
problem. It looks to me like you've taken a path that's going to cause
you endless grief when, as Jack says, phrase searches are built in to
the tokenization process.

Best,
Erick


On Wed, Apr 2, 2014 at 12:58 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
> Query by phrase is a core feature of tokenized text in Lucene and Solr, so
> there is no need to use a pattern token filter for that purpose. And yes,
> doing so pretty much breaks most token filters that would assume that the
> text is tokenized.
>
> -- Jack Krupansky
>
> -----Original Message----- From: solr-user
> Sent: Wednesday, April 2, 2014 12:46 PM
> To: solr-user@lucene.apache.org
>
> Subject: Re: how do I get search for "fort st john" to match "ft saint john"
>
> Hi Eric.
>
> No, that doesnt fix the problem either (I have tested this previously and
> did so again just now)
>
> Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
> since I want the user to search by phrase), the phrase "marina former fort
> ord" (for example) does not get turned into four tokens ("marina", "former",
> "fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
> for them (by design)
>
> the original question remains: is there a tokenizer/plugin that will allow
> me to synonym words in a unbroken phrase?
>
> note: the reason I dont want to tokenize the data by whitespace is that it
> would cause way to many results to get returned if I, for example, search on
> "new" or "st" ...  However, I still want to be able to include "fort saint
> john" in the results if the user searches for "ft st john" or "fort st john"
> or ...
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: how do I get search for "fort st john" to match "ft saint john"

Posted by Jack Krupansky <ja...@basetechnology.com>.
Query by phrase is a core feature of tokenized text in Lucene and Solr, so 
there is no need to use a pattern token filter for that purpose. And yes, 
doing so pretty much breaks most token filters that would assume that the 
text is tokenized.

-- Jack Krupansky

-----Original Message----- 
From: solr-user
Sent: Wednesday, April 2, 2014 12:46 PM
To: solr-user@lucene.apache.org
Subject: Re: how do I get search for "fort st john" to match "ft saint john"

Hi Eric.

No, that doesnt fix the problem either (I have tested this previously and
did so again just now)

Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
since I want the user to search by phrase), the phrase "marina former fort
ord" (for example) does not get turned into four tokens ("marina", "former",
"fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
for them (by design)

the original question remains: is there a tokenizer/plugin that will allow
me to synonym words in a unbroken phrase?

note: the reason I dont want to tokenize the data by whitespace is that it
would cause way to many results to get returned if I, for example, search on
"new" or "st" ...  However, I still want to be able to include "fort saint
john" in the results if the user searches for "ft st john" or "fort st john"
or ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: how do I get search for "fort st john" to match "ft saint john"

Posted by solr-user <so...@hotmail.com>.
Hi Eric.

No, that doesnt fix the problem either (I have tested this previously and
did so again just now)

Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
since I want the user to search by phrase), the phrase "marina former fort
ord" (for example) does not get turned into four tokens ("marina", "former",
"fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
for them (by design)

the original question remains: is there a tokenizer/plugin that will allow
me to synonym words in a unbroken phrase?

note: the reason I dont want to tokenize the data by whitespace is that it
would cause way to many results to get returned if I, for example, search on
"new" or "st" ...  However, I still want to be able to include "fort saint
john" in the results if the user searches for "ft st john" or "fort st john"
or ...



--
View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how do I get search for "fort st john" to match "ft saint john"

Posted by al...@aim.com.
It seems to me that, you are missing this line  

  <filter class="solr.SynonymFilterFactory" synonyms="city_index_synonyms.txt" ignoreCase="true" expand="true" />

under
 <analyzer type="query">

Alex.

 

 

-----Original Message-----
From: solr-user <so...@hotmail.com>
To: solr-user <so...@lucene.apache.org>
Sent: Tue, Apr 1, 2014 5:01 pm
Subject: Re: how do I get search for "fort st john" to match "ft saint john"


Hi Eric.

Sorry, been away.  

The city_index_synonyms.txt file is pretty small as it contains just these
two lines:

saint,st,ste
fort,ft

There is nothing at all in the city_query_synonyms.txt file, and it isn't
used either.

My understanding is that solr would create the appropriate synonym entries
in the index and so treat "fort" and "ft" as equal

if you have a simple one line schema (that uses the type definition from my
original email) and index "fort saint john", does it work for you?  i.e.
does it return results if you search for "ft st john" and "ft saint john"
and "fort st john"?  

My Solr 4.6.1 instance doesn't.  I am wondering if synonyms just don't work
for all/some words in a phrase



--
View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html
Sent from the Solr - User mailing list archive at Nabble.com.