You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Frank A <fs...@gmail.com> on 2010/06/03 03:32:12 UTC

Some basics

Hi,

I'm new to SOLR and have some basic questions that hopefully steer me in the
right direction.

- I want my search to "auto" spell check - that is if someone types
"restarant" I'd like the system to automatically search for restaurant.
I've seen the SpellCheckComponent but that doesn't seem to have a simple way
to automatically do the "near" type comparison.  Is the SpellCheckComponent
the wrong one or do I just need to manually handle the situation in my
client code?

- Also, what is the proper analyzer if I want to search a search for "thai
food" or "thai restaurant" to actually match on Thai?  I can't totally
ignore words like food and restaurant but I want to ignore more general
terms and look for specific first (or I should say score them higher).

Any tips on what I should be reading up on will be greatly appreciated.

Thanks.

Re: Some basics

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Frank,

Is the following what you are after:

Here is a query for my last name, but misspelled: http://search-lucene.com/?q=gospodneticc

But if you look above the results, you will see this text:

  Search results for "gospodnetic" :

... and the search results are indeed for the auto-corrected query.

To get this functionality we built this:

  http://sematext.com/products/dym-researcher/index.html

Regarding your second question:
I don't think there is anything in Solr that allows it to automatically figure out which terms are the "more specific" ones and which are the "more general" ones.  Perhaps it can base such assumptions about terms based on their occurrence frequency in the index, and here TermVectorsComponent can help.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Frank A <fs...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wed, June 2, 2010 9:32:12 PM
> Subject: Some basics
> 
> Hi,

I'm new to SOLR and have some basic questions that hopefully steer me 
> in the
right direction.

- I want my search to "auto" spell check - 
> that is if someone types
"restarant" I'd like the system to automatically 
> search for restaurant.
I've seen the SpellCheckComponent but that doesn't 
> seem to have a simple way
to automatically do the "near" type 
> comparison.  Is the SpellCheckComponent
the wrong one or do I just need 
> to manually handle the situation in my
client code?

- Also, what is 
> the proper analyzer if I want to search a search for "thai
food" or "thai 
> restaurant" to actually match on Thai?  I can't totally
ignore words 
> like food and restaurant but I want to ignore more general
terms and look for 
> specific first (or I should say score them higher).

Any tips on what I 
> should be reading up on will be greatly appreciated.

Thanks.

Re: Some basics

Posted by Chris Hostetter <ho...@fucit.org>.

: - I want my search to "auto" spell check - that is if someone types
: "restarant" I'd like the system to automatically search for restaurant.
: I've seen the SpellCheckComponent but that doesn't seem to have a simple way
: to automatically do the "near" type comparison.  Is the SpellCheckComponent
: the wrong one or do I just need to manually handle the situation in my
: client code?

at the moment you need to handle this in your client -- if you get no 
results back (or too few results based on some expecatation you have) 
but the spellcheck component retunred a suggestion then trigger a 
subsequent search using that suggestion.

: - Also, what is the proper analyzer if I want to search a search for "thai
: food" or "thai restaurant" to actually match on Thai?  I can't totally
: ignore words like food and restaurant but I want to ignore more general
: terms and look for specific first (or I should say score them higher).

the issue isn't so much your analyzer as how you structure your query -- i 
would suggest using the dismax query parser with a very low value for hte 
'mm' param (ie: '1' or something like '10%' if you expect a lot of queries 
with many many words) and a useful "pf" param -- that way two word queries 
will return matches for either word, but docs that match both words will 
score higher, and docs that match the full phrase will score the highest.




-Hoss