You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2011/10/05 22:06:11 UTC

Offering search suggestions - a discussion of multi-term phrases

I am trying to figure out how we can begin offering search suggestions 
to people, especially when a user types in something that results in few 
or zero results.  For background, we have an archive of about 60 million 
objects, most of which are photographs.  There are also a number of text 
articles, and most recently, videos.  The metadata is kept in a 
database, and the database is used as the import source for Solr.

The first thing we're going to try is spellcheck, using the terms 
component to generate a wordlist from our catchall field and then doing 
what we can in with a program to remove undesirable words.  I do not 
anticipate running into much trouble with this part.

Another idea we have is search suggestions.  One aspect is autocomplete, 
the other is similar to the spell-check, but more sophisticated.  It 
would do things like offer "Nicole Kidman" if the user typed in "Tom 
Cruise" and didn't get many search results.

The problem I can see with all of these things is that single terms will 
not really be enough, and single terms is all I can get out of the 
index.  Our distributed index is already quite a bit larger than the 
available RAM on the machines that contain it, and it's growing 
steadily.  Adding analysis complexity or copyFields to the index is not 
much of an option, because we have no budget available for new hardware, 
but I won't completely rule it out.

Is there any way, even if it's offline analysis of either the index or 
the database, to come up with common short phrases specific to our 
data?  If there is, perhaps I can then give it to Solr and let it make 
suggestions with it.

Thanks,
Shawn


Re: Offering search suggestions - a discussion of multi-term phrases

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Shawn,

Have you looked at http://www.sematext.com/products/dym-researcher/index.html as a solution to the ZeroHits problem?

If that doesn't work, then yes, offline word/phase co-occurrence may work.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Shawn Heisey <so...@elyograg.org>
>To: solr-user@lucene.apache.org
>Sent: Wednesday, October 5, 2011 4:06 PM
>Subject: Offering search suggestions - a discussion of multi-term phrases
>
>I am trying to figure out how we can begin offering search suggestions to people, especially when a user types in something that results in few or zero results.  For background, we have an archive of about 60 million objects, most of which are photographs.  There are also a number of text articles, and most recently, videos.  The metadata is kept in a database, and the database is used as the import source for Solr.
>
>The first thing we're going to try is spellcheck, using the terms component to generate a wordlist from our catchall field and then doing what we can in with a program to remove undesirable words.  I do not anticipate running into much trouble with this part.
>
>Another idea we have is search suggestions.  One aspect is autocomplete, the other is similar to the spell-check, but more sophisticated.  It would do things like offer "Nicole Kidman" if the user typed in "Tom Cruise" and didn't get many search results.
>
>The problem I can see with all of these things is that single terms will not really be enough, and single terms is all I can get out of the index.  Our distributed index is already quite a bit larger than the available RAM on the machines that contain it, and it's growing steadily.  Adding analysis complexity or copyFields to the index is not much of an option, because we have no budget available for new hardware, but I won't completely rule it out.
>
>Is there any way, even if it's offline analysis of either the index or the database, to come up with common short phrases specific to our data?  If there is, perhaps I can then give it to Solr and let it make suggestions with it.
>
>Thanks,
>Shawn
>
>
>
>