You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rémy Loubradou <re...@hipsnip.com> on 2012/03/27 19:05:50 UTC

Auto-complete phrase

Hello, I am working on creating a auto-complete functionality for my field
merchant_name present all over my documents. I am using the version 3.4 of
Solr and I am trying to take advantage of the Suggester functionality.
Unfortunately so far I didn't figure out how to make it works as  I
expected.

If my list of merchants present in my documents is:(my real list is bigger
than the following list, that's the reason why I don't use dictionnary and
also because it will change often.)
Redoute
Suisse Trois
Conforama
But
Cult Beauty
Brother Trois

I expect from the Suggester component to match words or part of them and
return phrases where words or part of them have been matched.
for example with /suggest?q=tro, I would like to get this:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="tro">
<int name="numFound">2</int>
<int name="startOffset">0</int>
<int name="endOffset">x</int>
<arr name="suggestion">
<str>Bother Trois</str>
<str>Suisse Trois</str>
</arr>
</lst>
</lst>
</lst>
</response>

I experimented suggestion on a field configured with the tokenizer
"solr.KeywordTokenizerFactory" or "solr.WhitespaceTokenizerFactory".
In my mind I have to find a way to handle 3 cases:
/suggest?q=bo ->(should return) bother trois
/suggest?q=tro ->(should return) bother trois, suisse trois
/suggest?q=bo%20tro ->(should return) bother trois

With the "solr.KeywordTokenizerFactory" I get:
/suggest?q=bo -> bother trois
/suggest?q=tro -> "nothing"
/suggest?q=bo%20tro -> "nothing"

With the "solr.WhitespaceTokenizerFactory" I get:
/suggest?q=bo -> bother
/suggest?q=troi -> trois
/suggest?q=bo%20tro -> bother, trois

Not exactly what I want ... :(

My configuration in the file solrconfig.xml for the suggester component:

<searchComponent class="solr.SpellCheckComponent" name="suggestMerchant">
    <lst name="spellchecker">
      <str name="name">suggestMerchant</str>
      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
      <str
name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
      <!-- Alternatives to lookupImpl:
           org.apache.solr.spelling.suggest.fst.FSTLookup   [finite state
automaton]
           org.apache.solr.spelling.suggest.fst.WFSTLookupFactory [weighted
finite state automaton]
           org.apache.solr.spelling.suggest.jaspell.JaspellLookup [default,
jaspell-based]
           org.apache.solr.spelling.suggest.tst.TSTLookup   [ternary trees]
      -->
      <str name="field">merchant_name_autocomplete</str>  <!-- the indexed
field to derive suggestions from -->
      <float name="threshold">0.0</float>
      <str name="buildOnCommit">true</str>
<!--
      <str name="sourceLocation">american-english</str>
-->
    </lst>
  </searchComponent>
  <requestHandler class="org.apache.solr.handler.component.SearchHandler"
name="/suggest/merchant">
    <lst name="defaults">
      <str name="spellcheck">true</str>
      <str name="spellcheck.dictionary">suggestMerchant</str>
      <str name="spellcheck.onlyMorePopular">true</str>
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.collate">true</str>
      <int name="spellcheck.maxCollations">10</int>
    </lst>
    <arr name="components">
      <str>suggestMerchant</str>
    </arr>
  </requestHandler>

How can I implement autocomplete with the Suggester component to get what I
expect? Thanks for your help, I really appreciate.

Re: Auto-complete phrase

Posted by Rémy Loubradou <re...@hipsnip.com>.
Thanks Otis but that's not an option for me. "Should" be pretty easy to do
this with Solr, I will still continue to work on it.

Great William I will give a try with this method, thanks.

On 28 March 2012 06:11, William Bell <bi...@gmail.com> wrote:

> I am also very confused at the use case for the Suggester component.
> With collate on, it will try to combine random words together not the
> actual phrases that are there.
>
> I get better mileage out of EDGE grams and tokenize on whitespace...
> Left to right... Since that is how most people think.
>
> However, I would like Suggester to work as follows:
>
> Index:
> Chris Smith
> Tony Dawson
> Chris Leaf
> Daddy Golucky
>
> Query:
> 1. "Chris" it returns "Chris Leaf" but not both Chris Smith and Chris Leaf.
> 2. I seem to get collated (take first work and combine with second
> word). SO I would see things like "Smith Leaf".... Very strange and
> not what we expect. These are formal names.
>
> When I use Ngrams.... I can index:
>
> C
> Ch
> Chr
> Chri
> Chris
> S
> Sm
> Smi
> Smit
> Smith
>
> Thus if I search on "Smi" it will match Chris Smith and also Chris
> Leaf. Exactly what I want.
>
>
>
>
> On Tue, Mar 27, 2012 at 11:05 AM, Rémy Loubradou <re...@hipsnip.com> wrote:
> > Hello, I am working on creating a auto-complete functionality for my
> field
> > merchant_name present all over my documents. I am using the version 3.4
> of
> > Solr and I am trying to take advantage of the Suggester functionality.
> > Unfortunately so far I didn't figure out how to make it works as  I
> > expected.
> >
> > If my list of merchants present in my documents is:(my real list is
> bigger
> > than the following list, that's the reason why I don't use dictionnary
> and
> > also because it will change often.)
> > Redoute
> > Suisse Trois
> > Conforama
> > But
> > Cult Beauty
> > Brother Trois
> >
> > I expect from the Suggester component to match words or part of them and
> > return phrases where words or part of them have been matched.
> > for example with /suggest?q=tro, I would like to get this:
> >
> > <response>
> > <lst name="responseHeader">
> > <int name="status">0</int>
> > <int name="QTime">0</int>
> > </lst>
> > <lst name="spellcheck">
> > <lst name="suggestions">
> > <lst name="tro">
> > <int name="numFound">2</int>
> > <int name="startOffset">0</int>
> > <int name="endOffset">x</int>
> > <arr name="suggestion">
> > <str>Bother Trois</str>
> > <str>Suisse Trois</str>
> > </arr>
> > </lst>
> > </lst>
> > </lst>
> > </response>
> >
> > I experimented suggestion on a field configured with the tokenizer
> > "solr.KeywordTokenizerFactory" or "solr.WhitespaceTokenizerFactory".
> > In my mind I have to find a way to handle 3 cases:
> > /suggest?q=bo ->(should return) bother trois
> > /suggest?q=tro ->(should return) bother trois, suisse trois
> > /suggest?q=bo%20tro ->(should return) bother trois
> >
> > With the "solr.KeywordTokenizerFactory" I get:
> > /suggest?q=bo -> bother trois
> > /suggest?q=tro -> "nothing"
> > /suggest?q=bo%20tro -> "nothing"
> >
> > With the "solr.WhitespaceTokenizerFactory" I get:
> > /suggest?q=bo -> bother
> > /suggest?q=troi -> trois
> > /suggest?q=bo%20tro -> bother, trois
> >
> > Not exactly what I want ... :(
> >
> > My configuration in the file solrconfig.xml for the suggester component:
> >
> > <searchComponent class="solr.SpellCheckComponent" name="suggestMerchant">
> >    <lst name="spellchecker">
> >      <str name="name">suggestMerchant</str>
> >      <str
> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
> >      <str
> > name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
> >      <!-- Alternatives to lookupImpl:
> >           org.apache.solr.spelling.suggest.fst.FSTLookup   [finite state
> > automaton]
> >           org.apache.solr.spelling.suggest.fst.WFSTLookupFactory
> [weighted
> > finite state automaton]
> >           org.apache.solr.spelling.suggest.jaspell.JaspellLookup
> [default,
> > jaspell-based]
> >           org.apache.solr.spelling.suggest.tst.TSTLookup   [ternary
> trees]
> >      -->
> >      <str name="field">merchant_name_autocomplete</str>  <!-- the indexed
> > field to derive suggestions from -->
> >      <float name="threshold">0.0</float>
> >      <str name="buildOnCommit">true</str>
> > <!--
> >      <str name="sourceLocation">american-english</str>
> > -->
> >    </lst>
> >  </searchComponent>
> >  <requestHandler class="org.apache.solr.handler.component.SearchHandler"
> > name="/suggest/merchant">
> >    <lst name="defaults">
> >      <str name="spellcheck">true</str>
> >      <str name="spellcheck.dictionary">suggestMerchant</str>
> >      <str name="spellcheck.onlyMorePopular">true</str>
> >      <str name="spellcheck.count">10</str>
> >      <str name="spellcheck.collate">true</str>
> >      <int name="spellcheck.maxCollations">10</int>
> >    </lst>
> >    <arr name="components">
> >      <str>suggestMerchant</str>
> >    </arr>
> >  </requestHandler>
> >
> > How can I implement autocomplete with the Suggester component to get
> what I
> > expect? Thanks for your help, I really appreciate.
>
>
>
> --
> Bill Bell
> billnbell@gmail.com
> cell 720-256-8076
>

Re: Auto-complete phrase

Posted by William Bell <bi...@gmail.com>.
I am also very confused at the use case for the Suggester component.
With collate on, it will try to combine random words together not the
actual phrases that are there.

I get better mileage out of EDGE grams and tokenize on whitespace...
Left to right... Since that is how most people think.

However, I would like Suggester to work as follows:

Index:
Chris Smith
Tony Dawson
Chris Leaf
Daddy Golucky

Query:
1. "Chris" it returns "Chris Leaf" but not both Chris Smith and Chris Leaf.
2. I seem to get collated (take first work and combine with second
word). SO I would see things like "Smith Leaf".... Very strange and
not what we expect. These are formal names.

When I use Ngrams.... I can index:

C
Ch
Chr
Chri
Chris
S
Sm
Smi
Smit
Smith

Thus if I search on "Smi" it will match Chris Smith and also Chris
Leaf. Exactly what I want.




On Tue, Mar 27, 2012 at 11:05 AM, Rémy Loubradou <re...@hipsnip.com> wrote:
> Hello, I am working on creating a auto-complete functionality for my field
> merchant_name present all over my documents. I am using the version 3.4 of
> Solr and I am trying to take advantage of the Suggester functionality.
> Unfortunately so far I didn't figure out how to make it works as  I
> expected.
>
> If my list of merchants present in my documents is:(my real list is bigger
> than the following list, that's the reason why I don't use dictionnary and
> also because it will change often.)
> Redoute
> Suisse Trois
> Conforama
> But
> Cult Beauty
> Brother Trois
>
> I expect from the Suggester component to match words or part of them and
> return phrases where words or part of them have been matched.
> for example with /suggest?q=tro, I would like to get this:
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">0</int>
> </lst>
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="tro">
> <int name="numFound">2</int>
> <int name="startOffset">0</int>
> <int name="endOffset">x</int>
> <arr name="suggestion">
> <str>Bother Trois</str>
> <str>Suisse Trois</str>
> </arr>
> </lst>
> </lst>
> </lst>
> </response>
>
> I experimented suggestion on a field configured with the tokenizer
> "solr.KeywordTokenizerFactory" or "solr.WhitespaceTokenizerFactory".
> In my mind I have to find a way to handle 3 cases:
> /suggest?q=bo ->(should return) bother trois
> /suggest?q=tro ->(should return) bother trois, suisse trois
> /suggest?q=bo%20tro ->(should return) bother trois
>
> With the "solr.KeywordTokenizerFactory" I get:
> /suggest?q=bo -> bother trois
> /suggest?q=tro -> "nothing"
> /suggest?q=bo%20tro -> "nothing"
>
> With the "solr.WhitespaceTokenizerFactory" I get:
> /suggest?q=bo -> bother
> /suggest?q=troi -> trois
> /suggest?q=bo%20tro -> bother, trois
>
> Not exactly what I want ... :(
>
> My configuration in the file solrconfig.xml for the suggester component:
>
> <searchComponent class="solr.SpellCheckComponent" name="suggestMerchant">
>    <lst name="spellchecker">
>      <str name="name">suggestMerchant</str>
>      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>      <str
> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>      <!-- Alternatives to lookupImpl:
>           org.apache.solr.spelling.suggest.fst.FSTLookup   [finite state
> automaton]
>           org.apache.solr.spelling.suggest.fst.WFSTLookupFactory [weighted
> finite state automaton]
>           org.apache.solr.spelling.suggest.jaspell.JaspellLookup [default,
> jaspell-based]
>           org.apache.solr.spelling.suggest.tst.TSTLookup   [ternary trees]
>      -->
>      <str name="field">merchant_name_autocomplete</str>  <!-- the indexed
> field to derive suggestions from -->
>      <float name="threshold">0.0</float>
>      <str name="buildOnCommit">true</str>
> <!--
>      <str name="sourceLocation">american-english</str>
> -->
>    </lst>
>  </searchComponent>
>  <requestHandler class="org.apache.solr.handler.component.SearchHandler"
> name="/suggest/merchant">
>    <lst name="defaults">
>      <str name="spellcheck">true</str>
>      <str name="spellcheck.dictionary">suggestMerchant</str>
>      <str name="spellcheck.onlyMorePopular">true</str>
>      <str name="spellcheck.count">10</str>
>      <str name="spellcheck.collate">true</str>
>      <int name="spellcheck.maxCollations">10</int>
>    </lst>
>    <arr name="components">
>      <str>suggestMerchant</str>
>    </arr>
>  </requestHandler>
>
> How can I implement autocomplete with the Suggester component to get what I
> expect? Thanks for your help, I really appreciate.



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076