You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kuba Krzemień <kr...@gmail.com> on 2011/08/17 21:14:36 UTC

suggester issues

Hello, I am working on creating a auto-complete functionality for my platform which indexes large ammounts of text (title + contents) - there is too much data for a dictionary. I am using the latest version of Solr (3.3) and I am trying to take advantage of the Suggester functionality. Unfortunately so far the outcome isn't that great. 

The Suggester works only for single words or whole phrases (depends on the tokenizer). When using the first option, I am unable to suggest any combined queries. For example the suggestion for 'ne' will be 'new'. Suggestion for 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats worse, querying 'new AND y' gives the same results (also when using collate), which means that the returned suggestion may give no results - what makes sense separately often doesn't work combined. I need a way to find only those suggestions, that will return results when doing a AND query (for example 'new AND york', 'new AND year', as long as they give results upon querying - 'new AND yeti' shouldn't be returned as a suggestion). 

When I use the second tokenizer and the suggestions return phrases, for 'ne' I will get 'new york' and 'new year', but for 'new y' I will get nothing. Also, for 'y' I will get nothing, so the issue remains. 

If someone has some experience working with the Suggester, or if someone has created a well working auto-suggester based on Solr, please help me. I've been trying to find a sollution for this for quite some time.

Yours sincerely,
Jackob K

Re: suggester issues

Posted by aniljayanti <an...@gmail.com>.
Hi,

 I m also facing same issue while using suggester (working in c#.net). 
Below is my configurations.

suggest/?q="michael ja"
-----------------------
<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100"
omitNorms="true">
        <analyzer type="index">
          <tokenizer class="solr.KeywordTokenizerFactory" />
          <filter class="solr.LowerCaseFilterFactory" />
          <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="15" side="front" />
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="query">
     <tokenizer class="solr.KeywordTokenizerFactory" /> 
         <filter class="solr.LowerCaseFilterFactory" />     
   </analyzer>
  </fieldType>

<field name="empname" type="edgytext" indexed="true" stored="true"
omitNorms="true" omitTermFreqAndPositions="true" />

<field name="autocomplete_text" type="edgytext" indexed="true"
stored="false"  multiValued="true" omitNorms="true"
omitTermFreqAndPositions="false" />

<copyField source="empname" dest="autocomplete_text"/>

Response :

 <?xml version="1.0" encoding="UTF-8" ?>
- <response>
- <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">1</int> 
  </lst>
  <result name="response" numFound="0" start="0" /> 
- <lst name="spellcheck">
- <lst name="suggestions">
- <lst name="michael">
  <int name="numFound">10</int> 
  <int name="startOffset">1</int> 
  <int name="endOffset">8</int> 
- <arr name="suggestion">
  <str>michael "bully" herbig</str> 
  <str>michael bolton</str> 
  <str>michael bolton: arias</str> 
  <str>michael falch</str> 
  <str>michael holm</str> 
  <str>michael jackson</str> 
  <str>michael neale</str> 
  <str>michael penn</str> 
  <str>michael salgado</str> 
  <str>michael w. smith</str> 
  </arr>
  </lst>
- <lst name="ja">
  <int name="numFound">10</int> 
  <int name="startOffset">9</int> 
  <int name="endOffset">11</int> 
- <arr name="suggestion">
  <str>ja me tanssimme</str> 
  <str>jacob andersen</str> 
  <str>jacob haugaard</str> 
  <str>jagged edge</str> 
  <str>jaguares</str> 
  <str>jamiroquai</str> 
  <str>jamppa tuominen</str> 
  <str>jane olivor</str> 
  <str>janis joplin</str> 
  <str>janne tulkki</str> 
  </arr>
  </lst>
  <str name="collation">"michael "bully" herbig ja me tanssimme"</str> 
  </lst>
  </lst>
  </response>

Please Help,

AnilHayanti 



--
View this message in context: http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p4007205.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: suggester issues

Posted by Will Oberman <ob...@civicscience.com>.

Sent from my iPhone

On Aug 21, 2011, at 5:54 AM, "Kuba Krzemien" <kr...@aster.pl> wrote:

> Finally got it working - turns out you can't just add it to the lib  
> dir as the wiki suggests. Unfortunately the only way is adding it to  
> solr.war.
>
> Thanks for your help.
>
> --------------------------------------------------
> From: "William Oberman" <ob...@civicscience.com>
> Sent: Friday, August 19, 2011 5:07 PM
> To: <so...@lucene.apache.org>
> Subject: Re: suggester issues
>
>> Hard to say, so I'll list the exact steps I took:
>> -Downloaded apache-solr-3.3.0 (I like to stick with releases vs. svn)
>> -Untar and cd
>> -ant
>> -Wrote my class below (under a peer directory in apache-solr-3.3.0)
>> -javac -cp ../dist/apache-solr-core-3.3.0.jar:../lucene/build/ 
>> lucene-core-3.3-SNAPSHOT.jar com/civicscience/ 
>> SpellingQueryConverter.java
>> -jar cf cs.jar com
>> -Unzipped solr.war (under example)
>> -Added my cs.jar to lib (under web-inf)
>> -Rezipped solr.war
>> -Added: <queryConverter name="queryConverter"  
>> class="com.civicscience.SpellingQueryConverter"/> to solrconfig.xml
>> -Restarted jetty
>>
>> And, that seemed to all work.
>>
>> will
>>
>> On Aug 19, 2011, at 10:44 AM, Kuba Krzemien wrote:
>>
>>> As far as I checked creating a custom query converter is the only  
>>> way to make this work.
>>> Unfortunately I have some problems with running it - after  
>>> creating a JAR with my class (Im using your source code, obviously  
>>> besides package and class names) and throwing it into the lib dir  
>>> I've added <queryConverter name="queryConverter"  
>>> class="mypackage.MySpellingQueryConverter"/> to solrconfig.xml.
>>>
>>> I get a "SEVERE: org.apache.solr.common.SolrException: Error  
>>> Instantiating QueryConverter, mypackage.MySpellingQueryConverter  
>>> is not a org.apache.solr.spelling.QueryConverter".
>>>
>>> What am I doing wrong?
>>>
>>> --------------------------------------------------
>>> From: "William Oberman" <ob...@civicscience.com>
>>> Sent: Thursday, August 18, 2011 10:35 PM
>>> To: <so...@lucene.apache.org>
>>> Subject: Re: suggester issues
>>>
>>>> I tried this:
>>>> package com.civicscience;
>>>>
>>>> import java.util.ArrayList;
>>>> import java.util.Collection;
>>>> import java.util.Collections;
>>>>
>>>> import org.apache.lucene.analysis.Token;
>>>> import org.apache.solr.spelling.QueryConverter;
>>>>
>>>> /**
>>>> * Converts the query string to a Collection of Lucene tokens.
>>>> **/
>>>> public class SpellingQueryConverter extends QueryConverter  {
>>>>
>>>> /**
>>>> * Converts the original query string to a collection of Lucene  
>>>> Tokens.
>>>> * @param original the original query string
>>>> * @return a Collection of Lucene Tokens
>>>> */
>>>> @Override
>>>> public Collection<Token> convert(String original) {
>>>>  if (original == null) {
>>>>    return Collections.emptyList();
>>>>  }
>>>>  Collection<Token> result = new ArrayList<Token>();
>>>>  Token token = new Token(original, 0, original.length(), "word");
>>>>  result.add(token);
>>>>  return result;
>>>> }
>>>>
>>>> }
>>>>
>>>> And added it to the classpath, and now it does what I expect.
>>>>
>>>> will
>>>>
>>>>
>>>> On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:
>>>>
>>>>> It can be done, I did that with shingles, but it's not the way  
>>>>> it's meant to
>>>>> be. The main problem with suggester is that we want compound  
>>>>> words and we
>>>>> never get them. I try to get "internet explorer" but when i  
>>>>> enter in the
>>>>> second word, "internet e" the suggester never finds "explorer".
>>>>>
>>>>> 2011/8/18 oberman_cs <ob...@civicscience.com>
>>>>>
>>>>>> I was trying to deal with the exact same issue, with the exact  
>>>>>> same
>>>>>> results.
>>>>>> Is there really no way to feed a phrase into the suggester  
>>>>>> (spellchecker)
>>>>>> without it splitting the input phrase into words?
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> *Alexei Martchenko* | *CEO* | Superdownloads
>>>>> alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
>>>>> 5083.1018/5080.3535/5080.3533

Re: suggester issues

Posted by Kuba Krzemien <kr...@aster.pl>.
Finally got it working - turns out you can't just add it to the lib dir as 
the wiki suggests. Unfortunately the only way is adding it to solr.war.

Thanks for your help.

--------------------------------------------------
From: "William Oberman" <ob...@civicscience.com>
Sent: Friday, August 19, 2011 5:07 PM
To: <so...@lucene.apache.org>
Subject: Re: suggester issues

> Hard to say, so I'll list the exact steps I took:
> -Downloaded apache-solr-3.3.0 (I like to stick with releases vs. svn)
> -Untar and cd
> -ant
> -Wrote my class below (under a peer directory in apache-solr-3.3.0)
> -javac -cp 
> ../dist/apache-solr-core-3.3.0.jar:../lucene/build/lucene-core-3.3-SNAPSHOT.jar 
> com/civicscience/SpellingQueryConverter.java
> -jar cf cs.jar com
> -Unzipped solr.war (under example)
> -Added my cs.jar to lib (under web-inf)
> -Rezipped solr.war
> -Added: <queryConverter name="queryConverter" 
> class="com.civicscience.SpellingQueryConverter"/> to solrconfig.xml
> -Restarted jetty
>
> And, that seemed to all work.
>
> will
>
> On Aug 19, 2011, at 10:44 AM, Kuba Krzemien wrote:
>
>> As far as I checked creating a custom query converter is the only way to 
>> make this work.
>> Unfortunately I have some problems with running it - after creating a JAR 
>> with my class (Im using your source code, obviously besides package and 
>> class names) and throwing it into the lib dir I've added <queryConverter 
>> name="queryConverter" class="mypackage.MySpellingQueryConverter"/> to 
>> solrconfig.xml.
>>
>> I get a "SEVERE: org.apache.solr.common.SolrException: Error 
>> Instantiating QueryConverter, mypackage.MySpellingQueryConverter is not a 
>> org.apache.solr.spelling.QueryConverter".
>>
>> What am I doing wrong?
>>
>> --------------------------------------------------
>> From: "William Oberman" <ob...@civicscience.com>
>> Sent: Thursday, August 18, 2011 10:35 PM
>> To: <so...@lucene.apache.org>
>> Subject: Re: suggester issues
>>
>>> I tried this:
>>> package com.civicscience;
>>>
>>> import java.util.ArrayList;
>>> import java.util.Collection;
>>> import java.util.Collections;
>>>
>>> import org.apache.lucene.analysis.Token;
>>> import org.apache.solr.spelling.QueryConverter;
>>>
>>> /**
>>> * Converts the query string to a Collection of Lucene tokens.
>>> **/
>>> public class SpellingQueryConverter extends QueryConverter  {
>>>
>>> /**
>>>  * Converts the original query string to a collection of Lucene Tokens.
>>>  * @param original the original query string
>>>  * @return a Collection of Lucene Tokens
>>>  */
>>> @Override
>>> public Collection<Token> convert(String original) {
>>>   if (original == null) {
>>>     return Collections.emptyList();
>>>   }
>>>   Collection<Token> result = new ArrayList<Token>();
>>>   Token token = new Token(original, 0, original.length(), "word");
>>>   result.add(token);
>>>   return result;
>>> }
>>>
>>> }
>>>
>>> And added it to the classpath, and now it does what I expect.
>>>
>>> will
>>>
>>>
>>> On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:
>>>
>>>> It can be done, I did that with shingles, but it's not the way it's 
>>>> meant to
>>>> be. The main problem with suggester is that we want compound words and 
>>>> we
>>>> never get them. I try to get "internet explorer" but when i enter in 
>>>> the
>>>> second word, "internet e" the suggester never finds "explorer".
>>>>
>>>> 2011/8/18 oberman_cs <ob...@civicscience.com>
>>>>
>>>>> I was trying to deal with the exact same issue, with the exact same
>>>>> results.
>>>>> Is there really no way to feed a phrase into the suggester 
>>>>> (spellchecker)
>>>>> without it splitting the input phrase into words?
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>>
>>>> *Alexei Martchenko* | *CEO* | Superdownloads
>>>> alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
>>>> 5083.1018/5080.3535/5080.3533
> 

Re: suggester issues

Posted by William Oberman <ob...@civicscience.com>.
Hard to say, so I'll list the exact steps I took:
-Downloaded apache-solr-3.3.0 (I like to stick with releases vs. svn)
-Untar and cd
-ant
-Wrote my class below (under a peer directory in apache-solr-3.3.0)
-javac -cp ../dist/apache-solr-core-3.3.0.jar:../lucene/build/lucene-core-3.3-SNAPSHOT.jar com/civicscience/SpellingQueryConverter.java
-jar cf cs.jar com
-Unzipped solr.war (under example)
-Added my cs.jar to lib (under web-inf)
-Rezipped solr.war
-Added: <queryConverter name="queryConverter" class="com.civicscience.SpellingQueryConverter"/> to solrconfig.xml
-Restarted jetty

And, that seemed to all work.

will

On Aug 19, 2011, at 10:44 AM, Kuba Krzemien wrote:

> As far as I checked creating a custom query converter is the only way to make this work.
> Unfortunately I have some problems with running it - after creating a JAR with my class (Im using your source code, obviously besides package and class names) and throwing it into the lib dir I've added <queryConverter name="queryConverter" class="mypackage.MySpellingQueryConverter"/> to solrconfig.xml.
> 
> I get a "SEVERE: org.apache.solr.common.SolrException: Error Instantiating QueryConverter, mypackage.MySpellingQueryConverter is not a org.apache.solr.spelling.QueryConverter".
> 
> What am I doing wrong?
> 
> --------------------------------------------------
> From: "William Oberman" <ob...@civicscience.com>
> Sent: Thursday, August 18, 2011 10:35 PM
> To: <so...@lucene.apache.org>
> Subject: Re: suggester issues
> 
>> I tried this:
>> package com.civicscience;
>> 
>> import java.util.ArrayList;
>> import java.util.Collection;
>> import java.util.Collections;
>> 
>> import org.apache.lucene.analysis.Token;
>> import org.apache.solr.spelling.QueryConverter;
>> 
>> /**
>> * Converts the query string to a Collection of Lucene tokens.
>> **/
>> public class SpellingQueryConverter extends QueryConverter  {
>> 
>> /**
>>  * Converts the original query string to a collection of Lucene Tokens.
>>  * @param original the original query string
>>  * @return a Collection of Lucene Tokens
>>  */
>> @Override
>> public Collection<Token> convert(String original) {
>>   if (original == null) {
>>     return Collections.emptyList();
>>   }
>>   Collection<Token> result = new ArrayList<Token>();
>>   Token token = new Token(original, 0, original.length(), "word");
>>   result.add(token);
>>   return result;
>> }
>> 
>> }
>> 
>> And added it to the classpath, and now it does what I expect.
>> 
>> will
>> 
>> 
>> On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:
>> 
>>> It can be done, I did that with shingles, but it's not the way it's meant to
>>> be. The main problem with suggester is that we want compound words and we
>>> never get them. I try to get "internet explorer" but when i enter in the
>>> second word, "internet e" the suggester never finds "explorer".
>>> 
>>> 2011/8/18 oberman_cs <ob...@civicscience.com>
>>> 
>>>> I was trying to deal with the exact same issue, with the exact same
>>>> results.
>>>> Is there really no way to feed a phrase into the suggester (spellchecker)
>>>> without it splitting the input phrase into words?
>>>> 
>>>> --
>>>> View this message in context:
>>>> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> *Alexei Martchenko* | *CEO* | Superdownloads
>>> alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
>>> 5083.1018/5080.3535/5080.3533


Re: suggester issues

Posted by Kuba Krzemien <kr...@aster.pl>.
As far as I checked creating a custom query converter is the only way to 
make this work.
Unfortunately I have some problems with running it - after creating a JAR 
with my class (Im using your source code, obviously besides package and 
class names) and throwing it into the lib dir I've added <queryConverter 
name="queryConverter" class="mypackage.MySpellingQueryConverter"/> to 
solrconfig.xml.

I get a "SEVERE: org.apache.solr.common.SolrException: Error Instantiating 
QueryConverter, mypackage.MySpellingQueryConverter is not a 
org.apache.solr.spelling.QueryConverter".

What am I doing wrong?

--------------------------------------------------
From: "William Oberman" <ob...@civicscience.com>
Sent: Thursday, August 18, 2011 10:35 PM
To: <so...@lucene.apache.org>
Subject: Re: suggester issues

> I tried this:
> package com.civicscience;
>
> import java.util.ArrayList;
> import java.util.Collection;
> import java.util.Collections;
>
> import org.apache.lucene.analysis.Token;
> import org.apache.solr.spelling.QueryConverter;
>
> /**
> * Converts the query string to a Collection of Lucene tokens.
> **/
> public class SpellingQueryConverter extends QueryConverter  {
>
>  /**
>   * Converts the original query string to a collection of Lucene Tokens.
>   * @param original the original query string
>   * @return a Collection of Lucene Tokens
>   */
>  @Override
>  public Collection<Token> convert(String original) {
>    if (original == null) {
>      return Collections.emptyList();
>    }
>    Collection<Token> result = new ArrayList<Token>();
>    Token token = new Token(original, 0, original.length(), "word");
>    result.add(token);
>    return result;
>  }
>
> }
>
> And added it to the classpath, and now it does what I expect.
>
> will
>
>
> On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:
>
>> It can be done, I did that with shingles, but it's not the way it's meant 
>> to
>> be. The main problem with suggester is that we want compound words and we
>> never get them. I try to get "internet explorer" but when i enter in the
>> second word, "internet e" the suggester never finds "explorer".
>>
>> 2011/8/18 oberman_cs <ob...@civicscience.com>
>>
>>> I was trying to deal with the exact same issue, with the exact same
>>> results.
>>> Is there really no way to feed a phrase into the suggester 
>>> (spellchecker)
>>> without it splitting the input phrase into words?
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> -- 
>>
>> *Alexei Martchenko* | *CEO* | Superdownloads
>> alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
>> 5083.1018/5080.3535/5080.3533
> 

Re: suggester issues

Posted by William Oberman <ob...@civicscience.com>.
I tried this:
package com.civicscience;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;

import org.apache.lucene.analysis.Token;
import org.apache.solr.spelling.QueryConverter;

/**                                                                                                                                                                                                                                        
 * Converts the query string to a Collection of Lucene tokens.                                                                                                                                                                             
 **/
public class SpellingQueryConverter extends QueryConverter  {

  /**                                                                                                                                                                                                                                      
   * Converts the original query string to a collection of Lucene Tokens.                                                                                                                                                                 
   * @param original the original query string                                                                                                                                                                                             
   * @return a Collection of Lucene Tokens                                                                                                                                                                                                 
   */
  @Override
  public Collection<Token> convert(String original) {
    if (original == null) {                                                                                                                                                            
      return Collections.emptyList();
    }
    Collection<Token> result = new ArrayList<Token>();
    Token token = new Token(original, 0, original.length(), "word");
    result.add(token);
    return result;
  }

}

And added it to the classpath, and now it does what I expect. 

will


On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:

> It can be done, I did that with shingles, but it's not the way it's meant to
> be. The main problem with suggester is that we want compound words and we
> never get them. I try to get "internet explorer" but when i enter in the
> second word, "internet e" the suggester never finds "explorer".
> 
> 2011/8/18 oberman_cs <ob...@civicscience.com>
> 
>> I was trying to deal with the exact same issue, with the exact same
>> results.
>> Is there really no way to feed a phrase into the suggester (spellchecker)
>> without it splitting the input phrase into words?
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> 
> 
> -- 
> 
> *Alexei Martchenko* | *CEO* | Superdownloads
> alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
> 5083.1018/5080.3535/5080.3533


Re: suggester issues

Posted by Alexei Martchenko <al...@superdownloads.com.br>.
It can be done, I did that with shingles, but it's not the way it's meant to
be. The main problem with suggester is that we want compound words and we
never get them. I try to get "internet explorer" but when i enter in the
second word, "internet e" the suggester never finds "explorer".

2011/8/18 oberman_cs <ob...@civicscience.com>

> I was trying to deal with the exact same issue, with the exact same
> results.
> Is there really no way to feed a phrase into the suggester (spellchecker)
> without it splitting the input phrase into words?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Re: suggester issues

Posted by oberman_cs <ob...@civicscience.com>.
I was trying to deal with the exact same issue, with the exact same results. 
Is there really no way to feed a phrase into the suggester (spellchecker)
without it splitting the input phrase into words?

--
View this message in context: http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: suggester issues

Posted by "O. Klein" <kl...@octoweb.nl>.
The problem lies in the suggester like the spellchecker, tokenizing on
whitespace. So while shingles might give you nice suggestions, the behaviour
of the Suggester makes it unusable.

Besides that, I never succeeded in getting the suggester to show more
collations then one. Normal spellchecker on the same fields showed them
allright.

Unless Im missing some hidden features or something, I think the Suggester
might need some work to make it work like people expect it to work.


--
View this message in context: http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3264740.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: suggester issues

Posted by Kuba Krzemień <kr...@aster.pl>.
What happens if you set spellcheck.maxCollations to more than 1?

--------------------------------------------------
From: "Alexei Martchenko" <al...@superdownloads.com.br>
Sent: Wednesday, August 17, 2011 11:01 PM
To: <so...@lucene.apache.org>
Subject: Re: suggester issues

> I've been indexing and reindexing stuff here with Shingles. I don't 
> believe
> it's the best approach. Results are interesting, but I believe it's not 
> what
> the suggester is meant to be.
>
> I tried
>
> <fieldType name="textSuggestion" class="solr.TextField"
> positionIncrementGap="10" stored="false" multiValued="true">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ShingleFilterFactory" maxShingleSize="4"
> outputUnigrams="true" outputUnigramsIfNoShingles="false" />
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> </fieldType>
>
> but I got compound words in the suggestion itself.
>
> If you query them like http://localhost:8983/solr/{mycore}/suggest/?q=dri 
> i
> get
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> </lst>
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="dri">
> <int name="numFound">6</int>
> <int name="startOffset">0</int>
> <int name="endOffset">3</int>
> <arr name="suggestion">
> <str>drivers</str>
> <str>drivers nvidia</str>
> <str>drivers intel</str>
> <str>drivers nvidia geforce</str>
> <str>drive</str>
> <str>driver</str>
> </arr>
> </lst>
> <str name="collation">drivers</str>
> </lst>
> </lst>
> </response>
>
> but when i enter the second word,
> http://localhost:8983/solr/{mycore}/suggest/?q=drivers%20n<http://localhost:8983/solr/%7Bmycore%7D/suggest/?q=drivers%20n>
> it
> scrambles everything
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">0</int>
> </lst>
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="drivers">
> <int name="numFound">4</int>
> <int name="startOffset">0</int>
> <int name="endOffset">7</int>
> <arr name="suggestion">
> <str>drivers</str>
> <str>drivers nvidia</str>
> <str>drivers intel</str>
> <str>drivers nvidia geforce</str>
> </arr>
> </lst>
> <lst name="n">
> <int name="numFound">10</int>
> <int name="startOffset">8</int>
> <int name="endOffset">9</int>
> <arr name="suggestion">
> <str>nvidia</str>
> <str>net</str>
> <str>nvidia geforce</str>
> <str>network</str>
> <str>new</str>
> <str>n</str>
> <str>ninja</str>
> </arr>
> </lst>
> <str name="collation">drivers nvidia</str>
> </lst>
> </lst>
> </response>
>
> Although the collation seems fine for this, it's not exactly what 
> suggester
> is supposed to do.
>
> Any thoughts?
>
> 2011/8/17 Alexei Martchenko <al...@superdownloads.com.br>
>
>> I have the very very very same problem. I could copy+paste your message 
>> as
>> mine. I've discovered so far that bigger dictionaries work better for me,
>> controlling threshold is much better than avoid indexing one or twio 
>> fields.
>> Of course i'm still polishing this.
>>
>> At this very moment I was looking into Shingles, are you using them?
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>>
>> How are your fields?
>>
>> 2011/8/17 Kuba Krzemień <kr...@gmail.com>
>>
>>> Hello, I am working on creating a auto-complete functionality for my
>>> platform which indexes large ammounts of text (title + contents) - there 
>>> is
>>> too much data for a dictionary. I am using the latest version of Solr 
>>> (3.3)
>>> and I am trying to take advantage of the Suggester functionality.
>>> Unfortunately so far the outcome isn't that great.
>>>
>>> The Suggester works only for single words or whole phrases (depends on 
>>> the
>>> tokenizer). When using the first option, I am unable to suggest any 
>>> combined
>>> queries. For example the suggestion for 'ne' will be 'new'. Suggestion 
>>> for
>>> 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
>>> worse, querying 'new AND y' gives the same results (also when using
>>> collate), which means that the returned suggestion may give no results -
>>> what makes sense separately often doesn't work combined. I need a way to
>>> find only those suggestions, that will return results when doing a AND 
>>> query
>>> (for example 'new AND york', 'new AND year', as long as they give 
>>> results
>>> upon querying - 'new AND yeti' shouldn't be returned as a suggestion).
>>>
>>> When I use the second tokenizer and the suggestions return phrases, for
>>> 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
>>> nothing. Also, for 'y' I will get nothing, so the issue remains.
>>>
>>> If someone has some experience working with the Suggester, or if someone
>>> has created a well working auto-suggester based on Solr, please help me.
>>> I've been trying to find a sollution for this for quite some time.
>>>
>>> Yours sincerely,
>>> Jackob K
>>>
>>
>>
>>
>> --
>>
>> *Alexei Martchenko* | *CEO* | Superdownloads
>> alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
>> 5083.1018/5080.3535/5080.3533
>>
>>
>
>
> -- 
>
> *Alexei Martchenko* | *CEO* | Superdownloads
> alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
> 5083.1018/5080.3535/5080.3533
> 

Re: suggester issues

Posted by Alexei Martchenko <al...@superdownloads.com.br>.
I've been indexing and reindexing stuff here with Shingles. I don't believe
it's the best approach. Results are interesting, but I believe it's not what
the suggester is meant to be.

I tried

<fieldType name="textSuggestion" class="solr.TextField"
positionIncrementGap="10" stored="false" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="4"
outputUnigrams="true" outputUnigramsIfNoShingles="false" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

but I got compound words in the suggestion itself.

If you query them like http://localhost:8983/solr/{mycore}/suggest/?q=dri i
get

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="dri">
<int name="numFound">6</int>
<int name="startOffset">0</int>
<int name="endOffset">3</int>
<arr name="suggestion">
<str>drivers</str>
<str>drivers nvidia</str>
<str>drivers intel</str>
<str>drivers nvidia geforce</str>
<str>drive</str>
<str>driver</str>
</arr>
</lst>
<str name="collation">drivers</str>
</lst>
</lst>
</response>

but when i enter the second word,
http://localhost:8983/solr/{mycore}/suggest/?q=drivers%20n<http://localhost:8983/solr/%7Bmycore%7D/suggest/?q=drivers%20n>
it
scrambles everything

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="drivers">
<int name="numFound">4</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>drivers</str>
<str>drivers nvidia</str>
<str>drivers intel</str>
<str>drivers nvidia geforce</str>
</arr>
</lst>
<lst name="n">
<int name="numFound">10</int>
<int name="startOffset">8</int>
<int name="endOffset">9</int>
<arr name="suggestion">
<str>nvidia</str>
<str>net</str>
<str>nvidia geforce</str>
<str>network</str>
<str>new</str>
<str>n</str>
<str>ninja</str>
</arr>
</lst>
<str name="collation">drivers nvidia</str>
</lst>
</lst>
</response>

Although the collation seems fine for this, it's not exactly what suggester
is supposed to do.

Any thoughts?

2011/8/17 Alexei Martchenko <al...@superdownloads.com.br>

> I have the very very very same problem. I could copy+paste your message as
> mine. I've discovered so far that bigger dictionaries work better for me,
> controlling threshold is much better than avoid indexing one or twio fields.
> Of course i'm still polishing this.
>
> At this very moment I was looking into Shingles, are you using them?
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>
> How are your fields?
>
> 2011/8/17 Kuba Krzemień <kr...@gmail.com>
>
>> Hello, I am working on creating a auto-complete functionality for my
>> platform which indexes large ammounts of text (title + contents) - there is
>> too much data for a dictionary. I am using the latest version of Solr (3.3)
>> and I am trying to take advantage of the Suggester functionality.
>> Unfortunately so far the outcome isn't that great.
>>
>> The Suggester works only for single words or whole phrases (depends on the
>> tokenizer). When using the first option, I am unable to suggest any combined
>> queries. For example the suggestion for 'ne' will be 'new'. Suggestion for
>> 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
>> worse, querying 'new AND y' gives the same results (also when using
>> collate), which means that the returned suggestion may give no results -
>> what makes sense separately often doesn't work combined. I need a way to
>> find only those suggestions, that will return results when doing a AND query
>> (for example 'new AND york', 'new AND year', as long as they give results
>> upon querying - 'new AND yeti' shouldn't be returned as a suggestion).
>>
>> When I use the second tokenizer and the suggestions return phrases, for
>> 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
>> nothing. Also, for 'y' I will get nothing, so the issue remains.
>>
>> If someone has some experience working with the Suggester, or if someone
>> has created a well working auto-suggester based on Solr, please help me.
>> I've been trying to find a sollution for this for quite some time.
>>
>> Yours sincerely,
>> Jackob K
>>
>
>
>
> --
>
> *Alexei Martchenko* | *CEO* | Superdownloads
> alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
> 5083.1018/5080.3535/5080.3533
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Re: suggester issues

Posted by Alexei Martchenko <al...@superdownloads.com.br>.
I have the very very very same problem. I could copy+paste your message as
mine. I've discovered so far that bigger dictionaries work better for me,
controlling threshold is much better than avoid indexing one or twio fields.
Of course i'm still polishing this.

At this very moment I was looking into Shingles, are you using them?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

How are your fields?

2011/8/17 Kuba Krzemień <kr...@gmail.com>

> Hello, I am working on creating a auto-complete functionality for my
> platform which indexes large ammounts of text (title + contents) - there is
> too much data for a dictionary. I am using the latest version of Solr (3.3)
> and I am trying to take advantage of the Suggester functionality.
> Unfortunately so far the outcome isn't that great.
>
> The Suggester works only for single words or whole phrases (depends on the
> tokenizer). When using the first option, I am unable to suggest any combined
> queries. For example the suggestion for 'ne' will be 'new'. Suggestion for
> 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
> worse, querying 'new AND y' gives the same results (also when using
> collate), which means that the returned suggestion may give no results -
> what makes sense separately often doesn't work combined. I need a way to
> find only those suggestions, that will return results when doing a AND query
> (for example 'new AND york', 'new AND year', as long as they give results
> upon querying - 'new AND yeti' shouldn't be returned as a suggestion).
>
> When I use the second tokenizer and the suggestions return phrases, for
> 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
> nothing. Also, for 'y' I will get nothing, so the issue remains.
>
> If someone has some experience working with the Suggester, or if someone
> has created a well working auto-suggester based on Solr, please help me.
> I've been trying to find a sollution for this for quite some time.
>
> Yours sincerely,
> Jackob K
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533