You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ankit Murarka <an...@rancoretech.com> on 2013/08/02 10:16:21 UTC

Complete phrase Suggest Feature in Apache Lucene

Hello All,

Just like spellcheck feature which after lot of trouble was Implemented, 
is it possible to implement Complete Phrase Suggest Feature in Lucene 
4.3 . So if I enter an incorrect phrase it can suggest me few possible 
valid phrases.

One way could be to get suggestion for each word in the sentence and 
calling SpellChecker.suggestSimilar for each word. This can be done but 
this won't help me build a near possible phrase.

If I input "Wanna chk Luc Fetre" then I will get different spell 
suggestions for each word but this wont help me build a near exact phrase.

Is there any possible way of doing this. I have gone through Javadoc but 
could not find anything related to this. Any help on suggesting a 
possible way or an alternative will be highly appreciated.

-- 
Regards

Ankit

"What lies behind us and what lies before us are tiny matters compared with what lies within us"


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Complete phrase Suggest Feature in Apache Lucene

Posted by Ankit Murarka <an...@rancoretech.com>.
Hello Koji. I understand only English. Unfortunately the link you shared 
contains slides which I dont know is present in which Language !!
If possible please share the slides in English language. Many thanks for 
the help.

On 8/2/2013 9:04 PM, Koji Sekiguchi wrote:
> (13/08/02 17:16), Ankit Murarka wrote:
>> Hello All,
>>
>> Just like spellcheck feature which after lot of trouble was 
>> Implemented, is it possible to implement
>> Complete Phrase Suggest Feature in Lucene 4.3 . So if I enter an 
>> incorrect phrase it can suggest me
>> few possible valid phrases.
>>
>> One way could be to get suggestion for each word in the sentence and 
>> calling
>> SpellChecker.suggestSimilar for each word. This can be done but this 
>> won't help me build a near
>> possible phrase.
>>
>> If I input "Wanna chk Luc Fetre" then I will get different spell 
>> suggestions for each word but this
>> wont help me build a near exact phrase.
>>
>> Is there any possible way of doing this. I have gone through Javadoc 
>> but could not find anything
>> related to this. Any help on suggesting a possible way or an 
>> alternative will be highly appreciated.
>>
>
> I've wrote a program that extracts buzz phrases from Lucene index.
>
> http://www.slideshare.net/KojiSekiguchi/lucene-terms-extraction
>
> By using it, I got a phrase list. The phrase list can be used for 
> autocomplete and
> did you mean features.
>
> koji


-- 
Regards

Ankit Murarka

"What lies behind us and what lies before us are tiny matters compared with what lies within us"


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Complete phrase Suggest Feature in Apache Lucene

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(13/08/02 17:16), Ankit Murarka wrote:
> Hello All,
>
> Just like spellcheck feature which after lot of trouble was Implemented, is it possible to implement
> Complete Phrase Suggest Feature in Lucene 4.3 . So if I enter an incorrect phrase it can suggest me
> few possible valid phrases.
>
> One way could be to get suggestion for each word in the sentence and calling
> SpellChecker.suggestSimilar for each word. This can be done but this won't help me build a near
> possible phrase.
>
> If I input "Wanna chk Luc Fetre" then I will get different spell suggestions for each word but this
> wont help me build a near exact phrase.
>
> Is there any possible way of doing this. I have gone through Javadoc but could not find anything
> related to this. Any help on suggesting a possible way or an alternative will be highly appreciated.
>

I've wrote a program that extracts buzz phrases from Lucene index.

http://www.slideshare.net/KojiSekiguchi/lucene-terms-extraction

By using it, I got a phrase list. The phrase list can be used for autocomplete and
did you mean features.

koji
-- 
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Complete phrase Suggest Feature in Apache Lucene

Posted by Ivan Krišto <iv...@gmail.com>.
On 08/06/2013 02:50 PM, Ankit Murarka wrote:
> This does not seem to help.As per suggestion, here's what I did":
> a. Indexed the document line by line. Verified from Luke that it is
> actually indexing line by line.
> b. Effectively each line is a phrase over here.
>
> I dont seem to understand how do I index this whole phrase as
> SpellChecker suggestion. When I passed the index as it is, the
> SpellChecker suggestion provided only the word suggestions rather than
> phrase suggestion.

If you say that index writer did a good job, than you must have
configured spellchecker the wrong way.
To avoid guessing each point of configuration, I'm sending you the
complete working example.

Check these lines:
// @indexing
SpellChecker phraseRecommender = new SpellChecker(spellDir);
IndexReader reader = DirectoryReader.open(dir);
phraseRecommender.indexDictionary(new LuceneDictionary(reader,
REC_FIELD_NAME), iwc, true);

// @query recommendation
SpellChecker phraseRecommender = new SpellChecker(spellDir);
phraseRecommender.setAccuracy(0.3f);

Complete working code:

import java.io.IOException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.spell.LuceneDictionary;
import org.apache.lucene.search.spell.SpellChecker;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;

public class PhraseSuggestion {
  public static final String REC_FIELD_NAME = "recommendation";
 
  public static void main(String[] args) throws IOException {
   RAMDirectory phrasesDir = new RAMDirectory();
   RAMDirectory spellDir = new RAMDirectory();
   // Index time
   indexPhrases(phrasesDir, spellDir,
       "What have the Romans ever done for us?",
       "This parrot is no more.",
       "A tiger... in Africa?",
       "That Rabbit's Dynamite!!",
       "Lovely spam! Wonderful spam!",
       "Spam spam spam spam...",
       "A duck",
       "Strange ladies lying in pools distributing swords is no basis
for government",
       "Nobody expects the Spanish Inquisition");
  
   // Query suggestion time
   SpellChecker phraseRecommender = new SpellChecker(spellDir);
   phraseRecommender.setAccuracy(0.3f);
   System.out.println(getSuggestion("I like spamming with a spam",
phraseRecommender));
   System.out.println(getSuggestion("I want parrot and a rabbit",
phraseRecommender));
   System.out.println(getSuggestion("rabbit dynamite", phraseRecommender));
  }
 
  public static void indexPhrases(Directory dir, Directory spellDir,
String ... phrases) throws IOException {
    IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_43, new
StandardAnalyzer(Version.LUCENE_43));
    IndexWriter writer = new IndexWriter(dir, iwc);
   
    for (int i = 0; i < phrases.length; i++) {
      addRecommendation(phrases[i], writer);
    }
   
    writer.close();
   
    SpellChecker phraseRecommender = new SpellChecker(spellDir);
   
    IndexReader reader = DirectoryReader.open(dir);
    phraseRecommender.indexDictionary(new LuceneDictionary(reader,
REC_FIELD_NAME), iwc, true);
    phraseRecommender.close();
    reader.close();
  }
 
  private static void addRecommendation(String phrase, IndexWriter writer)
      throws CorruptIndexException, IOException {
    Document doc = new Document();
   
    FieldType ft = new FieldType(StringField.TYPE_NOT_STORED);
    ft.setOmitNorms(false);
    Field f = new Field(REC_FIELD_NAME, phrase, ft);
    doc.add(f);
   
    writer.addDocument(doc);
  }
 
  public static String getSuggestion(String query, SpellChecker
phraseRecommender) throws IOException {
    String[] suggestions = phraseRecommender.suggestSimilar(query, 5);
    if (suggestions.length > 0) return suggestions[0];
    else return null;
  }
}

It prints:
Lovely spam! Wonderful spam!
This parrot is no more.
That Rabbit's Dynamite!!


  Regards,
    Ivan Krišto

> On 8/2/2013 7:58 PM, Ivan Krišto wrote:
>> On 08/02/2013 10:16 AM, Ankit Murarka wrote:
>>   
>>> is it possible to implement Complete Phrase Suggest Feature in Lucene
>>> 4.3 . So if I enter an incorrect phrase it can suggest me few possible
>>> valid phrases.
>>>
>>> One way could be to get suggestion for each word in the sentence and
>>> calling SpellChecker.suggestSimilar for each word. This can be done
>>> but this won't help me build a near possible phrase.
>>>
>>> If I input "Wanna chk Luc Fetre" then I will get different spell
>>> suggestions for each word but this wont help me build a near exact
>>> phrase.
>>>      
>> I did something similar some time ago (I've used Lucene 4.0 trunk before
>> its release, and I don't know if spellchecker API changed since then).
>>
>> Idea is simple:
>> - Take a list of valid phrases and index whole phrases as spellchecker
>> suggestions.
>>
>> My implementation:
>> - As a list of valid phrases I took queries from search engine query
>> log.
>> - At index time, beside saving phrases, I also saved occurance number of
>> single phrases.
>> - My phrase suggestion would take 5 most similar phrases to given query
>> and returned most common phrase from index.
>> It's very simple and works quite well.
>>
>> A few tips:
>> - Think when to show phrase suggestion, e.g. show suggestion only if
>> most common suggested phrase occures 10 time more often than given
>> query.
>> - Explore different distance measures and their parameters.
>> - Maybe it would be good to use only word 3-grams as phrases (if you
>> have query "how to use lucene", you would index "how to use" and "to use
>> lucene" as phrases) -- than you would "fix" given query by parts.
>> - To explore more solutions of this problem search papers for "related
>> query suggestion".
>> - Twitter came to similar idea as I did:
>> https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search
>>
>>
>>
>>    Regards,
>>      Ivan Krišto
>>
>> <https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search>
>>
>>
>>
>>    
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Complete phrase Suggest Feature in Apache Lucene

Posted by Ankit Murarka <an...@rancoretech.com>.
Hello,

This does not seem to help.As per suggestion, here's what I did":

a. Indexed the document line by line. Verified from Luke that it is 
actually indexing line by line.
b. Effectively each line is a phrase over here.

I dont seem to understand how do I index this whole phrase as 
SpellChecker suggestion. When I passed the index as it is, the 
SpellChecker suggestion provided only the word suggestions rather than 
phrase suggestion.

There has to be some different way of indexing the whole phrase as 
spellchecker suggestion. Please note, the phrase was extracted from the 
document by indexing it line by line. Each phrase is actually a line.

On 8/2/2013 7:58 PM, Ivan Krišto wrote:
> On 08/02/2013 10:16 AM, Ankit Murarka wrote:
>    
>> is it possible to implement Complete Phrase Suggest Feature in Lucene
>> 4.3 . So if I enter an incorrect phrase it can suggest me few possible
>> valid phrases.
>>
>> One way could be to get suggestion for each word in the sentence and
>> calling SpellChecker.suggestSimilar for each word. This can be done
>> but this won't help me build a near possible phrase.
>>
>> If I input "Wanna chk Luc Fetre" then I will get different spell
>> suggestions for each word but this wont help me build a near exact
>> phrase.
>>      
> I did something similar some time ago (I've used Lucene 4.0 trunk before
> its release, and I don't know if spellchecker API changed since then).
>
> Idea is simple:
> - Take a list of valid phrases and index whole phrases as spellchecker
> suggestions.
>
> My implementation:
> - As a list of valid phrases I took queries from search engine query log.
> - At index time, beside saving phrases, I also saved occurance number of
> single phrases.
> - My phrase suggestion would take 5 most similar phrases to given query
> and returned most common phrase from index.
> It's very simple and works quite well.
>
> A few tips:
> - Think when to show phrase suggestion, e.g. show suggestion only if
> most common suggested phrase occures 10 time more often than given query.
> - Explore different distance measures and their parameters.
> - Maybe it would be good to use only word 3-grams as phrases (if you
> have query "how to use lucene", you would index "how to use" and "to use
> lucene" as phrases) -- than you would "fix" given query by parts.
> - To explore more solutions of this problem search papers for "related
> query suggestion".
> - Twitter came to similar idea as I did:
> https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search
>
>
>    Regards,
>      Ivan Krišto
>
> <https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search>
>
>
>    


-- 
Regards

Ankit Murarka

"What lies behind us and what lies before us are tiny matters compared with what lies within us"


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Complete phrase Suggest Feature in Apache Lucene

Posted by Ivan Krišto <iv...@gmail.com>.
On 08/02/2013 10:16 AM, Ankit Murarka wrote:
> is it possible to implement Complete Phrase Suggest Feature in Lucene
> 4.3 . So if I enter an incorrect phrase it can suggest me few possible
> valid phrases.
>
> One way could be to get suggestion for each word in the sentence and
> calling SpellChecker.suggestSimilar for each word. This can be done
> but this won't help me build a near possible phrase.
>
> If I input "Wanna chk Luc Fetre" then I will get different spell
> suggestions for each word but this wont help me build a near exact
> phrase.

I did something similar some time ago (I've used Lucene 4.0 trunk before
its release, and I don't know if spellchecker API changed since then).

Idea is simple:
- Take a list of valid phrases and index whole phrases as spellchecker
suggestions.

My implementation:
- As a list of valid phrases I took queries from search engine query log.
- At index time, beside saving phrases, I also saved occurance number of
single phrases.
- My phrase suggestion would take 5 most similar phrases to given query
and returned most common phrase from index.
It's very simple and works quite well.

A few tips:
- Think when to show phrase suggestion, e.g. show suggestion only if
most common suggested phrase occures 10 time more often than given query.
- Explore different distance measures and their parameters.
- Maybe it would be good to use only word 3-grams as phrases (if you
have query "how to use lucene", you would index "how to use" and "to use
lucene" as phrases) -- than you would "fix" given query by parts.
- To explore more solutions of this problem search papers for "related
query suggestion".
- Twitter came to similar idea as I did:
https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search


  Regards,
    Ivan Krišto

<https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search>