You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by dreampeppers99 <le...@gmail.com> on 2008/04/08 17:11:21 UTC
Questions about use of SpellChecker: Constructor and Simillarity...
Hi,
I have two question about this GREAT tool.. (framework, library...
"whatever")
Well I decide put spell checker on my applications and I start to read some
papers and "found out" the Lucene project...
Anyway, I make it works, but I just want to know...
1º Why need I pass a Directory objecto (obligatory) on constructor of
SpellChecker?
2º Suposse that in my dictonary I had these words:
"The Lord of the Rings: The Two Towers"
"The Lord of the Rings: The Fellowship of the Ring"
"The Lord of the Rings: The Return of the King"
I just want to know how can I code something to "suggest" when user query
"The Lord of the Rings: The Two Towers" the application suggest:
"The Lord of the Rings: The Fellowship of the Ring"
"The Lord of the Rings: The Return of the King"
It is possible just using the Lucene?
################ My Test Class ######################
SpellChecker spell;
spell= new SpellChecker(FSDirectory.getDirectory(".")); //why this... ?!!
spell.indexDictionary(new Dicionario());
String[] l = spell.suggestSimilar(args[0],5);
for (String vl : l ){
System.out.println("Suggested : " + vl);
}
###############################################
############### My Dictionary######################
public class Dicionario implements
org.apache.lucene.search.spell.Dictionary{
public Iterator getWordsIterator(){
List<String> lista = new ArrayList<String>();
lista.add("peter");
lista.add("spider man 3");
lista.add("johnny depp");
lista.add("the edge");
lista.add("monk");
lista.add("arnold schwarzenegger");
return lista.iterator();
}
}
###############################################
Thanks in advance... :D
--
View this message in context: http://www.nabble.com/Questions-about-use-of-SpellChecker%3A-Constructor-and-Simillarity...-tp16559731p16559731.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Karl Wettin <ka...@gmail.com>.
Mathieu Lecarme skrev:
> Le 8 avr. 08 à 18:34, Karl Wettin a écrit :
>> dreampeppers99 skrev:
>>> 2º Suposse that in my dictonary I had these words:
>>> "The Lord of the Rings: The Two Towers"
>>> "The Lord of the Rings: The Fellowship of the Ring"
>>> "The Lord of the Rings: The Return of the King"
>>> I just want to know how can I code something to "suggest" when user
>>> query
>>> "The Lord of the Rings: The Two Towers" the application suggest: "The
>>> Lord of the Rings: The Fellowship of the Ring"
>>> "The Lord of the Rings: The Return of the King"
>>> It is possible just using the Lucene?
>>
>> There are no typos in your example so you really don't even need a
>> spell checker for that. Using OR clauses in your query would be
>> enough. Perhaps you want to combine one variant with MUST clauses that
>> has a bit more boost than the OR clauses.
> A classical OR query will match shuffled data : "The king of lord got a
> ring" should match.
> With shingle, you will match title in the right order.
Appending a SHOULD clause containing a phrase or span query with a bit
of boost also works.
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Leandro <le...@gmail.com>.
>
> Mainly because it is a nasty peice of code. But it does a good job.
> >
> Because spellChecker use a directory to store data. It can be FSDirectory,
> RAMDirectory ....
Perfect explanation... !!!
So use the RAMDirectory is better (perfomatically)
spell= new SpellChecker(FSDirectory.getDirectory("."));
spell= new SpellChecker(RAMDirectory.getDirectory("."));
The second is better (fast) to little amount of data...
Thanks so much, now I can understand ... It may be on real documentation...
> A classical OR query will match shuffled data : "The king of lord got a
> ring" should match.
> With shingle, you will match title in the right order.
Shingle will divide it on "couple" of words... so I can use it with OR ...
(The good one.... I'll try this)
Thanks so much!!!
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Mathieu Lecarme <ma...@garambrogne.net>.
Le 8 avr. 08 à 18:34, Karl Wettin a écrit :
> dreampeppers99 skrev:
>> 1º Why need I pass a Directory objecto (obligatory) on constructor of
>> SpellChecker?
>
> Mainly because it is a nasty peice of code. But it does a good job.
Because spellChecker use a directory to store data. It can be
FSDirectory, RAMDirectory ....
>
>
>> 2º Suposse that in my dictonary I had these words:
>> "The Lord of the Rings: The Two Towers"
>> "The Lord of the Rings: The Fellowship of the Ring"
>> "The Lord of the Rings: The Return of the King"
>> I just want to know how can I code something to "suggest" when user
>> query
>> "The Lord of the Rings: The Two Towers" the application suggest:
>> "The Lord of the Rings: The Fellowship of the Ring"
>> "The Lord of the Rings: The Return of the King"
>> It is possible just using the Lucene?
>
> There are no typos in your example so you really don't even need a
> spell checker for that. Using OR clauses in your query would be
> enough. Perhaps you want to combine one variant with MUST clauses
> that has a bit more boost than the OR clauses.
A classical OR query will match shuffled data : "The king of lord got
a ring" should match.
With shingle, you will match title in the right order.
M.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Mathieu Lecarme <ma...@garambrogne.net>.
>> I'm cool :) I just think you are overcomplicating things.
>>
>>
> Yes... I can use two words and OR
> Suposse I query on this
>
> The Lord of Rings: Return of King
> The Lord of Rings: Fellowship
> The Lord of Rings: The Two towers
> The Lord of Weapons
> The Lord of War
>
> Suposse an user search: "The Lord of Rings Return of King"
> WHERE
> name like '%the lord%' or
> name like '%lord of%' or
> name like '%of rings%' or
> name like '%rings return%' or
> name like '%return of%' or
> name like '%of king%'
Lucen syntax is more pretty.
With movie title indexed as "title", with LowerCaseFilter.
BooleanQuery bq = new BooleanQuery();
bd.add(new TermQuery(new Term("title", "the lord")), Occur.SHOULD);
bd.add(new TermQuery(new Term("title", "lord of")), Occur.SHOULD);
bd.add(new TermQuery(new Term("title", "of rings")), Occur.SHOULD);
bd.add(new TermQuery(new Term("title", "rings return")), Occur.SHOULD);
bd.add(new TermQuery(new Term("title", "return of")), Occur.SHOULD);
bd.add(new TermQuery(new Term("title", "of king")), Occur.SHOULD);
> So will show all lines... the question now is which is best
> 'ranking' ...
> However you all help me so much , THANKS SO MUCH!!!
> (now I won't say bad about the constructor of SpellChecker)
most word matched, the better score you have.
You should use a thresold (number of common words/word size) or
something like that to exclude to far title.
M.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Leandro <le...@gmail.com>.
>
>
>
> I'm cool :) I just think you are overcomplicating things.
>
>
Yes... I can use two words and OR
Suposse I query on this
The Lord of Rings: Return of King
The Lord of Rings: Fellowship
The Lord of Rings: The Two towers
The Lord of Weapons
The Lord of War
Suposse an user search: "The Lord of Rings Return of King"
WHERE
name like '%the lord%' or
name like '%lord of%' or
name like '%of rings%' or
name like '%rings return%' or
name like '%return of%' or
name like '%of king%'
So will show all lines... the question now is which is best 'ranking' ...
However you all help me so much , THANKS SO MUCH!!!
(now I won't say bad about the constructor of SpellChecker)
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Karl Wettin <ka...@gmail.com>.
Leandro skrev:
>> Sorry, I missunderstood your question. See other reply.
>>
>
> Yes I got it. thanks
>
>> Are you sure about that? Did you benchmark? Can we see the results?
>
>
> Hey man take it easy, I just imagine. But I guess use the ShingleFilter will
> help.
I'm cool :) I just think you are overcomplicating things.
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Leandro <le...@gmail.com>.
> Sorry, I missunderstood your question. See other reply.
>
Yes I got it. thanks
> Are you sure about that? Did you benchmark? Can we see the results?
Hey man take it easy, I just imagine. But I guess use the ShingleFilter will
help.
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Karl Wettin <ka...@gmail.com>.
Leandro skrev:
>>
>> 1º Why need I pass a Directory objecto (obligatory) on constructor of
>>> SpellChecker?
>>>
>> Mainly because it is a nasty peice of code. But it does a good job.
>>
>
> How can we suggest it (create an normal constructor without param) to the
> team?
Sorry, I missunderstood your question. See other reply.
>> There are no typos in your example so you really don't even need a spell
>> checker for that. Using OR clauses in your query would be enough.
>
> I guess no, because user will enter : "The Lord of the Rings: The Return of
> the King" ... and the system should response with:
>
>
> Similar:
> The Lord of the Rings: The Two Towers
> The Lord of the Rings: The Fellowship of the Ring
>
> I can't see how can I do that? (just using the OR statement)
> For example:
>
> name like '%the%'
> or
> name like '%Lord%'
> or
> name like '%of%'
> or
> name like '%the%'
> or
> name like '%Rings%'
>
> will produce so much results besides to be non-performatic...
Are you sure about that? Did you benchmark? Can we see the results?
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Leandro <le...@gmail.com>.
>
>
> 1º Why need I pass a Directory objecto (obligatory) on constructor of
> > SpellChecker?
> >
>
> Mainly because it is a nasty peice of code. But it does a good job.
>
Thanks.
How can we suggest it (create an normal constructor without param) to the
team?
>
>
> 2º Suposse that in my dictonary I had these words:
> >
> > "The Lord of the Rings: The Two Towers"
> > "The Lord of the Rings: The Fellowship of the Ring"
> > "The Lord of the Rings: The Return of the King"
> >
> > I just want to know how can I code something to "suggest" when user
> > query
> > "The Lord of the Rings: The Two Towers" the application suggest:
> > "The Lord of the Rings: The Fellowship of the Ring"
> > "The Lord of the Rings: The Return of the King"
> >
> > It is possible just using the Lucene?
> >
>
> There are no typos in your example so you really don't even need a spell
> checker for that. Using OR clauses in your query would be enough.
I guess no, because user will enter : "The Lord of the Rings: The Return of
the King" ... and the system should response with:
Similar:
The Lord of the Rings: The Two Towers
The Lord of the Rings: The Fellowship of the Ring
I can't see how can I do that? (just using the OR statement)
For example:
name like '%the%'
or
name like '%Lord%'
or
name like '%of%'
or
name like '%the%'
or
name like '%Rings%'
will produce so much results besides to be non-performatic...
Perhaps you want to combine one variant with MUST clauses that has a bit
> more boost than the OR clauses.
>
> karl
Thanks so much Karl!!!
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Karl Wettin <ka...@gmail.com>.
dreampeppers99 skrev:
> 1º Why need I pass a Directory objecto (obligatory) on constructor of
> SpellChecker?
Mainly because it is a nasty peice of code. But it does a good job.
> 2º Suposse that in my dictonary I had these words:
>
> "The Lord of the Rings: The Two Towers"
> "The Lord of the Rings: The Fellowship of the Ring"
> "The Lord of the Rings: The Return of the King"
>
> I just want to know how can I code something to "suggest" when user query
> "The Lord of the Rings: The Two Towers" the application suggest:
>
> "The Lord of the Rings: The Fellowship of the Ring"
> "The Lord of the Rings: The Return of the King"
>
> It is possible just using the Lucene?
There are no typos in your example so you really don't even need a spell
checker for that. Using OR clauses in your query would be enough.
Perhaps you want to combine one variant with MUST clauses that has a bit
more boost than the OR clauses.
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Questions about use of SpellChecker: Constructor and Simillarity...
Posted by Mathieu Lecarme <ma...@garambrogne.net>.
Use shingleFilter.
I'm working on a wider SpellChecker, I'll post a third patch soon.
https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java
M.
dreampeppers99 a écrit :
> Hi,
>
> I have two question about this GREAT tool.. (framework, library...
> "whatever")
> Well I decide put spell checker on my applications and I start to read some
> papers and "found out" the Lucene project...
>
> Anyway, I make it works, but I just want to know...
>
> 1º Why need I pass a Directory objecto (obligatory) on constructor of
> SpellChecker?
> 2º Suposse that in my dictonary I had these words:
>
> "The Lord of the Rings: The Two Towers"
> "The Lord of the Rings: The Fellowship of the Ring"
> "The Lord of the Rings: The Return of the King"
>
> I just want to know how can I code something to "suggest" when user query
> "The Lord of the Rings: The Two Towers" the application suggest:
>
> "The Lord of the Rings: The Fellowship of the Ring"
> "The Lord of the Rings: The Return of the King"
>
> It is possible just using the Lucene?
>
> ################ My Test Class ######################
> SpellChecker spell;
> spell= new SpellChecker(FSDirectory.getDirectory(".")); //why this... ?!!
> spell.indexDictionary(new Dicionario());
>
> String[] l = spell.suggestSimilar(args[0],5);
>
> for (String vl : l ){
> System.out.println("Suggested : " + vl);
> }
> ###############################################
>
>
>
> ############### My Dictionary######################
> public class Dicionario implements
> org.apache.lucene.search.spell.Dictionary{
>
> public Iterator getWordsIterator(){
> List<String> lista = new ArrayList<String>();
> lista.add("peter");
> lista.add("spider man 3");
> lista.add("johnny depp");
> lista.add("the edge");
> lista.add("monk");
> lista.add("arnold schwarzenegger");
> return lista.iterator();
> }
> }
> ###############################################
>
> Thanks in advance... :D
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org