You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by dreampeppers99 <le...@gmail.com> on 2008/04/08 17:11:21 UTC

Questions about use of SpellChecker: Constructor and Simillarity...

Hi,

I have two question about this GREAT tool.. (framework, library...
"whatever")
Well I decide put spell checker on my applications and I start to read some
papers and "found out" the Lucene project...

Anyway, I make it works, but I just want to know...

1º Why need I pass a Directory objecto (obligatory) on constructor of
SpellChecker?
2º Suposse that in my dictonary I had these words:

"The Lord of the Rings: The Two Towers"
"The Lord of the Rings: The Fellowship of the Ring"
"The Lord of the Rings: The Return of the King"

I just want to know how can I code something to "suggest" when user query
"The Lord of the Rings: The Two Towers" the application suggest: 

"The Lord of the Rings: The Fellowship of the Ring"
"The Lord of the Rings: The Return of the King"

It is possible just using the Lucene?

################ My Test Class ######################
SpellChecker spell;
spell= new SpellChecker(FSDirectory.getDirectory(".")); //why this... ?!!
spell.indexDictionary(new Dicionario());	

String[] l = spell.suggestSimilar(args[0],5);	

for (String vl : l ){
   System.out.println("Suggested : " + vl);
}
###############################################



############### My Dictionary######################
public class Dicionario implements
org.apache.lucene.search.spell.Dictionary{

public Iterator getWordsIterator(){
	List<String> lista = new ArrayList<String>();
	lista.add("peter");
	lista.add("spider man 3");
	lista.add("johnny depp");
	lista.add("the edge");
	lista.add("monk");
	lista.add("arnold schwarzenegger");
	return lista.iterator();
    }
}
###############################################

Thanks in advance... :D
-- 
View this message in context: http://www.nabble.com/Questions-about-use-of-SpellChecker%3A-Constructor-and-Simillarity...-tp16559731p16559731.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Karl Wettin <ka...@gmail.com>.
Mathieu Lecarme skrev:
> Le 8 avr. 08 à 18:34, Karl Wettin a écrit :
>> dreampeppers99 skrev:

>>> 2º Suposse that in my dictonary I had these words:
>>> "The Lord of the Rings: The Two Towers"
>>> "The Lord of the Rings: The Fellowship of the Ring"
>>> "The Lord of the Rings: The Return of the King"
>>> I just want to know how can I code something to "suggest" when user 
>>> query
>>> "The Lord of the Rings: The Two Towers" the application suggest: "The 
>>> Lord of the Rings: The Fellowship of the Ring"
>>> "The Lord of the Rings: The Return of the King"
>>> It is possible just using the Lucene?
>>
>> There are no typos in your example so you really don't even need a 
>> spell checker for that. Using OR clauses in your query would be 
>> enough. Perhaps you want to combine one variant with MUST clauses that 
>> has a bit more boost than the OR clauses.
> A classical OR query will match shuffled data : "The king of lord got a 
> ring" should match.
> With shingle, you will match title in the right order.

Appending a SHOULD clause containing a phrase or span query with a bit 
of boost also works.



    karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Leandro <le...@gmail.com>.
>
> Mainly because it is a nasty peice of code. But it does a good job.
> >
> Because spellChecker use a directory to store data. It can be FSDirectory,
> RAMDirectory ....


Perfect explanation... !!!
So use the RAMDirectory is better (perfomatically)

spell= new SpellChecker(FSDirectory.getDirectory("."));
spell= new SpellChecker(RAMDirectory.getDirectory("."));

The second is better (fast) to little amount of data...
Thanks so much, now I can understand ... It may be on real documentation...



> A classical OR query will match shuffled data : "The king of lord got a
> ring" should match.
> With shingle, you will match title in the right order.


Shingle will divide it on "couple" of words... so I can use it with OR ...
(The good one.... I'll try this)


Thanks so much!!!

Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Mathieu Lecarme <ma...@garambrogne.net>.
Le 8 avr. 08 à 18:34, Karl Wettin a écrit :
> dreampeppers99 skrev:
>> 1º Why need I pass a Directory objecto (obligatory) on constructor of
>> SpellChecker?
>
> Mainly because it is a nasty peice of code. But it does a good job.
Because spellChecker use a directory to store data. It can be  
FSDirectory, RAMDirectory ....

>
>
>> 2º Suposse that in my dictonary I had these words:
>> "The Lord of the Rings: The Two Towers"
>> "The Lord of the Rings: The Fellowship of the Ring"
>> "The Lord of the Rings: The Return of the King"
>> I just want to know how can I code something to "suggest" when user  
>> query
>> "The Lord of the Rings: The Two Towers" the application suggest:  
>> "The Lord of the Rings: The Fellowship of the Ring"
>> "The Lord of the Rings: The Return of the King"
>> It is possible just using the Lucene?
>
> There are no typos in your example so you really don't even need a  
> spell checker for that. Using OR clauses in your query would be  
> enough. Perhaps you want to combine one variant with MUST clauses  
> that has a bit more boost than the OR clauses.
A classical OR query will match shuffled data : "The king of lord got  
a ring" should match.
With shingle, you will match title in the right order.

M.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Mathieu Lecarme <ma...@garambrogne.net>.
>> I'm cool :) I just think you are overcomplicating things.
>>
>>
> Yes... I can use two words and OR
> Suposse I query on this
>
> The Lord of Rings: Return of King
> The Lord of Rings: Fellowship
> The Lord of Rings: The Two towers
> The Lord of Weapons
> The Lord of War
>
> Suposse an user search: "The Lord of Rings Return of King"
> WHERE
> name like '%the lord%' or
> name like '%lord of%' or
> name like '%of rings%' or
> name like '%rings return%' or
> name like '%return of%' or
> name like '%of king%'
Lucen syntax is more pretty.
With movie title indexed as "title", with LowerCaseFilter.


BooleanQuery bq = new BooleanQuery();
bd.add(new TermQuery(new Term("title", "the lord")), Occur.SHOULD);
bd.add(new TermQuery(new Term("title", "lord of")), Occur.SHOULD);
bd.add(new TermQuery(new Term("title", "of rings")), Occur.SHOULD);
bd.add(new TermQuery(new Term("title", "rings return")), Occur.SHOULD);
bd.add(new TermQuery(new Term("title", "return of")), Occur.SHOULD);
bd.add(new TermQuery(new Term("title", "of king")), Occur.SHOULD);

> So will show all lines... the question now is which is best  
> 'ranking' ...
> However you all help me so much , THANKS SO MUCH!!!
> (now I won't say bad about the constructor of SpellChecker)
most word matched, the better score you have.
You should use a thresold (number of common words/word size) or  
something like that to exclude to far title.

M.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Leandro <le...@gmail.com>.
>
>
>
> I'm cool :) I just think you are overcomplicating things.
>
>
Yes... I can use two words and OR
Suposse I query on this

The Lord of Rings: Return of King
The Lord of Rings: Fellowship
The Lord of Rings: The Two towers
The Lord of Weapons
The Lord of War

Suposse an user search: "The Lord of Rings Return of King"
WHERE
name like '%the lord%' or
name like '%lord of%' or
name like '%of rings%' or
name like '%rings return%' or
name like '%return of%' or
name like '%of king%'


So will show all lines... the question now is which is best 'ranking' ...
However you all help me so much , THANKS SO MUCH!!!
(now I won't say bad about the constructor of SpellChecker)

Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Karl Wettin <ka...@gmail.com>.
Leandro skrev:
>> Sorry, I missunderstood your question. See other reply.
>>
> 
> Yes I got it. thanks
> 
>> Are you sure about that? Did you benchmark? Can we see the results?
> 
> 
> Hey man take it easy, I just imagine. But I guess use the ShingleFilter will
> help.


I'm cool :) I just think you are overcomplicating things.


    karl


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Leandro <le...@gmail.com>.
> Sorry, I missunderstood your question. See other reply.
>

Yes I got it. thanks

> Are you sure about that? Did you benchmark? Can we see the results?


Hey man take it easy, I just imagine. But I guess use the ShingleFilter will
help.

Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Karl Wettin <ka...@gmail.com>.
Leandro skrev:
>>
>>  1º Why need I pass a Directory objecto (obligatory) on constructor of
>>> SpellChecker?
>>>
>> Mainly because it is a nasty peice of code. But it does a good job.
>>
> 
> How can we suggest it (create an normal constructor without param) to the
> team?

Sorry, I missunderstood your question. See other reply.

>> There are no typos in your example so you really don't even need a spell
>> checker for that. Using OR clauses in your query would be enough.
> 
> I guess no, because user will enter : "The Lord of the Rings: The Return of
> the King" ... and the system should response with:
> 
> 
> Similar:
> The Lord of the Rings: The Two Towers
> The Lord of the Rings: The Fellowship of the Ring
> 
> I can't see how can I do that?  (just using the OR statement)
> For example:
> 
> name like '%the%'
> or
> name like '%Lord%'
> or
> name like '%of%'
> or
> name like '%the%'
> or
> name like '%Rings%'
> 
> will produce so much results besides to be non-performatic...

Are you sure about that? Did you benchmark? Can we see the results?




     karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Leandro <le...@gmail.com>.
>
>
>  1º Why need I pass a Directory objecto (obligatory) on constructor of
> > SpellChecker?
> >
>
> Mainly because it is a nasty peice of code. But it does a good job.
>

Thanks.
How can we suggest it (create an normal constructor without param) to the
team?

>
>
>  2º Suposse that in my dictonary I had these words:
> >
> > "The Lord of the Rings: The Two Towers"
> > "The Lord of the Rings: The Fellowship of the Ring"
> > "The Lord of the Rings: The Return of the King"
> >
> > I just want to know how can I code something to "suggest" when user
> > query
> > "The Lord of the Rings: The Two Towers" the application suggest:
> > "The Lord of the Rings: The Fellowship of the Ring"
> > "The Lord of the Rings: The Return of the King"
> >
> > It is possible just using the Lucene?
> >
>
> There are no typos in your example so you really don't even need a spell
> checker for that. Using OR clauses in your query would be enough.


I guess no, because user will enter : "The Lord of the Rings: The Return of
the King" ... and the system should response with:


Similar:
The Lord of the Rings: The Two Towers
The Lord of the Rings: The Fellowship of the Ring

I can't see how can I do that?  (just using the OR statement)
For example:

name like '%the%'
or
name like '%Lord%'
or
name like '%of%'
or
name like '%the%'
or
name like '%Rings%'

will produce so much results besides to be non-performatic...

Perhaps you want to combine one variant with MUST clauses that has a bit
> more boost than the OR clauses.
>
>     karl


Thanks so much Karl!!!

Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Karl Wettin <ka...@gmail.com>.
dreampeppers99 skrev:
> 1º Why need I pass a Directory objecto (obligatory) on constructor of
> SpellChecker?

Mainly because it is a nasty peice of code. But it does a good job.

> 2º Suposse that in my dictonary I had these words:
> 
> "The Lord of the Rings: The Two Towers"
> "The Lord of the Rings: The Fellowship of the Ring"
> "The Lord of the Rings: The Return of the King"
> 
> I just want to know how can I code something to "suggest" when user query
> "The Lord of the Rings: The Two Towers" the application suggest: 
> 
> "The Lord of the Rings: The Fellowship of the Ring"
> "The Lord of the Rings: The Return of the King"
> 
> It is possible just using the Lucene?

There are no typos in your example so you really don't even need a spell 
checker for that. Using OR clauses in your query would be enough. 
Perhaps you want to combine one variant with MUST clauses that has a bit 
more boost than the OR clauses.



      karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Questions about use of SpellChecker: Constructor and Simillarity...

Posted by Mathieu Lecarme <ma...@garambrogne.net>.
Use shingleFilter.

I'm working on a wider SpellChecker, I'll post a third patch soon.
https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java

M.

dreampeppers99 a écrit :
> Hi,
>
> I have two question about this GREAT tool.. (framework, library...
> "whatever")
> Well I decide put spell checker on my applications and I start to read some
> papers and "found out" the Lucene project...
>
> Anyway, I make it works, but I just want to know...
>
> 1º Why need I pass a Directory objecto (obligatory) on constructor of
> SpellChecker?
> 2º Suposse that in my dictonary I had these words:
>
> "The Lord of the Rings: The Two Towers"
> "The Lord of the Rings: The Fellowship of the Ring"
> "The Lord of the Rings: The Return of the King"
>
> I just want to know how can I code something to "suggest" when user query
> "The Lord of the Rings: The Two Towers" the application suggest: 
>
> "The Lord of the Rings: The Fellowship of the Ring"
> "The Lord of the Rings: The Return of the King"
>
> It is possible just using the Lucene?
>
> ################ My Test Class ######################
> SpellChecker spell;
> spell= new SpellChecker(FSDirectory.getDirectory(".")); //why this... ?!!
> spell.indexDictionary(new Dicionario());	
>
> String[] l = spell.suggestSimilar(args[0],5);	
>
> for (String vl : l ){
>    System.out.println("Suggested : " + vl);
> }
> ###############################################
>
>
>
> ############### My Dictionary######################
> public class Dicionario implements
> org.apache.lucene.search.spell.Dictionary{
>
> public Iterator getWordsIterator(){
> 	List<String> lista = new ArrayList<String>();
> 	lista.add("peter");
> 	lista.add("spider man 3");
> 	lista.add("johnny depp");
> 	lista.add("the edge");
> 	lista.add("monk");
> 	lista.add("arnold schwarzenegger");
> 	return lista.iterator();
>     }
> }
> ###############################################
>
> Thanks in advance... :D
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org