You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by uddam chukmol <ud...@yahoo.com> on 2004/05/31 20:10:16 UTC

similarity of two texts

Hi,
 
I'm a newbie to Lucene and heard that it helps in the information retrieval process. However, my problem is not really related to the information retrieval but to the comparison of two texts. I think Lucene may help resolving it.  
 
I would like to have a clue on how to compare two given texts and finally say how much they are similar. 
 
Has anyone had this kind of experience? I will be very grateful to hear your ideas and your recommendations.
 
Thanks before hand!
 
Uddam CHUKMOL
 
 

		
---------------------------------
Do you Yahoo!?
Friends.  Fun. Try the all-new Yahoo! Messenger

Re: similarity of two texts

Posted by uddam chukmol <ud...@yahoo.com>.
Thanks guys for ur invaluable help and ideas. I'll take a look at Lucene 1.4 and tell you more whether it could deal with my problem.
 

		
---------------------------------
Do you Yahoo!?
Friends.  Fun. Try the all-new Yahoo! Messenger

Re: similarity of two texts

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jun 1, 2004, at 6:06 AM, sg@media-style.com wrote:

> Zitiere Erik Hatcher <er...@ehatchersolutions.com>:
>
>> On May 31, 2004, at 2:17 PM, Stefan Groschupf wrote:
>>> Lucene can't help you.
>>
>> What about using term vectors though?  I've been able to do 
>> rudimentary
>>
>> document similarity calculations using the new support in Lucene 1.4.
>
> Ups?! Is it build-in Lucene 1.4?! Cool, I miss that!
> So that would be very useful for similarity calculation.

Indeed it is!

Look at the new Field constructors and overloaded methods on Field.Text 
and such.  Also look at IndexReader.getTermFreq* methods.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: similarity of two texts

Posted by sg...@media-style.com.
Zitiere Erik Hatcher <er...@ehatchersolutions.com>:

> On May 31, 2004, at 2:17 PM, Stefan Groschupf wrote:
> > Lucene can't help you.
> 
> What about using term vectors though?  I've been able to do rudimentary
> 
> document similarity calculations using the new support in Lucene 1.4. 

Ups?! Is it build-in Lucene 1.4?! Cool, I miss that!
So that would be very useful for similarity calculation.
 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: similarity of two texts

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On May 31, 2004, at 2:17 PM, Stefan Groschupf wrote:
> Lucene can't help you.

What about using term vectors though?  I've been able to do rudimentary 
document similarity calculations using the new support in Lucene 1.4.  
Search the 'net for more info on term vectors and the formulas needed 
(elementary vector angle calculation, actually).

	Erik

> Am 31.05.2004 um 20:10 schrieb uddam chukmol:
>
>> Hi,
>>
>> I'm a newbie to Lucene and heard that it helps in the information 
>> retrieval process. However, my problem is not really related to the 
>> information retrieval but to the comparison of two texts. I think 
>> Lucene may help resolving it.
>>
>> I would like to have a clue on how to compare two given texts and 
>> finally say how much they are similar.
>>
>> Has anyone had this kind of experience? I will be very grateful to 
>> hear your ideas and your recommendations.
>>
>> Thanks before hand!
>>
>> Uddam CHUKMOL
>>
>>
>>
>> 		
>> ---------------------------------
>> Do you Yahoo!?
>> Friends.  Fun. Try the all-new Yahoo! Messenger
> ---------------------------------------------------------------
> open technology:   http://www.media-style.com
> open source:           http://www.weta-group.net
> open discussion:    http://www.text-mining.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: similarity of two texts

Posted by Stefan Groschupf <sg...@media-style.com>.
Lucene can't help you.
Search for text classification or text clustering.

Browse the tools section @ www.text-mining.org there you will found may  
be tools that can help you with this task.
In general some key words for your further search:

Feature extraction from text.
Data mining algorithms for clustering or classification.
One Algorithm you may be will found useful is "Support Vector Machine".

HTH
Stefan

P.S:
Support your local book store and order:
http://www.amazon.com/exec/obidos/tg/detail/-/1558605525/ 
qid=1086027371/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/103-6852557-3809420? 
v=glance&s=books&n=507846
This book has interesting section for you.



Am 31.05.2004 um 20:10 schrieb uddam chukmol:

> Hi,
>
> I'm a newbie to Lucene and heard that it helps in the information  
> retrieval process. However, my problem is not really related to the  
> information retrieval but to the comparison of two texts. I think  
> Lucene may help resolving it.
>
> I would like to have a clue on how to compare two given texts and  
> finally say how much they are similar.
>
> Has anyone had this kind of experience? I will be very grateful to  
> hear your ideas and your recommendations.
>
> Thanks before hand!
>
> Uddam CHUKMOL
>
>
>
> 		
> ---------------------------------
> Do you Yahoo!?
> Friends.  Fun. Try the all-new Yahoo! Messenger
---------------------------------------------------------------
open technology:   http://www.media-style.com
open source:           http://www.weta-group.net
open discussion:    http://www.text-mining.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org