You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ciprian URSU <ur...@gmail.com> on 2010/11/13 15:23:57 UTC

Can I use Lucene for this?

Hi Guys,

        I just find out about Lucene; after reading the main things on wiki
it seems to be a great tool, but I still didn't find out how can I use it
for my needs. What I want to do is a small tool which has some documents
(mainly text) inside and then when I have a new document as input, to
compare it with all those which are stored and to give me back as a
percentage of similarity. I have read this part:
http://wiki.apache.org/lucene-java/ScoresAsPercentages but it is not yet
very clear to me how to use Lucene for that. Is it possible that some of you
have a sample code for that?
        Thanks a lot, and I apologize for the fact that for many of you this
looks like a stupid post :).

Best Regards,
Ciprian.

Re: Can I use Lucene for this?

Posted by Lance Norskog <go...@gmail.com>.
The Lucene MoreLikeThis tool in lucene/contrib/similar will do one
variant of what you want.

You can do this particular test in Solr- you'll find it much much
easier to put together.
For other text similarities, you'll have to code them directly.

Lance

On Sat, Nov 13, 2010 at 7:07 AM, Shashi Kant <sk...@sloan.mit.edu> wrote:
> There are multiple measures of similarity for documents: Cosine similarity
> is a frequently used one.
>
>
> On Sat, Nov 13, 2010 at 9:23 AM, Ciprian URSU <ur...@gmail.com> wrote:
>
>> Hi Guys,
>>
>>        I just find out about Lucene; after reading the main things on wiki
>> it seems to be a great tool, but I still didn't find out how can I use it
>> for my needs. What I want to do is a small tool which has some documents
>> (mainly text) inside and then when I have a new document as input, to
>> compare it with all those which are stored and to give me back as a
>> percentage of similarity. I have read this part:
>> http://wiki.apache.org/lucene-java/ScoresAsPercentages but it is not yet
>> very clear to me how to use Lucene for that. Is it possible that some of
>> you
>> have a sample code for that?
>>        Thanks a lot, and I apologize for the fact that for many of you this
>> looks like a stupid post :).
>>
>> Best Regards,
>> Ciprian.
>>
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Can I use Lucene for this?

Posted by Shashi Kant <sk...@sloan.mit.edu>.
There are multiple measures of similarity for documents: Cosine similarity
is a frequently used one.


On Sat, Nov 13, 2010 at 9:23 AM, Ciprian URSU <ur...@gmail.com> wrote:

> Hi Guys,
>
>        I just find out about Lucene; after reading the main things on wiki
> it seems to be a great tool, but I still didn't find out how can I use it
> for my needs. What I want to do is a small tool which has some documents
> (mainly text) inside and then when I have a new document as input, to
> compare it with all those which are stored and to give me back as a
> percentage of similarity. I have read this part:
> http://wiki.apache.org/lucene-java/ScoresAsPercentages but it is not yet
> very clear to me how to use Lucene for that. Is it possible that some of
> you
> have a sample code for that?
>        Thanks a lot, and I apologize for the fact that for many of you this
> looks like a stupid post :).
>
> Best Regards,
> Ciprian.
>