You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by siddharth gupta <gu...@gmail.com> on 2015/08/14 13:52:32 UTC

Cosine Document Similarity using More than one field

Hello,

I have just started using lucene to calculate document similarity.
I am indexing around 1000 documents with 5 fields per document.

I need to calculate cosine document similarity value between a particular
document and all other documents.

Currently I am able to do that using* getTermVector(docId,String field)*
method.

But what if I want to use *two fields together to calculate the cosine
value* ?

Can anyone please help me to solve this issue ?

Thanks & Regards,
Sidd.

Re: Cosine Document Similarity using More than one field

Posted by Koji Sekiguchi <ko...@rondhuit.com>.
Hi Sidd,

You can have the third field which is made from two fields together,
like solr's <copyField/> does, then calculate cosine value on the third field.

regards,

Koji


On 2015/08/14 20:52, siddharth gupta wrote:
> Hello,
>
> I have just started using lucene to calculate document similarity.
> I am indexing around 1000 documents with 5 fields per document.
>
> I need to calculate cosine document similarity value between a particular
> document and all other documents.
>
> Currently I am able to do that using* getTermVector(docId,String field)*
> method.
>
> But what if I want to use *two fields together to calculate the cosine
> value* ?
>
> Can anyone please help me to solve this issue ?
>
> Thanks & Regards,
> Sidd.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org