You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christian Reuschling <ch...@gmail.com> on 2014/03/06 19:34:08 UTC

tf/idf similarity with modified document similarity

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

what is the best method to score documents similar to default similarity, but the document
frequency should be calculated per query against the matching result document set, not statically
against the whole corpus.

Didn't found a good and performant solution yet.

Thank you!

Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlMYv6AACgkQ6EqMXq+WZg+cjQCbBCwxnGyn18kEEbJ2aHbiyTNv
xpcAnRho4H/YGKzsmoOXN91+06nruhHa
=g3Ka
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: tf/idf similarity with modified document similarity

Posted by Jack Krupansky <ja...@basetechnology.com>.
Do you expect to have relatively large or relatively small result sets? For 
the former, are you willing to accept slow performance? I mean, your logic 
will have to scan all of the documents and fetch and check their term 
frequencies to count up df for each desired term. Maybe at least some of 
that info is hanging around as part of the query matching process.

Still, that is a reasonable feature to want and it has been requested 
before. Worth a Jira.

-- Jack Krupansky

-----Original Message----- 
From: Christian Reuschling
Sent: Thursday, March 6, 2014 1:34 PM
To: java-user@lucene.apache.org
Subject: tf/idf similarity with modified document similarity

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

what is the best method to score documents similar to default similarity, 
but the document
frequency should be calculated per query against the matching result 
document set, not statically
against the whole corpus.

Didn't found a good and performant solution yet.

Thank you!

Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlMYv6AACgkQ6EqMXq+WZg+cjQCbBCwxnGyn18kEEbJ2aHbiyTNv
xpcAnRho4H/YGKzsmoOXN91+06nruhHa
=g3Ka
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org