You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christian Reuschling <ch...@gmail.com> on 2014/03/06 19:34:08 UTC
tf/idf similarity with modified document similarity
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
what is the best method to score documents similar to default similarity, but the document
frequency should be calculated per query against the matching result document set, not statically
against the whole corpus.
Didn't found a good and performant solution yet.
Thank you!
Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlMYv6AACgkQ6EqMXq+WZg+cjQCbBCwxnGyn18kEEbJ2aHbiyTNv
xpcAnRho4H/YGKzsmoOXN91+06nruhHa
=g3Ka
-----END PGP SIGNATURE-----
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: tf/idf similarity with modified document similarity
Posted by Jack Krupansky <ja...@basetechnology.com>.
Do you expect to have relatively large or relatively small result sets? For
the former, are you willing to accept slow performance? I mean, your logic
will have to scan all of the documents and fetch and check their term
frequencies to count up df for each desired term. Maybe at least some of
that info is hanging around as part of the query matching process.
Still, that is a reasonable feature to want and it has been requested
before. Worth a Jira.
-- Jack Krupansky
-----Original Message-----
From: Christian Reuschling
Sent: Thursday, March 6, 2014 1:34 PM
To: java-user@lucene.apache.org
Subject: tf/idf similarity with modified document similarity
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
what is the best method to score documents similar to default similarity,
but the document
frequency should be calculated per query against the matching result
document set, not statically
against the whole corpus.
Didn't found a good and performant solution yet.
Thank you!
Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlMYv6AACgkQ6EqMXq+WZg+cjQCbBCwxnGyn18kEEbJ2aHbiyTNv
xpcAnRho4H/YGKzsmoOXN91+06nruhHa
=g3Ka
-----END PGP SIGNATURE-----
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org