You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by PlusPlus <r....@gmail.com> on 2010/02/25 05:14:21 UTC

Fuzzy membership of a term to the document

Hi,

   I want to change the Lucene's similarity in a way that I can add Fuzzy
memberships to the terms of a document. Thus, TF value of a term in one
document is not always 1, it can add 0.7 to the value of the TF ( (In my
application, each term is contained in a document at most once). This
membership value is available before index time.

   On the other hand, each occurrence of a word will not be considered as 1
documentfrequency for the IDF formula. 

   I was wondering if I can change the TF and IDF values of the terms like
this. So far, I know that I can change the impact of TF values on the
scoring, but not this thing that I'm looking for. 

Best,
Reza


-- 
View this message in context: http://old.nabble.com/Fuzzy-membership-of-a-term-to-the-document-tp27714347p27714347.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Fuzzy membership of a term to the document

Posted by Robert Muir <rc...@gmail.com>.
Hello Reza,

I've seen some similar stuff to what you mention, such as
http://ece.ut.ac.ir/dbrg/Hamshahri/Papers/FuFaIR.ppt
In that experiment, the membership was calculated with tf/idf parameters (it
looks like that gave best results).

I am scratching my head as to how this model could be easily implemented in
Lucene, but please report back if you figure something out... its
interesting!

On Wed, Feb 24, 2010 at 11:14 PM, PlusPlus <r....@gmail.com> wrote:

>
> Hi,
>
>   I want to change the Lucene's similarity in a way that I can add Fuzzy
> memberships to the terms of a document. Thus, TF value of a term in one
> document is not always 1, it can add 0.7 to the value of the TF ( (In my
> application, each term is contained in a document at most once). This
> membership value is available before index time.
>
>   On the other hand, each occurrence of a word will not be considered as 1
> documentfrequency for the IDF formula.
>
>   I was wondering if I can change the TF and IDF values of the terms like
> this. So far, I know that I can change the impact of TF values on the
> scoring, but not this thing that I'm looking for.
>
> Best,
> Reza
>
>
> --
> View this message in context:
> http://old.nabble.com/Fuzzy-membership-of-a-term-to-the-document-tp27714347p27714347.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Fuzzy membership of a term to the document

Posted by Chris Hostetter <ho...@fucit.org>.
:    I want to change the Lucene's similarity in a way that I can add Fuzzy
: memberships to the terms of a document. Thus, TF value of a term in one
: document is not always 1, it can add 0.7 to the value of the TF ( (In my
: application, each term is contained in a document at most once). This
: membership value is available before index time.

At first glance, i would suspect that what you really wnat to look at is 
using payloads, but i'm not certain, because you've been focused on your 
presumed solution to the problem, without supplying any details on
what your end goal is...

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org