You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by hemal bhatt <bh...@rediffmail.com> on 2004/04/28 15:50:11 UTC

Count for a keyword occurance in a file

Hi,

How can I get a count of the score given by Hits.Score().
i.e I want to know how many times a keyword occurs in a file.
Any help on this would be appreciated.
  
regards
Hemal Bhatt



regards
Hemal bhatt

RE: Count for a keyword occurance in a file

Posted by "Nader S. Henein" <ns...@bayt.net>.
So even an educated calculation won't do it because you'd need to know how
many documents the word occurs in (you could do a search, but that would be
overkill and impractical).

Cool

-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl] 
Sent: Thursday, April 29, 2004 10:57 AM
To: Lucene Users List
Subject: Re: Count for a keyword occurance in a file


On Thursday 29 April 2004 08:14, Nader S. Henein wrote:
> Tricky, scoring has to do with the frequency of the occurrence of the 
> word as opposed to the amount of words in the file in general 
> (Somebody correct me if I'm wrong) , so short of an educated 
> approximation, you could hack

Lucene uses two frequencies for a term: the nr. of docs in which it occurs
in an index (basis for IDF), and the nr of times a term occurs in a
document.

> the indexer to dynamically store the frequency of a word (oh so 
> unadvisable). Personally I recommend the educated approximation, 
> because you could index the document with the number of words in it ( 
> you would have to make sure you're not using Stop Word Analyzer or 
> Port Stemmer) and then based on the score reverse engineer the result 
> you want.
>
> Nader Henein
>
> -----Original Message-----
> From: hemal bhatt [mailto:bhatthemal@rediffmail.com]
> Sent: Wednesday, April 28, 2004 5:50 PM
> To: Lucene Users List
> Subject: Count for a keyword occurance in a file
>
>
> Hi,
>
> How can I get a count of the score given by Hits.Score().
> i.e I want to know how many times a keyword occurs in a file. Any help 
> on this would be appreciated.

The easiest way is to use IndexReader. I don't know what you mean by file
(index or document), but you can have both frequencies I mentioned above
from an IndexReader, evt. using skipTo() to go to the document. The methods
are docFreq(Term) and termDocs(Term).

Regards,
Ype



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Count for a keyword occurance in a file

Posted by Ype Kingma <yk...@xs4all.nl>.
On Thursday 29 April 2004 08:14, Nader S. Henein wrote:
> Tricky, scoring has to do with the frequency of the occurrence of the word
> as opposed to the amount of words in the file in general (Somebody correct
> me if I'm wrong) , so short of an educated approximation, you could hack

Lucene uses two frequencies for a term: the nr. of docs in which it occurs
in an index (basis for IDF), and the nr of times a term occurs in a document.

> the indexer to dynamically store the frequency of a word (oh so
> unadvisable). Personally I recommend the educated approximation, because
> you could index the document with the number of words in it ( you would
> have to make sure you're not using Stop Word Analyzer or Port Stemmer) and
> then based on the score reverse engineer the result you want.
>
> Nader Henein
>
> -----Original Message-----
> From: hemal bhatt [mailto:bhatthemal@rediffmail.com]
> Sent: Wednesday, April 28, 2004 5:50 PM
> To: Lucene Users List
> Subject: Count for a keyword occurance in a file
>
>
> Hi,
>
> How can I get a count of the score given by Hits.Score().
> i.e I want to know how many times a keyword occurs in a file. Any help on
> this would be appreciated.

The easiest way is to use IndexReader. I don't know what you mean by file
(index or document), but you can have both frequencies I mentioned above
from an IndexReader, evt. using skipTo() to go to the document.
The methods are docFreq(Term) and termDocs(Term).

Regards,
Ype



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Count for a keyword occurance in a file

Posted by "Nader S. Henein" <ns...@bayt.net>.
Tricky, scoring has to do with the frequency of the occurrence of the word
as opposed to the amount of words in the file in general (Somebody correct
me if I'm wrong) , so short of an educated approximation, you could hack the
indexer to dynamically store the frequency of a word (oh so unadvisable).
Personally I recommend the educated approximation, because you could index
the document with the number of words in it ( you would have to make sure
you're not using Stop Word Analyzer or Port Stemmer) and then based on the
score reverse engineer the result you want.

Nader Henein

-----Original Message-----
From: hemal bhatt [mailto:bhatthemal@rediffmail.com] 
Sent: Wednesday, April 28, 2004 5:50 PM
To: Lucene Users List
Subject: Count for a keyword occurance in a file


Hi,

How can I get a count of the score given by Hits.Score().
i.e I want to know how many times a keyword occurs in a file. Any help on
this would be appreciated.
  
regards
Hemal Bhatt



regards
Hemal bhatt



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org