You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sog <so...@gmail.com> on 2006/02/22 05:35:32 UTC

How can I get a term's frequency?

I search the index with a group of terms. I want to get every term's 
frequency in each document of the search result.

How can I?


thx,

sog 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


phrase frequency??

Posted by sog <so...@gmail.com>.
I searched my question in the mail archive, and found that I really want to 
get a phrase frequency, it is an old question which was not solved well.

I traced Lucene source code, and discover that I can get a phrase's IDF from 
the Hits object

weight= PhraseQuery$PhraseWeight  (id=62)
 idf= 8.3973465
 queryNorm= 0.11908524
 queryWeight= 1.0
 similarity= DefaultSimilarity  (id=66)
 this$0= PhraseQuery  (id=29)
 value= 8.3973465

and we can get an approximate formula: score = tf * idf

so: tf(phrase)= score / idf(phrase)


is this correct?



----- Original Message ----- 
From: "Daniel Noll" <da...@nuix.com.au>
To: <ja...@lucene.apache.org>
Sent: Thursday, February 23, 2006 8:57 AM
Subject: Re: How can I get a term's frequency?


> sog wrote:
>> en, but IndexReader.getTermFreqVector is an abstract method, I do not
>> know how to implement it in an efficient way. Anyone has good advise?
>
> You probably don't need to implement it, it's been implemented already.
> Just call the method.
>
>> I can do it in this way:
>>
>> QueryTermVector vector= new QueryTermVector(Document.getValues(field));
>> freq = result.getTermFrequencies();
>
> I'm not sure because I've never used QueryTermVector before, but the
> fact that QueryTermVector doesn't take an IndexReader as a parameter is
> a good indication that it can't tell you anything about the frequency of
> the term in your documents.
>
> Daniel
>
>
>
>
> -- 
> Daniel Noll
>
> Nuix Australia Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
> Phone: (02) 9280 0699
> Fax:   (02) 9212 6902
>
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How can I get a term's frequency?

Posted by Grant Ingersoll <gs...@syr.edu>.
You need to make sure you are indexing with Term Vectors in order for
IndexReader.getTermFreqVector to return anything meaningful. You do not
need to implement it.

QueryTermVector is meant to provide similar information to the Document
side for Queries.

For an example demo of indexing and using term vectors, go to
http://www.cnlp.org/apachecon2005. All the examples are under Apache
license and there is some documentation too.

-Grant

Daniel Noll wrote:
> sog wrote:
>   
>> en, but IndexReader.getTermFreqVector is an abstract method, I do not 
>> know how to implement it in an efficient way. Anyone has good advise?
>>     
>
> You probably don't need to implement it, it's been implemented already.
>  Just call the method.
>
>   
>> I can do it in this way:
>>
>> QueryTermVector vector= new QueryTermVector(Document.getValues(field));
>> freq = result.getTermFrequencies();
>>     
>
> I'm not sure because I've never used QueryTermVector before, but the
> fact that QueryTermVector doesn't take an IndexReader as a parameter is
> a good indication that it can't tell you anything about the frequency of
> the term in your documents.
>
> Daniel
>
>
>
>
>   

-- 
------------------------------------------------------------------- 
Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How can I get a term's frequency?

Posted by Daniel Noll <da...@nuix.com.au>.
sog wrote:
> en, but IndexReader.getTermFreqVector is an abstract method, I do not 
> know how to implement it in an efficient way. Anyone has good advise?

You probably don't need to implement it, it's been implemented already.
 Just call the method.

> I can do it in this way:
> 
> QueryTermVector vector= new QueryTermVector(Document.getValues(field));
> freq = result.getTermFrequencies();

I'm not sure because I've never used QueryTermVector before, but the
fact that QueryTermVector doesn't take an IndexReader as a parameter is
a good indication that it can't tell you anything about the frequency of
the term in your documents.

Daniel




-- 
Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How can I get a term's frequency?

Posted by sog <so...@gmail.com>.
en, but IndexReader.getTermFreqVector is an abstract method, I do not know 
how to implement it in an efficient way. Anyone has good advise?

I search with a group of query terms, I can get a document from the search 
result:

Query(term1, term2, term3)-->search index-->Hits(doc1, doc2, doc3, ......)

I wanna get term1's frequency in doc1 ?

I think the tf value is caculated in the index procedure. can I get the 
tf(term frequency) value of term1 directly?

I can do it in this way:


QueryTermVector vector= new QueryTermVector(Document.getValues(field));
freq = result.getTermFrequencies();


but I think this is a very low efficient way.

Anyone can help me ? thx


sog




----- Original Message ----- 
From: "Daniel Noll" <da...@nuix.com.au>
To: <ja...@lucene.apache.org>
Sent: Wednesday, February 22, 2006 1:19 PM
Subject: Re: How can I get a term's frequency?


> sog wrote:
>>
>> I search the index with a group of terms. I want to get every term's
>> frequency in each document of the search result.
>
> Are you looking for this?
>
> TermFreqVector vector = IndexReader.getTermFreqVector(docNum, "field");
>
> That gives you the frequency of every term, but you can just look up the
> ones you're interested in.
>
> Daniel
>
>
> -- 
> Daniel Noll
>
> Nuix Australia Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
> Phone: (02) 9280 0699
> Fax:   (02) 9212 6902
>
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How can I get a term's frequency?

Posted by sog <so...@gmail.com>.
en, I describe my question more clearly:

I search with a group of query terms, I can get a document from the search 
result:

Query(term1, term2, term3)-->search index-->Hits(doc1, doc2, doc3, ......)

I wanna get term1's frequency in doc1 ?

Hits(docs1((term1,freq),(term2,freq),(term3,freq)),
     docs2((term1,freq),(term2,freq),(term3,freq)),......)

 
I think the tf value is caculated in the index procedure. can I get the 
tf(term frequency) value of term1 directly?

I can do it in this way:


QueryTermVector vector= new QueryTermVector(Document.getValues(field));
freq = result.getTermFrequencies();


but I think this is a very low efficient way.

Anyone can help me ? thx


sog


----- Original Message ----- 
From: "Daniel Noll" <da...@nuix.com.au>
To: <ja...@lucene.apache.org>
Sent: Wednesday, February 22, 2006 1:19 PM
Subject: Re: How can I get a term's frequency?


> sog wrote:
>> 
>> I search the index with a group of terms. I want to get every term's 
>> frequency in each document of the search result.
> 
> Are you looking for this?
> 
> TermFreqVector vector = IndexReader.getTermFreqVector(docNum, "field");
> 
> That gives you the frequency of every term, but you can just look up the
> ones you're interested in.
> 
> Daniel
> 
> 
> -- 
> Daniel Noll
> 
> Nuix Australia Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
> Phone: (02) 9280 0699
> Fax:   (02) 9212 6902
> 
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How can I get a term's frequency?

Posted by Daniel Noll <da...@nuix.com.au>.
sog wrote:
> 
> I search the index with a group of terms. I want to get every term's 
> frequency in each document of the search result.

Are you looking for this?

TermFreqVector vector = IndexReader.getTermFreqVector(docNum, "field");

That gives you the frequency of every term, but you can just look up the
ones you're interested in.

Daniel


-- 
Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org