You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sog <so...@gmail.com> on 2006/02/22 05:35:32 UTC
How can I get a term's frequency?
I search the index with a group of terms. I want to get every term's
frequency in each document of the search result.
How can I?
thx,
sog
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
phrase frequency??
Posted by sog <so...@gmail.com>.
I searched my question in the mail archive, and found that I really want to
get a phrase frequency, it is an old question which was not solved well.
I traced Lucene source code, and discover that I can get a phrase's IDF from
the Hits object
weight= PhraseQuery$PhraseWeight (id=62)
idf= 8.3973465
queryNorm= 0.11908524
queryWeight= 1.0
similarity= DefaultSimilarity (id=66)
this$0= PhraseQuery (id=29)
value= 8.3973465
and we can get an approximate formula: score = tf * idf
so: tf(phrase)= score / idf(phrase)
is this correct?
----- Original Message -----
From: "Daniel Noll" <da...@nuix.com.au>
To: <ja...@lucene.apache.org>
Sent: Thursday, February 23, 2006 8:57 AM
Subject: Re: How can I get a term's frequency?
> sog wrote:
>> en, but IndexReader.getTermFreqVector is an abstract method, I do not
>> know how to implement it in an efficient way. Anyone has good advise?
>
> You probably don't need to implement it, it's been implemented already.
> Just call the method.
>
>> I can do it in this way:
>>
>> QueryTermVector vector= new QueryTermVector(Document.getValues(field));
>> freq = result.getTermFrequencies();
>
> I'm not sure because I've never used QueryTermVector before, but the
> fact that QueryTermVector doesn't take an IndexReader as a parameter is
> a good indication that it can't tell you anything about the frequency of
> the term in your documents.
>
> Daniel
>
>
>
>
> --
> Daniel Noll
>
> Nuix Australia Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
> Phone: (02) 9280 0699
> Fax: (02) 9212 6902
>
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: How can I get a term's frequency?
Posted by Grant Ingersoll <gs...@syr.edu>.
You need to make sure you are indexing with Term Vectors in order for
IndexReader.getTermFreqVector to return anything meaningful. You do not
need to implement it.
QueryTermVector is meant to provide similar information to the Document
side for Queries.
For an example demo of indexing and using term vectors, go to
http://www.cnlp.org/apachecon2005. All the examples are under Apache
license and there is some documentation too.
-Grant
Daniel Noll wrote:
> sog wrote:
>
>> en, but IndexReader.getTermFreqVector is an abstract method, I do not
>> know how to implement it in an efficient way. Anyone has good advise?
>>
>
> You probably don't need to implement it, it's been implemented already.
> Just call the method.
>
>
>> I can do it in this way:
>>
>> QueryTermVector vector= new QueryTermVector(Document.getValues(field));
>> freq = result.getTermFrequencies();
>>
>
> I'm not sure because I've never used QueryTermVector before, but the
> fact that QueryTermVector doesn't take an IndexReader as a parameter is
> a good indication that it can't tell you anything about the frequency of
> the term in your documents.
>
> Daniel
>
>
>
>
>
--
-------------------------------------------------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: How can I get a term's frequency?
Posted by Daniel Noll <da...@nuix.com.au>.
sog wrote:
> en, but IndexReader.getTermFreqVector is an abstract method, I do not
> know how to implement it in an efficient way. Anyone has good advise?
You probably don't need to implement it, it's been implemented already.
Just call the method.
> I can do it in this way:
>
> QueryTermVector vector= new QueryTermVector(Document.getValues(field));
> freq = result.getTermFrequencies();
I'm not sure because I've never used QueryTermVector before, but the
fact that QueryTermVector doesn't take an IndexReader as a parameter is
a good indication that it can't tell you anything about the frequency of
the term in your documents.
Daniel
--
Daniel Noll
Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax: (02) 9212 6902
This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: How can I get a term's frequency?
Posted by sog <so...@gmail.com>.
en, but IndexReader.getTermFreqVector is an abstract method, I do not know
how to implement it in an efficient way. Anyone has good advise?
I search with a group of query terms, I can get a document from the search
result:
Query(term1, term2, term3)-->search index-->Hits(doc1, doc2, doc3, ......)
I wanna get term1's frequency in doc1 ?
I think the tf value is caculated in the index procedure. can I get the
tf(term frequency) value of term1 directly?
I can do it in this way:
QueryTermVector vector= new QueryTermVector(Document.getValues(field));
freq = result.getTermFrequencies();
but I think this is a very low efficient way.
Anyone can help me ? thx
sog
----- Original Message -----
From: "Daniel Noll" <da...@nuix.com.au>
To: <ja...@lucene.apache.org>
Sent: Wednesday, February 22, 2006 1:19 PM
Subject: Re: How can I get a term's frequency?
> sog wrote:
>>
>> I search the index with a group of terms. I want to get every term's
>> frequency in each document of the search result.
>
> Are you looking for this?
>
> TermFreqVector vector = IndexReader.getTermFreqVector(docNum, "field");
>
> That gives you the frequency of every term, but you can just look up the
> ones you're interested in.
>
> Daniel
>
>
> --
> Daniel Noll
>
> Nuix Australia Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
> Phone: (02) 9280 0699
> Fax: (02) 9212 6902
>
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: How can I get a term's frequency?
Posted by sog <so...@gmail.com>.
en, I describe my question more clearly:
I search with a group of query terms, I can get a document from the search
result:
Query(term1, term2, term3)-->search index-->Hits(doc1, doc2, doc3, ......)
I wanna get term1's frequency in doc1 ?
Hits(docs1((term1,freq),(term2,freq),(term3,freq)),
docs2((term1,freq),(term2,freq),(term3,freq)),......)
I think the tf value is caculated in the index procedure. can I get the
tf(term frequency) value of term1 directly?
I can do it in this way:
QueryTermVector vector= new QueryTermVector(Document.getValues(field));
freq = result.getTermFrequencies();
but I think this is a very low efficient way.
Anyone can help me ? thx
sog
----- Original Message -----
From: "Daniel Noll" <da...@nuix.com.au>
To: <ja...@lucene.apache.org>
Sent: Wednesday, February 22, 2006 1:19 PM
Subject: Re: How can I get a term's frequency?
> sog wrote:
>>
>> I search the index with a group of terms. I want to get every term's
>> frequency in each document of the search result.
>
> Are you looking for this?
>
> TermFreqVector vector = IndexReader.getTermFreqVector(docNum, "field");
>
> That gives you the frequency of every term, but you can just look up the
> ones you're interested in.
>
> Daniel
>
>
> --
> Daniel Noll
>
> Nuix Australia Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
> Phone: (02) 9280 0699
> Fax: (02) 9212 6902
>
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: How can I get a term's frequency?
Posted by Daniel Noll <da...@nuix.com.au>.
sog wrote:
>
> I search the index with a group of terms. I want to get every term's
> frequency in each document of the search result.
Are you looking for this?
TermFreqVector vector = IndexReader.getTermFreqVector(docNum, "field");
That gives you the frequency of every term, but you can just look up the
ones you're interested in.
Daniel
--
Daniel Noll
Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax: (02) 9212 6902
This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org