You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Pablo Gomes Ludermir <go...@gmail.com> on 2005/04/14 17:15:26 UTC

getting the number of occurrences within a document

Hello all,

I would like to get the following information from the index:

1. Given a term, how many times the term occurs in each document.
Something like a triple:
< Term, Doc1, Freq> , <Term, Doc2, Freq>, <Term2, Docx, Freq>, ...

Is possible to do that?


Regards,
Pablo

-- 
Pablo Gomes Ludermir
gomesp@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: getting the number of occurrences within a document

Posted by Andy Roberts <ma...@andy-roberts.net>.
On Thursday 14 Apr 2005 15:15, Pablo Gomes Ludermir wrote:
> Hello all,
>
> I would like to get the following information from the index:
>
> 1. Given a term, how many times the term occurs in each document.
> Something like a triple:
> < Term, Doc1, Freq> , <Term, Doc2, Freq>, <Term2, Docx, Freq>, ...
>
> Is possible to do that?
>
>
> Regards,
> Pablo

Off the top of my head... assuming you have an IndexReader (or MultiReader) called reader:

TermEnum te = reader.terms();

while (te.next()) {
	Term currentTerm = te.term();
	
	TermDocs docs = reader.termDocs(currentTerm);
	int docCounter = 1;
	while (docs.next()) {
		System.out.println(currentTerm.text() + ", doc" + docCount + ", " + docs.freq());
		docCounter++;
	}
}

HTH,

Andy

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: getting the number of occurrences within a document

Posted by Paul Libbrecht <pa...@activemath.org>.
Le 14 avr. 05, à 17:15, Pablo Gomes Ludermir a écrit :

> I would like to get the following information from the index:
> 1. Given a term, how many times the term occurs in each document.
> Something like a triple:
> < Term, Doc1, Freq> , <Term, Doc2, Freq>, <Term2, Docx, Freq>, ...
> Is possible to do that?

Luke did this to my index with good speed... I presume one should be 
able to find the source of this easily.

paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: getting the number of occurrences within a document

Posted by Pasha Bizhan <fc...@ok.ru>.
Hi,

> From: Pablo Gomes Ludermir [mailto:gomesp@gmail.com] 

> I would like to get the following information from the index:
> 
> 1. Given a term, how many times the term occurs in each document.
> Something like a triple:
> < Term, Doc1, Freq> , <Term, Doc2, Freq>, <Term2, Docx, Freq>, ...
> 
> Is possible to do that?

See IndexReader.TermDocs(Term t) and TermDocs.Freq().

Pasha Bizhan





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org