You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Pablo Gomes Ludermir <go...@gmail.com> on 2005/04/14 17:15:26 UTC
getting the number of occurrences within a document
Hello all,
I would like to get the following information from the index:
1. Given a term, how many times the term occurs in each document.
Something like a triple:
< Term, Doc1, Freq> , <Term, Doc2, Freq>, <Term2, Docx, Freq>, ...
Is possible to do that?
Regards,
Pablo
--
Pablo Gomes Ludermir
gomesp@gmail.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: getting the number of occurrences within a document
Posted by Andy Roberts <ma...@andy-roberts.net>.
On Thursday 14 Apr 2005 15:15, Pablo Gomes Ludermir wrote:
> Hello all,
>
> I would like to get the following information from the index:
>
> 1. Given a term, how many times the term occurs in each document.
> Something like a triple:
> < Term, Doc1, Freq> , <Term, Doc2, Freq>, <Term2, Docx, Freq>, ...
>
> Is possible to do that?
>
>
> Regards,
> Pablo
Off the top of my head... assuming you have an IndexReader (or MultiReader) called reader:
TermEnum te = reader.terms();
while (te.next()) {
Term currentTerm = te.term();
TermDocs docs = reader.termDocs(currentTerm);
int docCounter = 1;
while (docs.next()) {
System.out.println(currentTerm.text() + ", doc" + docCount + ", " + docs.freq());
docCounter++;
}
}
HTH,
Andy
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: getting the number of occurrences within a document
Posted by Paul Libbrecht <pa...@activemath.org>.
Le 14 avr. 05, à 17:15, Pablo Gomes Ludermir a écrit :
> I would like to get the following information from the index:
> 1. Given a term, how many times the term occurs in each document.
> Something like a triple:
> < Term, Doc1, Freq> , <Term, Doc2, Freq>, <Term2, Docx, Freq>, ...
> Is possible to do that?
Luke did this to my index with good speed... I presume one should be
able to find the source of this easily.
paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: getting the number of occurrences within a document
Posted by Pasha Bizhan <fc...@ok.ru>.
Hi,
> From: Pablo Gomes Ludermir [mailto:gomesp@gmail.com]
> I would like to get the following information from the index:
>
> 1. Given a term, how many times the term occurs in each document.
> Something like a triple:
> < Term, Doc1, Freq> , <Term, Doc2, Freq>, <Term2, Docx, Freq>, ...
>
> Is possible to do that?
See IndexReader.TermDocs(Term t) and TermDocs.Freq().
Pasha Bizhan
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org