You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mark Gunnels <ma...@gmail.com> on 2005/04/08 04:00:55 UTC

list all terms in a field

Is there a simple way to list all terms in a field?
The only approach that I see is to use the IndexReader.terms()  method
and then iterate over all the results and build my list by manually
filtering. This seems inefficient and there must be a better way that
my newbie eyes don't see.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: list all terms in a field

Posted by Chris Lamprecht <cl...@gmail.com>.
Mark, Here's a small piece of code that outputs a list of all terms
for a given field, in order of decreasing term frequency:

--- Requires Java 1.5 for PriorityQueue, or you can use Doug Lea's version ---


String field = "myfield".intern();                       // intern
required for != below
IndexReader reader = IndexReader.open(indexPath);
Term startTerm = new Term(field, "");         // set your field here
TermEnum termEnum = reader.terms(startTerm);
PriorityQueue queue = new PriorityQueue();     // using java 1.5's PriorityQueue

while (termEnum.next()) {
   Term term = termEnum.term();
   if (term.field() != field) break;       // lucene interns fields so != works
   int freq = reader.docFreq(term);
   queue.add(new TermFreq(term, freq));
}
reader.close();

while (!queue.isEmpty()) {
   TermFreq termFreq = (TermFreq) queue.remove();
   System.out.println(termFreq.term.text()+": "+termFreq.freq);
}

/* inner class for priority queue */
class TermFreq implements Comparable {
    Term term;
    int freq;
    public TermFreq(Term term, int freq) { this.term = term; this.freq = freq; }
    public int compareTo(Object o) {
        if (this == o) return 0;
        TermFreq other = (TermFreq) o;
        return other.freq - this.freq;
    }
}


On Apr 7, 2005 9:00 PM, Mark Gunnels <ma...@gmail.com> wrote:
> Is there a simple way to list all terms in a field?
> The only approach that I see is to use the IndexReader.terms()  method
> and then iterate over all the results and build my list by manually
> filtering. This seems inefficient and there must be a better way that
> my newbie eyes don't see.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: list all terms in a field

Posted by Chris Hostetter <ho...@fucit.org>.
: Is there a simple way to list all terms in a field?
: The only approach that I see is to use the IndexReader.terms()  method
: and then iterate over all the results and build my list by manually
: filtering. This seems inefficient and there must be a better way that
: my newbie eyes don't see.

it's the most efficient way i can think of, the trick is in the specifics
of the documentation:

1) TermEnum says...

   Term enumerations are always ordered by Term.compareTo(). Each term in
   the enumeration is greater than all that precede it.

2) Term.compareTo(Term) says...

   The ordering of terms is first by field, then by text.

2) IndexReader.terms(Term) says...

   Returns an enumeration of all terms after a given term.


...which means that if you want all of the Terms for a given field, you
can call IntexReader.terms(new Term("field","")) and get a TermEnum
starting at the very first Term in that field.  you cna then iterate over
the Terms untill you encounter one for a different field (or hit the end
of the TermEnum)

I think that's about as efficient as it gets.

take a look at RangeFilter for an example in use...

http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/lucene/search/RangeFilter.java

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org