You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sven <sv...@gmail.com> on 2008/11/12 20:08:04 UTC

How to get the terms within 5 words of another term?

Hi everyone,

I have a term "foo" and I want to count all the occurrences of all the 
terms that are within 5 words of "foo" in all the documents which 
contain "foo".  For simplicity sake, this is only for a single field.  
So if I have 3 documents (each with a single field) that look like this:

Once upon a time, foo lived far, far away in a magical kingdom.

"The Life and Time of the Hero Called Foo" is, by far, the best novel 
about spam I have ever read.

I theorize that over time, foo will gradually move far away from bar.

I would like to generate a list of terms and hits based on their 
proximity to "foo" in all the documents.  So I'll end up with something 
like:

far : 4
time : 3
away : 2

Any help would be greatly appreciated.

Thanks much!
-Sven


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org