You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sven <sv...@gmail.com> on 2008/11/12 20:08:04 UTC
How to get the terms within 5 words of another term?
Hi everyone,
I have a term "foo" and I want to count all the occurrences of all the
terms that are within 5 words of "foo" in all the documents which
contain "foo". For simplicity sake, this is only for a single field.
So if I have 3 documents (each with a single field) that look like this:
Once upon a time, foo lived far, far away in a magical kingdom.
"The Life and Time of the Hero Called Foo" is, by far, the best novel
about spam I have ever read.
I theorize that over time, foo will gradually move far away from bar.
I would like to generate a list of terms and hits based on their
proximity to "foo" in all the documents. So I'll end up with something
like:
far : 4
time : 3
away : 2
Any help would be greatly appreciated.
Thanks much!
-Sven
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org