You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bill Janssen <ja...@parc.com> on 2007/02/09 19:13:46 UTC

Re: Reduction based "more like this"?

> For example, given terms "female", "John" and "London" - all 3 may
> have equal IDF but should a document representing a female in London
> be given equal weighting to a document representing the rarer example
> of a female who happens to be called "John"?

Not to mention multi-word phrase tokenization, like the difference
between a document which contains the text

  "...should not be allowed to possess a lethal weapon like a..."

and a document which contains the phrase

  "...should not be allowed to see Lethal Weapon until at least the age of..."

In the first case, tokenization of "lethal weapon" should take place, while
in the second case, we need to preserve the the phrase as a single term.

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org