You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by te...@gmail.com on 2007/03/16 12:16:11 UTC

Problem with stemmer

We have a slight problem using default stemmer. The problem is that
some words are stemmed the way they cannot be used later while
searching.
For example imagine we have a phrase "iron ore" on some webpage.
Nutch fetches the page and stores stemmed version of every word in its
index so "iron ore" becomes "iron or".

The problem is that we cannot search for "ore" - Nutch shows the
results of pages that contain simple "OR" word because "ORE" and "OR"
are stemmed exactly the same way. So if we search for "Iron Ore" Nutch
actually shows the webpages containing "Iron" and "Or".

Does anybody know how to  fix that ?