You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Royi Ronen <ro...@gmail.com> on 2011/08/09 16:20:53 UTC

problem with terms component results ?

Hi,
I am using the terms component.
Many times an 'e' at the end of the word is missing.
E.g., it gives 'googl' instead of 'google', 'youtub' instead of 'youtube'.
The problem does not exist for some other words ending with 'e'.
Any ideas why it happens?
Royi

Re: problem with terms component results ?

Posted by Erick Erickson <er...@gmail.com>.
The TermsComponent is looking at *indexed* terms that have
been passed through the analysis chain. So I suspect you're
seeing the results of stemming.

WordDelimiterFilterFactory will also break things up, as will
other tokenizers/analyzers. If you want your original input
you'll need to have a pretty bare-bones analysis chain.

Best
Erick

On Tue, Aug 9, 2011 at 10:20 AM, Royi Ronen <ro...@gmail.com> wrote:
> Hi,
> I am using the terms component.
> Many times an 'e' at the end of the word is missing.
> E.g., it gives 'googl' instead of 'google', 'youtub' instead of 'youtube'.
> The problem does not exist for some other words ending with 'e'.
> Any ideas why it happens?
> Royi
>

Re: problem with terms component results ?

Posted by Erik Hatcher <er...@gmail.com>.
Because you've got a stemmer in your analysis chain for those fields.  If you want unstemmed terms, remove the stemmer, or copyField to a different field to use for the terms component.

	Erik

On Aug 9, 2011, at 10:20 , Royi Ronen wrote:

> Hi,
> I am using the terms component.
> Many times an 'e' at the end of the word is missing.
> E.g., it gives 'googl' instead of 'google', 'youtub' instead of 'youtube'.
> The problem does not exist for some other words ending with 'e'.
> Any ideas why it happens?
> Royi