You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ya...@bloglines.com on 2005/11/22 00:54:04 UTC
Strange tokenization with StandardFilter
I'm using a StandardFilter and seeing some strange tokenization.
Here's
the input:
apache.org hosts lucene at apache.org.
Here's the tokens it
outputs:
apache.org
hosts
lucene
at
apacheorg
Is this a bug
that apache.org and apache.org. don't convert to the same token?
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Strange tokenization with StandardFilter
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On 21 Nov 2005, at 18:54, yahootintin.11533894@bloglines.com wrote:
> I'm using a StandardFilter and seeing some strange tokenization.
>
> Here's
> the input:
> apache.org hosts lucene at apache.org.
>
> Here's the tokens it
> outputs:
> apache.org
> hosts
> lucene
> at
> apacheorg
>
> Is this a bug
> that apache.org and apache.org. don't convert to the same token?
Didn't you just report this same issue?
The behavior certainly is not sensible in this case. So I'd call it
a bug, yes. Again, the trailing '.' is the culprit.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org