You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ya...@bloglines.com on 2005/11/22 00:54:04 UTC

Strange tokenization with StandardFilter

I'm using a StandardFilter and seeing some strange tokenization.

Here's
the input:
apache.org hosts lucene at apache.org.

Here's the tokens it
outputs:
 apache.org
 hosts
 lucene
 at 
 apacheorg

Is this a bug
that apache.org and apache.org. don't convert to the same token?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Strange tokenization with StandardFilter

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On 21 Nov 2005, at 18:54, yahootintin.11533894@bloglines.com wrote:

> I'm using a StandardFilter and seeing some strange tokenization.
>
> Here's
> the input:
> apache.org hosts lucene at apache.org.
>
> Here's the tokens it
> outputs:
>  apache.org
>  hosts
>  lucene
>  at
>  apacheorg
>
> Is this a bug
> that apache.org and apache.org. don't convert to the same token?


Didn't you just report this same issue?

The behavior certainly is not sensible in this case.  So I'd call it  
a bug, yes.  Again, the trailing '.' is the culprit.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org