You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by "Digy (JIRA)" <ji...@apache.org> on 2010/04/11 19:52:41 UTC

[jira] Closed: (LUCENENET-354) The StandardAnalyzer tokenizer doesn't tokenize on all tokens when numbers are present in the original string

     [ https://issues.apache.org/jira/browse/LUCENENET-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Digy closed LUCENENET-354.
--------------------------

    Resolution: Won't Fix

> The StandardAnalyzer tokenizer doesn't tokenize on all tokens when numbers are present in the original string
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENENET-354
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-354
>             Project: Lucene.Net
>          Issue Type: Bug
>         Environment: Lucene.Net 2.9.1
>            Reporter: Matt Dufrasne
>
> The StandardAnalyzer tokenizer doesn't tokenize on all tokens when numbers are present in the original string.
> I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before. When indexing "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned:
> "bb hhh_ffff5_ssss"
> After some testing, I've found that this is because of the number. If I input
> "BB_HHH_FFFF_SSSS", I get
> "bb hhh ffff ssss"
> At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira