You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2011/01/26 13:11:45 UTC

[jira] Closed: (LUCENE-1161) Punctuation handling in StandardTokenizer (and WikipediaTokenizer)

     [ https://issues.apache.org/jira/browse/LUCENE-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler closed LUCENE-1161.
---------------------------------

    Resolution: Won't Fix

The old StandardTokenizer behaviour was deprecated in Lucene 3.1 and replaced by a new one doing [Unicode Standard Annex #29|http://unicode.org/reports/tr29/] segmentation. The deprecated code will not get any fixes anymore.

> Punctuation handling in StandardTokenizer (and WikipediaTokenizer)
> ------------------------------------------------------------------
>
>                 Key: LUCENE-1161
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1161
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Grant Ingersoll
>            Priority: Minor
>
> It would be useful, in the StandardTokenizer, to be able to have more control over in-word punctuation is handled.  For instance, it is not always desirable to split on dashes or other punctuation.  In other cases, one may want to output the split tokens plus a collapsed version of the token that removes the punctuation.
> For example, Solr's WordDelimiterFilter provides some nice capabilities here, but it can't do it's job when using the StandardTokenizer because the StandardTokenizer already makes the decision on how to handle it without giving the user any choice.
> I think, in JFlex, we can have a back-compatible way of letting users make decisions about punctuation that occurs inside of a token.  Such as e-bay or i-pod, thus allowing for matches on iPod and eBay.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org