You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Amit Jain (JIRA)" <ji...@apache.org> on 2015/11/27 04:22:11 UTC

[jira] [Closed] (OAK-3648) Use StandardTokenizer instead of ClassicTokenizer in OakAnalyzer

     [ https://issues.apache.org/jira/browse/OAK-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amit Jain closed OAK-3648.
--------------------------

Bulk close for 1.3.11

> Use StandardTokenizer instead of ClassicTokenizer in OakAnalyzer
> ----------------------------------------------------------------
>
>                 Key: OAK-3648
>                 URL: https://issues.apache.org/jira/browse/OAK-3648
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>             Fix For: 1.3.11
>
>
> This is related to OAK-3276 where the intent was to use {{StandardAnalyzer}} by default (instead of {{OakAnalyzer}}). As discussed there, we need specific word delimiter which isn't possible with StandardAnalyzer, so we instead should switch over to StandardTokenizer in OakAnalyer itself.
> A few motivations to do that:
> * Better unicode support
> * ClassicTokenizer is the old (~lucene 3.1) implementation of standard tokenizer
> One of the key difference between classic and standard tokenizer is the way they delimit words (standard analyzer follows unicode text segmentation rules)... but that difference gets nullified as we have our own WordDelimiterFilter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)