You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2016/07/19 08:07:20 UTC

[jira] [Created] (OAK-4575) Oak 1.0.x fulltext search with ideographic space (U+3000) as separator

Thomas Mueller created OAK-4575:
-----------------------------------

             Summary: Oak 1.0.x fulltext search with ideographic space (U+3000) as separator
                 Key: OAK-4575
                 URL: https://issues.apache.org/jira/browse/OAK-4575
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: query
    Affects Versions: 1.0.32
            Reporter: Thomas Mueller
            Assignee: Thomas Mueller
             Fix For: 1.0.33


In Oak 1.0, the Lucene index uses its own tokenizer. That tokenizer doesn't support ideographic space (U+3000) as word separator.

In Oak 1.2 and later, the Lucene tokenizer is used, which works as expected.

Backporting all relevant changed from Oak 1.2 to the 1.0 branch would be a lot of changes, and the risk of regression would be high (too high in my view). An alternative is to add support for the ideographic space in the query engine (replace it with a regular space character). Please note the behavior is still not exactly the same as with Oak 1.2, but as for this exact use case it is expected to work correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)