You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-issues@jackrabbit.apache.org by "Dave Hughes (Jira)" <ji...@apache.org> on 2020/07/21 01:03:00 UTC

[jira] [Commented] (OAK-9145) OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order

    [ https://issues.apache.org/jira/browse/OAK-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161641#comment-17161641 ] 

Dave Hughes commented on OAK-9145:
----------------------------------

A PR has been opened on the Github project:
[https://github.com/apache/jackrabbit-oak/pull/242]

> OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order
> --------------------------------------------------------------------------
>
>                 Key: OAK-9145
>                 URL: https://issues.apache.org/jira/browse/OAK-9145
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: indexing, jcr, lucene
>         Environment: Discovered while performing DAM searches in Adobe Experience Manager. 
> Searching for _savings_, the damAssetLucene index (which uses the default OakAnalyzer) does not find an asset named _savingsAccount.svg_.
> Upon configuring the index's analyzers (_/oak:index/damAssetLucene/analyzers_) to apply WordDelimiterFilter before LowerCaseFilter, the correct behaviour was seen.
> {noformat}
> {
>   "jcr:primaryType": "nt:unstructured",
>   "default": {
>     "jcr:primaryType": "nt:unstructured",
>     "tokenizer": {
>       "jcr:primaryType": "nt:unstructured",
>       "name": "Standard"
>     },
>     "filters": {
>       "jcr:primaryType": "nt:unstructured",
>       "WordDelimiter": {"jcr:primaryType": "nt:unstructured"},
>       "LowerCase": {"jcr:primaryType": "nt:unstructured"}
>     }
>   }
> }
> {noformat}
>            Reporter: Dave Hughes
>            Priority: Minor
>              Labels: easyfix
>
> I believe OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in the wrong order.  WordDelimiterFilter is invoked with the GENERATE_WORD_PARTS flag, which splits camelCase/PascalCase into multiple terms, but since the LowerCaseFilter is applied first, the mixed-case is lost and the terms can't be split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)