You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-issues@jackrabbit.apache.org by "Dave Hughes (Jira)" <ji...@apache.org> on 2020/11/22 18:43:00 UTC

[jira] [Commented] (OAK-9145) OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order

    [ https://issues.apache.org/jira/browse/OAK-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236996#comment-17236996 ] 

Dave Hughes commented on OAK-9145:
----------------------------------

I opened this issue in July and emailed the dev mailing list in September, but I've failed to gain any traction on it. I've probably failed to follow your contribution guidelines, but those weren't super clear when I went searching for the process in July.

In a last ditch effort, I'm going to mention a bunch of people who have tickets on the current agile board, in hopes that one of you can take this on, or at least guide me to the correct process.  Thanks in advance.

[~thomasm] [~mreutegg] [~baedke] [~mattvryan] [~teofili] [~angela] [~adulceanu]

> OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order
> --------------------------------------------------------------------------
>
>                 Key: OAK-9145
>                 URL: https://issues.apache.org/jira/browse/OAK-9145
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: indexing, jcr, lucene
>         Environment: Discovered while performing DAM searches in Adobe Experience Manager. 
> Searching for _savings_, the damAssetLucene index (which uses the default OakAnalyzer) does not find an asset named _savingsAccount.svg_.
> Upon configuring the index's analyzers (_/oak:index/damAssetLucene/analyzers_) to apply WordDelimiterFilter before LowerCaseFilter, the correct behaviour was seen.
> {noformat}
> {
>   "jcr:primaryType": "nt:unstructured",
>   "default": {
>     "jcr:primaryType": "nt:unstructured",
>     "tokenizer": {
>       "jcr:primaryType": "nt:unstructured",
>       "name": "Standard"
>     },
>     "filters": {
>       "jcr:primaryType": "nt:unstructured",
>       "WordDelimiter": {"jcr:primaryType": "nt:unstructured"},
>       "LowerCase": {"jcr:primaryType": "nt:unstructured"}
>     }
>   }
> }
> {noformat}
>            Reporter: Dave Hughes
>            Priority: Minor
>              Labels: easyfix, pull-request-available
>
> I believe OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in the wrong order.  WordDelimiterFilter is invoked with the GENERATE_WORD_PARTS flag, which splits camelCase/PascalCase into multiple terms, but since the LowerCaseFilter is applied first, the mixed-case is lost and the terms can't be split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)