You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-issues@jackrabbit.apache.org by "David Gonzalez (JIRA)" <ji...@apache.org> on 2017/02/16 16:02:41 UTC

[jira] [Created] (OAK-5692) Oak Lucene analyzers docs unclear on viable configurations

David Gonzalez created OAK-5692:
-----------------------------------

Summary: Oak Lucene analyzers docs unclear on viable configurations
Key: OAK-5692
URL: https://issues.apache.org/jira/browse/OAK-5692
Project: Jackrabbit Oak
Issue Type: Documentation
Reporter: David Gonzalez

The Oak lucene docs [1] > Analyzers section would benefit from clarification:

Combining analyzer-based topics into a single ticket

* If no analyzer is specified, what analyzer setup is used (at the vert least some tokenizer must be used)
* The docs mention the "default" analyzer ([oak:queryIndexDefinition]/analyzers/default). Can other analyzers be defined? How are they selected for use? is the selection configurable?
* How are languages handled? Ex. language specific stop words, synonyms, char mapping, and Stemming.
* If [oak:queryIndexDefinition]/analyzers/default@class=org.apache.lucene.analysis.standard.StandardAnalyzer it appears the Standard Tokenizer and Standard Lowercase and Stop Filters are used. The Stop filter can be augmented w the well-named stopwords file.
** Can other charFilters/filters be layered on top of this "named" Analyzer (it seems not).
* When the Stop Filter is used it provided the OOTB language-based stop words. If a custom stopwords file is provided, that list replaced the OOTB lang-based, requiring the developer to provide their own language based Stop words. Is this correct? This should be called out and link out to the catalog of OOTB stopword txt files for easy inclusion)
* The Stop filters words property must be a String not String[] and the value is a comma delimited String value. Would be good to call this out.
* What are all the CharFilters/Filters available? Is there a concise list w/ their params? (Ex. i think the PorterStem might support and ignoreCase param?)
* Synonym Filter syntax is unclear; It seems like here are 2 formats; directional x -> y and bi-directional (comma delimited); i could only get the latter to work.
* Are all the options in the link [2] supported. Its unclear if there is a 1:1 between oak lucene and solr's capabilities or if [2] is a loose example of the "types" of supported analyzers.

[1] http://jackrabbit.apache.org/oak/docs/query/lucene.html
[2] https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)