You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2016/06/12 16:09:20 UTC
[jira] [Updated] (LUCENE-7318) Graduate StandardAnalyzer out of
analyzers module into core
[ https://issues.apache.org/jira/browse/LUCENE-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-7318:
---------------------------------------
Attachment: LUCENE-7318.patch
Rote patch, moving {{StandardAnalyzer/Tokenizer}}, and the utility
classes it uses, to core's oal.analysis module.
I left {{ClassicAnalyzer}} and {{UAX29URLEmailTokenizer}} in the
analysis module.
"ant test" passes but precommit is still angry about some javadocs
... I'll iterate.
The one non-rote change I did was to move the
{{ENGLISH_STOP_WORDS_SET}} from {{StopAnalyzer}} (still in analyzers
module) to {{StandardAnalyzer}}.
I also added "jflex" target to core's build.xml, to regenerate the
tokenizer.
I left {{ClassicAnalyzer}}, and the factories, in the analysis/common
module.
> Graduate StandardAnalyzer out of analyzers module into core
> -----------------------------------------------------------
>
> Key: LUCENE-7318
> URL: https://issues.apache.org/jira/browse/LUCENE-7318
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7318.patch
>
>
> Spinoff from LUCENE-7314:
> {{StandardAnalyzer}} has progressed substantially since we broke out the analyzers module ... it now follows a real Unicode standard (UAX #29 Unicode Text Segmentation). It's also much faster than it used to be, since it switched to JFlex a while back. Many bug fixes, etc.
> I think it would make a good default for most Lucene users, and we should graduate it from the analyzers module into core, and make it the default for {{IndexWriter}}.
> It's really quite crazy that users must go digging in the analyzers module to get started with Lucene ... we don't make them dig through the codecs module to find a good default codec ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org