You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2016/06/12 16:09:20 UTC

[jira] [Updated] (LUCENE-7318) Graduate StandardAnalyzer out of analyzers module into core

     [ https://issues.apache.org/jira/browse/LUCENE-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-7318:
---------------------------------------
    Attachment: LUCENE-7318.patch

Rote patch, moving {{StandardAnalyzer/Tokenizer}}, and the utility
classes it uses, to core's oal.analysis module.

I left {{ClassicAnalyzer}} and {{UAX29URLEmailTokenizer}} in the
analysis module.

"ant test" passes but precommit is still angry about some javadocs
... I'll iterate.

The one non-rote change I did was to move the
{{ENGLISH_STOP_WORDS_SET}} from {{StopAnalyzer}} (still in analyzers
module) to {{StandardAnalyzer}}.

I also added "jflex" target to core's build.xml, to regenerate the
tokenizer.

I left {{ClassicAnalyzer}}, and the factories, in the analysis/common
module.


> Graduate StandardAnalyzer out of analyzers module into core
> -----------------------------------------------------------
>
>                 Key: LUCENE-7318
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7318
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master (7.0), 6.2
>
>         Attachments: LUCENE-7318.patch
>
>
> Spinoff from LUCENE-7314:
> {{StandardAnalyzer}} has progressed substantially since we broke out the analyzers module ... it now follows a real Unicode standard (UAX #29 Unicode Text Segmentation).  It's also much faster than it used to be, since it switched to JFlex a while back.  Many bug fixes, etc.
> I think it would make a good default for most Lucene users, and we should graduate it from the analyzers module into core, and make it the default for {{IndexWriter}}.
> It's really quite crazy that users must go digging in the analyzers module to get started with Lucene ... we don't make them dig through the codecs module to find a good default codec ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org