You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Robert Muir (Jira)" <ji...@apache.org> on 2021/09/12 14:33:00 UTC

[jira] [Created] (LUCENE-10098) Add note/link to GermanAnalyzer for decompounding nouns

Robert Muir created LUCENE-10098:
------------------------------------

             Summary: Add note/link to GermanAnalyzer for decompounding nouns
                 Key: LUCENE-10098
                 URL: https://issues.apache.org/jira/browse/LUCENE-10098
             Project: Lucene - Core
          Issue Type: Task
            Reporter: Robert Muir


The GermanAnalyzer doesn't split compound nouns.

Doing this requires some auxiliary data files with strange licenses. But [~uschindler] has documented and packaged everything up to make this easy: https://github.com/uschindler/german-decompounder

We added a Lucene API example (using CustomAnalyzer) to the README: https://github.com/uschindler/german-decompounder/pull/6

So I think it would be nice to link to this from the javadocs, it makes it really easy to download the datafiles and configure an appropriate analyzer, if you are OK with Latex and LGPL licenses for the data files (which many folks might be).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org