You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2005/04/22 12:05:27 UTC
DO NOT REPLY [Bug 34570] New: -
A new Greek Analyzer for Lucene
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34570>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=34570
Summary: A new Greek Analyzer for Lucene
Product: Lucene
Version: 1.4
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Analysis
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: past@ebs.gr
I would like to contribute a greek analyzer for lucene. It is based on the
existing Russian analyzer and features:
- most common greek character sets, such as Unicode, ISO-8859-7 and Windows-1253
- a collection of common greek stop words
- conversion of characters with diacritics (accent, diaeresis) in the lower case
filter, as well as handling of special characters, such as small final sigma
For the character sets I used RFC 1947 (Greek Character Encoding for Electronic
Mail Messages) as a reference. I have incorporated this analyzer in Luke as well
as used it successfully in a recent project of my company (EBS Ltd.).
I hope you will find it a useful addition to the project.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org