You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by bu...@apache.org on 2005/04/22 12:05:27 UTC

DO NOT REPLY [Bug 34570] New: - A new Greek Analyzer for Lucene

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34570>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=34570

           Summary: A new Greek Analyzer for Lucene
           Product: Lucene
           Version: 1.4
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Analysis
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: past@ebs.gr


I would like to contribute a greek analyzer for lucene. It is based on the
existing Russian analyzer and features:

- most common greek character sets, such as Unicode, ISO-8859-7 and Windows-1253
- a collection of common greek stop words
- conversion of characters with diacritics (accent, diaeresis) in the lower case
filter, as well as handling of special characters, such as small final sigma

For the character sets I used RFC 1947 (Greek Character Encoding for Electronic
Mail Messages) as a reference. I have incorporated this analyzer in Luke as well
as used it successfully in a recent project of my company (EBS Ltd.).

I hope you will find it a useful addition to the project.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org