You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2003/10/13 17:54:47 UTC
DO NOT REPLY [Bug 23784] New: -
[PATCH] Arabic Analyzer, Stemmer
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23784>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23784
[PATCH] Arabic Analyzer, Stemmer
Summary: [PATCH] Arabic Analyzer, Stemmer
Product: Lucene
Version: unspecified
Platform: Other
OS/Version: Other
Status: NEW
Severity: Enhancement
Priority: Other
Component: Analysis
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: otis@apache.org
CC: pierrick.brihaye@wanadoo.fr
September 28th 2003 contribution from "Pierrick Brihaye"
<pi...@wanadoo.fr>.
Original email:
Hi all,
I have written a Lucene Analyzer for arabic. You will find it here :
http://perso.wanadoo.fr/pierrick.brihaye/ArabicAnalyzer.jar
(provisional
adress, anybody interested in hosting it ?)
This work is still in beta stage but it gives quite good results :-)
In order to make it work, you need :
1) a 1.4+ JVM (because of the native support for regular expressions
which
are heavily used in the program ; I've been too lazy to use an external
package)
2) Apache Jakarta Commons-Collections :
http://jakarta.apache.org/commons/collections.html
3) a recent Lucene distribution ;-)
All this work is based on the amazing Tim Buckwalter's Arabic
Morphological
Analyzer Version 1.0
(http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49)
originaly written in Perl and released under the GPL.
The jar contains :
a) the compiled classes
b) the required data files (dictionaries and compatibility tables)
c) 2 command-line test programs
d) 3 test documents with different encodings
e) the source code
f) a README file that will give you a little bit more of information
:-)
To Lucene developers : I plan to offer this work to Lucene (see the jar
hierarchy... and the source file headers ;-). Any objections ?
Feedback is very welcome : there are quite a lot of unresolved issues,
with
the analyzer itselfs as well as with Lucene.
mE AlslAmap, cheers,
p.b.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org