You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by jo...@teamware.co.uk on 2002/02/01 17:44:00 UTC

Finnish Stemmer / Ananlyzer source code

Appologies for submitting my source code in a rushed manner - but I have been
very busy leading up to my maternity leave.  I am about to take leave for
approximately 6 months, but I would like to submit some of the source code I
have developed recently to use Lucene for web sites written in Finnish.

This mail contains a zip file which contains the relevant java classes, some
design documentation and also some C code - since our solution involved using
some third party commercial software called Morfo by a company called Kielikone.

To make use of this Finnish Analyzer, we also had to make 2 relatively small
changes to the lucene core code.  This was to allow more than one token to be
stored at the same postion within a document.  These changes may be useful to
others and I would be grateful if someone could consider adding them to he
lucene core code.

The 2 lucene files that were modified are :
/analysis/TokenStream.java
/index/DocumetWriter.java

Our updated versions are also attached.

I have submitted this code at the last minute - so unfortunately, I shall not
be able to respond to any queries.  But perhaps one day someone will search the
archive for a Finnish Stemmer and find these attachments useful ?

Kind regards

Joanne Sproston
Teamware Group
joanne.sproston@teamware.co.uk
phone: +44 (0)1782 794879  fax: +44 (0)1782  776667

intra / extra / Internet solutions at www.teamware.com