You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Thilo Goetz <tw...@gmx.de> on 2007/10/23 18:11:14 UTC

HMM-based POS tagger for the sandbox

All,

I'm currently working with a student, Eugenie Giesbrecht,
who is implementing a HMM-based part-of-speech tagger
for inclusion in the sandbox.  This is 100% original
work of Eugenie's for Apache, and we'll start checking
in code during the next few days.

The only data Eugenie currently has for experimentation
is the Brown corpus of American English.  If you have
any POS-tagged data that we could use for training
(English or other languages), please let us know.
The usual license restrictions apply.  I don't think we
can use any data that's only free for research purposes.

Please let us know if you have any suggestions or would
like to help ;-)  Once Eugenie has something running,
we'll make an announcement on the user's list.

Eugenie has an ICLA on file with the ASF.

--Thilo