You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Richard Eckart de Castilho <ri...@gmail.com> on 2013/05/15 11:24:44 UTC

Re: About whether the UIMA supports Chinese

Hi,

> Recently, I was study the UIMA, through documents, understand the UIMA can support Chinese, but how the UIMA support Chinese not mentioned in the document, so support for the UIMA Chinese this piece is more confused for me , I hope you can give me some detailed document or material, help me to solve my confused! 
> Thank you very much!

we have some support for Chinese in DKPro Core. 

DKPro Core ASL [1]:

- Tokenizer/segmenter using LanguageTool [2, 3]
- Part-of-speech tagger using TreeTagger [4, 5] (TreeTagger is research only)

DKPro Core GPL [6]:

- Part-of-speech tagger using Stanford NLP [7,8]
- Parser using Stanford NLP [7,9]
- Parser using Berkeley Parser [10, 11]

Some of these components may only be available in the SVN trunk version.

That said, we do not really work on Chinese data, so this is rather a proof-of-concept (checking character set works, models are loaded properly, etc). If you try it out and have feedback, please tell us :)

-- Richard

[1] http://code.google.com/p/dkpro-core-asl

[2] http://www.languagetool.org

[3] http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.languagetool-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/languagetool/LanguageToolSegmenterTest.java

[4] http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

[5] http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.treetagger-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/treetagger/TreeTaggerPosLemmaTT4JTest.java#128

[6] http://code.google.com/p/dkpro-core-gpl

[7] http://www-nlp.stanford.edu/software/index.shtml

[8] http://code.google.com/p/dkpro-core-gpl/source/browse/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/test/java/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/StanfordPosTaggerTest.java#64

[9] http://code.google.com/p/dkpro-core-gpl/source/browse/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/test/java/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/StanfordParserTest.java#316

[10] http://code.google.com/p/berkeleyparser/

[11] http://code.google.com/p/dkpro-core-gpl/source/browse/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.berkeleyparser-gpl/src/test/java/de/tudarmstadt/ukp/dkpro/core/berkeleyparser/BerkeleyParserTest.java#114