You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Ken Krugler <kk...@transpac.com> on 2005/08/12 23:47:48 UTC
Language detection
Given the recent discussion regarding charset/language detection on
this list, people might find this IBM reseearch paper interesting:
<ftp://ftp.software.ibm.com/software/globalization/documents/linguini.pdf>ftp://ftp.software.ibm.com/software/globalization/documents/linguini.pdf
Linguini: Language Identification for Multilingual Documents
John M. Prager
-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200