You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2006/01/11 23:04:39 UTC

[Nutch Wiki] Update of "LanguageIdentifier" by JeromeCharron

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by JeromeCharron:
http://wiki.apache.org/nutch/LanguageIdentifier

The comment on the change is:
Add some doc about generating new language profiles

------------------------------------------------------------------------------
  
  == Generating some NGrams profiles ==
  
- TODO
+ Generating a new language profile in Nutch is really easy.
+ Simply launch the following command:
+ {{{
+ java org.apache.nutch.analysis.lang.NGramProfile -create <profile-name> <filename> <encoding>
+ }}}
+ where
+  * '''profile-name''' is the [http://www.w3.org/WAI/ER/IG/ert/iso639.htm ISO-639 2-letter codes] of the new language.
+  * '''filename''' is the name of the file used to build the new language profile (the biggest it is, and the most it contains different sources and subjects the better the profile will be).
+  * '''encoding''' is the encoding of the file used to build the new profile ('''filename''').
+ 
  
  == Open Issues ==