You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by nalgonda <de...@gmail.com> on 2008/08/21 06:50:32 UTC

how to crate Generating a new language profile in Nutch

Hi, 
Can any one tel me how to crate a Generating a new language profile in Nutch 
i saw in language idenifier in that thay mention 
java org.apache.nutch.analysis.lang.NGramProfile -create <profile-name>
<filename> <encoding> 
clearly expalin what is profile-name,file name and encoding 
give any Example 

Thanks,
-- 
View this message in context: http://www.nabble.com/how-to-crate-Generating-a-new-language-profile-in-Nutch-tp19082403p19082403.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to crate Generating a new language profile in Nutch

Posted by Tomislav Poljak <tp...@gmail.com>.
Hi,
what you actually do when you create profile is train identifier
(classifier) on sample text (so it learns language most popular
n-grams and their statistics).

This n-gram language statistics is then written in a file langCode.npg
(this is an <profile-name> - it is an output file from this process,
example: en.npg)
You need to provide a sample text in language you are trying to (learn
how to) identify, this is <filename> param (example: bigEngText.txt).
This text shoud be as big as you can provide, bigger text -> better
statistics -> better language identifier. Encoding of this sample text
is the last <encoding> param (example : UTF-8)


Hope this helps,

Tomislav

2008/8/21 nalgonda <de...@gmail.com>:
>
> Hi,
> Can any one tel me how to crate a Generating a new language profile in Nutch
> i saw in language idenifier in that thay mention
> java org.apache.nutch.analysis.lang.NGramProfile -create <profile-name>
> <filename> <encoding>
> clearly expalin what is profile-name,file name and encoding
> give any Example
>
> Thanks,
> --
> View this message in context: http://www.nabble.com/how-to-crate-Generating-a-new-language-profile-in-Nutch-tp19082403p19082403.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

Re: how to crate Generating a new language profile in Nutch

Posted by nalgonda <de...@gmail.com>.
Hi,
Thanks for u r reply
i tried to cretae Arabic language i gave in this way
java org.apache.nutch.analysis.lang.NGramProfile -create ar Arabic
arabicUTF8
in that Arabic file i mention some news on Arabic language after run in cmd
prompt
but i wil get error NoClassDefFoundErroe in
org/apache/nutch/analysis/lang/Ngramprofile
for this i have do any configurations

Thanks,


nalgonda wrote:
> 
> Hi, 
> Can any one tel me how to crate a Generating a new language profile in
> Nutch 
> i saw in language idenifier in that thay mention 
> java org.apache.nutch.analysis.lang.NGramProfile -create <profile-name>
> <filename> <encoding> 
> clearly expalin what is profile-name,file name and encoding 
> give any Example 
> 
> Thanks,
> 

-- 
View this message in context: http://www.nabble.com/how-to-crate-Generating-a-new-language-profile-in-Nutch-tp19082403p19085585.html
Sent from the Nutch - User mailing list archive at Nabble.com.