You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Volkan Ebil <vo...@pecya.com> on 2008/01/31 09:38:36 UTC

Help needed!!

Hi everyone,

 

I am from Turkey. My language has a special char "ğ" .This char  used only
in Turkish and i have to make a language identifier.I have thought that
instead of using ngrams  i can simply check that 

if the html content includes "ğ" or not.For this reason I need  an if check
to make the following:

 

Fetch the url

 

if  content of the url includes      "ğ" or "Ğ"

            

            then parse and index the url

else

            skip the url.

 

 

Where should i look in source code ? How can i make such a limitation like
that ?



Thanks in advance 

 

Volkan..