You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Volkan Ebil <vo...@pecya.com> on 2008/01/31 09:38:36 UTC
Help needed!!
Hi everyone,
I am from Turkey. My language has a special char "ğ" .This char used only
in Turkish and i have to make a language identifier.I have thought that
instead of using ngrams i can simply check that
if the html content includes "ğ" or not.For this reason I need an if check
to make the following:
Fetch the url
if content of the url includes "ğ" or "Ğ"
then parse and index the url
else
skip the url.
Where should i look in source code ? How can i make such a limitation like
that ?
Thanks in advance
Volkan..