You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Honorez Dylan <Dy...@cronos.be> on 2007/04/18 17:30:51 UTC

Language Identification

Hi,

 

I'm having issues with the language identifier plugin, in the specific
scenario where no language attribute is set on the html tag and no
language metadata is available. I understood there are two more steps
then, being http header extraction and then statistical analysis. I'd
like to skip the http header check, because I suspect my http server
sends back a default value for content-language, being the the system
language, and this is not correct. I'd like to directly proceed to
statistical analysis.

 

Is it possible to do this?

 

Dylan Honorez
R & D Consultant
4C Technologies / kZen
+32 (0)485 / 69.28.12
dylan.honorez@kzen.be