You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Doğacan Güney <do...@gmail.com> on 2009/07/10 12:14:01 UTC

Re: how to change encoding

On Fri, Jul 10, 2009 at 12:43, Saurabh Suman<sa...@rediff.com> wrote:
>
> Hi
> how can i change encoding of text content. In nutch-default.xml  for
> parser.character.encoding.default is windows-1252. what does it mean?

Nutch tries to detect a page's encoding automatically (a particularly
useful option here
is encodingdetector.charset.min.confidence. I generally get good
results when it is 70 or so).

But if for some reason nutch can't figure out the encoding
parser.character.encoding.default is the encoding we fallback to.

> --
> View this message in context: http://www.nabble.com/how-to-change--encoding-tp24424482p24424482.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
Doğacan Güney