You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Saurabh Suman <sa...@rediff.com> on 2009/07/10 11:43:33 UTC

how to change encoding

Hi 
how can i change encoding of text content. In nutch-default.xml  for
parser.character.encoding.default is windows-1252. what does it mean?
-- 
View this message in context: http://www.nabble.com/how-to-change--encoding-tp24424482p24424482.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: how to change encoding

Posted by Doğacan Güney <do...@gmail.com>.
On Fri, Jul 10, 2009 at 12:43, Saurabh Suman<sa...@rediff.com> wrote:
>
> Hi
> how can i change encoding of text content. In nutch-default.xml  for
> parser.character.encoding.default is windows-1252. what does it mean?

Nutch tries to detect a page's encoding automatically (a particularly
useful option here
is encodingdetector.charset.min.confidence. I generally get good
results when it is 70 or so).

But if for some reason nutch can't figure out the encoding
parser.character.encoding.default is the encoding we fallback to.

> --
> View this message in context: http://www.nabble.com/how-to-change--encoding-tp24424482p24424482.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
Doğacan Güney