You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Saurabh Suman <sa...@rediff.com> on 2009/07/10 11:43:33 UTC
how to change encoding
Hi
how can i change encoding of text content. In nutch-default.xml for
parser.character.encoding.default is windows-1252. what does it mean?
--
View this message in context: http://www.nabble.com/how-to-change--encoding-tp24424482p24424482.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: how to change encoding
Posted by Doğacan Güney <do...@gmail.com>.
On Fri, Jul 10, 2009 at 12:43, Saurabh Suman<sa...@rediff.com> wrote:
>
> Hi
> how can i change encoding of text content. In nutch-default.xml for
> parser.character.encoding.default is windows-1252. what does it mean?
Nutch tries to detect a page's encoding automatically (a particularly
useful option here
is encodingdetector.charset.min.confidence. I generally get good
results when it is 70 or so).
But if for some reason nutch can't figure out the encoding
parser.character.encoding.default is the encoding we fallback to.
> --
> View this message in context: http://www.nabble.com/how-to-change--encoding-tp24424482p24424482.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
--
Doğacan Güney