You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jazz <ja...@me.com> on 2013/02/25 21:37:53 UTC

Nutch 2.1 MySQL setup character encoding

Hi,

How do I setup nutch to crawl correctly using the UTF-8 character set?

This does not work: http://nlp.solutions.asia/?p=180

I am using nutch 2.1, Solr 4.0 and MySQL 5.5.30. This is the error during the parser job:

Caused by: java.sql.SQLException: Incorrect string value: '\xEF\xBB\xBF Ir...' for column 'text' at row 1
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002)

The problem seems to be that the JDBC connection is not working on UTF-8. How do I change that in nutch? This is used but does not seem to effect the JDBC connection:

<property>
	<name>parser.character.encoding.default</name>
	<value>utf-8</value>
</property>


Thanks for your help,

Bart