You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jazz <ja...@me.com> on 2013/02/25 21:37:53 UTC
Nutch 2.1 MySQL setup character encoding
Hi,
How do I setup nutch to crawl correctly using the UTF-8 character set?
This does not work: http://nlp.solutions.asia/?p=180
I am using nutch 2.1, Solr 4.0 and MySQL 5.5.30. This is the error during the parser job:
Caused by: java.sql.SQLException: Incorrect string value: '\xEF\xBB\xBF Ir...' for column 'text' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002)
The problem seems to be that the JDBC connection is not working on UTF-8. How do I change that in nutch? This is used but does not seem to effect the JDBC connection:
<property>
<name>parser.character.encoding.default</name>
<value>utf-8</value>
</property>
Thanks for your help,
Bart