You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Nathan Gass (JIRA)" <ji...@apache.org> on 2012/11/05 14:12:12 UTC
[jira] [Created] (NUTCH-1490) Data Truncation exceptions when using
mysql
Nathan Gass created NUTCH-1490:
----------------------------------
Summary: Data Truncation exceptions when using mysql
Key: NUTCH-1490
URL: https://issues.apache.org/jira/browse/NUTCH-1490
Project: Nutch
Issue Type: Bug
Affects Versions: 2.1
Reporter: Nathan Gass
Nutch does not ensure the set (or implicit) maximal length for the following columns:
title
urls (id, baseUrl, reprUrl,
typ (contentType)
inlinks
outlinks
Trying to store too much data in one of this columns results in an exception similar to this (copied from GORA-24, I will be able to add an newer stack trace later today):
java.io.IOException: java.sql.BatchUpdateException: Data truncation: Data too long for column 'inlinks' at row 1
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
Caused by: java.sql.BatchUpdateException: Data truncation: Data too long for column 'inlinks' at row 1
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2018)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1449)
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
... 5 more
I'll add my current fixes in later comments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1490) Data Truncation exceptions when
using mysql
Posted by "Nathan Gass (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490636#comment-13490636 ]
Nathan Gass commented on NUTCH-1490:
------------------------------------
Additionally the given maximum length for urls and title need to be enforced somewhere in nutch code.
I'd suggest some new config values for this.
> Data Truncation exceptions when using mysql
> -------------------------------------------
>
> Key: NUTCH-1490
> URL: https://issues.apache.org/jira/browse/NUTCH-1490
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 2.1
> Reporter: Nathan Gass
> Attachments: patch
>
>
> Nutch does not ensure the set (or implicit) maximal length for the following columns:
> title
> urls (id, baseUrl, reprUrl,
> typ (contentType)
> inlinks
> outlinks
> Trying to store too much data in one of this columns results in an exception similar to this (copied from GORA-24, I will be able to add an newer stack trace later today):
> java.io.IOException: java.sql.BatchUpdateException: Data truncation: Data too long for column 'inlinks' at row 1
> at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
> at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
> at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> Caused by: java.sql.BatchUpdateException: Data truncation: Data too long for column 'inlinks' at row 1
> at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2018)
> at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1449)
> at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
> ... 5 more
> I'll add my current fixes in later comments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1490) Data Truncation exceptions when using
mysql
Posted by "Nathan Gass (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nathan Gass updated NUTCH-1490:
-------------------------------
Attachment: patch
The actual length values I used are somewhat arbitrary. They need to be large enough for db.update.max.inlinks and db.max.outlinks.per.page.
I guessed that with utf8 this amounts to something like (max-url-length + db.max.anchor.length) * 3 + c per entry and just rounded this up to some MiB values (so I used 24MiB and 0.5MiB).
> Data Truncation exceptions when using mysql
> -------------------------------------------
>
> Key: NUTCH-1490
> URL: https://issues.apache.org/jira/browse/NUTCH-1490
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 2.1
> Reporter: Nathan Gass
> Attachments: patch
>
>
> Nutch does not ensure the set (or implicit) maximal length for the following columns:
> title
> urls (id, baseUrl, reprUrl,
> typ (contentType)
> inlinks
> outlinks
> Trying to store too much data in one of this columns results in an exception similar to this (copied from GORA-24, I will be able to add an newer stack trace later today):
> java.io.IOException: java.sql.BatchUpdateException: Data truncation: Data too long for column 'inlinks' at row 1
> at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
> at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
> at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> Caused by: java.sql.BatchUpdateException: Data truncation: Data too long for column 'inlinks' at row 1
> at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2018)
> at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1449)
> at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
> ... 5 more
> I'll add my current fixes in later comments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira