You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/04/01 16:11:06 UTC

Re: java.sql.BatchUpdateException after fetch and wrong WebPage.protocolStatus in trunk

Any thoughts? 

On Tuesday 22 March 2011 17:19:57 Markus Jelsma wrote:
> Hi,
> 
> I did a few successful fetches for testing trunk's solrclean. After
> removing some pages for having a few NOTFOUND entries in the WebDB (with
> HSQLDB as storage backend) the following exception occured:
> 
> 2011-03-22 16:53:49,727 INFO  fetcher.FetcherJob - -activeThreads=0
> 2011-03-22 16:53:51,036 WARN  mapred.LocalJobRunner - job_local_0001
> java.io.IOException: java.sql.BatchUpdateException: data exception: string
> data, right truncation
>         at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
>         at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
>         at
> org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
>         at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> Caused by: java.sql.BatchUpdateException: data exception: string data,
> right truncation
>         at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown
> Source) at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328) ...
> 5 more
> 
> Also, when i execute solrclean and log WebPage.ProtocolStatus() i see wrong
> values for pages that were removed, instead of ProtocolStatusCodes.NOTFOUND
> (13) they got just 0.
> 
> It smells like a bug but i could be doing things the wrong way, of course
> ;)
> 
> 
> Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350