You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Coffey <mc...@yahoo.com.INVALID> on 2017/05/01 23:42:16 UTC

idexer "possible analysis error"

I know this might be more of a SOLR question, but I bet some of you know the answer.

I've been using Nutch1.12 + SOLR 5.4.1 successfully for several weeks, but suddenly I am having frequent problems. My recent changes have been (1) indexing two segments at a time, instead of one, and (2) indexing larger segments than before.

The segments are still not terribly large, just 24000 each, for a total of 48000 in the two-segment job.

Here is the exception I get
17/05/01 07:29:34 INFO mapreduce.Job:  map 100% reduce 67%
17/05/01 07:29:42 INFO mapreduce.Job: Task Id : attempt_1491521848897_3507_r_000000_2, Status : FAILED
Error: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://coderox.xxx.com:8984/solr/popular: Exception writing document id http://0-0.ooo/ to the index; possible analysis error.
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:209)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:173)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:367)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:56)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


Of course, the document URL is different each time.

It looks to me like it's complaining about an individual document. This is surprising because it didn't happen at all for the first two million documents I indexed.

Have you nay suggestions on how to debug this? Or how to make it ignore occasional single-document errors without freaking out??

Re: idexer "possible analysis error"

Posted by Furkan KAMACI <fu...@gmail.com>.
Hi Micheal,

What do you have in your Solr logs?

Kind Regards,
Furkan KAMACI


2 May 2017 Sal, saat 02:45 tarihinde Michael Coffey
<mc...@yahoo.com.invalid> şunu yazdı:

> I know this might be more of a SOLR question, but I bet some of you know
> the answer.
>
> I've been using Nutch1.12 + SOLR 5.4.1 successfully for several weeks, but
> suddenly I am having frequent problems. My recent changes have been (1)
> indexing two segments at a time, instead of one, and (2) indexing larger
> segments than before.
>
> The segments are still not terribly large, just 24000 each, for a total of
> 48000 in the two-segment job.
>
> Here is the exception I get
> 17/05/01 07:29:34 INFO mapreduce.Job:  map 100% reduce 67%
> 17/05/01 07:29:42 INFO mapreduce.Job: Task Id :
> attempt_1491521848897_3507_r_000000_2, Status : FAILED
> Error:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://coderox.xxx.com:8984/solr/popular: Exception
> writing document id http://0-0.ooo/ to the index; possible analysis error.
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
> at
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:209)
> at
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:173)
> at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85)
> at
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
> at
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
> at
> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
> at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
> at
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:367)
> at
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:56)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>
> Of course, the document URL is different each time.
>
> It looks to me like it's complaining about an individual document. This is
> surprising because it didn't happen at all for the first two million
> documents I indexed.
>
> Have you nay suggestions on how to debug this? Or how to make it ignore
> occasional single-document errors without freaking out??
>