You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Richardson, Jacquelyn F." <fl...@ornl.gov> on 2016/09/21 13:54:37 UTC

RE: Error while attempting to add documents to Solr

Hi Markus,

Thanks very much for your response.  

I did what you suggested but did not see anything missing in the first few bytes.  

Because I have the same setup on my local machine I was curious to see what would happen if I copied the directory containing the segments created from the crawl  (on my local machine) of a seed file.  Once copied I issued the following commands to index into Solr.  To do so on the server, I did:   

	1.  Double-click Cygwin.bat file to open command window.
	2. CD to nutch home directory.
	3. Issue command: export JAVA_HOME=F:/Programs/jdk1.7.0_80
	4. Issue command: bin/nutch solrindex http://fegddd.enther.rlco.gov/solr/collection1_tst crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb crawls/crawlsitemap/segments/*

I received a slightly different error this time.  In the Hadoop.log I received:
	WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&version=2
	at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
	at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
	at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
	at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
	at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2016-09-21 08:39:55,499 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
	at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
	at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

And in solr.log I received:
	ERROR - 2016-09-21 08:39:54.509; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
	at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
	at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
	at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:850)
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
	at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2500)
	at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2489)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)
Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
	at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
	at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
	at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
	at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
	at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213)
	at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
	... 23 more

Now I am totally at a loss.  I thought it might be my setup, but when I compared them they are the same.  

Any light you may be able to shed on what is wrong will be greatly appreciated.

Thanks,
Jackie

-----Original Message-----
From: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
Sent: Friday, August 12, 2016 3:00 PM
To: user@nutch.apache.org
Subject: RE: Error while attempting to add documents to Solr

Hello Jacquelyn,

This is very odd:

> Unexpected EOF in prolog
> at [row,col {unknown-source}]: [1,0]

We've fixed this problem a long time ago. It was a problem of non-unicode codepoints in the data sent to Solr. The Solr indexing plugin strips them all, and to my knowledge, there are no other non-unicode codepoints to strip.

What you can do to analyze the problem is to use debug or even trace logging, so you can see the exact XML Nutch is sending on the wire, and use a hexeditor to check for position 1,0, well, the first few bytes.

Markus

 
 
-----Original message-----
> From:Richardson, Jacquelyn F. <fl...@ornl.gov>
> Sent: Friday 12th August 2016 19:37
> To: user@nutch.apache.org
> Subject: Error while attempting to add documents to Solr
> 
> Hi All,
> 
> Some background information that maybe of some help.  I have Cygwin64, Solr 4.7, apache Nutch 1.9 source and tomcat configured in a Windows 7 environment.  This setup works well on my local machine.  I can crawl the specified web page(s) and Nutch can successfully index the content to Solr.
> 
> I moved this setup to one of our servers (except tomcat 8; it was already on the server and the OS is Windows Server 2008).  I executed a crawl of a seed file using the individual Nutch commands.  Everything worked fine until I ran the command to index the content to Solr.  I issued the following command:
> bin/nutch solrindex http://fegddd.enther.rlco.gov/solr/collection1_tst 
> crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb 
> crawls/crawlsitemap/segments/*
> 
> I received the following error in haddoop.log:
>                 WARN  mapred.LocalJobRunner - job_local_0001 
> org.apache.solr.common.SolrException: Bad Request
> 
> Bad Request
> 
> request: 
> http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&v
> ersion=2
> 
> Solr.log reports this error:
>                 INFO  - 2016-08-12 07:18:27.656; 
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1_tst] 
> webapp=/solr path=/update params={wt=javabin&version=2} {} 0 62 ERROR - 2016-08-12 07:18:27.656; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Unexpected EOF in prolog at [row,col {unknown-source}]: [1,0]
>                 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
>                 at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>                 at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>                 at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>                 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>                 at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
>                 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
>                 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
>                 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
>                 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>                 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
>                 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
>                 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
>                 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
>                 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
>                 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
>                 at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:850)
>                 at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
>                 at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2500)
>                 at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2489)
>                 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>                 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>                 at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>                 at java.lang.Thread.run(Thread.java:745)
> Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog 
> at [row,col {unknown-source}]: [1,0]
>                 at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
>                 at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
>                 at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
>                 at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
>                 at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213)
>                 at 
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> 
> I have compared the setup on my local machine with the setup on the server machine and I cannot see a difference.  I thought perhaps it had something to do with the solrindex-mapping.xml file but what is on the server agrees with what I have on my local machine.
> 
> Any help you can provide will be most appreciated.
> 
> Thanks,
> Jackie
> 
> 


RE: Error while attempting to add documents to Solr

Posted by "Richardson, Jacquelyn F." <fl...@ornl.gov>.
Hi Markus,

If I should upgrade to the latest version of Solr (6.2.1) is it advisable to upgrade my current version (1.9) of nutch?  If so, should I upgrade to the latest version of nutch (1.12)?  

Jackie

-----Original Message-----
From: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
Sent: Wednesday, September 21, 2016 1:30 PM
To: user@nutch.apache.org
Subject: RE: Error while attempting to add documents to Solr

Hmm, a last option would be to upgrade your Solr instance. 4x is really old and it might do the trick.

Markus

 
 
-----Original message-----
> From:Richardson, Jacquelyn F. <fl...@ornl.gov>
> Sent: Wednesday 21st September 2016 15:54
> To: user@nutch.apache.org
> Subject: RE: Error while attempting to add documents to Solr
> 
> Hi Markus,
> 
> Thanks very much for your response.
> 
> I did what you suggested but did not see anything missing in the first 
> few bytes.
> 
> Because I have the same setup on my local machine I was curious to see 
> what would happen if I copied the directory containing the segments created from the crawl  (on my local machine) of a seed file.  Once copied I issued the following commands to index into Solr.  To do so on the server, I did:
> 
>  
> 
> 1.  Double-click Cygwin.bat file to open command window. 
> 2. CD to nutch home directory. 
> 3. Issue command: export JAVA_HOME=F:/Programs/jdk1.7.0_80 4. Issue 
> command: bin/nutch solrindex 
> http://fegddd.enther.rlco.gov/solr/collection1_tst 
> crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb 
> crawls/crawlsitemap/segments/*
> 
> I received a slightly different error this time.  In the Hadoop.log I received: 
> WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Bad Request
> 
> Bad Request
> 
> request: 
> http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&v
> ersion=2 at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common
> sHttpSolrServer.java:430) at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common
> sHttpSolrServer.java:244) at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abs
> tractUpdateRequest.java:105) at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWrite
> r.java:155) at 
> org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
> at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputForm
> at.java:44) at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:21
> 6)
> 2016-09-21 08:39:55,499 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed! 
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
> 
> And in solr.log I received: 
> ERROR - 2016-09-21 08:39:54.509; org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: Unexpected EOF in prolog  at 
> [row,col {unknown-source}]: [1,0] at 
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandl
> er.java:92) at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con
> tentStreamHandlerBase.java:74) at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:135) at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> java:780) at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:427) at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:217) at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> cationFilterChain.java:239) at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> lterChain.java:206) at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> lve.java:212) at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> lve.java:106) at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> va:141) at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> va:79) at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> e.java:88) at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :521) at 
> org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcesso
> r.java:850) at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(A
> bstractProtocol.java:674) at 
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoi
> nt.java:2500) at 
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint
> .java:2489) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1145) at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615) at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThr
> ead.java:61) at java.lang.Thread.run(Thread.java:745)
> Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog  
> at [row,col {unknown-source}]: [1,0] at 
> com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:68
> 6) at 
> com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:213
> 4) at 
> com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.jav
> a:2040) at 
> com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
> at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:
> 213) at 
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> ... 23 more
> 
> Now I am totally at a loss.  I thought it might be my setup, but when 
> I compared them they are the same.
> 
> Any light you may be able to shed on what is wrong will be greatly appreciated.
> 
> Thanks,
> Jackie
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
> Sent: Friday, August 12, 2016 3:00 PM
> To: user@nutch.apache.org
> Subject: RE: Error while attempting to add documents to Solr
> 
> Hello Jacquelyn,
> 
> This is very odd:
> 
> > Unexpected EOF in prolog
> > at [row,col {unknown-source}]: [1,0]
> 
> We've fixed this problem a long time ago. It was a problem of non-unicode codepoints in the data sent to Solr. The Solr indexing plugin strips them all, and to my knowledge, there are no other non-unicode codepoints to strip.
> 
> What you can do to analyze the problem is to use debug or even trace logging, so you can see the exact XML Nutch is sending on the wire, and use a hexeditor to check for position 1,0, well, the first few bytes.
> 
> Markus
> 
>  
>  
> -----Original message-----
> > From:Richardson, Jacquelyn F. <fl...@ornl.gov>
> > Sent: Friday 12th August 2016 19:37
> > To: user@nutch.apache.org
> > Subject: Error while attempting to add documents to Solr
> > 
> > Hi All,
> > 
> > Some background information that maybe of some help.  I have Cygwin64, Solr 4.7, apache Nutch 1.9 source and tomcat configured in a Windows 7 environment.  This setup works well on my local machine.  I can crawl the specified web page(s) and Nutch can successfully index the content to Solr.
> > 
> > I moved this setup to one of our servers (except tomcat 8; it was already on the server and the OS is Windows Server 2008).  I executed a crawl of a seed file using the individual Nutch commands.  Everything worked fine until I ran the command to index the content to Solr.  I issued the following command:
> > bin/nutch solrindex 
> > http://fegddd.enther.rlco.gov/solr/collection1_tst
> > crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb
> > crawls/crawlsitemap/segments/*
> > 
> > I received the following error in haddoop.log:
> >                 WARN  mapred.LocalJobRunner - job_local_0001
> > org.apache.solr.common.SolrException: Bad Request
> > 
> > Bad Request
> > 
> > request: 
> > http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin
> > &v
> > ersion=2
> > 
> > Solr.log reports this error:
> >                 INFO  - 2016-08-12 07:18:27.656;  
> >org.apache.solr.update.processor.LogUpdateProcessor; 
> >[collection1_tst]  webapp=/solr path=/update 
> >params={wt=javabin&version=2} {} 0 62 ERROR - 2016-08-12 
> >07:18:27.656; org.apache.solr.common.SolrException; 
> >org.apache.solr.common.SolrException: Unexpected EOF in prolog at 
> >[row,col {unknown-source}]: [1,0]
> >                 at 
> >org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> >                 at 
> >org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHand
> >ler.java:92)
> >                 at 
> >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
> >ntentStreamHandlerBase.java:74)
> >                 at 
> >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> >erBase.java:135)
> >                 at 
> >org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> >                 at 
> >org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
> >.java:780)
> >                 at 
> >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> >r.java:427)
> >                 at 
> >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> >r.java:217)
> >                 at 
> >org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
> >icationFilterChain.java:239)
> >                 at 
> >org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
> >ilterChain.java:206)
> >                 at 
> >org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
> >alve.java:212)
> >                 at 
> >org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
> >alve.java:106)
> >                 at 
> >org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
> >ava:141)
> >                 at 
> >org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
> >ava:79)
> >                 at 
> >org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
> >ve.java:88)
> >                 at 
> >org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
> >a:521)
> >                 at 
> >org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcess
> >or.java:850)
> >                 at 
> >org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(
> >AbstractProtocol.java:674)
> >                 at 
> >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpo
> >int.java:2500)
> >                 at 
> >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoin
> >t.java:2489)
> >                 at 
> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> >java:1145)
> >                 at 
> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> >.java:615)
> >                 at 
> >org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskTh
> >read.java:61)
> >                 at java.lang.Thread.run(Thread.java:745)
> > Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in 
> >prolog  at [row,col {unknown-source}]: [1,0]
> >                 at 
> >com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:6
> >86)
> >                 at 
> >com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:21
> >34)
> >                 at 
> >com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.ja
> >va:2040)
> >                 at 
> >com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
> >                 at 
> >org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java
> >:213)
> >                 at
> > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> > 
> > I have compared the setup on my local machine with the setup on the server machine and I cannot see a difference.  I thought perhaps it had something to do with the solrindex-mapping.xml file but what is on the server agrees with what I have on my local machine.
> > 
> > Any help you can provide will be most appreciated.
> > 
> > Thanks,
> > Jackie
> > 
> > 
> 
> 


RE: Error while attempting to add documents to Solr

Posted by Markus Jelsma <ma...@openindex.io>.
Hmm, a last option would be to upgrade your Solr instance. 4x is really old and it might do the trick.

Markus

 
 
-----Original message-----
> From:Richardson, Jacquelyn F. <fl...@ornl.gov>
> Sent: Wednesday 21st September 2016 15:54
> To: user@nutch.apache.org
> Subject: RE: Error while attempting to add documents to Solr
> 
> Hi Markus,
> 
> Thanks very much for your response.  
> 
> I did what you suggested but did not see anything missing in the first few bytes.  
> 
> Because I have the same setup on my local machine I was curious to see what would happen if I copied the directory containing the segments created from the crawl  (on my local machine) of a seed file.  Once copied I issued the following commands to index into Solr.  To do so on the server, I did:   
> 
>  
> 
> 1.  Double-click Cygwin.bat file to open command window. 
> 2. CD to nutch home directory. 
> 3. Issue command: export JAVA_HOME=F:/Programs/jdk1.7.0_80 
> 4. Issue command: bin/nutch solrindex http://fegddd.enther.rlco.gov/solr/collection1_tst crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb crawls/crawlsitemap/segments/*
> 
> I received a slightly different error this time.  In the Hadoop.log I received: 
> WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Bad Request
> 
> Bad Request
> 
> request: http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&version=2 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
> at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) 
> at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155) 
> at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118) 
> at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44) 
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) 
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) 
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2016-09-21 08:39:55,499 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed! 
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) 
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) 
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) 
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
> 
> And in solr.log I received: 
> ERROR - 2016-09-21 08:39:54.509; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Unexpected EOF in prolog
>  at [row,col {unknown-source}]: [1,0] 
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) 
> at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) 
> at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) 
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) 
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) 
> at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) 
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) 
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) 
> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) 
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) 
> at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212) 
> at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) 
> at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141) 
> at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) 
> at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) 
> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521) 
> at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:850) 
> at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674) 
> at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2500) 
> at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2489) 
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
> at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) 
> at java.lang.Thread.run(Thread.java:745)
> Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
>  at [row,col {unknown-source}]: [1,0] 
> at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686) 
> at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134) 
> at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040) 
> at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) 
> at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213) 
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) 
> ... 23 more
> 
> Now I am totally at a loss.  I thought it might be my setup, but when I compared them they are the same.  
> 
> Any light you may be able to shed on what is wrong will be greatly appreciated.
> 
> Thanks,
> Jackie
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
> Sent: Friday, August 12, 2016 3:00 PM
> To: user@nutch.apache.org
> Subject: RE: Error while attempting to add documents to Solr
> 
> Hello Jacquelyn,
> 
> This is very odd:
> 
> > Unexpected EOF in prolog
> > at [row,col {unknown-source}]: [1,0]
> 
> We've fixed this problem a long time ago. It was a problem of non-unicode codepoints in the data sent to Solr. The Solr indexing plugin strips them all, and to my knowledge, there are no other non-unicode codepoints to strip.
> 
> What you can do to analyze the problem is to use debug or even trace logging, so you can see the exact XML Nutch is sending on the wire, and use a hexeditor to check for position 1,0, well, the first few bytes.
> 
> Markus
> 
>  
>  
> -----Original message-----
> > From:Richardson, Jacquelyn F. <fl...@ornl.gov>
> > Sent: Friday 12th August 2016 19:37
> > To: user@nutch.apache.org
> > Subject: Error while attempting to add documents to Solr
> > 
> > Hi All,
> > 
> > Some background information that maybe of some help.  I have Cygwin64, Solr 4.7, apache Nutch 1.9 source and tomcat configured in a Windows 7 environment.  This setup works well on my local machine.  I can crawl the specified web page(s) and Nutch can successfully index the content to Solr.
> > 
> > I moved this setup to one of our servers (except tomcat 8; it was already on the server and the OS is Windows Server 2008).  I executed a crawl of a seed file using the individual Nutch commands.  Everything worked fine until I ran the command to index the content to Solr.  I issued the following command:
> > bin/nutch solrindex http://fegddd.enther.rlco.gov/solr/collection1_tst 
> > crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb 
> > crawls/crawlsitemap/segments/*
> > 
> > I received the following error in haddoop.log:
> >                 WARN  mapred.LocalJobRunner - job_local_0001 
> > org.apache.solr.common.SolrException: Bad Request
> > 
> > Bad Request
> > 
> > request: 
> > http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&v
> > ersion=2
> > 
> > Solr.log reports this error:
> >                 INFO  - 2016-08-12 07:18:27.656; 
> > org.apache.solr.update.processor.LogUpdateProcessor; [collection1_tst] 
> > webapp=/solr path=/update params={wt=javabin&version=2} {} 0 62 ERROR - 2016-08-12 07:18:27.656; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Unexpected EOF in prolog at [row,col {unknown-source}]: [1,0]
> >                 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> >                 at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> >                 at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >                 at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >                 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> >                 at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
> >                 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
> >                 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
> >                 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
> >                 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >                 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
> >                 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
> >                 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
> >                 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
> >                 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
> >                 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
> >                 at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:850)
> >                 at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
> >                 at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2500)
> >                 at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2489)
> >                 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >                 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >                 at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> >                 at java.lang.Thread.run(Thread.java:745)
> > Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog 
> > at [row,col {unknown-source}]: [1,0]
> >                 at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
> >                 at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
> >                 at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
> >                 at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
> >                 at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213)
> >                 at 
> > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> > 
> > I have compared the setup on my local machine with the setup on the server machine and I cannot see a difference.  I thought perhaps it had something to do with the solrindex-mapping.xml file but what is on the server agrees with what I have on my local machine.
> > 
> > Any help you can provide will be most appreciated.
> > 
> > Thanks,
> > Jackie
> > 
> > 
> 
>