You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by sc...@gmx.net, sc...@gmx.net on 2012/02/25 15:55:40 UTC

Re: Solr Indexing

nutch 1.4

greetz,
Rafael.



what nutch version are you using?

On 14/Dec/ 2011, at 12:29 , Rafael Pappert wrote:

> Hey Markus,
> 
> Nutch' log contains loads of errors like this:
> 
> org.apache.solr.common.SolrException: Bad Request
> 
> Bad Request
> 
> request: http://hadoop0:8080/apache-solr-3.5.0/update?wt=javabin&version=2
> 	at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
> 	at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> 	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> 	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> 	at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:81)
> 	at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:54)
> 	at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
> 	at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:456)
> 	at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:496)
> 	at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:166)
> 	at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:51)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:518)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:419)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> 
> 
> 
> 
> 
> 
> On 14/Dec/ 2011, at 12:25 , Markus Jelsma wrote:
> 
>> We also need Nutch' log
>> 
>> On Wednesday 14 December 2011 12:14:15 Rafael Pappert wrote:
>>> Hello List,
>>> 
>>> at indexing got errors like this:
>>> 
>>> Dec 14, 2011 5:00:11 AM org.apache.solr.common.SolrException log
>>> SEVERE: java.lang.RuntimeException: [was class java.io.IOException] Invalid
>>> CRLF at
>>> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:1
>>> 8) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
>>> at
>>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>>> 657) at
>>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at
>>> org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:301) at
>>> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157) at
>>> org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Content
>>> StreamHandlerBase.java:58) at
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
>>> e.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java
>>> :356) at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
>>> a:252) at
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicati
>>> onFilterChain.java:243) at
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilter
>>> Chain.java:210) at
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.
>>> java:224) at
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.
>>> java:169) at
>>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBa
>>> se.java:472) at
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:1
>>> 68) at
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:9
>>> 8) at
>>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
>>> at
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.ja
>>> va:118) at
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407
>>> ) at
>>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Pro
>>> cessor.java:987) at
>>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstr
>>> actProtocol.java:539) at
>>> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.jav
>>> a:1815) at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.
>>> java:886) at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>>> :908) at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.io.IOException: Invalid CRLF
>>>       at
>>> org.apache.coyote.http11.filters.ChunkedInputFilter.parseCRLF(ChunkedInput
>>> Filter.java:356) at
>>> org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFil
>>> ter.java:147) at
>>> org.apache.coyote.http11.InternalAprInputBuffer.doRead(InternalAprInputBuf
>>> fer.java:548) at org.apache.coyote.Request.doRead(Request.java:422)
>>>       at
>>> org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:2
>>> 90) at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:429)
>>> at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:315) at
>>> org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.jav
>>> a:167) at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365) at
>>> com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
>>>       at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
>>>       at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
>>>       at
>>> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:
>>> 57) at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) at
>>> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>>> :4628) at
>>> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>>> :4126) at
>>> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>>> at
>>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>>> 649) ... 25 more
>>> 
>>> and this:
>>> 
>>> Dec 14, 2011 5:00:11 AM org.apache.solr.common.SolrException log
>>> SEVERE: org.apache.solr.common.SolrException: Unexpected EOF; was expecting
>>> a close tag for element <field> at [row,col {unknown-source}]: [1,2708390]
>>>       at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)
>>>       at
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Content
>>> StreamHandlerBase.java:58) at
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
>>> e.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java
>>> :356) at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
>>> a:252) at
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicati
>>> onFilterChain.java:243) at
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilter
>>> Chain.java:210) at
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.
>>> java:224) at
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.
>>> java:169) at
>>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBa
>>> se.java:472) at
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:1
>>> 68) at
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:9
>>> 8) at
>>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
>>> at
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.ja
>>> va:118) at
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407
>>> ) at
>>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Pro
>>> cessor.java:987) at
>>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstr
>>> actProtocol.java:539) at
>>> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.jav
>>> a:1815) at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.
>>> java:886) at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>>> :908) at java.lang.Thread.run(Thread.java:662)
>>> Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF; was expecting
>>> a close tag for element <field> at [row,col {unknown-source}]: [1,2708390]
>>>       at
>>> com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
>>> at
>>> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2730
>>> ) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at
>>> org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295) at
>>> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157) at
>>> org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) ... 21 more
>>> 
>>> fetching, parsing, etc works without any problems. Any ideas?
>>> 
>>> Thanks in advance,
>>> Rafael.
>> 
>> -- 
>> Markus Jelsma - CTO - Openindex
>