You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by xu cheng <xc...@gmail.com> on 2010/12/27 04:24:41 UTC

exception with xml file processing

hi all:
 I use solr to index my documents, and I put my text in a cdata
segment.however, solr always throws an exception complaining about
thexml file processing
.
 It seems that I can still index the document successfully!!!(actually , I'm
not sure about cos there are pretty too many document!)


the exception stack is like this: and all the exception infos are the same




 Error processing "legacy" update
command:com.ctc.wstx.exc.WstxUnexpectedCharException: Une
xpected character ''' (code 39) in prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]
        at
com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
        at
com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2047)
        at
com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
        at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:90)
        at
org.apache.solr.handler.XmlUpdateRequestHandler.doLegacyUpdate(XmlUpdateRequestHandle
r.java:130)
        at
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:79)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterCha
in.java:290)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:
206)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterCha
in.java:235)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:
206)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
        at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
        at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protoco
l.java:588)
        at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:619)










any suggestion and reference are appreciated! thanks

Re: exception with xml file processing

Posted by Lance Norskog <go...@gmail.com>.
Tomcat needs a flag that tells it to use UTF-8. If you don't set that
various problems happen, including this one. Look on the solr wiki for
Tomcat and UTF-8.

Also, there can't be any blank lines at the top of the XML file before
the XML header.

Can you post a very short XML file that has this problem?

On Mon, Dec 27, 2010 at 5:43 AM, Erick Erickson <er...@gmail.com> wrote:
> This often happens if there is some character at the very beginning
> of the XML document, outside of any tags, here:
>
> character ''' (code 39) in prolog; expected '<'
>  at [row,col {unknown-source}]: [1,1]
>
> But you indicate that this is happening for every document? If that's
> the case, it may be an encoding issue. Make sure your servlet container
> character encoding handles the character set in your
> documents (UTF-8?).
>
> If that doesn't help, please show us an example of a file that doesn't work,
> the version of Solr you're using, info about your servlet, etc.
>
> Best
> Erick
>
> On Sun, Dec 26, 2010 at 10:24 PM, xu cheng <xc...@gmail.com> wrote:
>
>> hi all:
>>  I use solr to index my documents, and I put my text in a cdata
>> segment.however, solr always throws an exception complaining about
>> thexml file processing
>> .
>>  It seems that I can still index the document successfully!!!(actually ,
>> I'm
>> not sure about cos there are pretty too many document!)
>>
>>
>> the exception stack is like this: and all the exception infos are the same
>>
>>
>>
>>
>>  Error processing "legacy" update
>> command:com.ctc.wstx.exc.WstxUnexpectedCharException: Une
>> xpected character ''' (code 39) in prolog; expected '<'
>>  at [row,col {unknown-source}]: [1,1]
>>        at
>> com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
>>        at
>>
>> com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2047)
>>        at
>> com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
>>        at
>> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:90)
>>        at
>>
>> org.apache.solr.handler.XmlUpdateRequestHandler.doLegacyUpdate(XmlUpdateRequestHandle
>> r.java:130)
>>        at
>> org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:79)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>        at
>>
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterCha
>> in.java:290)
>>        at
>>
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:
>> 206)
>>        at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
>>        at
>>
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterCha
>> in.java:235)
>>        at
>>
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:
>> 206)
>>        at
>>
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at
>>
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>        at
>>
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>        at
>>
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>        at
>>
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>        at
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>>        at
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
>>        at
>>
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protoco
>> l.java:588)
>>        at
>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> any suggestion and reference are appreciated! thanks
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: exception with xml file processing

Posted by Erick Erickson <er...@gmail.com>.
This often happens if there is some character at the very beginning
of the XML document, outside of any tags, here:

character ''' (code 39) in prolog; expected '<'
 at [row,col {unknown-source}]: [1,1]

But you indicate that this is happening for every document? If that's
the case, it may be an encoding issue. Make sure your servlet container
character encoding handles the character set in your
documents (UTF-8?).

If that doesn't help, please show us an example of a file that doesn't work,
the version of Solr you're using, info about your servlet, etc.

Best
Erick

On Sun, Dec 26, 2010 at 10:24 PM, xu cheng <xc...@gmail.com> wrote:

> hi all:
>  I use solr to index my documents, and I put my text in a cdata
> segment.however, solr always throws an exception complaining about
> thexml file processing
> .
>  It seems that I can still index the document successfully!!!(actually ,
> I'm
> not sure about cos there are pretty too many document!)
>
>
> the exception stack is like this: and all the exception infos are the same
>
>
>
>
>  Error processing "legacy" update
> command:com.ctc.wstx.exc.WstxUnexpectedCharException: Une
> xpected character ''' (code 39) in prolog; expected '<'
>  at [row,col {unknown-source}]: [1,1]
>        at
> com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
>        at
>
> com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2047)
>        at
> com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
>        at
> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:90)
>        at
>
> org.apache.solr.handler.XmlUpdateRequestHandler.doLegacyUpdate(XmlUpdateRequestHandle
> r.java:130)
>        at
> org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:79)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>        at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterCha
> in.java:290)
>        at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:
> 206)
>        at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
>        at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterCha
> in.java:235)
>        at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:
> 206)
>        at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>        at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>        at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
>        at
>
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protoco
> l.java:588)
>        at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>        at java.lang.Thread.run(Thread.java:619)
>
>
>
>
>
>
>
>
>
>
> any suggestion and reference are appreciated! thanks
>