You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nick Way <ni...@southeastpublishing.com> on 2017/02/28 16:01:21 UTC

Invalid UTF-8 character 0xffff at char #17373581, byte #17539047

Hello everyone,

We use Solr (with Adobe Coldfusion) to index circa 60,000 pdfs, however the
daily refresh has been failing with this error "Invalid UTF-8 character
0xffff at char #17373581, byte #17539047...." [truncated - full error
message is posted below]

   -
   - Can Solr be configured to skip problematic documents (eg those
   containing an invalid character)?
   - Can Solr be configured to log which document it had a problem indexing?
   - If no to both of the above, do you have any suggestions for how I can
   either detect the problematic document or stop Solr erroring on it?


Thank you very much indeed.

Kind regards,


Nick Way

full error message:

[was class java.io.CharConversionException] Invalid UTF-8 character 0xffff
at char #17373581, byte #17539047) java.lang.RuntimeException: [was class
java.io.CharConversionException] Invalid UTF-8 character 0xffff at char
#17373581, byte #17539047) at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at
org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:301) at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157) at
org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(H [was class
java.io.CharConversionException] Invalid UTF-8 character 0xffff at char
#17373581, byte #17539047) java.lang.RuntimeException: [was class
java.io.CharConversionException] Invalid UTF-8 character 0xffff at char
#17373581, byte #17539047) at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at
org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:301) at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157) at
org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(H request:
http://localhost:8985/solr/solr77b/update?commit=true&waitFlush=false&waitSearcher=false&wt=xml&version=2.2