You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pranav Prakash <pr...@gmail.com> on 2011/09/19 10:46:31 UTC

java.io.CharConversionException While Indexing in Solr 3.4

Hi List,

I tried Solr 3.4.0 today and while indexing I got the error
java.lang.RuntimeException: [was class java.io.CharConversionException]
Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289)

My earlier version was Solr 1.4 and this same document went into index
successfully. Looking around, I see issue
https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the
issue. I thought this patch is already applied to Solr 3.4.0. Is there
something I am missing?

Is there anything else I need to mention? Logs/ My document details etc.?

*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>

Re: java.io.CharConversionException While Indexing in Solr 3.4

Posted by Erik Hatcher <er...@gmail.com>.
And to further clarify, the issue isn't in solr-ruby, it's in REXML (a lame Ruby XML library).  Both rsolr and solr-ruby will use libxml instead of REXML if it is present.

	Erik

On Sep 20, 2011, at 03:46 , Pranav Prakash wrote:

> I managed to resolve this issue. Turns out that the issue was because of a
> faulty XML file being generated by ruby-solr gem. I had to install
> libxml-ruby, rsolr and I used rsolr gem instead of ruby-solr.
> 
> Also, if you face this kind of issue, the test-utf8.sh file included in
> exampledocs is a good file to test Solr's behavior towards UTF-8 chars.
> 
> Great wok Solr team, and special thanks to Erik Hatcher.
> 
> *Pranav Prakash*
> 
> "temet nosce"
> 
> Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
> Google <http://www.google.com/profiles/pranny>
> 
> 
> On Mon, Sep 19, 2011 at 15:54, Pranav Prakash <pr...@gmail.com> wrote:
> 
>> 
>> Just in case, someone might be intrested here is the log
>> 
>> SEVERE: java.lang.RuntimeException: [was class
>> java.io.CharConversionException] Invalid UTF-8 middle byte 0x73 (at char
>> #66641, byte #65289)
>> at
>> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
>> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
>> at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>> at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:287)
>> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:146)
>> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
>> at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>> at
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>> at
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>> at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>> at
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>> at
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>> at
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>> at org.mortbay.jetty.Server.handle(Server.java:326)
>> at
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>> at
>> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
>> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
>> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
>> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>> at
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>> at
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>> Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x73
>> (at char #66641, byte #65289)
>> at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313)
>> at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204)
>> at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
>> at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
>> at
>> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
>> at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
>> ... 26 more
>> 
>> 
>> Also, is there a setting so I can change the level of backtrace? This would
>> be helpful in showing the complete stack instead of 26 more ...
>> 
>> *Pranav Prakash*
>> 
>> "temet nosce"
>> 
>> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> |
>> Google <http://www.google.com/profiles/pranny>
>> 
>> 
>> On Mon, Sep 19, 2011 at 14:16, Pranav Prakash <pr...@gmail.com> wrote:
>> 
>>> 
>>> Hi List,
>>> 
>>> I tried Solr 3.4.0 today and while indexing I got the error
>>> java.lang.RuntimeException: [was class java.io.CharConversionException]
>>> Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289)
>>> 
>>> My earlier version was Solr 1.4 and this same document went into index
>>> successfully. Looking around, I see issue
>>> https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the
>>> issue. I thought this patch is already applied to Solr 3.4.0. Is there
>>> something I am missing?
>>> 
>>> Is there anything else I need to mention? Logs/ My document details etc.?
>>> 
>>> *Pranav Prakash*
>>> 
>>> "temet nosce"
>>> 
>>> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> |
>>> Google <http://www.google.com/profiles/pranny>
>>> 
>> 
>> 


Re: java.io.CharConversionException While Indexing in Solr 3.4

Posted by Pranav Prakash <pr...@gmail.com>.
I managed to resolve this issue. Turns out that the issue was because of a
faulty XML file being generated by ruby-solr gem. I had to install
libxml-ruby, rsolr and I used rsolr gem instead of ruby-solr.

Also, if you face this kind of issue, the test-utf8.sh file included in
exampledocs is a good file to test Solr's behavior towards UTF-8 chars.

Great wok Solr team, and special thanks to Erik Hatcher.

*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>


On Mon, Sep 19, 2011 at 15:54, Pranav Prakash <pr...@gmail.com> wrote:

>
> Just in case, someone might be intrested here is the log
>
> SEVERE: java.lang.RuntimeException: [was class
> java.io.CharConversionException] Invalid UTF-8 middle byte 0x73 (at char
> #66641, byte #65289)
>  at
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
>  at
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>  at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:287)
> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:146)
>  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
>  at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>  at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>  at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>  at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>  at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>  at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>  at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
>  at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
>  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
>  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>  at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x73
> (at char #66641, byte #65289)
>  at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313)
> at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204)
>  at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
> at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
>  at
> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
> at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
>  at
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
> at
> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
>  at
> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
> at
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
>  ... 26 more
>
>
> Also, is there a setting so I can change the level of backtrace? This would
> be helpful in showing the complete stack instead of 26 more ...
>
> *Pranav Prakash*
>
> "temet nosce"
>
> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> |
> Google <http://www.google.com/profiles/pranny>
>
>
> On Mon, Sep 19, 2011 at 14:16, Pranav Prakash <pr...@gmail.com> wrote:
>
>>
>> Hi List,
>>
>> I tried Solr 3.4.0 today and while indexing I got the error
>> java.lang.RuntimeException: [was class java.io.CharConversionException]
>> Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289)
>>
>> My earlier version was Solr 1.4 and this same document went into index
>> successfully. Looking around, I see issue
>> https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the
>> issue. I thought this patch is already applied to Solr 3.4.0. Is there
>> something I am missing?
>>
>> Is there anything else I need to mention? Logs/ My document details etc.?
>>
>> *Pranav Prakash*
>>
>> "temet nosce"
>>
>> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> |
>> Google <http://www.google.com/profiles/pranny>
>>
>
>

Re: java.io.CharConversionException While Indexing in Solr 3.4

Posted by Pranav Prakash <pr...@gmail.com>.
Just in case, someone might be intrested here is the log

SEVERE: java.lang.RuntimeException: [was class
java.io.CharConversionException] Invalid UTF-8 middle byte 0x73 (at char
#66641, byte #65289)
 at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
 at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
 at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:287)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:146)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x73
(at char #66641, byte #65289)
 at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204)
 at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
 at
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
 at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
 at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
 ... 26 more


Also, is there a setting so I can change the level of backtrace? This would
be helpful in showing the complete stack instead of 26 more ...

*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>


On Mon, Sep 19, 2011 at 14:16, Pranav Prakash <pr...@gmail.com> wrote:

>
> Hi List,
>
> I tried Solr 3.4.0 today and while indexing I got the error
> java.lang.RuntimeException: [was class java.io.CharConversionException]
> Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289)
>
> My earlier version was Solr 1.4 and this same document went into index
> successfully. Looking around, I see issue
> https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the
> issue. I thought this patch is already applied to Solr 3.4.0. Is there
> something I am missing?
>
> Is there anything else I need to mention? Logs/ My document details etc.?
>
> *Pranav Prakash*
>
> "temet nosce"
>
> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com> |
> Google <http://www.google.com/profiles/pranny>
>