You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Charles Wardell <ch...@bcsolution.com> on 2011/04/06 22:26:39 UTC

unindexible Chars?

Once and awhile, my post.jar seems to fail on commit. Durring the commit process, I have gotten a few errors. One is that EOF character found, and another is that semicolon expected after &the. I also have come across a > was expected.

So my question is what characters do I need to strip out of the source text to ensure all posts are sucessful?

One side note. I have placed the text fields within <![CDATA[ ]] before adding the document.

Thanks,
Charlie 
 


Re: unindexible Chars?

Posted by Markus Jelsma <ma...@openindex.io>.
> Once and awhile, my post.jar seems to fail on commit. Durring the commit
> process, I have gotten a few errors. One is that EOF character found, and
> another is that semicolon expected after &the. I also have come across a >
> was expected.
> 
> So my question is what characters do I need to strip out of the source text
> to ensure all posts are sucessful?

The usual, it _must_ be valid XML.

> 
> One side note. I have placed the text fields within <![CDATA[ ]] before
> adding the document.

That's not a bad idea, then at least nothing bad can happen with the data 
embedded in the element. Usually these errors indicate invalid XML.

Try xmllint with some XML body giving errors.


> 
> Thanks,
> Charlie