You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Charles Wardell <ch...@bcsolution.com> on 2011/04/06 22:26:39 UTC
unindexible Chars?
Once and awhile, my post.jar seems to fail on commit. Durring the commit process, I have gotten a few errors. One is that EOF character found, and another is that semicolon expected after &the. I also have come across a > was expected.
So my question is what characters do I need to strip out of the source text to ensure all posts are sucessful?
One side note. I have placed the text fields within <![CDATA[ ]] before adding the document.
Thanks,
Charlie
Re: unindexible Chars?
Posted by Markus Jelsma <ma...@openindex.io>.
> Once and awhile, my post.jar seems to fail on commit. Durring the commit
> process, I have gotten a few errors. One is that EOF character found, and
> another is that semicolon expected after &the. I also have come across a >
> was expected.
>
> So my question is what characters do I need to strip out of the source text
> to ensure all posts are sucessful?
The usual, it _must_ be valid XML.
>
> One side note. I have placed the text fields within <![CDATA[ ]] before
> adding the document.
That's not a bad idea, then at least nothing bad can happen with the data
embedded in the element. Usually these errors indicate invalid XML.
Try xmllint with some XML body giving errors.
>
> Thanks,
> Charlie