You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by neosky <ne...@yahoo.com> on 2012/03/14 09:34:07 UTC

How to avoid the unexpected character error?

I use the xml to index the data. One filed might contains some characters
like '' <=>
It seems that will produce the error
I modify that filed doesn't index, but it doesn't work. I need to store the
filed, but index might not be indexed.
Thanks!

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3824726.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to avoid the unexpected character error?

Posted by Li Li <fa...@gmail.com>.
it's not the right place.
when you use java -Durl=http://... -jar post.jar data.xml
the data.xml file must be a valid xml file. you shoud escape special chars
in this file.
I don't know how you generate this file.
if you use java program(or other scripts) to generate this file, you should
use xml tools to generate this file.
but if you generate like this:
StringBuilder buf=new StringBuilder();
buf.append("<add>");
buf.append("<doc>");
buf.append("<field name="fname">text content</field>");
you should escape special chars.
if you use java, you can make use of org.apache.solr.common.util.XML class

On Fri, Mar 16, 2012 at 2:03 PM, neosky <ne...@yahoo.com> wrote:

> I am sorry, but I can't get what you mean.
> I tried the  HTMLStripCharFilter and PatternReplaceCharFilter. It doesn't
> work.
> Could you give me an example? Thanks!
>
>  <fieldType name="text_html" class="solr.TextField"
> positionIncrementGap="100">
>   <analyzer>
>     <charFilter class="solr.HTMLStripCharFilterFactory"/>
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>   </analyzer>
>  </fieldType>
>
> I also tried:
>
> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([^a-z])"
> replacement=""
>                 maxBlockChars="10000" blockDelimiters="|"/>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3831064.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: How to avoid the unexpected character error?

Posted by neosky <ne...@yahoo.com>.
I am sorry, but I can't get what you mean.
I tried the  HTMLStripCharFilter and PatternReplaceCharFilter. It doesn't
work.
Could you give me an example? Thanks! 

 <fieldType name="text_html" class="solr.TextField"
positionIncrementGap="100">
   <analyzer>
     <charFilter class="solr.HTMLStripCharFilterFactory"/>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
   </analyzer>
 </fieldType>

I also tried:

<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([^a-z])"
replacement=""
                 maxBlockChars="10000" blockDelimiters="|"/>

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3831064.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to avoid the unexpected character error?

Posted by Li Li <fa...@gmail.com>.
no, it's nothing to do with schema.xml
post.jar just post a file, it don't parse this file.
solr will use xml parser to parse this file. if you don't escape special
characters, it's not a valid xml file and solr will throw exceptions.

On Thu, Mar 15, 2012 at 12:33 AM, neosky <ne...@yahoo.com> wrote:

> Thanks!
> Does the schema.xml support this parameter? I am using the example post.jar
> to index my file.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3825959.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: How to avoid the unexpected character error?

Posted by neosky <ne...@yahoo.com>.
Thanks!
Does the schema.xml support this parameter? I am using the example post.jar
to index my file.

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3825959.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to avoid the unexpected character error?

Posted by Li Li <fa...@gmail.com>.
There is a class org.apache.solr.common.util.XML in solr
you can use this wrapper:
    public static String escapeXml(String s) throws IOException{
        StringWriter sw=new StringWriter();
        XML.escapeCharData(s, sw);
        return sw.getBuffer().toString();
    }

On Wed, Mar 14, 2012 at 4:34 PM, neosky <ne...@yahoo.com> wrote:

> I use the xml to index the data. One filed might contains some characters
> like '' <=>
> It seems that will produce the error
> I modify that filed doesn't index, but it doesn't work. I need to store the
> filed, but index might not be indexed.
> Thanks!
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3824726.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>