You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jörg Agatz <jo...@googlemail.com> on 2010/07/11 17:56:21 UTC

problem to indexing

Hallo Users..

I have a lot of work :-)
i havt ti indexing Mails.. .. And it works, but sometime i get Errors, ind i
Dont know why..

Maby you can Help..


My XML:

<?xml version="1.0" encoding="UTF-8" ?>
<add>
<doc>
<field name="FILE_ITEMS_MD5SUM">acd1a9416fe3ddaca1442e44e0ca8751</field>
<field
name="FILE_ITEMS_DATEINAME">0747046BCC415546B565108E3DA19406BF677F9715@XXXXXXXXXXXXXXXXXXXXXXXX
</field>
<field name="FILE_ITEMS_ARCHIVED">20100709140235</field>
<field name="FILE_ITEMS_SIZE">103261</field>
<field
name="FILE_ITEMS_PFAD">/2010/07/0747046BCC415546B565108E3DA19406BF677F9715@XXXXXXXXXXXXXXXXXXXXXXXX
</field>
<field
name="EMAIL_ID_ID">0747046BCC415546B565108E3DA19406BF677F9715@XXXXXXXXXXXXXXXXXXXXX
</field>
<field name="EMAIL_ID_IN_REPLY_TO">4B4366D9.60709@dueker.de</field>
<field
name="EMAIL_ID_REFERENCE">0747046BCC415546B565108E3DA19406BF6778C580@XXXXXXXXXXXXXXXXX
</field>
<field name="EMAIL_ID_REFERENCE">4B4366D9.60709@XXXXXXXXXXX</field>
<field name="EMAIL_HEADER_FROM">Ncorban@XXXXXXXXXXXXXXXXX</field>
<field name="EMAIL_HEADER_TO">tas@XXXXXXXXXXXXXXXXXXX</field>
<field name="EMAIL_HEADER_DATE">20100106064718</field>
<field name="EMAIL_HEADER_SUBJECT"><![CDATA[RE:LA
LALALALALALALALALALALALALALA (PART 1)]]></field>
<field name="EMAIL_BODY_BODY"><![CDATA[]]></field>
<field name="EMAIL_ATTACHMENT_ATTACHMENT">Butterfly</field>
<field name="EMAIL_ATTACHMENT_ATTACHMENT">Valves</field>
<field name="EMAIL_ATTACHMENT_ATTACHMENT">&</field>
<field name="EMAIL_ATTACHMENT_ATTACHMENT">Gate</field>
<field name="EMAIL_ATTACHMENT_ATTACHMENT">Valves.xls</field>
</doc>
</add>

My Error:

SimplePostTool: FATAL: Solr returned an error:
Unexpected_character__code_60_expected_a_name_start_character__at_rowcol_unknownsource_2044__javaioIOException_Unexpected_character__
code_60_expected_a_name_start_character__at_rowcol_unknownsource_2044__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava73__
at_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentStreamHandlerBasejava54__at_
orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__
at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_
orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_
orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_
orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_
orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_
orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_
orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_
orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__
at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_
orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_
orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjettyHttpConnectionhandleHttpConnectionjava378__at_
orgmortbayjettybioSocketConnector$ConnectionrunSocketConnectorjava226__at_
orgmortbaythreadBoundedThreadPool$PoolThreadrunBoundedThreadPooljava442_Cau

I cant find annything

King

RE: problem to indexing

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Jörg,

Just guessing what the problem is, the following looks like it's not well-formed XML:

   <field name="EMAIL_ATTACHMENT_ATTACHMENT">&</field>

If you want just the char "&", that should instead read:

   <field name="EMAIL_ATTACHMENT_ATTACHMENT">&amp;</field>

Similarly, you should escape "<" and ">" chars in text: &lt; and &gt; respectively.

Steve

> -----Original Message-----
> From: Jörg Agatz [mailto:joerg.agatz@googlemail.com]
> Sent: Sunday, July 11, 2010 11:56 AM
> To: solr-user@lucene.apache.org
> Subject: problem to indexing
> 
> Hallo Users..
> 
> I have a lot of work :-)
> i havt ti indexing Mails.. .. And it works, but sometime i get Errors, ind
> i Dont know why..
> 
> Maby you can Help..
> 
> 
> My XML:
> 
> <?xml version="1.0" encoding="UTF-8" ?>
> <add>
> <doc>
> <field name="FILE_ITEMS_MD5SUM">acd1a9416fe3ddaca1442e44e0ca8751</field>
> <field
> name="FILE_ITEMS_DATEINAME">0747046BCC415546B565108E3DA19406BF677F9715@XXX
> XXXXXXXXXXXXXXXXXXXXX
> </field>
> <field name="FILE_ITEMS_ARCHIVED">20100709140235</field>
> <field name="FILE_ITEMS_SIZE">103261</field>
> <field
> name="FILE_ITEMS_PFAD">/2010/07/0747046BCC415546B565108E3DA19406BF677F9715
> @XXXXXXXXXXXXXXXXXXXXXXXX
> </field>
> <field
> name="EMAIL_ID_ID">0747046BCC415546B565108E3DA19406BF677F9715@XXXXXXXXXXXX
> XXXXXXXXX
> </field>
> <field name="EMAIL_ID_IN_REPLY_TO">4B4366D9.60709@dueker.de</field>
> <field
> name="EMAIL_ID_REFERENCE">0747046BCC415546B565108E3DA19406BF6778C580@XXXXX
> XXXXXXXXXXXX
> </field>
> <field name="EMAIL_ID_REFERENCE">4B4366D9.60709@XXXXXXXXXXX</field>
> <field name="EMAIL_HEADER_FROM">Ncorban@XXXXXXXXXXXXXXXXX</field>
> <field name="EMAIL_HEADER_TO">tas@XXXXXXXXXXXXXXXXXXX</field>
> <field name="EMAIL_HEADER_DATE">20100106064718</field>
> <field name="EMAIL_HEADER_SUBJECT"><![CDATA[RE:LA
> LALALALALALALALALALALALALALA (PART 1)]]></field> <field
> name="EMAIL_BODY_BODY"><![CDATA[]]></field>
> <field name="EMAIL_ATTACHMENT_ATTACHMENT">Butterfly</field>
> <field name="EMAIL_ATTACHMENT_ATTACHMENT">Valves</field>
> <field name="EMAIL_ATTACHMENT_ATTACHMENT">&</field>
> <field name="EMAIL_ATTACHMENT_ATTACHMENT">Gate</field>
> <field name="EMAIL_ATTACHMENT_ATTACHMENT">Valves.xls</field>
> </doc>
> </add>
> 
> My Error:
> 
> SimplePostTool: FATAL: Solr returned an error:
> Unexpected_character__code_60_expected_a_name_start_character__at_rowcol_u
> nknownsource_2044__javaioIOException_Unexpected_character__
> code_60_expected_a_name_start_character__at_rowcol_unknownsource_2044__at_
> orgapachesolrhandlerXMLLoaderloadXMLLoaderjava73__
> at_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentStr
> eamHandlerBasejava54__at_
> orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava1
> 31__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__
> at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338_
> _at_
> orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__a
> t_
> orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava
> 1089__at_
> orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmor
> tbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_
> orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmor
> tbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_
> orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_
> orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollecti
> onjava211__at_
> orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_
> orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__
> at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnect
> ionhandleRequestHttpConnectionjava502__at_
> orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__
> at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_
> orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjet
> tyHttpConnectionhandleHttpConnectionjava378__at_
> orgmortbayjettybioSocketConnector$ConnectionrunSocketConnectorjava226__at_
> orgmortbaythreadBoundedThreadPool$PoolThreadrunBoundedThreadPooljava442_Ca
> u
> 
> I cant find annything
> 
> King