You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Belenki <va...@belenki.name> on 2012/07/06 10:58:59 UTC

Problem while indexing XML file with special characters represented ü

Dear community,

I am experiencing strange problem while trying to index / to import XML
document to SOLR via DataImportHandler. The XML document contains some
special characters (e.g. german ü) that are represented as XML entities
&uuml; or &auml;. There is also DTD file that defines these entities
(<!ENTITY uuml    "&#252;" >) (I tried to use dtd file as well as to
include the DTD definition to the xml itself). After I start the import
command full-import, the import process throws an exception as soon as it
tries to parse &uuml;: "Un
declared general entity "uuml". Did anyone already face such a problem? 

best regards,

Michael


My data-config for importing is:


<dataConfig>
        <dataSource type="FileDataSource" encoding="ISO-8859-1" />
        <document>
		<!--  stream should be true since huge xml document is being parsed -->
        <entity name="article"
                processor="XPathEntityProcessor"
                stream="true"
                forEach="/dblp/article"
                url="documents/dblp.xml"

                >
            <field column="key"        xpath="/dblp/article/@key" />
            <field column="title"     xpath="/dblp/article/title" />
			
			
       </entity>
        </document>
</dataConfig>

The XML file looks e.g. like this:

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE dblp [

    <!ENTITY uuml    "&#252;" ><!-- small u, dieresis or umlaut mark -->
]>
<dblp>

<article key="journals/fm/Riccardi09" mdate="2011-10-27">
<author>Marco Riccardi</author>
<title>Solution of Cubic and Quartic Equations.&uuml;</title>
<pages>117-122</pages>
<year>2009</year>
<volume>17</volume>

<journal>Formalized Mathematics</journal>

<number>1-4</number>
<ee>http://dx.doi.org/10.2478/v10037-009-0012-z</ee><url>db/journals/fm/fm17.html#Riccardi09</url>
</article></dblp>

The stack-trace is:

05.07.2012 17:37:19 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {deleteByQuery=*:*,add=[persons/Codd71a, persons/Hall74]} 0 1
05.07.2012 17:37:19 org.apache.solr.common.SolrException log
SCHWERWIEGEND: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeE
xception: org.apache.solr.handler.dataimport.DataImportHandlerException:
Parsing
 failed for xml, url:documents/dblp.xml rows processed in this xml:2 last
row in
 this xml:{title=Common Subexpression Identification in General Algebraic
System
s., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:264)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:375)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:445)
        at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:426)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataIm
portHandlerException: Parsing failed for xml, url:documents/dblp.xml rows
proces
sed in this xml:2 last row in this xml:{title=Common Subexpression
Identificatio
n in General Algebraic Systems., $forEach=/dblp/article,
key=persons/Hall74} Pro
cessing Document # 3
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:621)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:327)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:225)
        ... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Parsin
g failed for xml, url:documents/dblp.xml rows processed in this xml:2 last
row i
n this xml:{title=Common Subexpression Identification in General Algebraic
Syste
ms., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3
        at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
Throw(DataImportHandlerException.java:72)
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE
ntityProcessor.java:504)
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE
ntityProcessor.java:517)
        at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(Entity
ProcessorBase.java:120)
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(
XPathEntityProcessor.java:225)
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPath
EntityProcessor.java:204)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(Ent
ityProcessorWrapper.java:330)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:296)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:683)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:619)
        ... 5 more
Caused by: java.lang.RuntimeException:
com.ctc.wstx.exc.WstxParsingException: Un
declared general entity "uuml"
 at [row,col {unknown-source}]: [26,42]
        at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XP
athRecordReader.java:187)
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor$2.run(XPathEn
tityProcessor.java:427)
Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared general
entity "uum
l"
 at [row,col {unknown-source}]: [26,42]
        at
com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.jav
a:630)
        at
com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:467)

        at
com.ctc.wstx.sr.BasicStreamReader.handleUndeclaredEntity(BasicStreamR
eader.java:5431)
        at
com.ctc.wstx.sr.StreamScanner.expandUnresolvedEntity(StreamScanner.ja
va:1661)
        at
com.ctc.wstx.sr.StreamScanner.expandEntity(StreamScanner.java:1555)
        at
com.ctc.wstx.sr.StreamScanner.fullyResolveEntity(StreamScanner.java:1
523)
        at
com.ctc.wstx.sr.BasicStreamReader.skipTokenText(BasicStreamReader.jav
a:3568)
        at
com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:33
42)
        at
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java
:2622)
        at
com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
        at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStart
Element(XPathRecordReader.java:376)
        at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPath
RecordReader.java:310)
        at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStart
Element(XPathRecordReader.java:346)
        at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPath
RecordReader.java:310)
        at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStart
Element(XPathRecordReader.java:346)
        at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPath
RecordReader.java:310)
        at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$200(
XPathRecordReader.java:202)
        at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XP
athRecordReader.java:184)
        ... 1 more

05.07.2012 17:37:19 org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
05.07.2012 17:37:19 org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback


Re: Problem while indexing XML file with special characters represented ü

Posted by Mike Sokolov <so...@ifactory.com>.
I don't have any experience with DIH: maybe XPathEntityProcessor doesn't 
use a true XML parser?

You might want to try passing your documents through "xmllint -noent" 
(basically parse and reserialize) - that should inline the characters as 
UTF-8?

On 07/09/2012 03:18 PM, Michael Belenki wrote:
> Somebody any idea? Solr seems to ignore the DTD definition and therefore
> does not understand the entities like&uuml; or&auml; that are defined in
> dtd. Is it the problem? If yes how can I tell SOLR to consider the DTD
> definition?
>
> On Fri, 06 Jul 2012 10:58:59 +0200, Michael Belenki<va...@belenki.name>
> wrote:
>    
>> Dear community,
>>
>> I am experiencing strange problem while trying to index / to import XML
>> document to SOLR via DataImportHandler. The XML document contains some
>> special characters (e.g. german ü) that are represented as XML entities
>> ü or ä. There is also DTD file that defines these entities
>> (<!ENTITY uuml    "ü">) (I tried to use dtd file as well as to
>> include the DTD definition to the xml itself). After I start the import
>> command full-import, the import process throws an exception as soon as
>>      
> it
>    
>> tries to parse ü: "Un
>> declared general entity "uuml". Did anyone already face such a problem?
>>
>> best regards,
>>
>> Michael
>>
>>
>> My data-config for importing is:
>>
>>
>> <dataConfig>
>>          <dataSource type="FileDataSource" encoding="ISO-8859-1" />
>>          <document>
>> 		<!--  stream should be true since huge xml document is being parsed
>>      
> -->
>    
>>          <entity name="article"
>>                  processor="XPathEntityProcessor"
>>                  stream="true"
>>                  forEach="/dblp/article"
>>                  url="documents/dblp.xml"
>>
>>                  >
>>              <field column="key"        xpath="/dblp/article/@key" />
>>              <field column="title"     xpath="/dblp/article/title" />
>>
>>
>>         </entity>
>>          </document>
>> </dataConfig>
>>
>> The XML file looks e.g. like this:
>>
>> <?xml version="1.0" encoding="ISO-8859-1"?>
>>
>> <!DOCTYPE dblp [
>>
>>      <!ENTITY uuml    "ü"><!-- small u, dieresis or umlaut mark -->
>> ]>
>> <dblp>
>>
>> <article key="journals/fm/Riccardi09" mdate="2011-10-27">
>> <author>Marco Riccardi</author>
>> <title>Solution of Cubic and Quartic Equations.ü</title>
>> <pages>117-122</pages>
>> <year>2009</year>
>> <volume>17</volume>
>>
>> <journal>Formalized Mathematics</journal>
>>
>> <number>1-4</number>
>>
>>      
> <ee>http://dx.doi.org/10.2478/v10037-009-0012-z</ee><url>db/journals/fm/fm17.html#Riccardi09</url>
>    
>> </article></dblp>
>>
>> The stack-trace is:
>>
>> 05.07.2012 17:37:19 org.apache.solr.update.processor.LogUpdateProcessor
>> finish
>> INFO: {deleteByQuery=*:*,add=[persons/Codd71a, persons/Hall74]} 0 1
>> 05.07.2012 17:37:19 org.apache.solr.common.SolrException log
>> SCHWERWIEGEND: Full Import failed:java.lang.RuntimeException:
>> java.lang.RuntimeE
>> xception: org.apache.solr.handler.dataimport.DataImportHandlerException:
>> Parsing
>>   failed for xml, url:documents/dblp.xml rows processed in this xml:2
>>      
> last
>    
>> row in
>>   this xml:{title=Common Subexpression Identification in General
>>      
> Algebraic
>    
>> System
>> s., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>> :264)
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
>> rter.java:375)
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>> ava:445)
>>          at
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
>> va:426)
>> Caused by: java.lang.RuntimeException:
>> org.apache.solr.handler.dataimport.DataIm
>> portHandlerException: Parsing failed for xml, url:documents/dblp.xml
>>      
> rows
>    
>> proces
>> sed in this xml:2 last row in this xml:{title=Common Subexpression
>> Identificatio
>> n in General Algebraic Systems., $forEach=/dblp/article,
>> key=persons/Hall74} Pro
>> cessing Document # 3
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:621)
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
>> ava:327)
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>> :225)
>>          ... 3 more
>> Caused by:
>>      
> org.apache.solr.handler.dataimport.DataImportHandlerException:
>    
>> Parsin
>> g failed for xml, url:documents/dblp.xml rows processed in this xml:2
>>      
> last
>    
>> row i
>> n this xml:{title=Common Subexpression Identification in General
>>      
> Algebraic
>    
>> Syste
>> ms., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3
>>          at
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
>> Throw(DataImportHandlerException.java:72)
>>          at
>> org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE
>> ntityProcessor.java:504)
>>          at
>> org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE
>> ntityProcessor.java:517)
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(Entity
>> ProcessorBase.java:120)
>>          at
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(
>> XPathEntityProcessor.java:225)
>>          at
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPath
>> EntityProcessor.java:204)
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(Ent
>> ityProcessorWrapper.java:330)
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>> ityProcessorWrapper.java:296)
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:683)
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:619)
>>          ... 5 more
>> Caused by: java.lang.RuntimeException:
>> com.ctc.wstx.exc.WstxParsingException: Un
>> declared general entity "uuml"
>>   at [row,col {unknown-source}]: [26,42]
>>          at
>> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XP
>> athRecordReader.java:187)
>>          at
>> org.apache.solr.handler.dataimport.XPathEntityProcessor$2.run(XPathEn
>> tityProcessor.java:427)
>> Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared general
>> entity "uum
>> l"
>>   at [row,col {unknown-source}]: [26,42]
>>          at
>> com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.jav
>> a:630)
>>          at
>> com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:467)
>>
>>          at
>> com.ctc.wstx.sr.BasicStreamReader.handleUndeclaredEntity(BasicStreamR
>> eader.java:5431)
>>          at
>> com.ctc.wstx.sr.StreamScanner.expandUnresolvedEntity(StreamScanner.ja
>> va:1661)
>>          at
>> com.ctc.wstx.sr.StreamScanner.expandEntity(StreamScanner.java:1555)
>>          at
>> com.ctc.wstx.sr.StreamScanner.fullyResolveEntity(StreamScanner.java:1
>> 523)
>>          at
>> com.ctc.wstx.sr.BasicStreamReader.skipTokenText(BasicStreamReader.jav
>> a:3568)
>>          at
>> com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:33
>> 42)
>>          at
>> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java
>> :2622)
>>          at
>> com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
>>          at
>> org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStart
>> Element(XPathRecordReader.java:376)
>>          at
>> org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPath
>> RecordReader.java:310)
>>          at
>> org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStart
>> Element(XPathRecordReader.java:346)
>>          at
>> org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPath
>> RecordReader.java:310)
>>          at
>> org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStart
>> Element(XPathRecordReader.java:346)
>>          at
>> org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPath
>> RecordReader.java:310)
>>          at
>> org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$200(
>> XPathRecordReader.java:202)
>>          at
>> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XP
>> athRecordReader.java:184)
>>          ... 1 more
>>
>> 05.07.2012 17:37:19 org.apache.solr.update.DirectUpdateHandler2 rollback
>> INFO: start rollback
>> 05.07.2012 17:37:19 org.apache.solr.update.DirectUpdateHandler2 rollback
>> INFO: end_rollback
>>      

Re: Problem while indexing XML file with special characters represented ü

Posted by Mike Sokolov <so...@ifactory.com>.
I think the issue here is that DIH uses Woodstox "BasicStreamReader" 
(see 
http://woodstox.codehaus.org/3.2.9/javadoc/com/ctc/wstx/sr/BasicStreamReader.html) 
which has only minimal DTD support.  It might be best to use 
ValidatingStreamReader 
(http://woodstox.codehaus.org/3.2.9/javadoc/com/ctc/wstx/sr/ValidatingStreamReader.html) 
instead.

I think you could get this by requesting a validating XmlReader; that's 
a setting that's exposed at the factory level that returns a parser (ie 
an XmlReader).  But then you would probably also get validation turned 
on, which might not be so great in all cases.  Probably should be a user 
setting for XPathEntityProcessor somewhere?

-Mike

On 07/10/2012 07:10 PM, Chris Hostetter wrote:
> : Somebody any idea? Solr seems to ignore the DTD definition and therefore
> : does not understand the entities like&uuml; or&auml; that are defined in
> : dtd. Is it the problem? If yes how can I tell SOLR to consider the DTD
> : definition?
>
> Solr is just utilizing the builtin java XML parser for this, so there's
> nothing you can tell solr to "consider the DTD" but it is odd that this
> isn't working by default with java's parser -- i supsect there is some
> "hint" XPathEntityProcessor should be giving hte parser to ask it to look
> at these ENTITY declarations.
>
> I've filed a Jira issue to try and track this (and included a test case)
> but unfortunately i don't relaly know what the fix is...
>
> https://issues.apache.org/jira/browse/SOLR-3614
>
>
>
> -Hoss
>    

Re: Problem while indexing XML file with special characters represented ü

Posted by Chris Hostetter <ho...@fucit.org>.
: Somebody any idea? Solr seems to ignore the DTD definition and therefore
: does not understand the entities like &uuml; or &auml; that are defined in
: dtd. Is it the problem? If yes how can I tell SOLR to consider the DTD
: definition?

Solr is just utilizing the builtin java XML parser for this, so there's 
nothing you can tell solr to "consider the DTD" but it is odd that this 
isn't working by default with java's parser -- i supsect there is some 
"hint" XPathEntityProcessor should be giving hte parser to ask it to look 
at these ENTITY declarations.

I've filed a Jira issue to try and track this (and included a test case) 
but unfortunately i don't relaly know what the fix is...

https://issues.apache.org/jira/browse/SOLR-3614



-Hoss

Re: Problem while indexing XML file with special characters represented ü

Posted by Michael Belenki <va...@belenki.name>.
Somebody any idea? Solr seems to ignore the DTD definition and therefore
does not understand the entities like &uuml; or &auml; that are defined in
dtd. Is it the problem? If yes how can I tell SOLR to consider the DTD
definition?

On Fri, 06 Jul 2012 10:58:59 +0200, Michael Belenki <va...@belenki.name>
wrote:
> Dear community,
> 
> I am experiencing strange problem while trying to index / to import XML
> document to SOLR via DataImportHandler. The XML document contains some
> special characters (e.g. german ü) that are represented as XML entities
> ü or ä. There is also DTD file that defines these entities
> (<!ENTITY uuml    "ü" >) (I tried to use dtd file as well as to
> include the DTD definition to the xml itself). After I start the import
> command full-import, the import process throws an exception as soon as
it
> tries to parse ü: "Un
> declared general entity "uuml". Did anyone already face such a problem? 
> 
> best regards,
> 
> Michael
> 
> 
> My data-config for importing is:
> 
> 
> <dataConfig>
>         <dataSource type="FileDataSource" encoding="ISO-8859-1" />
>         <document>
> 		<!--  stream should be true since huge xml document is being parsed
-->
>         <entity name="article"
>                 processor="XPathEntityProcessor"
>                 stream="true"
>                 forEach="/dblp/article"
>                 url="documents/dblp.xml"
> 
>                 >
>             <field column="key"        xpath="/dblp/article/@key" />
>             <field column="title"     xpath="/dblp/article/title" />
> 
> 
>        </entity>
>         </document>
> </dataConfig>
> 
> The XML file looks e.g. like this:
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> 
> <!DOCTYPE dblp [
> 
>     <!ENTITY uuml    "ü" ><!-- small u, dieresis or umlaut mark -->
> ]>
> <dblp>
> 
> <article key="journals/fm/Riccardi09" mdate="2011-10-27">
> <author>Marco Riccardi</author>
> <title>Solution of Cubic and Quartic Equations.ü</title>
> <pages>117-122</pages>
> <year>2009</year>
> <volume>17</volume>
> 
> <journal>Formalized Mathematics</journal>
> 
> <number>1-4</number>
>
<ee>http://dx.doi.org/10.2478/v10037-009-0012-z</ee><url>db/journals/fm/fm17.html#Riccardi09</url>
> </article></dblp>
> 
> The stack-trace is:
> 
> 05.07.2012 17:37:19 org.apache.solr.update.processor.LogUpdateProcessor
> finish
> INFO: {deleteByQuery=*:*,add=[persons/Codd71a, persons/Hall74]} 0 1
> 05.07.2012 17:37:19 org.apache.solr.common.SolrException log
> SCHWERWIEGEND: Full Import failed:java.lang.RuntimeException:
> java.lang.RuntimeE
> xception: org.apache.solr.handler.dataimport.DataImportHandlerException:
> Parsing
>  failed for xml, url:documents/dblp.xml rows processed in this xml:2
last
> row in
>  this xml:{title=Common Subexpression Identification in General
Algebraic
> System
> s., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3
>         at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
> :264)
>         at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
> rter.java:375)
>         at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
> ava:445)
>         at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
> va:426)
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataIm
> portHandlerException: Parsing failed for xml, url:documents/dblp.xml
rows
> proces
> sed in this xml:2 last row in this xml:{title=Common Subexpression
> Identificatio
> n in General Algebraic Systems., $forEach=/dblp/article,
> key=persons/Hall74} Pro
> cessing Document # 3
>         at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> r.java:621)
>         at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
> ava:327)
>         at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
> :225)
>         ... 3 more
> Caused by:
org.apache.solr.handler.dataimport.DataImportHandlerException:
> Parsin
> g failed for xml, url:documents/dblp.xml rows processed in this xml:2
last
> row i
> n this xml:{title=Common Subexpression Identification in General
Algebraic
> Syste
> ms., $forEach=/dblp/article, key=persons/Hall74} Processing Document # 3
>         at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
> Throw(DataImportHandlerException.java:72)
>         at
> org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE
> ntityProcessor.java:504)
>         at
> org.apache.solr.handler.dataimport.XPathEntityProcessor$3.next(XPathE
> ntityProcessor.java:517)
>         at
> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(Entity
> ProcessorBase.java:120)
>         at
> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(
> XPathEntityProcessor.java:225)
>         at
> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPath
> EntityProcessor.java:204)
>         at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(Ent
> ityProcessorWrapper.java:330)
>         at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
> ityProcessorWrapper.java:296)
>         at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> r.java:683)
>         at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> r.java:619)
>         ... 5 more
> Caused by: java.lang.RuntimeException:
> com.ctc.wstx.exc.WstxParsingException: Un
> declared general entity "uuml"
>  at [row,col {unknown-source}]: [26,42]
>         at
> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XP
> athRecordReader.java:187)
>         at
> org.apache.solr.handler.dataimport.XPathEntityProcessor$2.run(XPathEn
> tityProcessor.java:427)
> Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared general
> entity "uum
> l"
>  at [row,col {unknown-source}]: [26,42]
>         at
> com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.jav
> a:630)
>         at
> com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:467)
> 
>         at
> com.ctc.wstx.sr.BasicStreamReader.handleUndeclaredEntity(BasicStreamR
> eader.java:5431)
>         at
> com.ctc.wstx.sr.StreamScanner.expandUnresolvedEntity(StreamScanner.ja
> va:1661)
>         at
> com.ctc.wstx.sr.StreamScanner.expandEntity(StreamScanner.java:1555)
>         at
> com.ctc.wstx.sr.StreamScanner.fullyResolveEntity(StreamScanner.java:1
> 523)
>         at
> com.ctc.wstx.sr.BasicStreamReader.skipTokenText(BasicStreamReader.jav
> a:3568)
>         at
> com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:33
> 42)
>         at
> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java
> :2622)
>         at
> com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
>         at
> org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStart
> Element(XPathRecordReader.java:376)
>         at
> org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPath
> RecordReader.java:310)
>         at
> org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStart
> Element(XPathRecordReader.java:346)
>         at
> org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPath
> RecordReader.java:310)
>         at
> org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStart
> Element(XPathRecordReader.java:346)
>         at
> org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPath
> RecordReader.java:310)
>         at
> org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$200(
> XPathRecordReader.java:202)
>         at
> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XP
> athRecordReader.java:184)
>         ... 1 more
> 
> 05.07.2012 17:37:19 org.apache.solr.update.DirectUpdateHandler2 rollback
> INFO: start rollback
> 05.07.2012 17:37:19 org.apache.solr.update.DirectUpdateHandler2 rollback
> INFO: end_rollback