You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fergus McMenemie <fe...@twig.me.uk> on 2009/01/16 11:04:09 UTC

DIH XPathEntityProcessor fails with docs containing

Hello all, as the subject says:
   DIH XPathEntityProcessor fails with docs containing <!DOCTYPE>
   
This is using a solr nightly build from monday.

INFO: Server startup in 3623 ms
Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
INFO: Read dataimport.properties
Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrCore execute
INFO: [jdocs] webapp=/solr path=/walkj params={command=full-import} status=0 QTime=13 
Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.DataImporter doFullImport
INFO: Starting Full Import
Jan 16, 2009 9:54:12 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [jdocs] REMOVING ALL DOCUMENTS FROM INDEX
Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=2
	commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_c,version=1232026423291,generation=12,filenames=[segments_c, _4.fnm, _4.frq, _4.prx, _4.tis, _4.tii, _4.nrm, _4.fdx, _4.fdt]
	commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_d,version=1232026423292,generation=13,filenames=[segments_d]
Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: last commit = 1232026423292
Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DocBuilder buildDocument
SEVERE: Exception while processing: jcurrent document : null
org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:/j/dtd/jxml/data/news/2008/frp70450.xmlrows processed :0 Processing Document # 1
	at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
	at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252)
	at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177)
	at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
	at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
	at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
	at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
	at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
	at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
 at [row,col {unknown-source}]: [3,81]
	at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
	at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242)
	... 9 more
Caused by: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
 at [row,col {unknown-source}]: [3,81]
	at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
	at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
	at com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:475)
	at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:358)
	at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3351)
	at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1988)
	at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
	at org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:141)
	at org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$000(XPathRecordReader.java:89)
	at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:82)
	... 10 more
Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DataImporter doFullImport
SEVERE: Full Import failed

A fragment from the top of the failing document is

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="../../../../config/support/j-deliver.xsl"?>
<!DOCTYPE j:record SYSTEM "../../../../config/jml-delivery-norm-2.1.dtd">
<j:record xmlns:j="http://dtd.j.com/2002/Content/" id="frp70450"  urname="record">
  <j:metadata xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="" urname="metadata" xlink:type="simple">
    <dc:date xmlns:dc="http://purl.org/dc/elements/1.1/" qualifier="pdate">20080131</dc:date>

The DTD does exist at the specified location. Removing the DOCTYPE directive
fixes everything. I know that use of DOCTYPE is out of fashion, and it does
not exist in our newer documents, however there are lots of older XML docs 
about!

Regards Fergus.
-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Re: DIH XPathEntityProcessor fails with docs containing

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
I have raised an issue and a patch is provided.
Please confirm if it helps
https://issues.apache.org/jira/browse/SOLR-964

On Fri, Jan 16, 2009 at 3:52 PM, Noble Paul നോബിള്‍  नोब्ळ्
<no...@gmail.com> wrote:
> stax parser automatically tries to fetch the DTD. How can we disable
> that at the parser level?
>
> On Fri, Jan 16, 2009 at 3:34 PM, Fergus McMenemie <fe...@twig.me.uk> wrote:
>> Hello all, as the subject says:
>>   DIH XPathEntityProcessor fails with docs containing <!DOCTYPE>
>>
>> This is using a solr nightly build from monday.
>>
>> INFO: Server startup in 3623 ms
>> Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
>> INFO: Read dataimport.properties
>> Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrCore execute
>> INFO: [jdocs] webapp=/solr path=/walkj params={command=full-import} status=0 QTime=13
>> Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.DataImporter doFullImport
>> INFO: Starting Full Import
>> Jan 16, 2009 9:54:12 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll
>> INFO: [jdocs] REMOVING ALL DOCUMENTS FROM INDEX
>> Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy onInit
>> INFO: SolrDeletionPolicy.onInit: commits:num=2
>>        commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_c,version=1232026423291,generation=12,filenames=[segments_c, _4.fnm, _4.frq, _4.prx, _4.tis, _4.tii, _4.nrm, _4.fdx, _4.fdt]
>>        commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_d,version=1232026423292,generation=13,filenames=[segments_d]
>> Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
>> INFO: last commit = 1232026423292
>> Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DocBuilder buildDocument
>> SEVERE: Exception while processing: jcurrent document : null
>> org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:/j/dtd/jxml/data/news/2008/frp70450.xmlrows processed :0 Processing Document # 1
>>        at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>>        at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252)
>>        at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177)
>>        at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
>>        at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
>>        at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
>>        at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
>>        at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
>>        at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
>>        at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
>>        at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
>> Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
>>  at [row,col {unknown-source}]: [3,81]
>>        at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
>>        at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242)
>>        ... 9 more
>> Caused by: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
>>  at [row,col {unknown-source}]: [3,81]
>>        at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
>>        at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
>>        at com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:475)
>>        at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:358)
>>        at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3351)
>>        at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1988)
>>        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
>>        at org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:141)
>>        at org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$000(XPathRecordReader.java:89)
>>        at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:82)
>>        ... 10 more
>> Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DataImporter doFullImport
>> SEVERE: Full Import failed
>>
>> A fragment from the top of the failing document is
>>
>> <?xml version="1.0" encoding="ISO-8859-1"?>
>> <?xml-stylesheet type="text/xsl" href="../../../../config/support/j-deliver.xsl"?>
>> <!DOCTYPE j:record SYSTEM "../../../../config/jml-delivery-norm-2.1.dtd">
>> <j:record xmlns:j="http://dtd.j.com/2002/Content/" id="frp70450"  urname="record">
>>  <j:metadata xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="" urname="metadata" xlink:type="simple">
>>    <dc:date xmlns:dc="http://purl.org/dc/elements/1.1/" qualifier="pdate">20080131</dc:date>
>>
>> The DTD does exist at the specified location. Removing the DOCTYPE directive
>> fixes everything. I know that use of DOCTYPE is out of fashion, and it does
>> not exist in our newer documents, however there are lots of older XML docs
>> about!
>>
>> Regards Fergus.
>> --
>>
>> ===============================================================
>> Fergus McMenemie               Email:fergus@twig.me.uk
>> Techmore Ltd                   Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets             Analyst Programmer
>> ===============================================================
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul

Re: DIH XPathEntityProcessor fails with docs containing

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
stax parser automatically tries to fetch the DTD. How can we disable
that at the parser level?

On Fri, Jan 16, 2009 at 3:34 PM, Fergus McMenemie <fe...@twig.me.uk> wrote:
> Hello all, as the subject says:
>   DIH XPathEntityProcessor fails with docs containing <!DOCTYPE>
>
> This is using a solr nightly build from monday.
>
> INFO: Server startup in 3623 ms
> Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
> INFO: Read dataimport.properties
> Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrCore execute
> INFO: [jdocs] webapp=/solr path=/walkj params={command=full-import} status=0 QTime=13
> Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.DataImporter doFullImport
> INFO: Starting Full Import
> Jan 16, 2009 9:54:12 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll
> INFO: [jdocs] REMOVING ALL DOCUMENTS FROM INDEX
> Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy onInit
> INFO: SolrDeletionPolicy.onInit: commits:num=2
>        commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_c,version=1232026423291,generation=12,filenames=[segments_c, _4.fnm, _4.frq, _4.prx, _4.tis, _4.tii, _4.nrm, _4.fdx, _4.fdt]
>        commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_d,version=1232026423292,generation=13,filenames=[segments_d]
> Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
> INFO: last commit = 1232026423292
> Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DocBuilder buildDocument
> SEVERE: Exception while processing: jcurrent document : null
> org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:/j/dtd/jxml/data/news/2008/frp70450.xmlrows processed :0 Processing Document # 1
>        at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>        at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252)
>        at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177)
>        at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
>        at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
>        at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
>        at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
>        at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
>        at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
>        at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
>        at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
> Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
>  at [row,col {unknown-source}]: [3,81]
>        at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
>        at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242)
>        ... 9 more
> Caused by: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
>  at [row,col {unknown-source}]: [3,81]
>        at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
>        at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
>        at com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:475)
>        at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:358)
>        at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3351)
>        at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1988)
>        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
>        at org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:141)
>        at org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$000(XPathRecordReader.java:89)
>        at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:82)
>        ... 10 more
> Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DataImporter doFullImport
> SEVERE: Full Import failed
>
> A fragment from the top of the failing document is
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <?xml-stylesheet type="text/xsl" href="../../../../config/support/j-deliver.xsl"?>
> <!DOCTYPE j:record SYSTEM "../../../../config/jml-delivery-norm-2.1.dtd">
> <j:record xmlns:j="http://dtd.j.com/2002/Content/" id="frp70450"  urname="record">
>  <j:metadata xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="" urname="metadata" xlink:type="simple">
>    <dc:date xmlns:dc="http://purl.org/dc/elements/1.1/" qualifier="pdate">20080131</dc:date>
>
> The DTD does exist at the specified location. Removing the DOCTYPE directive
> fixes everything. I know that use of DOCTYPE is out of fashion, and it does
> not exist in our newer documents, however there are lots of older XML docs
> about!
>
> Regards Fergus.
> --
>
> ===============================================================
> Fergus McMenemie               Email:fergus@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================
>



-- 
--Noble Paul

Re: DIH XPathEntityProcessor fails with docs containing

Posted by Fergus McMenemie <fe...@twig.me.uk>.
Seems to work fin on this mornings 23-jan-2009 nightly.

Thanks very much.



>On Wed, Jan 21, 2009 at 6:05 PM, Fergus McMenemie <fe...@twig.me.uk> wrote:
>
>>
>> After looking looking at http://issues.apache.org/jira/browse/SOLR-964,
>> where
>> it seems this issue has been addressed, I had another go at indexing
>> documents
>> containing DOCTYPE. It failed as follows.
>>
>>
>That patch has not been committed to the trunk yet. I'll take it up.
>
>-- 
>Regards,
>Shalin Shekhar Mangar.

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Re: DIH XPathEntityProcessor fails with docs containing

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Jan 21, 2009 at 6:05 PM, Fergus McMenemie <fe...@twig.me.uk> wrote:

>
> After looking looking at http://issues.apache.org/jira/browse/SOLR-964,
> where
> it seems this issue has been addressed, I had another go at indexing
> documents
> containing DOCTYPE. It failed as follows.
>
>
That patch has not been committed to the trunk yet. I'll take it up.

-- 
Regards,
Shalin Shekhar Mangar.

Re: DIH XPathEntityProcessor fails with docs containing

Posted by Fergus McMenemie <fe...@twig.me.uk>.
Hello,

After looking looking at http://issues.apache.org/jira/browse/SOLR-964, where
it seems this issue has been addressed, I had another go at indexing documents
containing DOCTYPE. It failed as follows.

This was using the nightly build from 21-jan 2009.

The comments section within jira suggested my inital message had been replied
to twice, I somehow missed them in my inbox!

Regards Fergus.

Jan 21, 2009 12:15:21 PM org.apache.solr.handler.dataimport.DataImporter doFullImport
INFO: Starting Full Import
Jan 21, 2009 12:15:21 PM org.apache.solr.core.SolrCore execute
INFO: [jdocs] webapp=/solr path=/dataimport params={command=show-config} status=0 QTime=0 
Jan 21, 2009 12:15:22 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument
SEVERE: Exception while processing: jc document : null
org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:/Volumes/spare/ts/j/dtd/jxml/data/news/f/f2008/frp70450.xmlrows processed :0 Processing Document # 1
	at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
	at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252)
	at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177)
	at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
	at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
	at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
	at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
	at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
	at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:180)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
	at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
	at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
	at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
	at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
	at java.lang.Thread.run(Thread.java:613)
Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
 at [row,col {unknown-source}]: [3,81]
	at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
	at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242)
	... 27 more
Caused by: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
 at [row,col {unknown-source}]: [3,81]
	at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
	at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
	at com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:475)
	at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:358)
	at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3351)
	at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1988)
	at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
	at org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:141)
	at org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$000(XPathRecordReader.java:89)
	at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:82)
	... 28 more
Jan 21, 2009 12:15:22 PM org.apache.solr.handler.dataimport.DataImporter doFullImport
SEVERE: Full Import failed




>Hello all, as the subject says:
>   DIH XPathEntityProcessor fails with docs containing <!DOCTYPE>
>   
>This is using a solr nightly build from monday.
>
>INFO: Server startup in 3623 ms
>Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
>INFO: Read dataimport.properties
>Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrCore execute
>INFO: [jdocs] webapp=/solr path=/walkj params={command=full-import} status=0 QTime=13 
>Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.DataImporter doFullImport
>INFO: Starting Full Import
>Jan 16, 2009 9:54:12 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll
>INFO: [jdocs] REMOVING ALL DOCUMENTS FROM INDEX
>Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy onInit
>INFO: SolrDeletionPolicy.onInit: commits:num=2
>	commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_c,version=1232026423291,generation=12,filenames=[segments_c, _4.fnm, _4.frq, _4.prx, _4.tis, _4.tii, _4.nrm, _4.fdx, _4.fdt]
>	commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_d,version=1232026423292,generation=13,filenames=[segments_d]
>Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
>INFO: last commit = 1232026423292
>Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DocBuilder buildDocument
>SEVERE: Exception while processing: jcurrent document : null
>org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:/j/dtd/jxml/data/news/2008/frp70450.xmlrows processed :0 Processing Document # 1
>	at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>	at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252)
>	at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177)
>	at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
>	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
>	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
>	at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
>	at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
>	at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
>	at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
>	at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
>Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
> at [row,col {unknown-source}]: [3,81]
>	at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
>	at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242)
>	... 9 more
>Caused by: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
> at [row,col {unknown-source}]: [3,81]
>	at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
>	at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
>	at com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:475)
>	at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:358)
>	at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3351)
>	at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1988)
>	at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
>	at org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:141)
>	at org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$000(XPathRecordReader.java:89)
>	at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:82)
>	... 10 more
>Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DataImporter doFullImport
>SEVERE: Full Import failed
>
>A fragment from the top of the failing document is
>
><?xml version="1.0" encoding="ISO-8859-1"?>
><?xml-stylesheet type="text/xsl" href="../../../../config/support/j-deliver.xsl"?>
><!DOCTYPE j:record SYSTEM "../../../../config/jml-delivery-norm-2.1.dtd">
><j:record xmlns:j="http://dtd.j.com/2002/Content/" id="frp70450"  urname="record">
>  <j:metadata xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="" urname="metadata" xlink:type="simple">
>    <dc:date xmlns:dc="http://purl.org/dc/elements/1.1/" qualifier="pdate">20080131</dc:date>
>
>The DTD does exist at the specified location. Removing the DOCTYPE directive
>fixes everything. I know that use of DOCTYPE is out of fashion, and it does
>not exist in our newer documents, however there are lots of older XML docs 
>about!

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================