You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Fred Schmitt <fr...@web.de> on 2010/10/12 08:46:10 UTC

MCF: XML parsing Error

Hi all,
I am having a problem while trying to index/crawl data. I configured a job with a Solr output connection and a web connection,
but after i use the "agentRun"-command and start the job, a few exceptions are thrown. 
 

On the console im getting:

[Fatal Error] :112:120: The element type "HR" must be terminated by the matching end-tag "</HR>".
org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: The element type "HR" must be terminated by th
 matching end-tag "</HR>".
 at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
 at org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
 at org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:608)
 at org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1647)
Caused by: org.xml.sax.SAXParseException: The element type "HR" must be terminated by the matching end-tag "</HR>".
 at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
 at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
 at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
 at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:365)
 ... 3 more
Shutting down...

and in the log-file i am getting this error:

[2010-10-11 15:32:02,515]ERROR Error connecting to update request API: 'HTTP/1.1 500 Internal Server Error
'
org.apache.manifoldcf.core.interfaces.ManifoldCFException:
 Error connecting to update request API: 'HTTP/1.1 500 Internal Server 
Error
'
at org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1665)


It would be great if you could help me.

best regards,
Fred
___________________________________________________________
WEB.DE DSL Doppel-Flat ab 19,99 &euro;/mtl.! Jetzt auch mit 
gratis Notebook-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2

Re: MCF: XML parsing Error

Posted by Fred Schmitt <fr...@web.de>.
It seems like the parsing from http to xml on the way to Solr did not work
and thus the error was thrown. After a few unsuccesful attempts to fix that
I reinstalled Solr. It works fine now, i guess. Thanks for your help&hints.

best regards,
Fred

-----Ursprüngliche Nachricht-----
Von: "Karl Wright" <da...@gmail.com>
Gesendet: 12.10.2010 11:06:25
An: connectors-user@incubator.apache.org
Betreff: Re: MCF: XML parsing Error

>It looks like you are getting back HTML from whatever it is that you
>pointed your solr connection at, rather than the XML that the solr
>connector is expecting.  Specifically, the Solr commit operation is
>failing.  Can you get any feedback from the Solr instance?
>
>Karl
>
>On Tue, Oct 12, 2010 at 2:46 AM, Fred Schmitt <fr...@web.de> wrote:
>> Hi all,
>> I am having a problem while trying to index/crawl data. I configured a job with a Solr output connection and a web connection,
>> but after i use the "agentRun"-command and start the job, a few exceptions are thrown.
>>
>>
>> On the console im getting:
>>
>> [Fatal Error] :112:120: The element type "HR" must be terminated by the matching end-tag "".
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: The element type "HR" must be terminated by th
>>  matching end-tag "".
>>  at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>>  at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:317)
>>  at org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:608)
>>  at org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1647)
>> Caused by: org.xml.sax.SAXParseException: The element type "HR" must be terminated by the matching end-tag "".
>>  at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>  at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>  at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
>>  at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:365)
>>  ... 3 more
>> Shutting down...
>>
>> and in the log-file i am getting this error:
>>
>> [2010-10-11 15:32:02,515]ERROR Error connecting to update request API: 'HTTP/1.1 500 Internal Server Error
>> '
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>  Error connecting to update request API: 'HTTP/1.1 500 Internal Server
>> Error
>> '
>> at org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1665)
>>
>>
>> It would be great if you could help me.
>>
>> best regards,
>> Fred
>> ___________________________________________________________
>> WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt auch mit
>> gratis Notebook-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2
>>
___________________________________________________________
GRATIS! Movie-FLAT mit über 300 Videos. 
Jetzt freischalten unter http://movieflat.web.de

Re: MCF: XML parsing Error

Posted by Karl Wright <da...@gmail.com>.
Also, FWIW, I added support to the notification phase to allow
activities to be logged from there, but never hooked it up in the Solr
connector.  Seems it would be helpful to do that, since you could then
see the actual response back from Solr in the history.  I can't
guarantee I'll get to it today, but hopefully soon.

Karl

On Tue, Oct 12, 2010 at 5:06 AM, Karl Wright <da...@gmail.com> wrote:
> It looks like you are getting back HTML from whatever it is that you
> pointed your solr connection at, rather than the XML that the solr
> connector is expecting.  Specifically, the Solr commit operation is
> failing.  Can you get any feedback from the Solr instance?
>
> Karl
>
> On Tue, Oct 12, 2010 at 2:46 AM, Fred Schmitt <fr...@web.de> wrote:
>> Hi all,
>> I am having a problem while trying to index/crawl data. I configured a job with a Solr output connection and a web connection,
>> but after i use the "agentRun"-command and start the job, a few exceptions are thrown.
>>
>>
>> On the console im getting:
>>
>> [Fatal Error] :112:120: The element type "HR" must be terminated by the matching end-tag "</HR>".
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: The element type "HR" must be terminated by th
>>  matching end-tag "</HR>".
>>  at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>>  at org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
>>  at org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:608)
>>  at org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1647)
>> Caused by: org.xml.sax.SAXParseException: The element type "HR" must be terminated by the matching end-tag "</HR>".
>>  at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>  at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>  at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
>>  at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:365)
>>  ... 3 more
>> Shutting down...
>>
>> and in the log-file i am getting this error:
>>
>> [2010-10-11 15:32:02,515]ERROR Error connecting to update request API: 'HTTP/1.1 500 Internal Server Error
>> '
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>  Error connecting to update request API: 'HTTP/1.1 500 Internal Server
>> Error
>> '
>> at org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1665)
>>
>>
>> It would be great if you could help me.
>>
>> best regards,
>> Fred
>> ___________________________________________________________
>> WEB.DE DSL Doppel-Flat ab 19,99 &euro;/mtl.! Jetzt auch mit
>> gratis Notebook-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2
>>
>

Re: MCF: XML parsing Error

Posted by Karl Wright <da...@gmail.com>.
It looks like you are getting back HTML from whatever it is that you
pointed your solr connection at, rather than the XML that the solr
connector is expecting.  Specifically, the Solr commit operation is
failing.  Can you get any feedback from the Solr instance?

Karl

On Tue, Oct 12, 2010 at 2:46 AM, Fred Schmitt <fr...@web.de> wrote:
> Hi all,
> I am having a problem while trying to index/crawl data. I configured a job with a Solr output connection and a web connection,
> but after i use the "agentRun"-command and start the job, a few exceptions are thrown.
>
>
> On the console im getting:
>
> [Fatal Error] :112:120: The element type "HR" must be terminated by the matching end-tag "</HR>".
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: The element type "HR" must be terminated by th
>  matching end-tag "</HR>".
>  at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:369)
>  at org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
>  at org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:608)
>  at org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1647)
> Caused by: org.xml.sax.SAXParseException: The element type "HR" must be terminated by the matching end-tag "</HR>".
>  at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>  at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>  at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
>  at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:365)
>  ... 3 more
> Shutting down...
>
> and in the log-file i am getting this error:
>
> [2010-10-11 15:32:02,515]ERROR Error connecting to update request API: 'HTTP/1.1 500 Internal Server Error
> '
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>  Error connecting to update request API: 'HTTP/1.1 500 Internal Server
> Error
> '
> at org.apache.manifoldcf.agents.output.solr.HttpPoster$CommitThread.run(HttpPoster.java:1665)
>
>
> It would be great if you could help me.
>
> best regards,
> Fred
> ___________________________________________________________
> WEB.DE DSL Doppel-Flat ab 19,99 &euro;/mtl.! Jetzt auch mit
> gratis Notebook-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2
>