You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by pe...@student.luc.ac.be on 2004/12/16 18:41:34 UTC

Socket timeout when parsing a big XML file (DOM/SAX)

Hello,

I wanted to parse a XML file of 30 megs, but I get the following error:

Exception in thread "main" java.net.ConnectException: Connection timed out:
connect
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.PlainSocketImpl.doConnect(Unknown Source)
  ...
  at sun.net.www.http.HttpClient.openServer(Unknown Source)
	...
	at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
	at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
	...

It both happens with SAX and DOM:

SAXParser parser = new SAXParser();
parser.setContentHandler(this);
parser.parse(file);

documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document = documentBuilder.parse(file);

I also tried to increase the allowed memory usage of the virtual machine, but
that didn't help. It has something to do with the filesize of the input XML
file, since everything works perfect with small files.

Someone who has an idea what is wrong and how it can be fixed?

And why are sockets and the HTTP protocol used anyway, to load a XML file on my
hard disk?

thanks a lot for your time,

-- 
Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Socket timeout when parsing a big XML file (DOM/SAX)

Posted by ra...@freddiemac.com.
This could very well be file size issue.

The HTTP servers are configurable to limit the upload size or even the 
request parameter size. What server are you using? Apache has a couple of 
configurable parameters:

http://httpd.apache.org/docs-2.0/mod/core.html#limitrequestfieldsize
http://httpd.apache.org/docs-2.0/mod/core.html#limitxmlrequestbody
http://httpd.apache.org/docs-2.0/mod/core.html#limitrequestbody

The default for LimitXMLRequestBody is 1000000 (almost 1MB).

Limits on request body or request fields are not part of HTTP protocol, so 
different implementation of servers may handle (or not) the limits 
differently.

Hope that helps.

-ramin





peter_billen@student.luc.ac.be 
12/16/2004 12:41 PM
Please respond to
xerces-j-user@xml.apache.org


To
xerces-j-user@xml.apache.org
cc

Subject
Socket timeout when parsing a big XML file (DOM/SAX)






Hello,

I wanted to parse a XML file of 30 megs, but I get the following error:

Exception in thread "main" java.net.ConnectException: Connection timed 
out:
connect
                 at java.net.PlainSocketImpl.socketConnect(Native Method)
                 at java.net.PlainSocketImpl.doConnect(Unknown Source)
  ...
  at sun.net.www.http.HttpClient.openServer(Unknown Source)
                 ...
                 at 
org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
                 at 
org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
                 ...

It both happens with SAX and DOM:

SAXParser parser = new SAXParser();
parser.setContentHandler(this);
parser.parse(file);

documentBuilder = 
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document = documentBuilder.parse(file);

I also tried to increase the allowed memory usage of the virtual machine, 
but
that didn't help. It has something to do with the filesize of the input 
XML
file, since everything works perfect with small files.

Someone who has an idea what is wrong and how it can be fixed?

And why are sockets and the HTTP protocol used anyway, to load a XML file 
on my
hard disk?

thanks a lot for your time,

-- 
Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org




Re: Socket timeout when parsing a big XML file (DOM/SAX)

Posted by Joseph Kesselman <ke...@us.ibm.com>.



Does your document reference a DTD or Schema located on the web, perhaps?

______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org