You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by bg...@clever-age.com on 2002/08/30 13:28:45 UTC

Xerces2.1 SAX bad reading CDATA content (CORRECTION)

Hi,
sorry for bad manipulation

I have something very strange in my parser.


I am using xerces2.1 and I would to extract the CDATA content with SAX even.
When I extract with the content below, I've got the right number of CDATA
content.
But when the CDATA content are larger with big XML content, the parser get
crazy and give me the wrong number of dataelement.

So if you've got an idea, it will be welcome.

Thanks

XML content:

<?xml version="1.0" encoding="ISO-8859-1"?>
<root>
  <element1>dsds</element1>
  <element2>sdsd</element2>
  <node1>
    <dataelement>
      <![CDATA[
         <?xml version="1.0" encoding="ISO-8859-1"?>
         <subscription>
	   <requestValues>
		<toto>totoot</toto>
		<titi>tototo</titi>
		<tutu>jkjkkj</tutu>
	   </requestValues>
         </subscription>
      ]]>
    </dataelement>
    <dataelement>
      <![CDATA[
         <?xml version="1.0" encoding="ISO-8859-1"?>
         <subscription>
	   <requestValues>
		<toto>totoot</toto>
		<titi>tototo</titi>
		<tutu>jkjkkj</tutu>
	   </requestValues>
         </subscription>
      ]]>
    </dataelement>
    <dataelement>
      <![CDATA[
         <?xml version="1.0" encoding="ISO-8859-1"?>
         <subscription>
	   <requestValues>
		<toto>totoot</toto>
		<titi>tototo</titi>
		<tutu>jkjkkj</tutu>
	   </requestValues>
         </subscription>
      ]]>
    </dataelement>
<dataelement>
      <![CDATA[
         <?xml version="1.0" encoding="ISO-8859-1"?>
         <subscription>
	   <requestValues>
		<toto>totoot</toto>
		<titi>tototo</titi>
		<tutu>jkjkkj</tutu>
	   </requestValues>
         </subscription>
      ]]>
    </dataelement>
  </node1>
</root>

Here is my code contentHandler code:


import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;

import java.util.Vector;

public class XMLRegHandler extends DefaultHandler
{

	private String node;
	private String content;
	private static Vector subscription = new Vector();
	private ReadProperties rp = new ReadProperties();
	/**
	* Instance pour enregistrer les données de la registration
	*/
	private DataBeans databeans = DataBeans.getInstance();

	/**
	*	get the partner subscriptions Vector
 	*/
	public Vector getSubcription()
	{
		return subscription;
	}
	public void startDocument() throws SAXException {
	 }
	public void startElement(String uri, String localName, String qName,
Attributes attributes)
		throws SAXException
	{
		//non des tags
		node = qName;
	}
	public void characters(char[] ch, int start, int length)
	{
		//récupération des valeurs des tags
		content = new String(ch, start, length).trim();

		if(node.equals(rp.getSubscriptions()) && !content.equals(""))
		{
		   //here for the CDATA content
                   subscription.add(content);
		}
	}
	public void endElement(String uri, String local, String qName)
	throws SAXException {
		node = "";
	 }
}





---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: Xerces2.1 SAX bad reading CDATA content (CORRECTION)

Posted by Joseph Kesselman <ke...@us.ibm.com>.

This is a FAQ. SAX may deliver contiguous text as multiple calls to 
characters(), for reasons having to do with parser efficiency and input 
buffering. It is the programmer's responsibility to deal with that 
appropriately, eg by accumulating text until the next non-characters() 
event.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org