You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by jtao <jt...@cysive.com> on 2001/04/02 19:03:39 UTC

RE: Ignorable Whitespace

You can create your own parser by subclassing DOMparser by doing this:

/**This IgnorWhiteSpace DOM parser will
* ignore all the whitespace, CRLF, and tab char in its parsing
*/
import org.apache.xerces.parsers.DOMParser;

public class IgnoreWhiteSpaceDOMParser extends DOMParser
{
    public IgnoreWhiteSpaceDOMParser() throws Exception
    {
        super.setIncludeIgnorableWhitespace(false);
    }

    public void characters (int data) throws Exception
    {
        if (getIncludeIgnorableWhitespace () || (fStringPool.toString
(data).trim
        ().length () > 0))
            super.characters (data);
    }

}


then you can use this new parser like this:
        try
        {
            DOMParser parser = new IgnoreWhiteSpaceDOMParser();
            Reader stringReader = new StringReader(XMLString);
            InputSource xmlSource = new InputSource(stringReader);
            parser.setValidating(false);
            parser.parse(xmlSource);
            this.doc = parser.getDocument();
        }
        catch (Exception ex)
        {
            doc = null;
            ex.printStackTrace();
        }

James,

-----Original Message-----
From: Matt_Olsen@ovid.com [mailto:Matt_Olsen@ovid.com]
Sent: Monday, April 02, 2001 10:56 PM
To: xerces-j-user@xml.apache.org
Subject: Re: Ignorable Whitespace



I've noticed there is a public void characters(int dataIndex) method in
DOMParser which is called with an index into the string pool.  Apparently
you can retrieve that string from some string pool using dataIndex, but
how?  If someone could enlighten us on how to retrieve the string, you
could use that to check if the string is all whitespace, but then there is
no way to tell the DOMParser to exclude it from the DOM.

It would be nice if there was some sort of NodeFilter we could use when
building DOM trees to exclude or edit nodes as the DOM is being built
instead of hunting down nodes after it's constructed.

Matt




                    "John J.
                    Berkenpas"           To:
xerces-j-user@xml.apache.org
                    <john@ivoryto        cc:
                    wer.com>             Fax to:
                                         Subject:     Re: Ignorable
Whitespace
                    04/02/01
                    02:07 PM
                    Please
                    respond to
                    xerces-j-user






if your dtd allows whitespace in a node then it is NOT ignorable even
though it
seems so to you. i.e. if its #text then whitespace is ok and the parser
doesn't
ignore it. it can't know that you don't need it.

"Dalia, Keith A - TOS-DITT1" wrote:

> I don't want white space included in my tree:
>
> I use  parser.setIncludeIgnorableWhitespace(false);
>
> but text nodes that can be considered "ignorable whitespace" still appear
in
> the dom tree.
> What am I doing wrong.
>
> DOMParser parser = new DOMParser();
>
>         try
>         {
>             //parser.setFeature("http://xml.org/sax/features/validation",
> true);
>             parser.setIncludeIgnorableWhitespace(false);
>             parser.parse("gi.xml");
>
>         }
>         catch (java.io.IOException ioe)
>         {
>             System.out.println(ioe.toString());
>         }
>
>         catch (SAXNotRecognizedException snre)
>         {
>             snre.printStackTrace();
>         }
>         catch (SAXException saxe)
>         {
>             saxe.printStackTrace();
>         }
>
>
>         Document document = parser.getDocument();
>
>
> TIA, Keith
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org






---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org