You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Dawid Chodura <da...@gmail.com> on 2010/11/23 12:12:11 UTC

Transforming the stream of SAX events

Hello,
   I want to transform an XML document, but I can't use XSLT, because
I need to invoke Java code inside the transformation. If I understand
correctly, Xalan is not an option. I don't need to keep the whole XML
document in the memory for the transformation, so I decided to use SAX
parser instead of DOM. I need to create new elements in the
transformation.
   I read the sample xerces-2_10_0/samples/sax/Writer.java and it
generates the output document manually:

public void startElement(String uri, String local, String raw,
Attributes attrs) throws SAXException {
//...
    fOut.print('<');
    fOut.print(raw);
//...
    fOut.print('>');
    fOut.flush();
}

   I don't want to generate the output manually.
   I wrote my own example, which uses
org.cyberneko.html.parsers.SAXParser from NekoHTML parser:

package saxtransformexample;

import org.apache.xerces.util.AugmentationsImpl;
import org.apache.xerces.util.XMLAttributesImpl;
import org.apache.xerces.xni.Augmentations;
import org.apache.xerces.xni.QName;
import org.apache.xerces.xni.XMLAttributes;
import org.apache.xerces.xni.XNIException;
import org.apache.xerces.xni.parser.XMLDocumentFilter;
import org.cyberneko.html.filters.DefaultFilter;
import org.cyberneko.html.parsers.SAXParser;
import org.xml.sax.InputSource;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.stream.StreamResult;
import java.io.StringReader;
import java.io.StringWriter;

public class SAXTransformExample {

    public static void main(String args[]) throws Exception {
        String inputString = "<div></div>";
        StringWriter out = new StringWriter();
        StreamResult result = new StreamResult(out);

        SAXTransformerFactory transformerFactory =
(SAXTransformerFactory) SAXTransformerFactory.newInstance();

        Transformer transformer = transformerFactory.newTransformer();

        transformer.setOutputProperty(OutputKeys.INDENT, "no");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty(OutputKeys.METHOD, "html");

        XMLDocumentFilter[] filters = {new DefaultFilter() {

            @Override
            public void startElement(QName element, XMLAttributes
attributes, Augmentations augs) throws XNIException {
                if (!element.localpart.toLowerCase().equals("div")) {
                    super.startElement(element, attributes, augs);
                } else {
                    super.startElement(element, attributes, augs);
                    super.startElement(new QName("", "p", "p", null),
new XMLAttributesImpl(), new AugmentationsImpl());
                }
            }

            @Override
            public void endElement(QName element, Augmentations augs)
throws XNIException {
                if (!element.localpart.toLowerCase().equals("div")) {
                    super.endElement(element, augs);
                } else {
                    super.endElement(new QName("", "p", "p", null),
new AugmentationsImpl());
                    super.endElement(element, augs);
                }
            }
        }};
        SAXParser parser = new SAXParser();
        parser.setFeature("http://xml.org/sax/features/namespaces", false);
        parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment",
true);
        parser.setProperty("http://cyberneko.org/html/properties/filters",
filters);
        parser.setProperty("http://cyberneko.org/html/properties/names/elems",
"lower");

        transformer.transform(new SAXSource(parser, new
InputSource(new StringReader(inputString))), result);

        System.out.println("RESULT:" + out.getBuffer().toString() + ":");
    }
}

   It prints out:
RESULT:<div><p></p></div>

   The problem is that it uses XNI and since I'm not writing a parser
I think I shouldn't use XNI at all.

There is an example:
http://book.javanb.com/xml-and-java-developing-web-applications-2nd/0201770040_ch05lev1sec2.html

       OutputFormat format = new OutputFormat("xml", "UTF-8", false);
       format.setPreserveSpace(true);
       ContentHandler handler = new XMLSerializer(System.out, format);
       XMLReader parser =
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
       XMLReader filter = new MailFilter(parser);
       filter.setContentHandler(handler);
       filter.parse(argv[0]);

MailFilter extends org.xml.sax.helpers.XMLFilterImpl.
The example uses org.apache.xml.serialize.XMLSerializer, which is
deprecated in Xerces 2.9.0 API:

Deprecated. This class was deprecated in Xerces 2.9.0. It is
recommended that new applications use the DOM Level 3 LSSerializer or
JAXP's Transformation API
for XML (TrAX) for serializing XML. See the Xerces documentation for
more information.
http://xerces.apache.org/xerces2-j/javadocs/other/org/apache/xml/serialize/XMLSerializer.html

   If I don't want to use DOM, I assume I can't use DOM Level 3 LSSerializer.
   If I don't want to use XSLT, I assume I can't use JAXP's
Transformation API for XML (TrAX) for serializing XML.

   What is the proper way to transform the stream of SAX events into
another stream of SAX events, so that I don't need to write my own
parser or my own serializer?

Best regards,
   Dawid Chodura

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


RE: Transforming the stream of SAX events

Posted by David Lee <dl...@calldei.com>.
I suggest StAX instead of SAX for this kind of transformation.
It provides both reader and writer API's so can be used for both parsing and
generation/serialization.


----------------------------------------
David A. Lee
dlee@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Dawid Chodura [mailto:dawid.chodura@gmail.com] 
Sent: Tuesday, November 23, 2010 6:12 AM
To: j-users@xerces.apache.org
Subject: Transforming the stream of SAX events


Hello,
   I want to transform an XML document, but I can't use XSLT, because
I need to invoke Java code inside the transformation. If I understand
correctly, Xalan is not an option. I don't need to keep the whole XML
document in the memory for the transformation, so I decided to use SAX
parser instead of DOM. I need to create new elements in the
transformation.
   I read the sample xerces-2_10_0/samples/sax/Writer.java and it
generates the output document manually:

public void startElement(String uri, String local, String raw,
Attributes attrs) throws SAXException {
//...
    fOut.print('<');
    fOut.print(raw);
//...
    fOut.print('>');
    fOut.flush();
}

   I don't want to generate the output manually.
   I wrote my own example, which uses
org.cyberneko.html.parsers.SAXParser from NekoHTML parser:

package saxtransformexample;

import org.apache.xerces.util.AugmentationsImpl;
import org.apache.xerces.util.XMLAttributesImpl;
import org.apache.xerces.xni.Augmentations;
import org.apache.xerces.xni.QName;
import org.apache.xerces.xni.XMLAttributes;
import org.apache.xerces.xni.XNIException;
import org.apache.xerces.xni.parser.XMLDocumentFilter;
import org.cyberneko.html.filters.DefaultFilter;
import org.cyberneko.html.parsers.SAXParser;
import org.xml.sax.InputSource;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.stream.StreamResult;
import java.io.StringReader;
import java.io.StringWriter;

public class SAXTransformExample {

    public static void main(String args[]) throws Exception {
        String inputString = "<div></div>";
        StringWriter out = new StringWriter();
        StreamResult result = new StreamResult(out);

        SAXTransformerFactory transformerFactory =
(SAXTransformerFactory) SAXTransformerFactory.newInstance();

        Transformer transformer = transformerFactory.newTransformer();

        transformer.setOutputProperty(OutputKeys.INDENT, "no");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,
"yes");
        transformer.setOutputProperty(OutputKeys.METHOD, "html");

        XMLDocumentFilter[] filters = {new DefaultFilter() {

            @Override
            public void startElement(QName element, XMLAttributes
attributes, Augmentations augs) throws XNIException {
                if (!element.localpart.toLowerCase().equals("div")) {
                    super.startElement(element, attributes, augs);
                } else {
                    super.startElement(element, attributes, augs);
                    super.startElement(new QName("", "p", "p", null),
new XMLAttributesImpl(), new AugmentationsImpl());
                }
            }

            @Override
            public void endElement(QName element, Augmentations augs)
throws XNIException {
                if (!element.localpart.toLowerCase().equals("div")) {
                    super.endElement(element, augs);
                } else {
                    super.endElement(new QName("", "p", "p", null),
new AugmentationsImpl());
                    super.endElement(element, augs);
                }
            }
        }};
        SAXParser parser = new SAXParser();
        parser.setFeature("http://xml.org/sax/features/namespaces", false);
 
parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-
fragment",
true);
        parser.setProperty("http://cyberneko.org/html/properties/filters",
filters);
 
parser.setProperty("http://cyberneko.org/html/properties/names/elems",
"lower");

        transformer.transform(new SAXSource(parser, new
InputSource(new StringReader(inputString))), result);

        System.out.println("RESULT:" + out.getBuffer().toString() + ":");
    }
}

   It prints out:
RESULT:<div><p></p></div>

   The problem is that it uses XNI and since I'm not writing a parser
I think I shouldn't use XNI at all.

There is an example:
http://book.javanb.com/xml-and-java-developing-web-applications-2nd/02017700
40_ch05lev1sec2.html

       OutputFormat format = new OutputFormat("xml", "UTF-8", false);
       format.setPreserveSpace(true);
       ContentHandler handler = new XMLSerializer(System.out, format);
       XMLReader parser =
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
       XMLReader filter = new MailFilter(parser);
       filter.setContentHandler(handler);
       filter.parse(argv[0]);

MailFilter extends org.xml.sax.helpers.XMLFilterImpl.
The example uses org.apache.xml.serialize.XMLSerializer, which is
deprecated in Xerces 2.9.0 API:

Deprecated. This class was deprecated in Xerces 2.9.0. It is
recommended that new applications use the DOM Level 3 LSSerializer or
JAXP's Transformation API
for XML (TrAX) for serializing XML. See the Xerces documentation for
more information.
http://xerces.apache.org/xerces2-j/javadocs/other/org/apache/xml/serialize/X
MLSerializer.html

   If I don't want to use DOM, I assume I can't use DOM Level 3
LSSerializer.
   If I don't want to use XSLT, I assume I can't use JAXP's
Transformation API for XML (TrAX) for serializing XML.

   What is the proper way to transform the stream of SAX events into
another stream of SAX events, so that I don't need to write my own
parser or my own serializer?

Best regards,
   Dawid Chodura

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Transforming the stream of SAX events

Posted by Mukul Gandhi <mu...@apache.org>.
On Wed, Nov 24, 2010 at 4:21 PM, Dawid Chodura <da...@gmail.com> wrote:
> I checked the Xalan Java
> extensions and I couldn't find anything about integration with Spring.
> There is a possibility to create new instance and call a static method
> from XSLT stylesheet. Do you know if there is some way to obtain a
> reference to Spring Bean inside XSLT stylesheet?

This doesn't concern using Xerces (it largely does XML parsing and
schema validation of instance documents). Though from application
design point of view, asking for spring bean references from a XSLT
stylesheet according to me is not good (i.e working with Java objects
and more-so a graph of extension objects in XSLT layer can get hard).
It look to me that this kind of approach is not using XSLT or even
Xerces appropriately.




-- 
Regards,
Mukul Gandhi

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Transforming the stream of SAX events

Posted by Dawid Chodura <da...@gmail.com>.
Thank you for the answers.

I agree with Mukul Gandhi that using XSLT can save design hours and
would be more maintainable in the future, but originally I thought
that it may be too big and unnecessary for my little task, if there
was some other method to do it. I know about the possibility to extend
XSLT, although I never wrote any extension myself. I wrote "If I
understand correctly, Xalan is not an option.", because I wanted to
know if there isn't any method of transforming documents other than
XSLT in Xalan that I am unaware of. I checked the Xalan Java
extensions and I couldn't find anything about integration with Spring.
There is a possibility to create new instance and call a static method
from XSLT stylesheet. Do you know if there is some way to obtain a
reference to Spring Bean inside XSLT stylesheet?

StAX looks better than my solution, which uses XNI, so if I am unable
to use Spring Beans from XSLT, I will probably consider rewriting my
code to use StAX.

Best regards,
   Dawid Chodura

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Transforming the stream of SAX events

Posted by Mukul Gandhi <mu...@apache.org>.
On Tue, Nov 23, 2010 at 4:42 PM, Dawid Chodura <da...@gmail.com> wrote:
> I want to transform an XML document, but I can't use XSLT, because
> I need to invoke Java code inside the transformation.

I believe you need to make appropriate design and architectural
decisions for your use-case. Personally I find XSLT's template based
abstractions quite useful for XML document transformations.

> If I understand correctly, Xalan is not an option.

As Michael wrote Xalan provides Java extensions (you can virtually use
any built-in Java API, or even write your own Java methods which you
can invoke from XSLT stylesheets. XSLT transformations potentially get
infinitely extensible procedurally with Java extensions). I personally
would have gone this route first for this kind of use-case.

> I don't need to keep the whole XML
> document in the memory for the transformation, so I decided to use SAX
> parser instead of DOM.

You may try to tune JVM heap size while doing XSLT transforms, which
generally helps to use memory efficiently during XSLT transformations.

I normally try to have a good logical design for application (for
example in your kind of use-case, using XSLT may be better than doing
a SAX to SAX transformation) before worrying too much about physical
memory that's available in computer host systems :)

Imagine a huge XML transformation use-case. Though SAX to SAX
transformation will probably save physical memory during application's
run-time, but using XSLT for example can save numerous design hours
and would help maintainability too.

> What is the proper way to transform the stream of SAX events into
> another stream of SAX events.

I would suggest to have a little design thinking, before embarking this route :)




-- 
Regards,
Mukul Gandhi

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Transforming the stream of SAX events

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Dawid Chodura <da...@gmail.com> wrote on 11/23/2010 06:12:11 AM:

> Hello,
>    I want to transform an XML document, but I can't use XSLT, because
> I need to invoke Java code inside the transformation. If I understand
> correctly, Xalan is not an option...

You're making an assumption but did you actually check? Xalan does have
support for Java extensions [1].

Thanks.

[1] http://xml.apache.org/xalan-j/extensions.html

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org