You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Jim Henderson <jg...@metafile.com> on 2004/09/16 22:28:30 UTC

parsing complex objects using Java SAX

Topic: parsing complex objects using Java SAX
Developer background: newbe

I have data stream that contains complex objects (call this CO1). The object
contains sub-structures (call this SS1) that are used by other complex
objects (CO2).  Furthermore the sub-structures themselves may at times be
standalone objects.

I would like to write a content handler that can process the sub-structures
when they are standalone.  I would like to reuse the sub-structures content
handler when processing to more complex object from its content handler.

The examples I have seen chain the XMLReaders content handler for CO1 to use
the content handler for SS1 at the startElement() method.  So far so good!
But the ugly stuff occurs in the content handler for the sub-structure SS1.
The SS1 content handler endElement() method needs to know its parent wrapper
ends.  At that point the XMLReader content handler is returned to the parent
object.

Is there an example of another solution?

By NO means am I trying to discredit the author of the article.  He has
presented the only solution I have seen after a day of searching.  I am
thankful for his work but I am wondering if there is a different solution to
the problem.

Referenced article:
http://www.javaworld.com/javaworld/jw-08-2000/jw-0804-sax-p5.html



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: parsing complex objects using Java SAX

Posted by Simon Kitching <si...@ecnetwork.co.nz>.
On Fri, 2004-09-17 at 08:28, Jim Henderson wrote:
> Topic: parsing complex objects using Java SAX
> Developer background: newbe
> 
> I have data stream that contains complex objects (call this CO1). The object
> contains sub-structures (call this SS1) that are used by other complex
> objects (CO2).  Furthermore the sub-structures themselves may at times be
> standalone objects.
> 
> I would like to write a content handler that can process the sub-structures
> when they are standalone.  I would like to reuse the sub-structures content
> handler when processing to more complex object from its content handler.
> 
> The examples I have seen chain the XMLReaders content handler for CO1 to use
> the content handler for SS1 at the startElement() method.  So far so good!
> But the ugly stuff occurs in the content handler for the sub-structure SS1.
> The SS1 content handler endElement() method needs to know its parent wrapper
> ends.  At that point the XMLReader content handler is returned to the parent
> object.
> 
> Is there an example of another solution?
> 
> By NO means am I trying to discredit the author of the article.  He has
> presented the only solution I have seen after a day of searching.  I am
> thankful for his work but I am wondering if there is a different solution to
> the problem.
> 
> Referenced article:
> http://www.javaworld.com/javaworld/jw-08-2000/jw-0804-sax-p5.html

I don't fully understand your problem description, but perhaps it would
be worth having a look at http://jakarta.apache.org/commons/digester in
which "rule sets" can be defined to parse various xml structures.

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: parsing complex objects using Java SAX

Posted by Nikhil Dinesh <ni...@seas.upenn.edu>.
This works for the most part when there is a single child. Im assuming
that the childHandler will actually create the substructure. You need to
make minor modifications if you need multiple children. Also making this
kind of API "very" extensible is non-trivial, because you have to
reflectively determine what object to create based on the current object
and the names of the children from the SAX events. If you have just one
kind of xml document then this should suffice.

For more extensible APIs, take a look at: JiBX -
http://jibx.sourceforge.net or XMLBeans - http://xmlbeans.apache.org/ for two
APIs which do this sort of thing very nicely albeit very differently.

On Thu, 16 Sep 2004, Jim Henderson wrote:

> OK.  It took bit to understand what you were saying but it works for me!
> Thanks for the idea.
>
> Just to clarify:
>
>
> public class DocumentParser extends DefaultHandler
> {
>   ContentHandler childCH = null;
>
>   startElement(
>         String uri,
>         String localName,
>         String qName,
>         Attributes attributes)
>   {
>
>     if (childCH != null) {
>        childCH.startElement( uri, localName, qName, attributes);
>        return;
>     }
>     if (qName.equals("subStructure") {
>        childCH = subStructureContentHandler;
>        childCH.startElenent(uri, localName, qName, attributes);
>        return;
>     }
>     if ( ...
>   }
>
>   characters(
>         char[] ch,
>         int start,
>         int length)
>   {
>     if (childCH != null) {
>        childCH.characters( ch, start, length);
>        return;
>     }
>     charArrayBuffer.write(ch, start, length);
>   }
>
>
>   endElement(
>         String uri,
>         String localName,
>         String qName)
>   {
>     if  ( childCH != null) {
>        childCH.endElement( uri, localName,qName,);
>
>        if (childCH.isDone()) {
>           childCH= null;
>           doSomething with the populated substructure;
>           return;
>        }
>     }
>     if ( ...
>   }
> }
>
> -----Original Message-----
> From: Nikhil Dinesh [mailto:nikhild@seas.upenn.edu]
> Sent: Thursday, September 16, 2004 4:39 PM
> To: xerces-j-user@xml.apache.org; jgh@metafile.com
> Subject: Re: parsing complex objects using Java SAX
>
>
>
> > I have data stream that contains complex objects (call this CO1). The
> object
> > contains sub-structures (call this SS1) that are used by other complex
> > objects (CO2).  Furthermore the sub-structures themselves may at times be
> > standalone objects.
> >
> > I would like to write a content handler that can process the
> sub-structures
> > when they are standalone.  I would like to reuse the sub-structures
> content
> > handler when processing to more complex object from its content handler.
> >
> > The examples I have seen chain the XMLReaders content handler for CO1 to
> use
> > the content handler for SS1 at the startElement() method.  So far so good!
> > But the ugly stuff occurs in the content handler for the sub-structure
> SS1.
> > The SS1 content handler endElement() method needs to know its parent
> wrapper
> > ends.  At that point the XMLReader content handler is returned to the
> parent
> > object.
> >
>
> This is not entirely true. The SAX events will always flow through the
> parent ContentHandler. It is up to the parent to decide whether to pass it
> on or not. For example when the first startElement is received, subsequent
> startElements with no endElements should be passed on to or used to create
> the child. When all these have received endElements(which are passed on)
> and a further endElement is received the parent knows its content has
> ended and that the parent's parent will not pass it any more events. Events
> in
> between can be passed on to the child.
>
> In other words the ContentHandlers themselves can be organized in a
> treelike structure with events flowing down a path in the tree and being
> absorbed by the node it is intended for.
>
> One way to accomplish this is by having your Objects implement
> ContentHandler so reuse can happen.
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: parsing complex objects using Java SAX

Posted by Jim Henderson <jg...@metafile.com>.
OK.  It took bit to understand what you were saying but it works for me!
Thanks for the idea.

Just to clarify:


public class DocumentParser extends DefaultHandler
{
  ContentHandler childCH = null;

  startElement(
        String uri,
        String localName,
        String qName,
        Attributes attributes)
  {

    if (childCH != null) {
       childCH.startElement( uri, localName, qName, attributes);
       return;
    }
    if (qName.equals("subStructure") {
       childCH = subStructureContentHandler;
       childCH.startElenent(uri, localName, qName, attributes);
       return;
    }
    if ( ...
  }

  characters(
        char[] ch,
        int start,
        int length)
  {
    if (childCH != null) {
       childCH.characters( ch, start, length);
       return;
    }
    charArrayBuffer.write(ch, start, length);
  }


  endElement(
        String uri,
        String localName,
        String qName)
  {
    if  ( childCH != null) {
       childCH.endElement( uri, localName,qName,);

       if (childCH.isDone()) {
          childCH= null;
          doSomething with the populated substructure;
          return;
       }
    }
    if ( ...
  }
}

-----Original Message-----
From: Nikhil Dinesh [mailto:nikhild@seas.upenn.edu]
Sent: Thursday, September 16, 2004 4:39 PM
To: xerces-j-user@xml.apache.org; jgh@metafile.com
Subject: Re: parsing complex objects using Java SAX



> I have data stream that contains complex objects (call this CO1). The
object
> contains sub-structures (call this SS1) that are used by other complex
> objects (CO2).  Furthermore the sub-structures themselves may at times be
> standalone objects.
>
> I would like to write a content handler that can process the
sub-structures
> when they are standalone.  I would like to reuse the sub-structures
content
> handler when processing to more complex object from its content handler.
>
> The examples I have seen chain the XMLReaders content handler for CO1 to
use
> the content handler for SS1 at the startElement() method.  So far so good!
> But the ugly stuff occurs in the content handler for the sub-structure
SS1.
> The SS1 content handler endElement() method needs to know its parent
wrapper
> ends.  At that point the XMLReader content handler is returned to the
parent
> object.
>

This is not entirely true. The SAX events will always flow through the
parent ContentHandler. It is up to the parent to decide whether to pass it
on or not. For example when the first startElement is received, subsequent
startElements with no endElements should be passed on to or used to create
the child. When all these have received endElements(which are passed on)
and a further endElement is received the parent knows its content has
ended and that the parent's parent will not pass it any more events. Events
in
between can be passed on to the child.

In other words the ContentHandlers themselves can be organized in a
treelike structure with events flowing down a path in the tree and being
absorbed by the node it is intended for.

One way to accomplish this is by having your Objects implement
ContentHandler so reuse can happen.




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: parsing complex objects using Java SAX

Posted by Nikhil Dinesh <ni...@seas.upenn.edu>.
> I have data stream that contains complex objects (call this CO1). The object
> contains sub-structures (call this SS1) that are used by other complex
> objects (CO2).  Furthermore the sub-structures themselves may at times be
> standalone objects.
>
> I would like to write a content handler that can process the sub-structures
> when they are standalone.  I would like to reuse the sub-structures content
> handler when processing to more complex object from its content handler.
>
> The examples I have seen chain the XMLReaders content handler for CO1 to use
> the content handler for SS1 at the startElement() method.  So far so good!
> But the ugly stuff occurs in the content handler for the sub-structure SS1.
> The SS1 content handler endElement() method needs to know its parent wrapper
> ends.  At that point the XMLReader content handler is returned to the parent
> object.
>

This is not entirely true. The SAX events will always flow through the
parent ContentHandler. It is up to the parent to decide whether to pass it
on or not. For example when the first startElement is received, subsequent
startElements with no endElements should be passed on to or used to create
the child. When all these have received endElements(which are passed on)
and a further endElement is received the parent knows its content has
ended and that the parent's parent will not pass it any more events. Events in
between can be passed on to the child.

In other words the ContentHandlers themselves can be organized in a
treelike structure with events flowing down a path in the tree and being
absorbed by the node it is intended for.

One way to accomplish this is by having your Objects implement
ContentHandler so reuse can happen.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org