You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Aleksander Slominski <as...@cs.indiana.edu> on 2002/05/05 19:03:45 UTC

Re: XML Pull-Parsing (was: "How to start writing a non-blocking SAX parser")

Andy Clark wrote:

> I would take the same approach that we use in XNI which is
> that objects are never orphaned by their creator. For example,
> when a handler receives a struct like QName or XMLString in a
> callback, it must make a copy of the contents because the
> object will be re-used by the component that created it and
> passed it along to the handler.
>
> Applying that choice to my pull-parsing API would mean that
> only one event object (of each type) would ever be created. So
> the memory footprint is not really an issue.

agreed as then those objects are simply containers to ass in/out
arguments and it is very similar to C/C++ as user will need to make a copy
if want to keep object values longer than time of one callback.

the interesting thing with immutable objects (such as element name
represented as interned String) is that they can be kept indefinitely
and shared very efficiently between parser and the user code

however for other level objects such as start tag even the
benefits, as you describe, are not that clear ...


> > better to keep event objects similar to all Java API and
> > expose get/set methods instead of public fields.
>
> But if you assume that the method is going to be inlined and
> it's not, then you lose some performance. Because pushing and
> popping the method call stack takes time. If the data fields
> are public on the object returned from "next", then it's
> just an object access.

that makes java programming slightly more lower level
and fell more like C/C++ :-)

> > all of those functions can be easily built with XMLPULL API
> > and exposed as an utility class instead requiring too detailed
> > description of method implementation in interface ...
>
> It's just a question of deciding what functionality is the
> most useful. If 90% of the users end up using this convenience
> method, then it should be part of the core API. And I think
> that this functionality (and some others) are that useful.

the problem is to resist temptation of adding too much too fast,
we think that in XMLPULL API we have some useful methods
(like nextText/nextTag) and i personally think that more is needed
but it is good to wait a bit and see what is _really_ needed.

> > > event queueing. Due to the pipeline nature of the XNI parser
> >
> > that sounds like a good engineering decision and i will
> > try to implements it in xni2xmlpull - and will make
> > implementation more robust :-)
>
> Yep. Let me know if you need any pointers understanding
> how the Xerces2 components work within the XNI framework.

thanks! i have read Xerces2 code and i have general grasp of its
working (in general ...)

> > > the character buffers. That way I would not have to copy
> > > any characters at all because I would know that the contents
> > > of the char buffers would not be over-written.
> >
> > that sounds like a great addition to Xerces 2  - i had something similar
>
> I may try to hack up my pull-parsing ideas just to see
> how they work. And if I do, then I'll definitely be adding
> this feature to Xerces2. In fact, I should add it anyway
> because I know other people would use it as well. For
> example, way back when the Xalan folks were talking about
> having better control over the char buffers so that they
> didn't constantly have to copy chars around.

i think it is good for perfromance and will make code
writing text gathering for element content much easier...

> It's fun to be able to program what I want again. :)

what can i say :-))))

alek




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: XML Pull-Parsing (was: "How to start writing a non-blocking SAX parser")

Posted by Aleksander Slominski <as...@cs.indiana.edu>.
Andy Clark wrote:

> Aleksander Slominski wrote:
> > the interesting thing with immutable objects (such as element name
> > represented as interned String) is that they can be kept indefinitely
> > and shared very efficiently between parser and the user code
> >
> > however for other level objects such as start tag even the
> > benefits, as you describe, are not that clear ...
>
> Well, we're talking about specific kinds of programs. Which
> is why you've been working on xml-pull APIs and implementations.
> In this work, have you found much need for the application to
> keep parts of the docs around (e.g. the event contents) for
> very long?

it depends: in case when i need to deserialize more complex
data structure (such as SOAP graph) that parts  of XML are
not yet converted then one needs to keep event objects
(as some kind of ultra lightweight partial DOM ...)

> > the problem is to resist temptation of adding too much too fast,
>
> Yep, I agree. There would certainly be a tendency to add
> everything but the kitchen sink. :)

with public API: once something is added then it must be supported forever ...

> > we think that in XMLPULL API we have some useful methods
> > (like nextText/nextTag) and i personally think that more is needed
> > but it is good to wait a bit and see what is _really_ needed.
>
> Yep. It's like when they erect a new building but don't
> make the sidewalks -- they want to see where people walk
> *before* laying the concrete.

so we can depend on user to show us the way to go :-)))

> > i think it is good for perfromance and will make code
> > writing text gathering for element content much easier...
>
> I don't know about performance... I guess it really
> depends on the application.

otherwise one needs to use StringBuffer to keep content
as it is not possible if there is one characters() callback or
multiple for element content. if there was only one callback
i still have to copy characters() buffer as it may be invalid when next
endTag() callback is delivered ...

thanks,

alek



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: XML Pull-Parsing (was: "How to start writing a non-blocking SAX parser")

Posted by Andy Clark <an...@apache.org>.
Aleksander Slominski wrote:
> the interesting thing with immutable objects (such as element name
> represented as interned String) is that they can be kept indefinitely
> and shared very efficiently between parser and the user code
> 
> however for other level objects such as start tag even the
> benefits, as you describe, are not that clear ...

Well, we're talking about specific kinds of programs. Which
is why you've been working on xml-pull APIs and implementations.
In this work, have you found much need for the application to 
keep parts of the docs around (e.g. the event contents) for 
very long?

> > popping the method call stack takes time. If the data fields
> > are public on the object returned from "next", then it's
> > just an object access.
> 
> that makes java programming slightly more lower level
> and fell more like C/C++ :-)

I don't have a problem with that. :)

> > It's just a question of deciding what functionality is the
> > most useful. If 90% of the users end up using this convenience
> > method, then it should be part of the core API. And I think
> > that this functionality (and some others) are that useful.
> 
> the problem is to resist temptation of adding too much too fast,

Yep, I agree. There would certainly be a tendency to add
everything but the kitchen sink. :)

> we think that in XMLPULL API we have some useful methods
> (like nextText/nextTag) and i personally think that more is needed
> but it is good to wait a bit and see what is _really_ needed.

Yep. It's like when they erect a new building but don't
make the sidewalks -- they want to see where people walk 
*before* laying the concrete.

> > Yep. Let me know if you need any pointers understanding
> > how the Xerces2 components work within the XNI framework.
> 
> thanks! i have read Xerces2 code and i have general grasp of its
> working (in general ...)

Okay. Just thought I'd offer my assistance, if needed. 
I know that there aren't many people out there writing 
directly to XNI (or if they are, then they're awfully
quiet about it ;).


> > > > the character buffers. That way I would not have to copy
> > > > any characters at all because I would know that the contents
> > > > of the char buffers would not be over-written.
>
> i think it is good for perfromance and will make code
> writing text gathering for element content much easier...

I don't know about performance... I guess it really
depends on the application.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org