You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Aleksander Slominski <as...@cs.indiana.edu> on 2002/07/22 19:53:02 UTC

feature to reuse buffer [Re: [Announce] The CyberNeko Tools for XNI 2002.07.17 Available]

Andy Clark wrote:

> So I'm thinking of adding a feature to Xerces2 that
> allows applications to set whether the scanner re-
> uses its character buffers. If the scanner doesn't
> reuse buffers, then my converter doesn't have to
> copy anything. Therefore, the performance of the
> pull-parser driven by Xerces2 should be better.

hi,

that would be great- i have hit this problem when
implementing XMLPULL API on top of Xerces 2 XNI.

another great feature would be ability to "lock" parser
buffer content so when i receive multiple character() call-backs
i know that XMLString offset and length will be valid
as parser will just create bigger buffer to keep larger content.

then later i unlock allowing parser to reuse current buffer
(so parser can start writing to buffer from beginning).

it would be interesting to compare if such reuse of growable
buffer is more efficient then fixed buffer (AFAIK it is current
implementation in Xerces) and the option you describe when
scanner never reuses the buffer - how does it sound?

> Of course, you could always write an implementation
> only for the pull-parsing API that wouldn't suffer
> this problem at all. You could even drive the API
> from other pull-parsing APIs like XPP, etc. :)

that is where good layering becomes important for
good performance (lower level must be efficient :-))
and i think about XML pull parsing as being
quite low level and very efficient (similarly to SAX),
natural layer just above XML tokenizer that
can be used to build higher levels efficiently
(like increment xml node trees).

thanks,

alek



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


allowing client to control buffer use policy [Re: feature to reuse buffer]

Posted by Aleksander Slominski <as...@cs.indiana.edu>.
Andy Clark wrote:

> Aleksander Slominski wrote:
> > i have implemented something like this already by wrapping
> > real reader with special reader that is cumulating input when
> > asked for. that allows me to keep buffer offsets from
> > XMLString and uses content in cumulative reader buffer
> > even though Xerces may reuse its internal buffer
> > (see below how i implemented CumulativeReader - it
> > works but is inefficient as it adds extra layer of indirection
> > and another buffer potentially for all IO operations).
>
> The inefficiency is precisely the reason why I wouldn't
> want to use this type of approach. Besides, this approach
> has a hard limit of Integer.MAX_VALUE.

hi,

typical usage pattern is to lock buffer when interesting
event happens (for example first call to characters())
and then unlock to retrieve content (for example in endElement())
- that means that client does not need to copy
buffer content (no need to accumulate characters()
content in StringBuffer etc.).

when unlocked the content of the buffer has exactly
the same semantics as currently (when unlocked buffer can be shrank)
however when locked it is then guaranteed that offsets
into buffer content are valid until buffer is unlocked.
therefore only if buffer is locked permanently it will grow indefinitely.


anyway i think it is very useful feature.


> The approach I'm thinking of simply manages the creation
> of buffers, not what is done with them. So the various
> parts of the code would still be in control of their own
> buffers but would simply request new buffers from the
> buffer manager/factory/whatever.

however that means that making sure that
buffer can be safely reused will be very difficult - right?

i think that if buffers are not reused then xerces may
be facing memory problems when parsing large XML input
(for example if client keeps reference to buffers
passed in characters())?

thanks,

alek



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: feature to reuse buffer [Re: [Announce] The CyberNeko Tools forXNI 2002.07.17 Available]

Posted by Andy Clark <an...@apache.org>.
Aleksander Slominski wrote:
> i have implemented something like this already by wrapping
> real reader with special reader that is cumulating input when
> asked for. that allows me to keep buffer offsets from
> XMLString and uses content in cumulative reader buffer
> even though Xerces may reuse its internal buffer
> (see below how i implemented CumulativeReader - it
> works but is inefficient as it adds extra layer of indirection
> and another buffer potentially for all IO operations).

The inefficiency is precisely the reason why I wouldn't
want to use this type of approach. Besides, this approach
has a hard limit of Integer.MAX_VALUE.

The approach I'm thinking of simply manages the creation
of buffers, not what is done with them. So the various
parts of the code would still be in control of their own
buffers but would simply request new buffers from the
buffer manager/factory/whatever.

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: feature to reuse buffer [Re: [Announce] The CyberNeko Tools forXNI 2002.07.17 Available]

Posted by Aleksander Slominski <as...@cs.indiana.edu>.
Andy Clark wrote:

> Aleksander Slominski wrote:
> > another great feature would be ability to "lock" parser
> > buffer content so when i receive multiple character() call-backs
> > i know that XMLString offset and length will be valid
> > as parser will just create bigger buffer to keep larger content.
>
> Sounds like you're asking for a more advanced buffer
> management system built into the parser. I was looking
> at implementing the "don't re-use character buffers"
> feature today and it's not as easy as I had hoped. So...

i have implemented something like this already by wrapping
real reader with special reader that is cumulating input when
asked for. that allows me to keep buffer offsets from
XMLString and uses content in cumulative reader buffer
even though Xerces may reuse its internal buffer
(see below how i implemented CumulativeReader - it
works but is inefficient as it adds extra layer of indirection
and another buffer potentially for all IO operations).

> I may end up implementing a system where the application
> can register a buffer manager and all of the components
> would ask the manager for new buffers, etc. Then, using
> this method, the Xerces2 parser would by default re-use
> buffers but a pull parser could install a new buffer
> manager that always created new buffers.
>
> But I need to do some more thinking on the topic...

that would be great!

i think that this is important feature to tune for perfromance.

thanks,

alek

ps. here is how i implemented cumulative reader:

    protected class CumulativeReader extends Reader
    {
        private Reader source;

        private boolean cumulative;

        private int bufAbsoluteStart;
        private int bufAbsoluteEnd;

        private char[] buf = new char[10 * 1024];
        private int bufStart;
        private int bufEnd;



        /** Constructs this reader from another reader. */
        public CumulativeReader(Reader reader) {
            source = reader;
        }

        public void setCumulative(boolean value) {
            cumulative = value;
            if(cumulative) {
                char[] newBuf = new char[buf.length + 10 * 1024];
                if(bufEnd > bufStart) {
                    System.arraycopy(buf, bufStart, newBuf, 0, bufEnd - bufStart);
                }
                bufEnd = bufEnd - bufStart;
                bufStart = 0;
            }
        }
        public boolean getCumulative() { return cumulative; }

        public char[] getCumulativeBuffer() {
            return buf;
        }

        public int getCumulativeBufferAbsoluteStart() {
            return bufAbsoluteStart;
        }

        public int getCumulativeBufferAbsoluteEnd() {
            return bufAbsoluteEnd;
        }

        public int getCumulativeBufferStart() {
            return bufStart;
        }

        public int getCumulativeBufferEnd() {
            return bufEnd;
        }

        //
        // Reader methods
        //


        // ignore closing
        public void close() { }

        public int read(char[] ch, int offset, int length)
            throws IOException
        {

            // read form original
            int ret = source.read(ch, offset, length);

            if(ret > 0) {
                if(!cumulative) {
                    buf = ch;
                    bufStart = offset;
                    bufEnd = offset + ret;
                    bufAbsoluteStart = bufAbsoluteEnd;
                } else {
                    // append ch to buf at bufAbsoluteEnd
                    int newLen = bufEnd + length;
                    if(buf.length < newLen) {
                        char[] newBuf = new char[newLen + 10 * 1024];
                        System.arraycopy(buf, bufStart, newBuf, 0, bufEnd - bufStart);
                        bufEnd = bufEnd - bufStart;
                        bufStart = 0;
                    }
                    System.arraycopy(ch, offset, buf, bufEnd, ret);
                    bufEnd += ret;
                }
                bufAbsoluteEnd += ret;
            }

            return ret;

        }

    }



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: feature to reuse buffer [Re: [Announce] The CyberNeko Tools for XNI 2002.07.17 Available]

Posted by Andy Clark <an...@apache.org>.
Aleksander Slominski wrote:
> another great feature would be ability to "lock" parser
> buffer content so when i receive multiple character() call-backs
> i know that XMLString offset and length will be valid
> as parser will just create bigger buffer to keep larger content.

Sounds like you're asking for a more advanced buffer
management system built into the parser. I was looking
at implementing the "don't re-use character buffers"
feature today and it's not as easy as I had hoped. So...

I may end up implementing a system where the application
can register a buffer manager and all of the components
would ask the manager for new buffers, etc. Then, using
this method, the Xerces2 parser would by default re-use
buffers but a pull parser could install a new buffer
manager that always created new buffers.

But I need to do some more thinking on the topic...

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org