You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Berin Loritsch <bl...@apache.org> on 2003/04/04 05:56:18 UTC

[RT:Long] Initial Results and comments (was Re: Compiling XML, and its replacement)

Stefano Mazzocchi wrote:
> I'll also be interested to see how different the performance gets on 
> hotspot server/client and how much it changes with several subsequent runs.

Well, with HotSpot client and a 15.4 KB (15,798 bytes) test document
(my build.xml file), I got the following results:

      [junit] Parsed 873557 times in 10005ms
      [junit] Average of 0.011453173633775472ms per parse

Compare that to a much smaller 170 bytes (170 bytes) test document:

      [junit] Parsed 16064210 times in 10004ms
      [junit] Average of 6.227508231030347E-4ms per parse

The two documents are at completely different complexities,
but the ratio of results is:

      170b      .000623ms
   --------- = -----------
    15,800b      .0115ms

That's a size increase of 92.9 times

compared to a time increase of 18.5 times

Times were comparable to Server Hotspot for this solution--although it
was only run for 10 seconds.

Considering we have a 5:1 size to time scaling ratio, it would be
interesting to see if it carries out to a much larger XML file--
if only I had one.  If scalability was linear, then a 1,580,000
byte file should only take .23 ms to parse.

I also tried the test with the -Xint (interpreted mode only) option
set, and there was no appreciable difference.  As best I can tell,
this is largely because the code is already as optimized as it
possibly can be.  This fits in line with your observations of unrolled
"loops".

In this instance though, I believe that we are dealing with more than
just "unrolled loops"  We are dealing with file reading overhead, and
interpretation overhead.  Your *compressed* XML addresses the second
issue, but in the end I believe it will behave very similarly to my
solution.

Also keep in mind that improvements in the compiler design (far future)
can allow for repetitive constructs to be moved into a separate method.
For instance, the following XML is highly repetitive:

<demo>
    <entry name="foo">
      bar
    </entry>
    <entry name="foo">
      bar
    </entry>
    <entry name="foo">
      bar
    </entry>
    <entry name="foo">
      bar
    </entry>
    <entry name="foo">
      bar
    </entry>
    <entry name="foo">
      bar
    </entry>
</demo>

As documents become very large it becomes critical to do something
other than my very simplistic compilation.  However there are plenty
of opportunities to optimize the XML compiler.  For example, we could
easily reduce the above XML to something along the lines of:

startElement("demo")

for (int i = 0; i < 6; i++)
{
      outputEntry()
}

endElement("demo")

Even if the attribute values and element values were different,
but the same structure remained, the compiler would be able
to (theorhetically) reduce it to a method with parameters:

startElement("demo")

outputEntry("foo", "bar");
outputEntry("ego", "centric");
outputEntry("gas", "bag");
outputEntry("I", "am");
outputEntry("just", "kidding");
outputEntry("my", "peeps");

endElement("demo")

Still allowing for some level of hotspot action.

However, I believe the true power of Binary XML will be with its
support for XMLCallBacks and (in the mid term future) decorators.
The decorator concept will allow us to set a series of SAX events
for a common object.  This will render the XSLT stage a moot point
as we can apply pre-styled decorators to the same set of objects.
These will call for some alterations of the compiler as it stands
now, and will be required before a 1.0 relese.

I am trying to keep the library lean and mean.

Re: [RT:Long] Initial Results and comments (was Re: Compiling XML, and its replacement)

Posted by Berin Loritsch <bl...@apache.org>.

Stefano Mazzocchi wrote:
>> Considering we have a 5:1 size to time scaling ratio, it would be
>> interesting to see if it carries out to a much larger XML file--
>> if only I had one.  If scalability was linear, then a 1,580,000
>> byte file should only take .23 ms to parse.
> 
> 
> Are you aware of the fact that any java method cannot be greater than 
> 64Kb of bytecode? And I'm also sure there is a limit on how many methods 
> a java class can have.
> 
> So, at the very end, you have a top-size limit on how big your 
> compiled-in-memory object can be.

Absolutely.  However this is a stepping stone.  I haven't begun to look
at compiler optimizations yet.  I am trying to get the interface the way
I like it, and the thing to merely function (which I did last night!)

>> In this instance though, I believe that we are dealing with more than
>> just "unrolled loops"  We are dealing with file reading overhead, and
>> interpretation overhead.  Your *compressed* XML addresses the second
>> issue, but in the end I believe it will behave very similarly to my
>> solution.
> 
> 
> Good point. But you are ignoring the fact that all modern operating 
> systems have cached file systems. And, if this was not the case, it 
> would be fairly trivial to implement one underneat a source resolver.

:) And yet certain operations touch the file and incorporate a call to
blocking filesystem code.  Seriously though, once a file is read into
memory, it's all about the parsing and processing.  With my solution
there is nothing to process--it's all been done.

>> Also keep in mind that improvements in the compiler design (far future)
>> can allow for repetitive constructs to be moved into a separate method.
>> For instance, the following XML is highly repetitive:

<snip/>

>>
>> Still allowing for some level of hotspot action.
> 
> 
> I see, also to overcome the 64kb method limitation.

:)  Yep.

>> However, I believe the true power of Binary XML will be with its
>> support for XMLCallBacks and (in the mid term future) decorators.
> 
> 
> Can you elaborate more on this?

I just got this "working" in the sense that it is operational, not
in the sense that it is elegant, or where it needs to be.  Presently
I am using Processing Instructions to represent when a callback is
required.  What I want to do is allow actual XMLFragments be converted
to callbacks in the compiler.  That would allow direct support for
conventions such XInclude or other standards.  Unfortunately, it
proved too difficult for the short term.

For now, what I have working is this:

<test>
   <element withAttribute="true"/>
   <document>Add some text here</document>

   <?include-xml ../../build.xml?>
</test>

When this document is compiled you get the standard SAX events that
you expect, but the processing instruction is compiled as an
XMLCallBack.  This proved to be the easiest thing from an implementation
perspective--but I am open to alternatives.

The beauty of this approach is that CallBacks are much easier to
develop than something that works with SAX events on the fly.  I
have to add some more helper classes to make that statement true,
and your compressed XML would most likely be a key element of that.

However the concept is simple.  A document can be boiled down to
the parts that *never* change, and the elements that do change
are represented by easily developed code.  I'm thinking like a
developer, not a script kiddie.

The consequence of the design decisions is that we can never have
anything like [AJX]SP abuses like the following:

<xsp:logic>
   for (int i = 0; i < 10; i++)
   {
</xsp:logic>

   <element/>

<xsp:logic>
   }
</xsp:logic>

That is valid (but *very* poorly written XSP).  The XML can be boiled
down to things like this:

<html>
   <head><title><?doc-title theme="coco"?></title></head>
   <body>
     <table>
       <tr><td><img src="logo.png"/></td>
           <td><?doc-title theme="coco"?></td></tr>
       <tr rowspan="2"/><td><?site-tabs theme="coco"?></td></tr>
     </table>
     <table>
       <tr>
         <td><?site-menu theme="coco"?></td>
         <td><?doc-content theme="coco"?></td>
         <td><?site-tools theme="coco"></td>
       </tr>
     </table>
   </body>
</html>

Notice the embedded processing instructions?  They would be set
to call certain callback methods which could be used to provide
a common look and feel to all the docs.

The processing instruction would have the callback name (which
will be accessible via the JAR Services mechanism), and the
proper theme is preserved throughout the document.

It also means that certain things like the menu, tabs, content,
and tools can have the same logic but apply the specified
decorator (could be XSLTC, or could be something else).

The pipeline for this would be very simple:

<site:match pattern="*.html">
   <site:act name="choose-doc" source="{1}"/>
   <site:generate type="bxml" source="coco.xml"/>
   <site:serialize/>
</site:match>

It's been a while so I appologize if my sitemap logic is off.

But notice that there is no need for a transformer?

There is alot of work to make my vision happen, but it should be
much more natural for developers to work with than trying to write
a transformer to intercept certain logic.  The code that the developer
would have to write would be much more compact and readable as well.

To make this a reality, the XMLRepository needs to be modified to
allow temporary storage of XMLFragments, and the compiler needs to
be altered to allow for different compilation strategies (i.e.
optimize for fragments, etc.)

Anyway, hopefully you will see some advantages in the approach.

>> The decorator concept will allow us to set a series of SAX events
>> for a common object.  This will render the XSLT stage a moot point
>> as we can apply pre-styled decorators to the same set of objects.
> 
> 
> Isn't this what a translet (an xsltc-compiled XSLT stylesheet) was 
> supposed to be?

You would know better.  However what I was thinking of is something more
along the lines of this:

XMLDecorator
{
     transform( Object o, ContentHandler handler );
}

In a directory renderer callback I might have code like this:

DirectoryCallBack
{
     // exclude all the init code

     XMLFragment process( Properties props )
     {
         File dir = new File(props.getProperty("dir");
         CompressedFragment xml = new CompressedFragment();

         m_fileDecorator(dir, xml.contentHandler());

         return xml;
     }
}

The callback code is pretty simple.  I can easily create the callback,
and delegate to a delegator the actual representation of the object.
The object can be represented as XHTML directly, and it would be
embedded in the proper location.

> Anyway, I'm happy to see new approaches to xml generation being researched.

I had the concept a long time ago, and I think it could fit quite well
in the Cocoon concept.  My goal is to replace XSP with a more programmer
friendly alternative--not to make Cocoon absolete.

Re: [RT:Long] Initial Results and comments (was Re: Compiling XML, and its replacement)

Posted by Stefano Mazzocchi <st...@apache.org>.

Berin Loritsch wrote:
> Stefano Mazzocchi wrote:
> 
>> I'll also be interested to see how different the performance gets on 
>> hotspot server/client and how much it changes with several subsequent 
>> runs.
> 
> 
> Well, with HotSpot client and a 15.4 KB (15,798 bytes) test document
> (my build.xml file), I got the following results:
> 
>      [junit] Parsed 873557 times in 10005ms
>      [junit] Average of 0.011453173633775472ms per parse
> 
> Compare that to a much smaller 170 bytes (170 bytes) test document:
> 
>      [junit] Parsed 16064210 times in 10004ms
>      [junit] Average of 6.227508231030347E-4ms per parse
> 
> 
> The two documents are at completely different complexities,
> but the ratio of results is:
> 
>      170b      .000623ms
>   --------- = -----------
>    15,800b      .0115ms
> 
> That's a size increase of 92.9 times
> 
> compared to a time increase of 18.5 times
> 
> 
> Times were comparable to Server Hotspot for this solution--although it
> was only run for 10 seconds.
> 
> Considering we have a 5:1 size to time scaling ratio, it would be
> interesting to see if it carries out to a much larger XML file--
> if only I had one.  If scalability was linear, then a 1,580,000
> byte file should only take .23 ms to parse.

Are you aware of the fact that any java method cannot be greater than 
64Kb of bytecode? And I'm also sure there is a limit on how many methods 
a java class can have.

So, at the very end, you have a top-size limit on how big your 
compiled-in-memory object can be.

> I also tried the test with the -Xint (interpreted mode only) option
> set, and there was no appreciable difference.  As best I can tell,
> this is largely because the code is already as optimized as it
> possibly can be.  This fits in line with your observations of unrolled
> "loops".

Yep.

> In this instance though, I believe that we are dealing with more than
> just "unrolled loops"  We are dealing with file reading overhead, and
> interpretation overhead.  Your *compressed* XML addresses the second
> issue, but in the end I believe it will behave very similarly to my
> solution.

Good point. But you are ignoring the fact that all modern operating 
systems have cached file systems. And, if this was not the case, it 
would be fairly trivial to implement one underneat a source resolver.

> Also keep in mind that improvements in the compiler design (far future)
> can allow for repetitive constructs to be moved into a separate method.
> For instance, the following XML is highly repetitive:
> 
> <demo>
>    <entry name="foo">
>      bar
>    </entry>
>    <entry name="foo">
>      bar
>    </entry>
>    <entry name="foo">
>      bar
>    </entry>
>    <entry name="foo">
>      bar
>    </entry>
>    <entry name="foo">
>      bar
>    </entry>
>    <entry name="foo">
>      bar
>    </entry>
> </demo>
> 
> As documents become very large it becomes critical to do something
> other than my very simplistic compilation.  However there are plenty
> of opportunities to optimize the XML compiler.  For example, we could
> easily reduce the above XML to something along the lines of:
> 
> startElement("demo")
> 
> for (int i = 0; i < 6; i++)
> {
>      outputEntry()
> }
> 
> endElement("demo")
> 
> Even if the attribute values and element values were different,
> but the same structure remained, the compiler would be able
> to (theorhetically) reduce it to a method with parameters:
> 
> startElement("demo")
> 
> outputEntry("foo", "bar");
> outputEntry("ego", "centric");
> outputEntry("gas", "bag");
> outputEntry("I", "am");
> outputEntry("just", "kidding");
> outputEntry("my", "peeps");
> 
> endElement("demo")
> 
> Still allowing for some level of hotspot action.

I see, also to overcome the 64kb method limitation.

> However, I believe the true power of Binary XML will be with its
> support for XMLCallBacks and (in the mid term future) decorators.

Can you elaborate more on this?

> The decorator concept will allow us to set a series of SAX events
> for a common object.  This will render the XSLT stage a moot point
> as we can apply pre-styled decorators to the same set of objects.

Isn't this what a translet (an xsltc-compiled XSLT stylesheet) was 
supposed to be?

Anyway, I'm happy to see new approaches to xml generation being researched.

Stefano.