You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Ovidiu Predescu <ov...@cup.hp.com> on 2002/01/19 08:07:04 UTC

[UPDATE] Scheme/Cocoon progress - initial benchmarks show good speed!

Hi everybody,

This past week I've done some good progress in integrating the Scheme
engine in the Cocoon system. Thanks to Sylvain and Berin, I was able
to rework in a nicer, and perhaps more extensible way, the hookup
between the Scheme sitemap and Cocoon. This allows me to have XML
sitemaps, for now very simple, which are parsed and interpreted in
Scheme.

Currently the XML sitemap file is parsed using the Java Parser
component and translated into a Scheme representation called SXML
(Scheme XML, see http://okmij.org/ftp/Scheme/xml.html). This
representation is then translated by a Scheme function into Scheme
code, which becomes the Scheme runtime representation of the
sitemap. On each request, this Scheme function, acting as the sitemap,
is executed to process the request, according to the sitemap
definition.

The only supported constructs right now are simple linear pipelines,
with a generator, one or more transformers, and a serializer, or
pipelines that have only a reader. At some point I want to write some
more generic code to allow for more complex sitemap syntax to be
described. This Scheme code will allow the sitemap syntax to be
described using a BNF like syntax, and allow semantic actions to be
attached to the BNF rules. With this code it should be very easy to
experiment with new syntaxes/semantics for the sitemap (provided
you're willing to describe them in Scheme ;-). If you think about
Cocoon' sitemap syntax as a mini-language, then the natural way of
describing, analyzing and processing it is through the usual
techniques of compiling. That's exactly what I have planned to do, as
soon as I have more clearer ideas on this.

The current Scheme sitemap implementation tries to do some basic
analysis of the correctness of the XML sitemap, and reports the
encountered errors to the standard output by now. This will change to
be more integrated with Cocoon's error reporting mechanisms.

With the new architecture it should be easy to hookup the Scheme
sitemap implementation with another sitemap implementation, like the
current one, or with Sylvain's TreeProcessor implementation. I will
however defer this for the moment as I'm eager to get to the meat of
the problem, playing with the continuations concept.

-- 

Benchmark

I've done some basic, very rough, speed comparison between the Scheme
sitemap implementation and the compiled version. I used the Apache
'ab' program to send requests to process through a simple pipeline
(generator+XSLT+serializer) a very small stylebook document. The
resulting page has about 2.5kb.

The results are surprising: it appears the Scheme sitemap
implementation runs at the same speeds with the compiled version!

The only explanation I have for this is that the Scheme implementation
uses its own URI matcher, based on Jakarta ORO, which rumors say is
faster (at least in the simple usage I have) than Jakarta Regexp,
using by the "compiled" Cocoon. A bigger difference perhaps is the way
the parenthesized expressions in regular expression patters are
interpreted. In the current compiled approach, a substitute() function
is called at runtime to replace in the match pattern with the actual
values. In the Scheme implementation the parenthesis groups are
statically replaced at sitemap compile time, and they become function
arguments. The final expression is composed by doing a string append
of the pattern components and actual values. This trick saves
processing time at runtime, as the pattern does not have to be
traversed to find out where to place the values.

Other than this everything else in the infrastructure is the same
between the two implementations. The difference is of course, the way
the pipelines are setup: in the compiled case each pipeline is setup
by compiled code, running as fast as possible, while in the Scheme
case, the pipeline setup is driven by the Scheme sitemap function,
which should be slower than the compiled version.

--

Future work

So far, I'm quite pleased with how things progress. Next week I'm
going to focus on implementing the infrastructure for using the
continuations from Scheme. I hope I'll be able to work out an example
in Scheme to drive the implementation. Once that's complete, I'm going
to focus on implementing a translator for the flow language to
Scheme.

For the flow language, I was thinking to name it JWebFlow. I
originally thought of JavaFlow, or WebFlow, but they are both taken by
some other projects/commercial companies.

Any better ideas on a good name for the flow language? I'd like to
hear your comments on this one.


Best regards,
-- 
Ovidiu Predescu <ov...@cup.hp.com>
http://orion.nsr.hp.com/ (inside HP's firewall only)
http://sourceforge.net/users/ovidiu/ (my SourceForge page)
http://www.geocities.com/SiliconValley/Monitor/7464/ (GNU, Emacs, other stuff)

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org

Re: [UPDATE] Scheme/Cocoon progress - initial benchmarks show good speed!

Posted by Stefano Mazzocchi <st...@apache.org>.

Ovidiu Predescu wrote:

[cool stuff]

> Benchmark
> 
> I've done some basic, very rough, speed comparison between the Scheme
> sitemap implementation and the compiled version. I used the Apache
> 'ab' program to send requests to process through a simple pipeline
> (generator+XSLT+serializer) a very small stylebook document. The
> resulting page has about 2.5kb.
> 
> The results are surprising: it appears the Scheme sitemap
> implementation runs at the same speeds with the compiled version!
> 
> The only explanation I have for this is that the Scheme implementation
> uses its own URI matcher, based on Jakarta ORO, which rumors say is
> faster (at least in the simple usage I have) than Jakarta Regexp,
> using by the "compiled" Cocoon. A bigger difference perhaps is the way
> the parenthesized expressions in regular expression patters are
> interpreted. In the current compiled approach, a substitute() function
> is called at runtime to replace in the match pattern with the actual
> values. In the Scheme implementation the parenthesis groups are
> statically replaced at sitemap compile time, and they become function
> arguments. The final expression is composed by doing a string append
> of the pattern components and actual values. This trick saves
> processing time at runtime, as the pattern does not have to be
> traversed to find out where to place the values.
> 
> Other than this everything else in the infrastructure is the same
> between the two implementations. The difference is of course, the way
> the pipelines are setup: in the compiled case each pipeline is setup
> by compiled code, running as fast as possible, while in the Scheme
> case, the pipeline setup is driven by the Scheme sitemap function,
> which should be slower than the compiled version.

I'm curious about the trends. 

A while ago I thought that an interpreted sitemap could be as fast (or
even faster) than a compiled one on hotspot JVM.

What JVM are you using? have you seen any difference in performance
trends between the two (the compiled sitemap should be faster at first,
then converging after a few thousands calls)

> --
> 
> Future work
> 
> So far, I'm quite pleased with how things progress. Next week I'm
> going to focus on implementing the infrastructure for using the
> continuations from Scheme. I hope I'll be able to work out an example
> in Scheme to drive the implementation. Once that's complete, I'm going
> to focus on implementing a translator for the flow language to
> Scheme.

That's cool.

> For the flow language, I was thinking to name it JWebFlow. I
> originally thought of JavaFlow, or WebFlow, but they are both taken by
> some other projects/commercial companies.
>
> Any better ideas on a good name for the flow language? I'd like to
> hear your comments on this one.

Hmmm, I question if we need to name it.

I mean: cocoon has the sitemap language, but we don't call it XMap or
XPL (extensible pipelining language) or equivalent. We simply say
'sitemap' and everybody understands.

Now, my personal suggestion is to avoid calling the language something
but call the flow scripts simply 'flowmaps'.

As we then say 'sitemap markup' we will say 'flowmap language'.

My impression is that naming the language will give users the impression
they have to learn another programming language (yet another!) in order
to run Cocoon and get something out of it.

Hey, this is *exactly* what happens for the sitemap, but people are
*much* less scared away from markup because I think the problem is with
syntax, not with semantics.

that is the reason why a scheme version of the sitemap or flowmap will
scare the crap out of almost every user we already have!

so, java doesn't scare people away because it uses the good-old C
syntax. javascript same thing.

Now you come up with JWebFlow... ok, the user thinks: has a J on it, so
it must be similar to java, and that's good, but then what?

On the other hand, the docs say: the sitemap defines the resources of
your site and the flowmaps define the flow and interaction between them.

What could be easier and more elegant than that?

so the user looks at examples and find out that the sitemap is written
using the XML syntax (and he'll find similar to the httpd.conf or
server.xml files and be pleased) and the flowmaps are writting using a
javascript-like syntax, cool, that's how he would have done it in his
dreams.

But if you tell him: look, in order to write your flowmaps, you have to
learn JWebFlow... oh, shit, no, no, I have a short deadline and I'm
already mixing java and PHP in my site, no please, no more languages.

See what I mean?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org

RE: [UPDATE] Scheme/Cocoon progress - initial benchmarks show good speed!

Posted by Allan Erskine <a....@cs.ucl.ac.uk>.

(a long-time lurker takes a break from his studies and is BLOWN AWAY by
C2's progress!!!  Congratulations all!)

Ovidiu,

I was trying to get Schecoon to build (great name, BTW!), but build came
unstuck trying to generate sisc.heap...I'm not a CVS expert; that's
probably why I'm suspicious of it - if sisc.heap is a serialised file,
would it not need flagged as text, so CVS could perform all the
appropriate conversions from *nix?

Best,
Allan

-----Original Message-----
From: ovidiu@cup.hp.com [mailto:ovidiu@cup.hp.com] 
Sent: 19 January 2002 07:07
To: cocoon-dev@xml.apache.org
Subject: [UPDATE] Scheme/Cocoon progress - initial benchmarks show good
speed!


Hi everybody,

This past week I've done some good progress in integrating the Scheme
engine in the Cocoon system. Thanks to Sylvain and Berin, I was able
to rework in a nicer, and perhaps more extensible way, the hookup
between the Scheme sitemap and Cocoon. This allows me to have XML
sitemaps, for now very simple, which are parsed and interpreted in
Scheme.

Currently the XML sitemap file is parsed using the Java Parser
component and translated into a Scheme representation called SXML
(Scheme XML, see http://okmij.org/ftp/Scheme/xml.html). This
representation is then translated by a Scheme function into Scheme
code, which becomes the Scheme runtime representation of the
sitemap. On each request, this Scheme function, acting as the sitemap,
is executed to process the request, according to the sitemap
definition.

The only supported constructs right now are simple linear pipelines,
with a generator, one or more transformers, and a serializer, or
pipelines that have only a reader. At some point I want to write some
more generic code to allow for more complex sitemap syntax to be
described. This Scheme code will allow the sitemap syntax to be
described using a BNF like syntax, and allow semantic actions to be
attached to the BNF rules. With this code it should be very easy to
experiment with new syntaxes/semantics for the sitemap (provided
you're willing to describe them in Scheme ;-). If you think about
Cocoon' sitemap syntax as a mini-language, then the natural way of
describing, analyzing and processing it is through the usual
techniques of compiling. That's exactly what I have planned to do, as
soon as I have more clearer ideas on this.

The current Scheme sitemap implementation tries to do some basic
analysis of the correctness of the XML sitemap, and reports the
encountered errors to the standard output by now. This will change to
be more integrated with Cocoon's error reporting mechanisms.

With the new architecture it should be easy to hookup the Scheme
sitemap implementation with another sitemap implementation, like the
current one, or with Sylvain's TreeProcessor implementation. I will
however defer this for the moment as I'm eager to get to the meat of
the problem, playing with the continuations concept.

-- 

Benchmark

I've done some basic, very rough, speed comparison between the Scheme
sitemap implementation and the compiled version. I used the Apache
'ab' program to send requests to process through a simple pipeline
(generator+XSLT+serializer) a very small stylebook document. The
resulting page has about 2.5kb.

The results are surprising: it appears the Scheme sitemap
implementation runs at the same speeds with the compiled version!

The only explanation I have for this is that the Scheme implementation
uses its own URI matcher, based on Jakarta ORO, which rumors say is
faster (at least in the simple usage I have) than Jakarta Regexp,
using by the "compiled" Cocoon. A bigger difference perhaps is the way
the parenthesized expressions in regular expression patters are
interpreted. In the current compiled approach, a substitute() function
is called at runtime to replace in the match pattern with the actual
values. In the Scheme implementation the parenthesis groups are
statically replaced at sitemap compile time, and they become function
arguments. The final expression is composed by doing a string append
of the pattern components and actual values. This trick saves
processing time at runtime, as the pattern does not have to be
traversed to find out where to place the values.

Other than this everything else in the infrastructure is the same
between the two implementations. The difference is of course, the way
the pipelines are setup: in the compiled case each pipeline is setup
by compiled code, running as fast as possible, while in the Scheme
case, the pipeline setup is driven by the Scheme sitemap function,
which should be slower than the compiled version.

--

Future work

So far, I'm quite pleased with how things progress. Next week I'm
going to focus on implementing the infrastructure for using the
continuations from Scheme. I hope I'll be able to work out an example
in Scheme to drive the implementation. Once that's complete, I'm going
to focus on implementing a translator for the flow language to
Scheme.

For the flow language, I was thinking to name it JWebFlow. I
originally thought of JavaFlow, or WebFlow, but they are both taken by
some other projects/commercial companies.

Any better ideas on a good name for the flow language? I'd like to
hear your comments on this one.


Best regards,
-- 
Ovidiu Predescu <ov...@cup.hp.com>
http://orion.nsr.hp.com/ (inside HP's firewall only)
http://sourceforge.net/users/ovidiu/ (my SourceForge page)
http://www.geocities.com/SiliconValley/Monitor/7464/ (GNU, Emacs, other
stuff)

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org