You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Berin Loritsch <bl...@apache.org> on 2003/02/24 14:24:27 UTC

[RT] Fixing the CLI

With HTTP, we have a lovely Response object that not only provides
a way of serializing the page to the end user, but also allows us
to set the HTTP return code for the page and its mime-type, etc.

It is clear that we are not using this to our advantage with the
Command Line Interface to Cocoon.  The fact that we can't easily
turn off error page generation for static web sites (that will
integrate with stuff generated or preexisting by other tools),
proves that we are not leveraging the HTTP return code to our
advantage.  Any Response with a return code in the 200s is good,
and any response with a return code that is 300+ is bad.

If we leverage that information, we can suppress the output of
pages that are redirects or simply error conditions (resource
not found, permission denied, 500 style error, etc.).

Also, it is important to output the links AS IS.  The fact that
Cocoon CLI and Cocoon live produce different results is plain
wrong.  If I want my HTML file to have a .foo extension, then
I should be able to do that without any problems.  Currently the
CLI version converts it to a .foo.html.  I don't think that it
should--or at the very least, it should be an option to turn off
that behavior.

The CLI should be able to determine--in one pass--what the
error condition of a page is, the mime-type, and the actual
page itself.  That would obviate the need for much of the
multi-pass architecture now.

Lastly, things like different views should be accessible via
parameters.  The CLI sends in the Request Parameters, so it
should be able to acess the results from the different views
via the Request Parameters.  That way the CLI version can
do all it needs to do in one pass.  THat seriously beats the
multipass approach it has today.

What do you all think?

Re: [RT] Fixing the CLI

Posted by Jeff Turner <je...@apache.org>.

On Mon, Feb 24, 2003 at 08:24:27AM -0500, Berin Loritsch wrote:
...
> Also, it is important to output the links AS IS.  The fact that
> Cocoon CLI and Cocoon live produce different results is plain
> wrong.  If I want my HTML file to have a .foo extension, then
> I should be able to do that without any problems.  Currently the
> CLI version converts it to a .foo.html.  I don't think that it
> should--or at the very least, it should be an option to turn off
> that behavior.

Another common case is where someone links to a directory, and the CLI
translates 'foo/' to 'foo/index.html', which half the time is wrong.

> What do you all think?

+1.  Anyone who makes the CLI more flexible will earn undying gratitude
from Forresters..


--Jeff

Re: [RT] Fixing the CLI

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Upayavira wrote, On 24/02/2003 18.24:
>>>This will be possible soon. Functionality is there, command line
>>>option is missing ATM. If you need it urgent, you can fix it in 10
>>>minutes.
>>
>>Yes, it's on my TODO list, along with other CLI optimizations I 
>>discussed with Vadim :-)
> 
> 
> Seeing as this was my fault (I didn't add an option to Main.java when I split Main 
> into Main and CocoonBean, patch applied by Vadim), I have done it now. 

Not your fault. It's an additional feature :-D

> There's 
> now an option (-e), which allows you to switch off confirmation of extensions (-e 
> false will switch it off, default is true to maintain existing functionality).
> 
> I've also added an option to pre-load a class, so that the CLI can be used to 
> generate database driven sites (-L <classname>). It can be repeated to allow the 
> loading of more than one class.
> 
> As soon as I can get my Cocoon system to work (see invalid config message), I'll 
> test and post a patch to Bugzilla.

Excellent! :-D

> I'd be interested to hear about these CLI optimizations you refer to.

Well... hmmm... ok, let me see if I remember it well enough.

There are two sets of optimizations possible: traversing optimizations 
and sitemap short-circuits.

Traversing optimizations
-------------------------

As you know, the Cocoon CLI gets the content of a page 3 times.
I had refactored these three calls to Cocoon in the methods (in call order):

  1 - getLinks
       First the page is generated and the link view is used to
       get the links

  2 - getType (called in translateURI)
       Then the type of the page is needed, so we know if we need
       to add an extension or other things; basically to translate
       the URI

  3 - getPage
      Actually gets the page *and* uses the translated URIS in the links

Now, with the -e option we basically don't need step 2. If done 
correctly, this will increase the speed! :-)

So we have two steps left: getting links and getting the page.
If we can make them into a single step we're done.

Cocoon has the concept of pluggable pipelines. And each pipeline is 
responsible of connecting the various components. If we used a pipeline 
that simply inserts between the source and the next components a pipe 
that records all links 
(org.apache.cocoon.xml.xlink.ExtendedXLinkPipe.java) into the 
Enviroment, we can effectively get both the result and the links in a 
single pass.

NOTE: This is possible *only* if we use the -e option. If we don't, the 
URL translation needed makes it impossible to do it in a single step, 
unless we keep the documents in memory and use a recursive algorithm, 
which poses bigger problems of scalability.


Sitemap short-circuits
-------------------------

Sometimes in the sitemap you will find things like:

     <map:match pattern="*/**">
       <map:read mime-type="text/html" src="docs/{1}/{2}.html"/>
     </map:match>

In this case the CLI fails to copy all the html files that the webapp 
version does.

We *could* pass it in the pipeline and traverse the links, but if we 
didn't want to touch the html at all? Imagine also that those html files 
are 5MB of Javadocs... ;-)

So in this case the CLI could see that we have a match with a reader on 
the local filesystem, and locally "invert" the pipeline with an 
optimization. That is, copy all html files under the docs dir, which in 
Java can be done orders of magnitude faster than under Cocoon.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Fixing the CLI

Posted by Upayavira <uv...@upaya.co.uk>.

> > This will be possible soon. Functionality is there, command line
> > option is missing ATM. If you need it urgent, you can fix it in 10
> > minutes.
> 
> Yes, it's on my TODO list, along with other CLI optimizations I 
> discussed with Vadim :-)

Seeing as this was my fault (I didn't add an option to Main.java when I split Main 
into Main and CocoonBean, patch applied by Vadim), I have done it now. There's 
now an option (-e), which allows you to switch off confirmation of extensions (-e 
false will switch it off, default is true to maintain existing functionality).

I've also added an option to pre-load a class, so that the CLI can be used to 
generate database driven sites (-L <classname>). It can be repeated to allow the 
loading of more than one class.

As soon as I can get my Cocoon system to work (see invalid config message), I'll 
test and post a patch to Bugzilla.

I'd be interested to hear about these CLI optimizations you refer to.

Regards, Upayavira

Re: [RT] Fixing the CLI

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Vadim Gritsenko wrote, On 24/02/2003 14.57:
> Berin Loritsch wrote:
> 
>> Also, it is important to output the links AS IS.
> 
> 
> This will be possible soon. Functionality is there, command line option 
> is missing ATM. If you need it urgent, you can fix it in 10 minutes.

Yes, it's on my TODO list, along with other CLI optimizations I 
discussed with Vadim :-)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Fixing the CLI

Posted by Vadim Gritsenko <va...@verizon.net>.

Berin Loritsch wrote:

> Also, it is important to output the links AS IS.



This will be possible soon. Functionality is there, command line option 
is missing ATM. If you need it urgent, you can fix it in 10 minutes.

Vadim

Re: [RT] Fixing the CLI

Posted by Berin Loritsch <bl...@apache.org>.

Nicola Ken Barozzi wrote:
> 
>>>> Lastly, things like different views should be accessible via
>>>> parameters.  The CLI sends in the Request Parameters, so it
>>>> should be able to acess the results from the different views
>>>> via the Request Parameters. 
> 
> 
> I don't get it. Could you please explain to me again?

All environments pass in a Request object, a Context object, and
a Response object.

The Request object would be the perfect place to store the results
of alternate views.  That way the CLI environment would be able to
query the Request object that was modified by Cocoon for the results
of a view.

All is done in one pass.  In the live web environment this feature
wouldn't be as useful, mainly because we haven't identified a need
to work with multiple views of the same resource in that environment.

I think the next thing that needs to be done to enable all this
for real is to relegate Cocoon to merely preparing the Response
objects and not committing them (by writing the response out).
That way the CLI can do its job properly, and the web will still
work.

Then again, this does present some other issues in regards to large
pages.  If the page is held in memory until it is done, then Cocoon
takes up too many resources.  One possibility is add a "Listener"
support to Cocoon.

A ResponseListener would provide a meaningful way to receive event
notification for different stages of the resource generation.  Of
course, that also means you need a way of canceling the generation
of a resource as well.  Something along these lines might work:

ResponseListener
{
     // returns false if response needs cancelation
     boolean responseCodeChanged( int code );

     // returns extension for mimetype--or null if no change
     String mimeTypeChanged( String mimeType );

     // Not sure the value to pass in, but you can listen
     // for these events and interpret them as they arrive.
     void viewResults( String viewName, Object results );
}

In the web environment, the DefaultResponseListener would be used,
and in the CLI environment, a specialized one would be used.

Re: [RT] Fixing the CLI

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Berin Loritsch wrote, On 24/02/2003 15.02:
> Stefano Mazzocchi wrote:
> 
>> Berin Loritsch wrote:
>>
>>> If we leverage that information, we can suppress the output of
>>> pages that are redirects or simply error conditions (resource
>>> not found, permission denied, 500 style error, etc.).
>>
>> Sorry, I don't understand what you are implying here.
> 
> 
> Implying?  I am stating that if a page is not successfully
> generated, I do not want a generated error page.  I.e.  Success or
> nothing.  If the CLI has a 400+, 500+ style error code, then it
> should be able to suppress the output of that page.
> 
> THat way tools that are designed to find broken links will actually
> find broken links.  ALso Cocoon (and hense Forrest) won't overwrite
> existing files or files generated by other tools.  Many projects
> do have heterogenious environments--it is a fact of life.

Yes, Berin, I know and understand.
I've been looking to fix it in the past days but it's not so easy, look 
at the code. The fact is that Cocoon itself generates it via 
handle-errors and using the Environment _writes it down_.

Cocoon dies not give you back a file! So it's a black box that makes 
that file and returns a response. But the file is already committed.
I tried renaming the file but it doesn't work.

Still digging...

>>> Also, it is important to output the links AS IS.  The fact that
>>> Cocoon CLI and Cocoon live produce different results is plain
>>> wrong.
>>
>> uh? yes, it's a workaround against the stupid concept that most file 
>> systems do not have an orthogonal way to indicate the mime-type of a 
>> file. Only good old macos file system has such capabilities.

It should be switchable via a parameter. Also on my TODO.

>>> If I want my HTML file to have a .foo extension, then
>>> I should be able to do that without any problems.
>>
>> Then you burn your generated web site on a CDROM, ship that and nobody 
>> will ever be able to open that file (note, also, that IE has a bug on 
>> handling MIME-types for URL that don't terminate with the extension 
>> that windows expects)
> 
> Not necessarily.  HEre is a case that happens:
> 
> The link is foo.pdf.  THe PDF file is either a pre-existing file
> generated from another source, or there was an error generating it
> from Cocoon.  Cocoon mangles the link to foo.pdf.html and outputs
> an error page.  Bad.
> 
> Same thing with graphics files.
> 
> As to you last argument--trust the user to have some intelligences.
> You can output a warning if you want, but if I don't want my URLs
> mangled, I should expect to know what I am doing.  We aren't all
> neanderthals or need spoon-feeding.

We can be or not. I'll just make a switch for it.

> The above errors and problems are what I am really wanting to address.

They are being addressed. Thanks you for reminding.

>>> Lastly, things like different views should be accessible via
>>> parameters.  The CLI sends in the Request Parameters, so it
>>> should be able to acess the results from the different views
>>> via the Request Parameters. 

I don't get it. Could you please explain to me again?

>> currently, we have a way to call the view directly which is even 
>> better. what different would it make?
> 
> One pass, and I get all the information I need.  Which in this case
> is better.


-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Fixing the CLI

Posted by Stefano Mazzocchi <st...@apache.org>.

Berin Loritsch wrote:

> As to you last argument--trust the user to have some intelligences.
> You can output a warning if you want, but if I don't want my URLs
> mangled, I should expect to know what I am doing.  We aren't all
> neanderthals or need spoon-feeding.

Ok.

I'm coming to realize that I'm probably an obstacle to the evolution of 
the CLI interface of Cocoon because I feel some little sense of idea 
ownership that is hurting the process.

So I step out of the way.

You guys do whatever you feel it's right. I fully trust your judgement 
on this and I'm not joking nor I have bad feelings.

Keep it up.

This should make Forrest people happy too.

-- 
Stefano Mazzocchi                               <st...@apache.org>
    Pluralitas non est ponenda sine necessitate [William of Ockham]
--------------------------------------------------------------------

Re: RT: Further integration with Avalon?

Posted by Peter Royal <pr...@apache.org>.

On Monday, February 24, 2003, at 12:26  PM, Bill Barnhill wrote:
> FYI: my setup may be a little different: Sevak-Jetty
> running out of Phoenix container, with Sevak hacked a
> little so that it loads all webapps in /opt/webapps
> (or whatever value of web-apps-dir is in config.xml
> file). I then set my build.webapp (I think its the
> right property?) dir to /opt/webapps/cocoon .
>
> Is anyone else travelling this road as well? If so I'd
> like to hear about what you've found.

Yup :) I have my Phoenix application mounting a Sevak block which loads 
two web applications based on Cocoon, passing in a parent 
ComponentManager so Cocoon can access other Phoenix blocks. I recently 
committed a patch to CocoonServlet which made getParentComponentManager 
a protected method so it can be overriden. All you have to do is extend 
CocoonServlet and make it Serviceable and use that as the parent for 
Cocoon's container hierarchy.

> Eventually I would like to be able to :
> 1. Create a .sar file that packages one or more Cocoon
> blocks
> 2. Deploy that .sar file by dropping it into
> $PHOENIX/apps
> 3. On the .sar's deployment have it publish a
> CocoonBlockService
> 4. Have A Cocoon.sar (based on Sevak) look for on
> startup, and listen for during the lifecycle,
> instances of CocoonBlockService and integrate the
> represented block within Cocoon.
> 5. Add a 'sar' target to builds the sar file in the
> 'build.sar'  location (set in local.build.properties
> and possibly defaulting to $PHOENIX/apps if $PHOENIX
> is not null).

The kicker here is that there is (currently) no way to have inter-SAR 
communication in phoenix. Each SAR is its own isolated application. 
Sevak's current design is at the block level, rather than at the SAR 
level. What you could do is create a single SAR with a Sevak-hosted 
Cocoon as well as other block's and have those other blocks be 
available to Cocoon (basically the design I outlined above).

With the new "auto-assembly" patches for Phoenix, your step 4 could be 
accomplished by declaring an Array dependency on a CocoonBlockService, 
and then the Sevak-hosting block would automatically pick up all 
CocoonBlockServices w/o having to do a listener.

> I'd love to hear thoughts from people more experienced
> with Avalon and/or Cocoon (just about everybody right
> now) on whether this is
> a) Has been done before

You're on the cutting edge of integration ;)

> c) a 'good thing' and suggestions on how to go about
> it

Outlined above :) Feedback welcome!
-pete

RT: Further integration with Avalon?

Posted by Bill Barnhill <gw...@yahoo.com>.

FYI: my setup may be a little different: Sevak-Jetty
running out of Phoenix container, with Sevak hacked a
little so that it loads all webapps in /opt/webapps
(or whatever value of web-apps-dir is in config.xml
file). I then set my build.webapp (I think its the
right property?) dir to /opt/webapps/cocoon .

Is anyone else travelling this road as well? If so I'd
like to hear about what you've found. 

Eventually I would like to be able to :
1. Create a .sar file that packages one or more Cocoon
blocks
2. Deploy that .sar file by dropping it into
$PHOENIX/apps
3. On the .sar's deployment have it publish a
CocoonBlockService
4. Have A Cocoon.sar (based on Sevak) look for on
startup, and listen for during the lifecycle,
instances of CocoonBlockService and integrate the
represented block within Cocoon.
5. Add a 'sar' target to builds the sar file in the
'build.sar'  location (set in local.build.properties
and possibly defaulting to $PHOENIX/apps if $PHOENIX
is not null).

I'd love to hear thoughts from people more experienced
with Avalon and/or Cocoon (just about everybody right
now) on whether this is 
a) Has been done before
b) a 'bad thing' and why
c) a 'good thing' and suggestions on how to go about
it

Thanks,
Bill Barnhill


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

Re: [RT] Fixing the CLI

Posted by Berin Loritsch <bl...@apache.org>.

Stefano Mazzocchi wrote:
> Berin Loritsch wrote:
> 
>> If we leverage that information, we can suppress the output of
>> pages that are redirects or simply error conditions (resource
>> not found, permission denied, 500 style error, etc.).
> 
> 
> Sorry, I don't understand what you are implying here.

Implying?  I am stating that if a page is not successfully
generated, I do not want a generated error page.  I.e.  Success or
nothing.  If the CLI has a 400+, 500+ style error code, then it
should be able to suppress the output of that page.

THat way tools that are designed to find broken links will actually
find broken links.  ALso Cocoon (and hense Forrest) won't overwrite
existing files or files generated by other tools.  Many projects
do have heterogenious environments--it is a fact of life.

> 
>> Also, it is important to output the links AS IS.  The fact that
>> Cocoon CLI and Cocoon live produce different results is plain
>> wrong.
> 
> 
> uh? yes, it's a workaround against the stupid concept that most file 
> systems do not have an orthogonal way to indicate the mime-type of a 
> file. Only good old macos file system has such capabilities.
> 
>> If I want my HTML file to have a .foo extension, then
>> I should be able to do that without any problems.
> 
> 
> Then you burn your generated web site on a CDROM, ship that and nobody 
> will ever be able to open that file (note, also, that IE has a bug on 
> handling MIME-types for URL that don't terminate with the extension that 
> windows expects)

Not necessarily.  HEre is a case that happens:

The link is foo.pdf.  THe PDF file is either a pre-existing file
generated from another source, or there was an error generating it
from Cocoon.  Cocoon mangles the link to foo.pdf.html and outputs
an error page.  Bad.

Same thing with graphics files.

As to you last argument--trust the user to have some intelligences.
You can output a warning if you want, but if I don't want my URLs
mangled, I should expect to know what I am doing.  We aren't all
neanderthals or need spoon-feeding.

The above errors and problems are what I am really wanting to address.

>> Lastly, things like different views should be accessible via
>> parameters.  The CLI sends in the Request Parameters, so it
>> should be able to acess the results from the different views
>> via the Request Parameters. 
> 
> 
> currently, we have a way to call the view directly which is even better. 
> what different would it make?

One pass, and I get all the information I need.  Which in this case
is better.

Re: [RT] Fixing the CLI

Posted by Stefano Mazzocchi <st...@apache.org>.

Berin Loritsch wrote:
> With HTTP, we have a lovely Response object that not only provides
> a way of serializing the page to the end user, but also allows us
> to set the HTTP return code for the page and its mime-type, etc.
> 
> It is clear that we are not using this to our advantage with the
> Command Line Interface to Cocoon.  The fact that we can't easily
> turn off error page generation for static web sites (that will
> integrate with stuff generated or preexisting by other tools),
> proves that we are not leveraging the HTTP return code to our
> advantage.  Any Response with a return code in the 200s is good,
> and any response with a return code that is 300+ is bad.
> 
> If we leverage that information, we can suppress the output of
> pages that are redirects or simply error conditions (resource
> not found, permission denied, 500 style error, etc.).

Sorry, I don't understand what you are implying here.

> Also, it is important to output the links AS IS.  The fact that
> Cocoon CLI and Cocoon live produce different results is plain
> wrong.

uh? yes, it's a workaround against the stupid concept that most file 
systems do not have an orthogonal way to indicate the mime-type of a 
file. Only good old macos file system has such capabilities.

> If I want my HTML file to have a .foo extension, then
> I should be able to do that without any problems.

Then you burn your generated web site on a CDROM, ship that and nobody 
will ever be able to open that file (note, also, that IE has a bug on 
handling MIME-types for URL that don't terminate with the extension that 
windows expects)

> Currently the
> CLI version converts it to a .foo.html. 

The CLI obtains the MIME-type first, changes the URI, then requests the 
page with the link translated accordingly.

 > I don't think that it
> should--or at the very least, it should be an option to turn off
> that behavior.

I agree that this (much slower) behavior could be turned off by a 
command line setting, if not needed, but saying that this is 'plain 
wrong' is definately an overstatement.

> The CLI should be able to determine--in one pass--what the
> error condition of a page is, the mime-type, and the actual
> page itself.  That would obviate the need for much of the
> multi-pass architecture now.

Grrr, I hate when people talk without even looking at the code! The 
multi-pass architectur is not done like this because it's fancy, but 
because there is no way (or at least, nobody could find it until now), 
to obtain information on MIME-type + having the ability to perform link 
translation before the serialization stage!

> Lastly, things like different views should be accessible via
> parameters.  The CLI sends in the Request Parameters, so it
> should be able to acess the results from the different views
> via the Request Parameters. 

currently, we have a way to call the view directly which is even better. 
what different would it make?

> That way the CLI version can
> do all it needs to do in one pass.  THat seriously beats the
> multipass approach it has today.
> 
> What do you all think?

I really don't see how you can do link translation in one pass.

-- 
Stefano Mazzocchi                               <st...@apache.org>
    Pluralitas non est ponenda sine necessitate [William of Ockham]
--------------------------------------------------------------------