You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by fabio rohrich <ro...@yahoo.it> on 2002/09/26 14:38:20 UTC

mod_blanks

I'm going to develop this topic for thesis.
Has anybody of you any suggest for it? Something to
addin the development (like compression of the string
) or some feature to implement!

And, the last thing, what do you think about it?

Thanks a lot,
Fabio

- mod_blanks: a module for the Apache web server which
would on-the-fly 
remove unnecessary blank space, comments and other
non-interesting 
things from the served page.  Skills needed: the C
langugae, a bit of 
text parsing techniques, HTML, learn Apache API. 
Complexity: low to 
moderate (after learning the API).  Usefulness:
moderate to low (but 
maybe better than that, it's a kind of nice toy topic
that could be 
shown to save a lot of bandwith on the Internet :-).



______________________________________________________________________
Mio Yahoo!: personalizza Yahoo! come piace a te 
http://it.yahoo.com/mail_it/foot/?http://it.my.yahoo.com/

Re: mod_blanks

Posted by Bojan Smojver <bo...@rexursive.com>.
Cool!

Someone actually created a tidy library, so I'm guessing it should be
possible to make direct calls into that functionality from the module.
That would make it self contained and a bit better performing.

BTW, I'm just saying this as something that might be interesting to the
original poster. I'm not planning to do anything like that. I think
checking validity of pages and removing unnecessary bits before they are
placed on the server is a better idea.

Bojan

On Fri, 2002-09-27 at 07:39, Jeff Trawick wrote:
> Bojan Smojver <bo...@rexursive.com> writes:
> 
> > This comment led me to another idea - how about plugging tidy
> > (http://tidy.sourceforge.net/) in there instead, which will not only
> > strip blanks if you tell it, but also clean the (X)HTML as well. Just a
> > thought...
> 
> This works for me (but I'm not a tidy user so I don't know how to make
> it do really fancy tricks).
> 
> ExtFilterDefine tidy-filter cmd="/dl/tidy"
> 
> <Location /manual/mod>
> SetOutputFilter tidy-filter
> ExtFilterOptions LogStderr
> </Location>
> 
> (gotta have mod_ext_filter loaded)
> 
> Since this runs tidy as an external program, this isn't a
> high-performance mechanism, but it could be useful nonetheless.
> 
> -- 
> Jeff Trawick | trawick@attglobal.net
> Born in Roswell... married an alien...
> 



Re: mod_blanks

Posted by Jeff Trawick <tr...@attglobal.net>.
Bojan Smojver <bo...@rexursive.com> writes:

> This comment led me to another idea - how about plugging tidy
> (http://tidy.sourceforge.net/) in there instead, which will not only
> strip blanks if you tell it, but also clean the (X)HTML as well. Just a
> thought...

This works for me (but I'm not a tidy user so I don't know how to make
it do really fancy tricks).

ExtFilterDefine tidy-filter cmd="/dl/tidy"

<Location /manual/mod>
SetOutputFilter tidy-filter
ExtFilterOptions LogStderr
</Location>

(gotta have mod_ext_filter loaded)

Since this runs tidy as an external program, this isn't a
high-performance mechanism, but it could be useful nonetheless.

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

Re: mod_blanks

Posted by Bojan Smojver <bo...@rexursive.com>.
On Fri, 2002-09-27 at 04:37, Ian Holsman wrote:
> fabio rohrich wrote:
> > I'm going to develop this topic for thesis.
> > Has anybody of you any suggest for it? Something to
> > addin the development (like compression of the string
> > ) or some feature to implement!
> > 
> > And, the last thing, what do you think about it?
> > 
> > Thanks a lot,
> > Fabio
> > 
> > - mod_blanks: a module for the Apache web server which
> > would on-the-fly 
> > remove unnecessary blank space, comments and other
> > non-interesting 
> > things from the served page.  Skills needed: the C
> > langugae, a bit of 
> > text parsing techniques, HTML, learn Apache API. 
> > Complexity: low to 
> > moderate (after learning the API).  
> I would disagree on this
> We have an internal module which does
> this as we have found that html is general is not easy to strip
> as you would think.
> If you do do this, please make sure you test your module on a lot of 
> different HTML out there, as well as multiple browsers..

This comment led me to another idea - how about plugging tidy
(http://tidy.sourceforge.net/) in there instead, which will not only
strip blanks if you tell it, but also clean the (X)HTML as well. Just a
thought...

Bojan


Re: mod_blanks

Posted by Graham Leggett <mi...@sharp.fm>.
Ian Holsman wrote:

> I would disagree on this
> We have an internal module which does
> this as we have found that html is general is not easy to strip
> as you would think.
> If you do do this, please make sure you test your module on a lot of 
> different HTML out there, as well as multiple browsers..

We've found it depends on the application. Most of the developers I'm 
working with now are indentation freaks who prepend each line with an 
average of 40 spaces. Stripping the trailing and leading space on each 
line is generally the safest way to do it, as the carriage return is 
whitespace anyway, works pretty well.

> we showed a 6% potential saving on this.

Bandwidth costs money - a 6% saving for virtually no effort can be 
significant on high traffic sites.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."


Re: mod_blanks

Posted by Ian Holsman <ia...@apache.org>.
fabio rohrich wrote:
> I'm going to develop this topic for thesis.
> Has anybody of you any suggest for it? Something to
> addin the development (like compression of the string
> ) or some feature to implement!
> 
> And, the last thing, what do you think about it?
> 
> Thanks a lot,
> Fabio
> 
> - mod_blanks: a module for the Apache web server which
> would on-the-fly 
> remove unnecessary blank space, comments and other
> non-interesting 
> things from the served page.  Skills needed: the C
> langugae, a bit of 
> text parsing techniques, HTML, learn Apache API. 
> Complexity: low to 
> moderate (after learning the API).  
I would disagree on this
We have an internal module which does
this as we have found that html is general is not easy to strip
as you would think.
If you do do this, please make sure you test your module on a lot of 
different HTML out there, as well as multiple browsers..
 > Usefulness:
> moderate to low (but 
> maybe better than that, it's a kind of nice toy topic
> that could be 
> shown to save a lot of bandwith on the Internet :-).
> 
we showed a 6% potential saving on this.
other savings of bigger size is to move to CSS stylesheets
we haven't currently got the module turned on, but it is on the
backburner (its implemented as a non-streaming filter in 2.0)
> 
> 
Ian
> ______________________________________________________________________
> Mio Yahoo!: personalizza Yahoo! come piace a te 
> http://it.yahoo.com/mail_it/foot/?http://it.my.yahoo.com/
> 



Re: mod_blanks

Posted by Graham Leggett <mi...@sharp.fm>.
fabio rohrich wrote:

> - mod_blanks: a module for the Apache web server which
> would on-the-fly 
> remove unnecessary blank space, comments and other
> non-interesting 
> things from the served page.

Very cool idea. In our tomcat based apps, we have a jsp tag that goes 
through the output line by line, and strips leading and trailing 
whitespace - saves a whole lot of bandwidth. An apache v2.0 module that 
does this instead would be a very cool thing indeed! :)

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."



Re: mod_blanks

Posted by Greg Stein <gs...@lyra.org>.
Yup, I'd agree with Peter on this one, from a practical standpoint. Reducing
spaces might be an interesting thesis topic, but the result would not have
much practical use.

I'll also point out that Apache 2.0 includes mod_deflate which is a solution
similar to mod_gzip (still not quite sure on how they compare, technically).

Cheers,
-g

On Thu, Sep 26, 2002 at 01:54:11PM -0600, Peter J. Cranstone wrote:
> Fabio,
> 
> Mod_gzip for Apache is a better solution. Prior to it's release both
> Kevin and I looked at what we call "poor man's compression". I.e. just
> removing the blank spaces, lines and other garbage in a served page.
> 
> Here was what we learned.
> 
> No one was interested. It didn't save much on the overall page, and
> people really don't like their HTML etc being messed with.
> 
> Also it's easier if you are going to spend the CPU cycles to simply use
> gzip compression to squeeze the page by upwards of 80%+ and save all the
> formatting to the Author's HTML
> 
> Mod_gzip already saves a ton of bandwidth and with a current browser
> there is no need to install a client side decoder.
> 
> Regards,
> 
> 
> Peter J. Cranstone
> 
> 
> -----Original Message-----
> From: fabio rohrich [mailto:rostich77@yahoo.it] 
> Sent: Thursday, September 26, 2002 6:38 AM
> To: dev@httpd.apache.org
> Subject: mod_blanks
> 
> I'm going to develop this topic for thesis.
> Has anybody of you any suggest for it? Something to
> addin the development (like compression of the string
> ) or some feature to implement!
> 
> And, the last thing, what do you think about it?
> 
> Thanks a lot,
> Fabio
> 
> - mod_blanks: a module for the Apache web server which
> would on-the-fly 
> remove unnecessary blank space, comments and other
> non-interesting 
> things from the served page.  Skills needed: the C
> langugae, a bit of 
> text parsing techniques, HTML, learn Apache API. 
> Complexity: low to 
> moderate (after learning the API).  Usefulness:
> moderate to low (but 
> maybe better than that, it's a kind of nice toy topic
> that could be 
> shown to save a lot of bandwith on the Internet :-).
> 
> 
> 
> ______________________________________________________________________
> Mio Yahoo!: personalizza Yahoo! come piace a te 
> http://it.yahoo.com/mail_it/foot/?http://it.my.yahoo.com/

-- 
Greg Stein, http://www.lyra.org/

Re: mod_blanks

Posted by Dirk-Willem van Gulik <di...@webweaving.org>.

On Thu, 26 Sep 2002, [iso-8859-1] fabio rohrich wrote:

> I'm going to develop this topic for thesis.
> Has anybody of you any suggest for it?

Apache 2.0 filtering is cool !

> Something to addin the development

Of course the 'right' place to do this is when the content is generated -
either by the editor or on the fly by an xslt or what not.

Also it is quite common that when a page is split up for specific browsers
it is reduced in size measurable.

> - mod_blanks: a module for the Apache web server which would on-the-fly
> remove unnecessary blank space, comments and other non-interesting
> things from the served page.  Skills needed: the C langugae, a bit of
> text parsing techniques, HTML, learn Apache API.  Complexity: low to
> moderate (after learning the API).

> Usefulness: moderate to low (but maybe better than that, it's a kind of
> nice toy topic that could be shown to save a lot of bandwith on the
> Internet :-).

Well.. what is going to save even more is looking at Expire/Cache control
carefully, and keepalive :-)

Dw


Re: mod_blanks

Posted by Daniel Lorch <ml...@lorch.cc>.
hi,

> > Little suggestion: "Compression" statistics (just
> > like the old mod_gzip does)
> 
> Better explanation, please. I don't understand. Can u
> suggest me sone links or documentation fot it?
> Thanks

He suggested that you create stastistics on how much bandwidth was saved by
using mod_blank. 

-daniel

Re: mod_blanks

Posted by Christian Kruse <ch...@cynapsis.de>.
Hi,

fabio rohrich <ro...@yahoo.it> wrote:
> > Little suggestion: "Compression" statistics (just
> > like the old mod_gzip does)
> 
> Better explanation, please. I don't understand. Can u
> suggest me sone links or documentation fot it?

  http://www.schroepl.net/projekte/mod_gzip/

mod_gzip sets three notes for statistics: mod_gzip_input_size,
mod_gzip_output_size and mod_gzip_compression_ratio.

Greetings,
 CK


Re: mod_blanks

Posted by fabio rohrich <ro...@yahoo.it>.
 --- "johannes m. richter" <jo...@gmx.net>
ha scritto: > Since this seems to be rather
understandable - will
> you make the source for 
> this available? (To try to learn..)   


I'll develop it in the next three months. For sure the
source it'll be available!!!!!!!


> Little suggestion: "Compression" statistics (just
> like the old mod_gzip does)


Better explanation, please. I don't understand. Can u
suggest me sone links or documentation fot it?
Thanks


> Good luck :)
> johannes
> -- 
> Theorie ist, wenn man alles weiß und nichts klappt.
> Praxis  ist, wenn
> alles funktioniert und keiner weiß warum. Bei
> Windows 9* sind Theorie
> und Praxis vereint, nichts funktioniert und keiner
> weiß warum.
> - http://jgcl.at/ko/ - new photos from summer camp
> 2002 in Moosen/Tirol
>  

______________________________________________________________________
Mio Yahoo!: personalizza Yahoo! come piace a te 
http://it.yahoo.com/mail_it/foot/?http://it.my.yahoo.com/

Re: mod_blanks

Posted by "johannes m. richter" <jo...@gmx.net>.
Since this seems to be rather understandable - will you make the source for 
this available? (To try to learn..)
Little suggestion: "Compression" statistics (just like the old mod_gzip does)
Good luck :)
johannes
-- 
Theorie ist, wenn man alles weiß und nichts klappt. Praxis  ist, wenn
alles funktioniert und keiner weiß warum. Bei Windows 9* sind Theorie
und Praxis vereint, nichts funktioniert und keiner weiß warum.
- http://jgcl.at/ko/ - new photos from summer camp 2002 in Moosen/Tirol


Re: mod_blanks

Posted by David Burry <db...@tagnet.org>.
I agree that mod_gzip does a lot better job as far as compression goes, and
it doesn't even use more cpu likely.

However, it's still important to remove HTML and JavaScript comments
sometimes for security reasons, but I suspect this could probably be better
done as part of the publishing process, not on the fly as pages are served.
(even gzip compression could be done this way actually, come to think of it)

Dave

----- Original Message -----
From: "Peter J. Cranstone" <cr...@msn.com>
To: <de...@httpd.apache.org>
Sent: Thursday, September 26, 2002 12:54 PM
Subject: RE: mod_blanks


> Fabio,
>
> Mod_gzip for Apache is a better solution. Prior to it's release both
> Kevin and I looked at what we call "poor man's compression". I.e. just
> removing the blank spaces, lines and other garbage in a served page.
>
> Here was what we learned.
>
> No one was interested. It didn't save much on the overall page, and
> people really don't like their HTML etc being messed with.
>
> Also it's easier if you are going to spend the CPU cycles to simply use
> gzip compression to squeeze the page by upwards of 80%+ and save all the
> formatting to the Author's HTML
>
> Mod_gzip already saves a ton of bandwidth and with a current browser
> there is no need to install a client side decoder.
>
> Regards,
>
>
> Peter J. Cranstone
>
>
> -----Original Message-----
> From: fabio rohrich [mailto:rostich77@yahoo.it]
> Sent: Thursday, September 26, 2002 6:38 AM
> To: dev@httpd.apache.org
> Subject: mod_blanks
>
> I'm going to develop this topic for thesis.
> Has anybody of you any suggest for it? Something to
> addin the development (like compression of the string
> ) or some feature to implement!
>
> And, the last thing, what do you think about it?
>
> Thanks a lot,
> Fabio
>
> - mod_blanks: a module for the Apache web server which
> would on-the-fly
> remove unnecessary blank space, comments and other
> non-interesting
> things from the served page.  Skills needed: the C
> langugae, a bit of
> text parsing techniques, HTML, learn Apache API.
> Complexity: low to
> moderate (after learning the API).  Usefulness:
> moderate to low (but
> maybe better than that, it's a kind of nice toy topic
> that could be
> shown to save a lot of bandwith on the Internet :-).
>
>
>
> ______________________________________________________________________
> Mio Yahoo!: personalizza Yahoo! come piace a te
> http://it.yahoo.com/mail_it/foot/?http://it.my.yahoo.com/
>


RE: mod_blanks

Posted by "Peter J. Cranstone" <cr...@msn.com>.
Fabio,

Mod_gzip for Apache is a better solution. Prior to it's release both
Kevin and I looked at what we call "poor man's compression". I.e. just
removing the blank spaces, lines and other garbage in a served page.

Here was what we learned.

No one was interested. It didn't save much on the overall page, and
people really don't like their HTML etc being messed with.

Also it's easier if you are going to spend the CPU cycles to simply use
gzip compression to squeeze the page by upwards of 80%+ and save all the
formatting to the Author's HTML

Mod_gzip already saves a ton of bandwidth and with a current browser
there is no need to install a client side decoder.

Regards,


Peter J. Cranstone


-----Original Message-----
From: fabio rohrich [mailto:rostich77@yahoo.it] 
Sent: Thursday, September 26, 2002 6:38 AM
To: dev@httpd.apache.org
Subject: mod_blanks

I'm going to develop this topic for thesis.
Has anybody of you any suggest for it? Something to
addin the development (like compression of the string
) or some feature to implement!

And, the last thing, what do you think about it?

Thanks a lot,
Fabio

- mod_blanks: a module for the Apache web server which
would on-the-fly 
remove unnecessary blank space, comments and other
non-interesting 
things from the served page.  Skills needed: the C
langugae, a bit of 
text parsing techniques, HTML, learn Apache API. 
Complexity: low to 
moderate (after learning the API).  Usefulness:
moderate to low (but 
maybe better than that, it's a kind of nice toy topic
that could be 
shown to save a lot of bandwith on the Internet :-).



______________________________________________________________________
Mio Yahoo!: personalizza Yahoo! come piace a te 
http://it.yahoo.com/mail_it/foot/?http://it.my.yahoo.com/