You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Alvar Freude <af...@assoziations-blaster.de> on 2000/07/19 03:20:25 UTC

Filtering HTML files with mod_proxy and mod_perl

Hi,

I want to create a service which filters HTML files like this: 
http://www.a-blast.org/web-blast.html ==> 
http://www.a-blast.org/web-blast.plx?url=http://www.nsa.gov/programs/employ/


But it should go through a proxy, you don't need to access another site
and all filtering works transparent in background, within the proxy.


So my idea was to make this with mod_proxy and mod_perl, but I did not
found any documentation to do this.

The user should enter a proxy in his Browser config, e.g.
superproxy2000.here.org:7777, and after this he can surf through the web
and gets filtered files.

Is this possible with mod_perl and md_proxy?


And if yes: where I can find documentation for this?

Or has somebody code snippets? ;)


Thanx and Ciao

  Alvar


-- 
Alvar C.H. Freude  |  alvar.freude@merz-akademie.de

    Demo: http://www.online-demonstration.org/   | Mach mit!
Blast-DE: http://www.assoziations-blaster.de/    | Blast-Dich-Fit
Blast-EN: http://www.a-blast.org/                | Blast/english

Re: Filtering HTML files with mod_proxy and mod_perl

Posted by Alvar Freude <al...@merz-akademie.de>.
Hi,

> This is what mod_proxy does on its own, no mod_perl needed.

including filtering?

 
> If you wanted to do it in "pure" mod_perl (no mod_proxy), write a
> TransHandler similar to the ones listed in chapter 7 of the Eagle book,
> pp 368 - 381 (pp 372 - 373, for example, is an anonymoizing proxy, and
> pp 374 - 381 is an ad-blocking proxy). This chapter is available on the
> web in its entirety at http://www.modperl.com/book/chapters/ch7.html.
> 
> Pretty simple, all in all.

yes, I think this is ECAXTLY this what I an searching for!

thanks!


Ciao
  Alvar

-- 
Alvar C.H. Freude  |  alvar.freude@merz-akademie.de

    Demo: http://www.online-demonstration.org/  |  Mach mit!
Blast-DE: http://www.assoziations-blaster.de/   |  Blast-Dich-Fit
Blast-EN: http://www.a-blast.org/               |  Blast/english

Re: Filtering HTML files with mod_proxy and mod_perl

Posted by darren chamberlain <da...@boston.com>.
Alvar Freude (af_lists@assoziations-blaster.de) said something to this effect:
> Hi,
> 
> I want to create a service which filters HTML files like this: 
> http://www.a-blast.org/web-blast.html ==> 
> http://www.a-blast.org/web-blast.plx?url=http://www.nsa.gov/programs/employ/
>
> The user should enter a proxy in his Browser config, e.g.
> superproxy2000.here.org:7777, and after this he can surf through the web
> and gets filtered files.
> 
> Is this possible with mod_perl and md_proxy?

This is what mod_proxy does on its own, no mod_perl needed.

If you wanted to do it in "pure" mod_perl (no mod_proxy), write a 
TransHandler similar to the ones listed in chapter 7 of the Eagle book,
pp 368 - 381 (pp 372 - 373, for example, is an anonymoizing proxy, and
pp 374 - 381 is an ad-blocking proxy). This chapter is available on the
web in its entirety at http://www.modperl.com/book/chapters/ch7.html.

Pretty simple, all in all.

(darren)


-- 
Think like a man of action, act like a man of thought.

Re: Filtering HTML files with mod_proxy and mod_perl

Posted by Wim Kerkhoff <wi...@netmaster.com>.
Cool!

I'll probably use that if I do a next version of my filter, and if I want to
manipulate the content based on the content type, the host that is requesting
it, etc.  Right now, I'm just using it as the proxy server on my LAN, so that
only those who now a password can surf.  If they can't, the handler prints out
a login form so that they can login for 30 minutes or whatever.  Eventually,
I'll hook this into a database, so that websites can be inserted, categorized,
and rated.  I'm using IPC::Cache right now to store login data and everything
is fast.  The user doesn't really notice any performance hits, but I'm scared
of when I have 200 people surfing at once, and every request has to be
validated against a database.

On 20-Jul-2000 Alvar Freude wrote:
> Hi,
> 
>> If you find a way to do it with Apache::Proxy, let the list know.
> 
> I am sure it will work with the example given by Darren.
> 
> If i checked it I think I'll create a small module and can spread it.
> 
> 
>> One of the major reasons I went this route over something like the examples
>> in
>> the mod_perl book, was speed.  Downloading big files using the examples book
>> was slow, as apache first gathers the content up into a variable (where you
>> can
>> do your regular expressions or whatever manipulating), then sent it to the
>> browser.  You would need a lot of memory in this situation.
> 
> yes, but if you use a subroutine which handles the incoming chunks, you
> can pass the file emmediatly. See
> http://theoryx5.uwinnipeg.ca/CPAN/data/libwww-perl/lwpcook.html at the
> bottom :)
  

Regards,

Wim Kerkhoff, Software Engineer
NetMaster Networking Solutions
wim@netmaster.com

Re: Filtering HTML files with mod_proxy and mod_perl

Posted by Alvar Freude <al...@merz-akademie.de>.
Hi,

> If you find a way to do it with Apache::Proxy, let the list know.

I am sure it will work with the example given by Darren.

If i checked it I think I'll create a small module and can spread it.


> One of the major reasons I went this route over something like the examples in
> the mod_perl book, was speed.  Downloading big files using the examples book
> was slow, as apache first gathers the content up into a variable (where you can
> do your regular expressions or whatever manipulating), then sent it to the
> browser.  You would need a lot of memory in this situation.

yes, but if you use a subroutine which handles the incoming chunks, you
can pass the file emmediatly. See
http://theoryx5.uwinnipeg.ca/CPAN/data/libwww-perl/lwpcook.html at the
bottom :)


Ciao
  Alvar

-- 
Alvar C.H. Freude  |  alvar.freude@merz-akademie.de

    Demo: http://www.online-demonstration.org/  |  Mach mit!
Blast-DE: http://www.assoziations-blaster.de/   |  Blast-Dich-Fit
Blast-EN: http://www.a-blast.org/               |  Blast/english

Re: Filtering HTML files with mod_proxy and mod_perl

Posted by Wim Kerkhoff <wi...@netmaster.com>.
On 19-Jul-2000 Alvar Freude wrote:
> Hi Wim,
> 
>> I've created something like this.
>> 
>> I've attached the script I used to build mod_proxy and mod_perl, and a short
>> Apache::MyFilter to show how to use this.  Note: I've cut down the handler
>> from
>> my version without really testing it, so it may have a couple syntax errors.
> 
> thanx!
> But ... I think it doesn't work in my case, because I have to change the
> HTML-content itself.
> 
> Or do you get somewhere the plain HTML-Content of the final
> HTTP-Request? If yes this part is missing in the example! ;)

Nope... Apache::Proxy just passes it on, AFAIK.

If you find a way to do it with Apache::Proxy, let the list know.

One of the major reasons I went this route over something like the examples in
the mod_perl book, was speed.  Downloading big files using the examples book
was slow, as apache first gathers the content up into a variable (where you can
do your regular expressions or whatever manipulating), then sent it to the
browser.  You would need a lot of memory in this situation.  With mod_proxy,
apache starts pushing data of to the client as soon as it gets it from the
server.

Regards,

Wim Kerkhoff, Software Engineer
NetMaster Networking Solutions
wim@netmaster.com

Re: Filtering HTML files with mod_proxy and mod_perl

Posted by Alvar Freude <al...@merz-akademie.de>.
Hi Wim,

> I've created something like this.
> 
> I've attached the script I used to build mod_proxy and mod_perl, and a short
> Apache::MyFilter to show how to use this.  Note: I've cut down the handler from
> my version without really testing it, so it may have a couple syntax errors.

thanx!
But ... I think it doesn't work in my case, because I have to change the
HTML-content itself.

Or do you get somewhere the plain HTML-Content of the final
HTTP-Request? If yes this part is missing in the example! ;)


Ciao
  Alvar

-- 
Alvar C.H. Freude  |  alvar.freude@merz-akademie.de

    Demo: http://www.online-demonstration.org/  |  Mach mit!
Blast-DE: http://www.assoziations-blaster.de/   |  Blast-Dich-Fit
Blast-EN: http://www.a-blast.org/               |  Blast/english

RE: Filtering HTML files with mod_proxy and mod_perl

Posted by Wim Kerkhoff <wi...@netmaster.com>.
Hi Alvar,

On 19-Jul-2000 Alvar Freude wrote:
> Hi,
> 
> I want to create a service which filters HTML files like this: 
> http://www.a-blast.org/web-blast.html ==> 
> http://www.a-blast.org/web-blast.plx?url=http://www.nsa.gov/programs/employ/
> 
> 
> But it should go through a proxy, you don't need to access another site
> and all filtering works transparent in background, within the proxy.
> 
> 
> So my idea was to make this with mod_proxy and mod_perl, but I did not
> found any documentation to do this.
> 
> The user should enter a proxy in his Browser config, e.g.
> superproxy2000.here.org:7777, and after this he can surf through the web
> and gets filtered files.
> 
> Is this possible with mod_perl and md_proxy?
> 
> 
> And if yes: where I can find documentation for this?
> 
> Or has somebody code snippets? ;)

I've created something like this.

I've attached the script I used to build mod_proxy and mod_perl, and a short
Apache::MyFilter to show how to use this.  Note: I've cut down the handler from
my version without really testing it, so it may have a couple syntax errors.

I'm using it like this:

<VirtualHost _default_:8080>
        PerlTransHandler Apache::SurfLogin
</VirtualHost>

so that I can still have a normal webserver on port 80.

Hope that helps,

Wim Kerkhoff, Software Engineer
NetMaster Networking Solutions
wim@netmaster.com

RE: Filtering HTML files with mod_proxy and mod_perl

Posted by Wim Kerkhoff <wi...@netmaster.com>.
On 19-Jul-2000 Alvar Freude wrote:
> Is this possible with mod_perl and md_proxy?

Yes (see me other post)

> And if yes: where I can find documentation for this?

Look at http://www.apache.org/docs/mod/mod_proxy.html, as well as perldoc
Apache::Proxy (once you've installed Apache::Proxy).

Regards,

Wim Kerkhoff, Software Engineer
NetMaster Networking Solutions
wim@netmaster.com