You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Robert Schenck <ro...@gmail.com> on 2009/12/02 12:02:27 UTC

[users@httpd] Reverse proxying is problematic

*I know this is a long read...but I really need help, and felt the best way
for anyone to help me remotely is to explain the issues in their entirety. *

Hello,

I'm trying to set a reverse proxy, but first, some context:

My office is subscribed to few academic journals. These journals verify the
subscription via IP, such that anyone connected to the internet through our
connection can access the journals. However, some individuals would like to
access the journals away from the office as well. We have a VPN, but it only
connects them to our intranet. Therefore, we want to create a reverse proxy
such that the users with connect to the VPN, then to our intranet, and then
to the proxy server, and then, ultimately, to the journal at hand. This
works because the proxy server will be within our intranet, which they have
access to through the VPN. So it will look like so:

Client --> VPN --> Our Intranet --> Reverse Proxy --> Journal

Note that I'm an intern and have had *very *little experience with Apache
and networking in general (and Linux!)...so please explain things fully.

I have attempted to follow this guide:
http://www.apachetutor.org/admin/reverseproxies

I'm running SUSE Linux Enterprise 11, and have installed apache through
zypper. I installed the mod_proxy_html and mod_xml2enc modules via
compiling. They are fully functional. (mod_proxy_html to rewrite links).

In the examples below I'm attempting to reverse proxy both http://aip.organd
http://apl.aip.org. So basically want I want to do is have anything that is
http://aip.org/somepage.html to be http://proxysrv1/aip/somepage.html and
anything that is http://apl.aip.org to be http://proxysrv1/apl/somepage.html.
All of the content on the page must go through the proxy (note: I know that
many of the links lead to other sub-domains, I will include those as
well...but later, I figured I should get these two working first). *Please
do not suggest a different server application like Squid, I'm required to
use Apache. *

So far, I have the following modifications to the http.conf file:

----------------------------------------------------------------------------------------------------------------------------
Include /etc/apache2/vhosts.d/*.conf

ProxyHTMLEnable On
ProxyHTMLExtended On

ProxyHTMLLinks  a               href
ProxyHTMLLinks  area            href
ProxyHTMLLinks  link            href
ProxyHTMLLinks  img             src longdesc usemap
ProxyHTMLLinks  object          classid codebase data usemap
ProxyHTMLLinks  q               cite
ProxyHTMLLinks  blockquote      cite
ProxyHTMLLinks  ins             cite
ProxyHTMLLinks  del             cite
ProxyHTMLLinks  form            action
ProxyHTMLLinks  input           src usemap
ProxyHTMLLinks  head            profile
ProxyHTMLLinks  base            href
ProxyHTMLLinks  script          src for
ProxyHTMLLinks  iframe          src

ProxyHTMLEvents onclick ondblclick onmousedown onmouseup \
                onmouseover onmousemove onmouseout onkeypress \
                onkeydown onkeyup onfocus onblur onload \
                onunload onsubmit onreset onselect onchange

ProxyRequests Off
ProxyPass /aip/ http://aip.org/
ProxyPassReverse /aip/ http://aip.org/
ProxyHTMLURLMap http://www.aip.org http://proxysrv1/aip
ProxyPass /apl/ http://apl.aip.org/
ProxyPassReverse /apl/ http://apl.aip.org/
ProxyHTMLURLMap http://apl.aip.org http://proxysrv1/apl

<Location /aip/>
        ProxyHTMLEnable On
        ProxyHTMLExtended On
        ProxyPassReverse /
        ProxyHTMLURLMap / /
        RequestHeader unset Accept-Encoding
</Location>

<Location /apl/>
        ProxyHTMLEnable On
        ProxyHTMLExtended On
        ProxyPassreverse /
        ProxyHTMLURLMap / /
        RequestHeader unset Accept-Encoding
</Location>

ProxyHTMLLogVerbose On
LogLevel Info


----------------------------------------------------------------------------------------------------------------------------

And the following modifications to the vhost.conf file:

----------------------------------------------------------------------------------------------------------------------------

NameVirtualHost *:80

<VirtualHost *:80>
    ServerName proxysrv1
    DocumentRoot /srv/www/htdocs
    HostnameLookups Off
    UseCanonicalName On

    ServerSignature On
    <Directory "/srv/www/htdocs">
        Options Indexes All
        AllowOverride None
        Order allow,deny
        Allow from all
    </Directory>
</VirtualHost>

<VirtualHost *:80>
        Documentroot /srv/www/htdocs/aip
        Servername proxysrv1/aip
        HostnameLookups Off
        UseCanonicalName On
        ServerSignature On
        <Directory "/srv/www/htdocs/aip">

                Options Indexes All
                AllowOverride None
                Order allow,deny
                Allow from all
        </Directory>
</VirtualHost>


<VirtualHost *:80>

        Documentroot /srv/www/htdocs/apl
        Servername proxysrv1/apl
        HostnameLookups Off
        UseCanonicalName On
        ServerSignature On
        <Directory "/srv/www/htdocs/apl">

                Options Indexes All
                AllowOverride None
                Order allow,deny
                Allow from all
        </Directory>
</VirtualHost>

-------------------------------------------------------------------------------------------

*The mass of issues:*

1) http://proxysrv1/aip/ looks like this: http://imgur.com/n6m0L.png

The page source: http://paste.ubuntu.com/333007/

2) http://proxysrv1/apl/ looks like this: http://proxysrv1/apl/

The page source: http://paste.ubuntu.com/333009/

3) I created a virtual host & proxy at http://proxysrv1/apl/, yet
links like http://apl.aip.org/about/about_the_journal

redirect to http://proxysrv/about/about_the_journal rather than
http://proxysrv/apl/about/about_the_journal

4) All the pages look like crap. I had aip.org working previously, but
only if I set its directory to / (so by going to http://proxysrv1/ you
went to aip.org/),

and had no virtual hosts.

5) That's actually all I can think of. But the pages are pretty darn broken.

*Please explain any fixes in a step-by-step process. Again, I'm new to this.*

Re: [users@httpd] Reverse proxying is problematic

Posted by Devraj Mukherjee <de...@gmail.com>.
Also look at mod_substitute and mod_headers

On Wed, Dec 2, 2009 at 10:45 PM, Robert Schenck <ro...@gmail.com> wrote:
> Peter,
>
> I have to use Apache, I don't have a choice (says my employer).
>
> On Wed, Dec 2, 2009 at 12:13 PM, Peter Schober <pe...@univie.ac.at>
> wrote:
>>
>> * Robert Schenck <ro...@gmail.com> [2009-12-02 12:03]:
>> > My office is subscribed to few academic journals. These journals verify
>> > the
>> > subscription via IP, such that anyone connected to the internet through
>> > our
>> > connection can access the journals.
>>
>> You might also want to look at EZproxy
>> http://en.wikipedia.org/wiki/EZproxy
>> (besides getting the publisher to dump IP-addresses for authorization).
>> -peter
>>
>> ---------------------------------------------------------------------
>> The official User-To-User support forum of the Apache HTTP Server Project.
>> See <URL:http://httpd.apache.org/userslist.html> for more info.
>> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>>   "   from the digest: users-digest-unsubscribe@httpd.apache.org
>> For additional commands, e-mail: users-help@httpd.apache.org
>>
>
>



-- 
"The secret impresses no-one, the trick you use it for is everything"
- Alfred Borden (The Prestiege)

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Reverse proxying is problematic

Posted by Robert Schenck <ro...@gmail.com>.
Peter: Well, I'm an intern so I'm supposed to be "learning"..or something
like.

On Wed, Dec 2, 2009 at 1:00 PM, Peter Schober <pe...@univie.ac.at>wrote:

> * Robert Schenck <ro...@gmail.com> [2009-12-02 12:46]:
> > I have to use Apache, I don't have a choice (says my employer).
>
> This was just meant as a heads up: depending on the publisher you
> might have to rewrite most everything (URLs, HTML content, Cookies,
> JavaScript, etc.), and every publisher does things differently.
> If your employer really thinks reinventing this poorly is time and
> money well spent (vs. using something that is known to just work),
> then so be it.
> (Not that I actually promote the use of aforementioned product, since
> that will only prolong the misuse of IP-addresses for authorization
> purposes. SAML is the standard way of accessing publisher resources
> online, of course.)
> -peter
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>   "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>
>

Re: [users@httpd] Reverse proxying is problematic

Posted by Peter Schober <pe...@univie.ac.at>.
* Robert Schenck <ro...@gmail.com> [2009-12-02 12:46]:
> I have to use Apache, I don't have a choice (says my employer).

This was just meant as a heads up: depending on the publisher you
might have to rewrite most everything (URLs, HTML content, Cookies,
JavaScript, etc.), and every publisher does things differently.
If your employer really thinks reinventing this poorly is time and
money well spent (vs. using something that is known to just work),
then so be it.
(Not that I actually promote the use of aforementioned product, since
that will only prolong the misuse of IP-addresses for authorization
purposes. SAML is the standard way of accessing publisher resources
online, of course.)
-peter

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Reverse proxying is problematic

Posted by Robert Schenck <ro...@gmail.com>.
Peter,

I have to use Apache, I don't have a choice (says my employer).

On Wed, Dec 2, 2009 at 12:13 PM, Peter Schober
<pe...@univie.ac.at>wrote:

> * Robert Schenck <ro...@gmail.com> [2009-12-02 12:03]:
> > My office is subscribed to few academic journals. These journals verify
> the
> > subscription via IP, such that anyone connected to the internet through
> our
> > connection can access the journals.
>
> You might also want to look at EZproxy
> http://en.wikipedia.org/wiki/EZproxy
> (besides getting the publisher to dump IP-addresses for authorization).
> -peter
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>   "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>
>

Re: [users@httpd] Reverse proxying is problematic

Posted by Peter Schober <pe...@univie.ac.at>.
* Robert Schenck <ro...@gmail.com> [2009-12-02 12:03]:
> My office is subscribed to few academic journals. These journals verify the
> subscription via IP, such that anyone connected to the internet through our
> connection can access the journals.

You might also want to look at EZproxy
http://en.wikipedia.org/wiki/EZproxy
(besides getting the publisher to dump IP-addresses for authorization).
-peter

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Reverse proxying is problematic

Posted by André Warnier <aw...@ice-sa.com>.
Robert Schenck wrote:
> *I know this is a long read...but I really need help, and felt the best way
> for anyone to help me remotely is to explain the issues in their entirety. *
> 
> Hello,
> 
> I'm trying to set a reverse proxy, but first, some context:
> 
> My office is subscribed to few academic journals. These journals verify the
> subscription via IP, such that anyone connected to the internet through our
> connection can access the journals. However, some individuals would like to
> access the journals away from the office as well. 

Hi.
I know that there is already a long list of answers to this, at the 
technical level. And you were right to provide some background like you 
did above.

Before solving the problem at the technical level, I would /strongly/ 
recommend getting in touch with the publishers of these journals, and 
talk to them about your idea (or your boss' idea) first.
This is just in case one of them would object, and consider that by 
doing this you are violating the commercial agreement your office has 
with them, and your office thus becomes a target for a copyright 
infringement lawsuit.
Publishers, who live from these copyright fees, tend to not joke about 
such matters.

Background :

A publisher made a contract with your office, whereby a certain number 
of people have access to a certain number of published journal articles, 
against a flat fee.  That flat fee replaces, under certain 
circumstances, a per-article, per-person fee which would normally have 
to be paid.  The number of people to which this arrangement applies, and 
the corresponding fee, is estimated by the supplier on the base of some 
reasonable number of users.  This number of users is limited, 
approximately, by the number of people which the supplier roughly 
calculated would be accessing these articles from within your corporate 
network, and would thus look like originating from the IP address of 
your firewall/proxy.

Your scheme would basically break the assumptions of the supplier, by 
potentially providing access to an uncontrolled number of people from 
outside of the network for which these assumptions were calculated.
The supplier may get very unhappy about this.

On the other hand, a case such as you describe is not that uncommon, and 
I am sure that the suppliers of these articles have other solutions 
available, which do not contravene the commercial agreements.



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Reverse proxying is problematic

Posted by Eric Covener <co...@gmail.com>.
On Wed, Dec 2, 2009 at 7:31 AM, Robert Schenck <ro...@gmail.com> wrote:

>> >
>> > http://paste.ubuntu.com/333080/
>> >

The operative message is:
[Wed Dec 02 13:21:43 2009] [error] [client 9.4.69.54] Directory index
forbidden by Options directive: /srv/www/htdocs/apl/

Which would have been nice to include in-line. If you're serving a
mod_autoindex directory index on purpose, allow it with Options
+Indexes in the <Directory> block that covers whatever this URL maps
to.

if you meant for this to be proxied, it isn't,

if you meant for this to show some default file, see DirectoryIndex.


-- 
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Reverse proxying is problematic

Posted by Robert Schenck <ro...@gmail.com>.
Here's a snippet: http://paste.ubuntu.com/333084/

On Wed, Dec 2, 2009 at 1:29 PM, Tom Evans <te...@googlemail.com> wrote:

> On Wed, Dec 2, 2009 at 12:23 PM, Robert Schenck <ro...@gmail.com>
> wrote:
> > I'm get "Access Forbidden" when trying to access proxysrv1/aip and
> > proxysrv1/apl
> >
> > This is my updated vhost file:
> >
> > http://paste.ubuntu.com/333080/
> >
>
> Your ServerName directives are not valid.
>
> When you get an 'Access Forbidden' message, apache will _always_
> explain why in the error log. What did it say?
>
> Cheers
>
> Tom
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>   "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>
>

Re: [users@httpd] Reverse proxying is problematic

Posted by Tom Evans <te...@googlemail.com>.
On Wed, Dec 2, 2009 at 12:23 PM, Robert Schenck <ro...@gmail.com> wrote:
> I'm get "Access Forbidden" when trying to access proxysrv1/aip and
> proxysrv1/apl
>
> This is my updated vhost file:
>
> http://paste.ubuntu.com/333080/
>

Your ServerName directives are not valid.

When you get an 'Access Forbidden' message, apache will _always_
explain why in the error log. What did it say?

Cheers

Tom

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Reverse proxying is problematic

Posted by Robert Schenck <ro...@gmail.com>.
I'm get "Access Forbidden" when trying to access proxysrv1/aip and
proxysrv1/apl

This is my updated vhost file:

http://paste.ubuntu.com/333080/

On Wed, Dec 2, 2009 at 1:09 PM, Tom Evans <te...@googlemail.com> wrote:

> On Wed, Dec 2, 2009 at 11:02 AM, Robert Schenck <ro...@gmail.com>
> wrote:
> > I know this is a long read...but I really need help, and felt the best
> way
> > for anyone to help me remotely is to explain the issues in their
> entirety.
>
> tl;dr
>
> >
> > Please explain any fixes in a step-by-step process. Again, I'm new to
> this.
> >
>
> Part of the problem is that you are rewriting HTML. Messy isn't it?
> Now do it again, but don't bother with rewriting the HTML.
>
> Remove all the Proxy directives from the main apache server config, it
> makes no sense when you then define vhosts later to use.
>
> Define a vhost for each site you wish to proxy. Set it up like so:
>
> <VirtualHost *:80>
>  ServerName proxyaip
>  ProxyRequests Off
>  DocumentRoot /var/empty
>
>  <Directory /var/empty>
>     Order allow,deny
>    Allow from all
>  </Directory>
>
>   <Location />
>    ProxyPass http://aip.com/
>    ProxyPassReverse http://aip.com/
>  </Location>
>
> </VirtualHost>
>
> Accessing http://proxyaip/ should now be just like accessing
> http://aip.com/ . If you want to proxy more sites, define more vhosts.
>
> Cheers
>
> Tom
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>   "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>
>

Re: [users@httpd] Reverse proxying is problematic

Posted by Tom Evans <te...@googlemail.com>.
On Wed, Dec 2, 2009 at 11:02 AM, Robert Schenck <ro...@gmail.com> wrote:
> I know this is a long read...but I really need help, and felt the best way
> for anyone to help me remotely is to explain the issues in their entirety.

tl;dr

>
> Please explain any fixes in a step-by-step process. Again, I'm new to this.
>

Part of the problem is that you are rewriting HTML. Messy isn't it?
Now do it again, but don't bother with rewriting the HTML.

Remove all the Proxy directives from the main apache server config, it
makes no sense when you then define vhosts later to use.

Define a vhost for each site you wish to proxy. Set it up like so:

<VirtualHost *:80>
  ServerName proxyaip
  ProxyRequests Off
  DocumentRoot /var/empty

  <Directory /var/empty>
    Order allow,deny
    Allow from all
  </Directory>

  <Location />
    ProxyPass http://aip.com/
    ProxyPassReverse http://aip.com/
  </Location>

</VirtualHost>

Accessing http://proxyaip/ should now be just like accessing
http://aip.com/ . If you want to proxy more sites, define more vhosts.

Cheers

Tom

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Reverse proxying is problematic

Posted by Eric Covener <co...@gmail.com>.
On 12/2/09, Robert Schenck <ro...@gmail.com> wrote:
> I disable the mod_proxy_html module and the page still looked the same,
> albeit without the little boxes signifying non-existent images.
>
> However, I also looked at the error log for the virtual host, and I found
> the following:
>
> http://paste.ubuntu.com/333064/

I didn't expect removing it to help, since you don't account for the
/css/ at all.  I just couldn't tell if that mod_proxy_html magic was
translating the /css/ into something you handled.

-- 
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Reverse proxying is problematic

Posted by Robert Schenck <ro...@gmail.com>.
I disable the mod_proxy_html module and the page still looked the same,
albeit without the little boxes signifying non-existent images.

However, I also looked at the error log for the virtual host, and I found
the following:

http://paste.ubuntu.com/333064/


On Wed, Dec 2, 2009 at 12:55 PM, Eric Covener <co...@gmail.com> wrote:

> Is mod_proxy_html supposed to be changing those /css/ links into
> something else that would actually be handled by your ProxyPass?  You
> can tell if it is by saving the source when you're actually going
> through the proxy.
>
> Also, 404's in your access log would be a big hint about what you're
> missing, but due to the rendering issue it's likely the css.
>
> --
> Eric Covener
> covener@gmail.com
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>   "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>
>

Re: [users@httpd] Reverse proxying is problematic

Posted by Eric Covener <co...@gmail.com>.
Is mod_proxy_html supposed to be changing those /css/ links into
something else that would actually be handled by your ProxyPass?  You
can tell if it is by saving the source when you're actually going
through the proxy.

Also, 404's in your access log would be a big hint about what you're
missing, but due to the rendering issue it's likely the css.

-- 
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org