You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@httpd.apache.org by bruce <be...@earthlink.net> on 2003/12/31 21:04:52 UTC

[users@httpd] Serving remote files from Apache...

Hi...

We've been investigating/searching for an answer to a problem... We've seen
possible solutions, but nothing that's been exact!..

We're looking at using Apache and we need a way of serving remote files via
Apache. We've seen solutions that indicate (if we understood them) that
Apache can be configured to serve/redirect files from a remote Apache
server. This appears to be accomplished via the ProxyPassReverse, mod_proxy
directives...

However, we'd actually like to have the website/page files reside on the
remote PC/Harddrives and to basically have them read from the remote
machine, and served via the Apache app...

IE:

         Client Browser (http://12.13.14.15) fetch content from the Apache
server

         Apache Server (http://12.13.14.15) ip address
              |  ^
              |  |
              |  |
              V  |
            Remote PC/Server
             The files are read from the remote Pc/Server and returned to
the Apache 		  Server for display on the Client Browser. We're looking at an
approach
              like this because it has to be scalable.

If this is not easily doable, and we need to utilize the ProxyPassReverse
solution, what issues are involved? In particular, how scalable would the
ProxyPassReverse approach be? We might need to server potentially 1000's of
sites with this approach...

Thanks...

-----Original Message-----
From: Jonas Eckerman [mailto:jonas_lists@frukt.org]
Sent: Tuesday, December 30, 2003 4:53 PM
To: users@httpd.apache.org
Subject: RE: [users@httpd] apache modification project..

On Tue, 30 Dec 2003 09:51:59 -0800, bruce wrote:

>  2) Not quite sure how the Mod_Proxy function gets us to being able
>  to "fetch" files from a remote PC, and return them to the user's
>  browser issuing the command for the URL in question...

That's kind of exactly what mod_proxy is meant to do. Check the docs
for it, especially the directives ProxyPass and ProxyPassReverse.

And if you use this, turn *off* ProxyRequests.

>  under the belief that there would have to be some mod of the
>  virtual host functions...

You only need to care about virtual hosts if you're using virtual
hosts. :-)

The proxy directives can be placed in a virtual host container if
that's what you need.

You can aslo use mod_proxy together with mod_cache to make a caching
proxy.

As for getting a whole bunch of machines to get their configs from one
central server, it's easy to see a multitude of ways to do this
without having to create any modules for Apache nor changing the
Apache source. FTP, wget, rsync comes to mind. As well as any other
protocol/method that lets you transfer a file from one machine to
another.

It's possible one could use mod_perl for this as well. I haven't ever
used mod_perl myself so I might well have misunderstood this
completely, but for some reason I'm under the impression that it lets
you use perl code in httpdconf.

Regards
/Jonas
--
Jonas Eckerman, jonas_lists@frukt.org
http://www.fsdb.org/

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

Re: [users@httpd] Serving remote files from Apache...

Posted by Nick Kew <ni...@webthing.com>.

On Wed, 31 Dec 2003, bruce wrote:

> We're looking at using Apache and we need a way of serving remote files via
> Apache.

Define "remote files"?

Or, more specifically, where are they coming from?  How Apache can access
them depends on how *any* application can access them remotely.  For
example, if a remote machine runs a webserver, then Apache can
access them by HTTP (that's proxying).  Or if you have some NFS-family
filesystem export available, you can pretend the files are local.

>	We've seen solutions that indicate (if we understood them) that
> Apache can be configured to serve/redirect files from a remote Apache
> server. This appears to be accomplished via the ProxyPassReverse, mod_proxy
> directives...

Yep.

>             Remote PC/Server
>              The files are read from the remote Pc/Server and returned to
> the Apache 		  Server for display on the Client Browser. We're looking at an
> approach
>               like this because it has to be scalable.

The biggest effect on scalability will be for you to run a (large)
local cache on your Apache.  The mechanism you use to contact a remote
server (http or NFS or ???) is secondary, though I think error handling
(eg a remote host is down) is going to be easier with a proxy.

> If this is not easily doable, and we need to utilize the ProxyPassReverse
> solution, what issues are involved? In particular, how scalable would the
> ProxyPassReverse approach be? We might need to server potentially 1000's of
> sites with this approach...

Not a problem.  Use a cache.  As for what issues are involved,
I'm writing an article on the subject, which should be available
shortly (though it focuses more on the mechanics than on scaling).

-- 
Nick Kew

In urgent need of paying work - see http://www.webthing.com/~nick/cv.html

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

Re: [users@httpd] Serving remote files from Apache...

Posted by Jeff White <jl...@earthlink.net>.


From: "Jonas Eckerman"

And the other side of the coin.  :)

> I would absolutely recommend Linux rather
> than Windows for this. For a couple  main reasons:

> 1:
> 2:
> 3:
> 4:

Perhaps the Windows built-in objects below:

Windows Scripting.
http://www.microsoft.com/technet/scriptcenter

Windows HTTP Services (WinHTTP)
http://msdn.microsoft.com/library/en-us/winhttp/http/portal.asp

WebClient.UploadFile Method in .NET
http://msdn.microsoft.com/library/en-us

Write Auto-Updating Apps with .NET and
the Background Intelligent Transfer Service API
http://msdn.microsoft.com/msdnmag/issues/03/02/bits/TOC.ASP?frame=true

> I think this kind of stuff is simply a lot easier
> to do on Linux (and other Unix-like systems)
> than on Windows.

Now this is all you had to say, plus
"since I do not fully use Windows."  :)

Jeff




---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

RE: [users@httpd] Serving remote files from Apache...

Posted by Jonas Eckerman <jo...@frukt.org>.

On Thu, 1 Jan 2004 11:42:29 -0800, bruce wrote:

First a short thought:

After reading your mail, I feel mod_proxy should be suitable. If you decide you don't want bto use HTTP or FTP when then "client" apaches fetches data from the central system, it might still be a good idea to check out mod_proxy as it is is possible to write modules that makes mod_proxy support other protocols as well.

I also feel that it should be worthwhile to check out existing methods external to Apache for updating the "client" apaches configs. One such method could be to have the master system push the config files to the clients with rsync.

>  1.1: How often does the configuration change?
>  The configuration for the Apache clients should be able to be
>  updated in a dynamic fashion. It is critical that the modified
>  Apache clients be able to access this information from the Central
>  system.

This doesn't answer the question of how often it will change.

If it may change more than once a minute, we can rule out everything that has to do with cron jobs for eaxmple.

>  configured... Another way might be to have the config information
>  periodically downloaded to the client machine, and then have the
>  Apache client "load" it. This would require the config information
>  be in a different format than the straight text file...

Why? If the client machine fetches it with ftp, rsync, http or whatever with a very small shell script called by cron, the fiule they fetch can be a standard apache httpd.conf. If you use rsync or wget you should be able to check the fuiles timestamp and then have the shell script tell apache to reload the config if the file has been changed since the last time it was reloaded.

What do you see that makes this type of solution unfeasible?

> There would also be security considerations...

Of course. The whole project is full of them. :-)

But: Does the multitude of apaches have fixed IP-addresses? If not, simply restricting access to the config based on IP should be a good start. If you use rsync, you can also use passwords. Or you can tunnel through ssh (or use scp) and use certificates *and* passwords *and* IP-addresse.

>  For ease of use.. it seems the "easiest" approach would be to
>  "suck" the config information into the Apache client...

I think it would be far easier to just use one of the multitude of existing ways to fetch a file regularly from one system to another, rather than inventing and implementing a way for Apache to do this work.

>  1.3: How critical is it that *all* the Apaches *always* contain
>  the latest configuration data?
>  Each Apache client may in fact contain a different "config". But
>  it's important that they be updated with the latest as soon as
>  possible.

What does "as soon as possible" mean. No more than 10 seconds? No more than 10 minutes? An hour?

> This of course will require some central app/process
>  that essentially monitors when "config" information is changed, as
>  well as the configs of each Apache client, and then which Apache
>  clients get what configs....

So push it from the server to the client using rsync. That ought to work.

And then the client can either have a cron job checking for changes, or if a gap of 1 minute is unacceptable it can have a small shell script that checks for changes ever 5 seconds or so. The easiest way might be to simply use rsync to send two fiules, the apache config and a flag file. The shell script can simply check for the existance of a flag file and delete it after it has reloaded the config. Use timestamps in the flag files names if changes can sometimes happenb very close to each other.

If you do decide to push the config from the server to the client, you could actually have the config stored any way you want to. In one text file per client, in one main text file and a number of smaller text files for the stuff that differs between clients, in a SQL database, or whatever. The script that pushes the config to the clients can also create the actual file to push from whatever the data is stored in.

1.4.2: Will the OS installations for all the Apache's be identical?
     We are considering Linux/Windows variant OSes for the Apache client
apps

I would absolutely recommend Linux rather than Windows for this. For a couple  main reasons:

1: There's a lot more applications made for this kind of stuff easily available for Linux (rsync for example is easy to use on Linux, but on Windows it is just frustratin).
2: You don't have to do anything special to get a good command shell with good scripting support on Linux, and intepreted languages suitable for automagic are easy to isntall (perl comes to mind).
3: A stable Windows based Apache system requires Windows 2K/XP with fixpacks. Windows 2K/XP requires a lot more computer than a Linux system without X.
4: It's easy to restrict access to files and directories on a Linux system. No users can become root, they canät read/change the cache or the config. While this is possible on Widnows as well, it's not as easy and if the machine is used for ordinary apps as well there can ve problems as some orinary apps simply won't work  correctly unless the user has administrator rights.

I think this kind of stuff is simply a lot easier to do on Linux (and other Unix-like systems) than on Windows.

>  2.1: How much data are we talking about?
>  You had to ask...!!! For a given site, we believe that they will
>  average ~10-20 pages of text... We'd guess ~5-10K.. We don't
>  really know, but we do know we're not talking large quantities of
>  data for the sites...

Ok. So a cache on the "client" apaches can be used without requiring large harddisks. That's good.

>  2.2: Is this dynamic data or static data? (The usefulness of
>  mod_cache depends on this).
>  The data in the sites/pages will be static...

This is good. It means you can use mod_proxy.

>  (at least initially) want to cache any page/site content on the
>  users PC anymore than we have to..

Ok. This might lead to scaling problems if the central servers canät keep up with all the "client" apaches.

>  We have considered using temp files/directories as needed...

This is a comment I don't understand. Using for what? The only thing I can think of in this setup that requires temp files is a cache.

>  We are also open to using/considering the use of caching methods,
>  provided we also incorporate some form of encryption for the
>  content.

That should be possible, depending on the OS. If you use a file system with support for user based encryption and an OS that also supports it, you should be able to place the cache (and config if desired) in an encrypted directory only accesible by the apache user.

If you decide to use mod_cache, there's another way to go as well.

Just as mod_proxy uses plugable modules for transport, mod_cache uses pluggable modules for storage. Ass far as I know there's currently two storage modules available for mod_cache, one stat stores on disk and one that stores in RAM. It should be possible to moify the disk cache module and create a new module that lets mod_cache store data in signed encrypted form.

This still leaves the config data though.

>  2.4: What kind of hardware (CPU numbers and speed, disk size,
>  amount of RAM) will the Apache's be installed on?
>  This will range. For design issues, assume a minimal machine...
>  500MHz, 10GHd, 256Meg Ram.

Considering you wont execute dynamic web code or need big caches on the "client" apaches, that should be enough for an ordinary apache with mod_proxy (and mod_cache if desired).

>  2.5: What kind of connections will the Apache's have to the remote
>  machines? The modified Apache clients will have to access the
>  Master/Central app/system through a secure process. Any
>  data/communication between the Central system and the client
>  Apache apps will have to go through a verification/validation
>  process...

And what does this mean? A "secure process"? Are you hoing to utilize IPC stuff (named pipes) or?

Are we talking about IP networks? LAN, WAN, the Internet?

>  We're still not comfortable with this approach given that it would
>  require the Master App/System to be running an Apache Based app,
>  that will be required to have its own config file.

No, it will require the master system to run one of:

1: a HTTP server (does not have to be apache)
2: a FTP server.
3: somehing else for wich you find or make a transport module that mod_proxy can use.

>  If you're interested in having a further discussion, or in perhaps
>  joining what we're thinking about doing, we would be interested in
>  talking with you.

I don't feel I have the time to actually hoin any projects, but I do have the time to check this list every now and then. :-)

Regards
/Jonas

-- 
Jonas Eckerman, jonas_lists@frukt.org
http://www.fsdb.org/

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

RE: [users@httpd] Serving remote files from Apache...

Posted by bruce <be...@earthlink.net>.

Nick,

Thanks for your responses... Are you interested in possibly working with us
on this project...

If you are, send us your resume/CV and your contact information and the best
time to call you. We'll get in touch with you as sosn as possible.


Thanks

Bruce Douglas
bedouglas@earthlink.net
(925) 866-2790


-----Original Message-----
From: Nick Kew [mailto:nick@webthing.com]
Sent: Thursday, January 01, 2004 5:16 PM
To: bruce
Cc: users@httpd.apache.org
Subject: RE: [users@httpd] Serving remote files from Apache...


On Thu, 1 Jan 2004, bruce wrote:

> 2.1: How much data are we talking about?
>      You had to ask...!!! For a given site, we believe that they will
> average ~10-20 pages of text... We'd guess ~5-10K.. We don't really know,
> but we do know we're not talking large quantities of data for the sites...

Are you sure you need any complex setup, rather than just (say)
transferring all the data to the Apache server itself with a cron job?

In the early days of big hosting farms, people would host 1000 sites
that size comfortably on a state-of-the-art Pentium 90.

>      This will range. For design issues, assume a minimal machine...
500MHz,
> 10GHd, 256Meg Ram.

Hey, can I have a minimal machine for my webserver? :-)

--
Nick Kew



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

RE: [users@httpd] Serving remote files from Apache...

Posted by Nick Kew <ni...@webthing.com>.

On Thu, 1 Jan 2004, bruce wrote:

> 2.1: How much data are we talking about?
>      You had to ask...!!! For a given site, we believe that they will
> average ~10-20 pages of text... We'd guess ~5-10K.. We don't really know,
> but we do know we're not talking large quantities of data for the sites...

Are you sure you need any complex setup, rather than just (say)
transferring all the data to the Apache server itself with a cron job?

In the early days of big hosting farms, people would host 1000 sites
that size comfortably on a state-of-the-art Pentium 90.

>      This will range. For design issues, assume a minimal machine... 500MHz,
> 10GHd, 256Meg Ram.

Hey, can I have a minimal machine for my webserver? :-)

-- 
Nick Kew

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

RE: [users@httpd] Serving remote files from Apache...

Posted by bruce <be...@earthlink.net>.

Jonas/Nick,

Thanks for the response...

Our project would have 100s of modified Apache client apps, each of which
would have to communicate with the Master/Central app to get the required
pages/data to server up. In keeping this approach simple, we're going to
consider static content only. There will be no processing of scripts/etc on
the client Apache PCs...

That said, the following are responses to the issues you raised...

1.1: How often does the configuration change?
     The configuration for the Apache clients should be able to be updated
in a dynamic fashion. It is critical that the modified Apache clients be
able to access this information from the Central system. This gives us
control over how the client app is configured... Another way might be to
have the config information periodically downloaded to the client machine,
and then have the Apache client "load" it. This would require the config
information be in a different format than the straight text file... There
would also be security considerations...

	For ease of use.. it seems the "easiest" approach would be to "suck" the
config information into the Apache client...

1.2: How often must the Apaches update their configuration?
	See above...

1.3: How critical is it that *all* the Apaches *always* contain the latest
configuration data?
	Each Apache client may in fact contain a different "config". But it's
important that they be updated with the latest as soon as possible. This of
course will require some central app/process that essentially monitors when
"config" information is changed, as well as the configs of each Apache
client, and then which Apache clients get what configs....

1.4: Will all the Apache installations be identical in *all* ways?
     See 1.3...

1.4.2: Will the OS installations for all the Apache's be identical?
     We are considering Linux/Windows variant OSes for the Apache client
apps

2.1: How much data are we talking about?
     You had to ask...!!! For a given site, we believe that they will
average ~10-20 pages of text... We'd guess ~5-10K.. We don't really know,
but we do know we're not talking large quantities of data for the sites...

2.2: Is this dynamic data or static data? (The usefulness of mod_cache
depends on this).
     The data in the sites/pages will be static... However, we do not (at
least initially) want to cache any page/site content on the users PC anymore
than we have to.. There are a couple of reasons for this, primarily due to
the fact that we don't want a user looking at their hard drive and seeing
20-30 Meg of data due to us!!

     We have considered using temp files/directories as needed...

     We are also open to using/considering the use of caching methods,
provided we also incorporate some form of encryption for the content. This
would ensure that the underlying site page/content data is "safe" and has
not been altered...

     This could also probably be accomplished with some computed hash.. that
is then checked against the data prior to it being served by the client
Apache, for site data that is cached...

2.3: Will there be scripts r other dynamic stuff that the Apaches are
supposed to fetch and then execute themselves?
     Nope.. For now, we are only going to deal with static content...

2.4: What kind of hardware (CPU numbers and speed, disk size, amount of RAM)
will the Apache's be installed on?
     This will range. For design issues, assume a minimal machine... 500MHz,
10GHd, 256Meg Ram.

2.5: What kind of connections will the Apache's have to the remote machines?
     The modified Apache clients will have to access the Master/Central
app/system through a secure process. Any data/communication between the
Central system and the client Apache apps will have to go through a
verification/validation process...

2.6: What OS will the Apache's machines and the remotes run? (This can be
important when considering mounting directories remotely.)
     see above...

     However, given the number of clients that we will be shooting for, if
we develop this!!, we would not like to mount directories internal to the
Master app/system environment that would be open to the outside clients...

     Our initial concern regarding bottlenecks had to really do with an
understanding of the issues surrounding the mod_proxy process. We're still
not comfortable with this approach given that it would require the Master
App/System to be running an Apache Based app, that will be required to have
its own config file.

     This config file would have to updated quite frequently to accommodate
changes made to the client Apache config information, for the mod_proxy
sections...

     These changes appear to be required, as we may not want the same mod
Apache client to serve the same sites on a continual basis...

This approach seems to provide a possible working approach. Assuming that
the Master Server/System apps can be created, as well as the monitoring app,
and the requisite software apps. The real issue would seem to be the ability
for the Master System to handle the quantities of requests for information
that will be coming from the client Apache apps. This could be accomplished
by a couple of cabinets of rack servers with the appropriate round robin
apps to facilitate this process...

If you're interested in having a further discussion, or in perhaps joining
what we're thinking about doing, we would be interested in talking with you.

At present this is an idea that's being refined.

Thanks for your assistance...

Regards,

Bruce Douglas
bedouglas@earthlink.net

-----Original Message-----
From: Jonas Eckerman [mailto:jonas_lists@frukt.org]
Sent: Thursday, January 01, 2004 8:50 AM
To: users@httpd.apache.org
Subject: Re: [users@httpd] Serving remote files from Apache...

On Wed, 31 Dec 2003 12:04:52 -0800, bruce wrote:

> We've been investigating/searching for an answer to a problem... We've
seen
> possible solutions, but nothing that's been exact!..

That's the way things usually work out. When searching for a solution you
need to decide wich of the following is the most important:

1: To get the perfect solution that fits your preconceptions of the solution
exactly. This often means you'll have to write the necessary code yourself.

2: To get a working solution that does solve the *original* problem, even if
not the way you envisioned when you set out and even if it means
reconcidering some things. This can often be done using existing solutions
in creative ways, and is often the approach that works best.

Note my use of *original* problem above. It's very common that people first
have a rather abstract problem. Then they start thinking of solutions. When
they think they're on the right track they start investigating how to
implement the solutions using existing means. If you stumble at this stage,
it's often a very good idea to step back to the first stage and see if there
are alternative ways to solve the original (often rather abstract) problem.

It's also very useful to explain the original problem when asking for help.
When people know the original problem, they may sometimes come up with
suprising solutions that one would never have thought about oneself.

Currently, you have explained a solution to a problem and have asked for
help and tips about implementing that solution. But you have not explain
what problem your solution is meant to solve.

I think you also need to separate your two different problems. You are
actually looking for solutions to two completely separate problems. The
solutions may therefore be completely separate as well. The problems:

1: Get the Apache machines to fetch their configuration from a central
place.

2: Get the Apache machines to serve content stored on remote machines.

To me both problems seem rather easy to deal with using existing solutions,
but as I don't have all the info on exactly what you're trying to do they
might indeed be difficult problems. Some info that'd be very good to have
about both problems:

1.1: How often does the configuration change?
1.2: How often must the Apaches update their configuration?
1.3: How critical is it that *all* the Apaches *allways* contain the latest
configuration data?
1.4: Will all the Apache installations be identical in *all* ways?
1.4.2: Will the OS isntallations for all the Apache's be identical?

2.1: How much data are we talking about?
2.2: Is this dynamic data or static data? (The usefullness of mod_cache
depends on this).
2.3: Will there be scripts r other dynamic stuff that the Apaches are
supposed to fetch and then execute themselves?
2.4: What kind of hardware (CPU numbers and speed, disk size, amount of RAM)
will the Apache's be installed on?
2.5: What kind of connections will the Apache's have to the remote machines?
2.6: What OS will the Apache's machines and the remotes run? (This can be
important when considereing mounting directories remotely.)

>  However, we'd actually like to have the website/page files reside
>  on the remote PC/Harddrives and to basically have them read from
>  the remote machine, and served via the Apache app...

Wich you can do with Apache's mod_proxy.

Honestly, I don't understand why you don't want to use mod_proxy for this.
You need to use some kind of transport protocol to have Apache fetch the
files from a remote machine. You've allready stated that you do not wish to
use NFS. Why not use HTTP? Do you have any other particular protocol for
fetshing the files that you'd prefer to HTTP? Are the Apache's supposed to
fetch executable code (PHP pages, CGIs, etc) and execute it (mod_proxy won't
do this)?

>         Apache Server (http://12.13.14.15) ip address
>              |  ^
>              |  |
>              |  |
>              V  |
>         Remote PC/Server

> If this is not easily doable, and we need to utilize the ProxyPassReverse
> solution, what issues are involved?

This can of course be done, but you will need some way for Apache to fetch
the files from the remote server. With mod_proxy, Apache allrady contains
the functionality that you describe (as I interpret it). If, for some as yet
unexplained reson, you do not wish to use mod_proxy you can use some other
method. You could mount (not necessarily through NFS) the remote machines
directories on the Apache machine, or you can implement your own module to
get Apache to fetch them.

If you can accept the use of mod_proxy (wich does exactly what you want) but
don't want Apache to fetch the files with HTTP, you can use mod_proxy_ftp
instead and have Apache fetch the files with FTP, or you can implement your
own module to get Apache's mod_proxy to use some other protocol.

One thing you have not explained is why you want Apache to do the fetching
of both content and configurations. Why not let the OS do this?

> In particular, how scalable would the
> ProxyPassReverse approach be? We might need to server potentially 1000's
of
> sites with this approach...

1000's of different web sites (meaning Apache will fetch from 1000's of
different remote machines)? Or maybe 1000's of different Apache's that will
all fetch from just a few remote machines? Or 1000's of something else?

* If you'll have 1000's of Apache instalations all fetching from just a few
remote machines:

You've allready created scaling problems as those few remote machines can
get extremely heavily loaded.

If this is the case, you should use mod_cache in the 1000's of Apache
installations in order to lower the load on the central remote machines.
With mod_cache this scheme should be able to scale very well.

One problem, even with using mod_proxy in this case is that mod_proxy just
fetches data and sends it on to clients. If you're using PHP, ASP, CGI or
similar stuff, the remote machine is the one that'll have to execute this
code, because Apache with mod_proxy just fetches it ans serves it as-is. If
this is what you're doing (or might be doing), mod_proxy will create
problems rather than solve them. This means you will probably be better of
letting the OS handle the actuall fetching of the remote files, and use a
more standard Apache that doesn't really know wether the files are stored
locally or remotely.

OTH, you did say you wanted small stripped Apaches, so I guess you do not
want them to be able to execute CGIs, PHP, ASP or other similar stuff
anyway.

* If you'll have 1000's of different virtual hosts in each Apache or 1000's
of remote machines:

I'm not sure where exactly the scaling problems will be, except that you
really should look at the modules for mass virtual hosting if you do this.

Regards
/Jonas

--
Jonas Eckerman, jonas_lists@frukt.org
http://www.fsdb.org/

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

Re: [users@httpd] Serving remote files from Apache...

Posted by Jonas Eckerman <jo...@frukt.org>.

On Wed, 31 Dec 2003 12:04:52 -0800, bruce wrote:

> We've been investigating/searching for an answer to a problem... We've seen
> possible solutions, but nothing that's been exact!..

That's the way things usually work out. When searching for a solution you need to decide wich of the following is the most important:

1: To get the perfect solution that fits your preconceptions of the solution exactly. This often means you'll have to write the necessary code yourself.

2: To get a working solution that does solve the *original* problem, even if not the way you envisioned when you set out and even if it means reconcidering some things. This can often be done using existing solutions in creative ways, and is often the approach that works best.

Note my use of *original* problem above. It's very common that people first have a rather abstract problem. Then they start thinking of solutions. When they think they're on the right track they start investigating how to implement the solutions using existing means. If you stumble at this stage, it's often a very good idea to step back to the first stage and see if there are alternative ways to solve the original (often rather abstract) problem.

It's also very useful to explain the original problem when asking for help. When people know the original problem, they may sometimes come up with suprising solutions that one would never have thought about oneself.

Currently, you have explained a solution to a problem and have asked for help and tips about implementing that solution. But you have not explain what problem your solution is meant to solve.

I think you also need to separate your two different problems. You are actually looking for solutions to two completely separate problems. The solutions may therefore be completely separate as well. The problems:

1: Get the Apache machines to fetch their configuration from a central place.

2: Get the Apache machines to serve content stored on remote machines.

To me both problems seem rather easy to deal with using existing solutions, but as I don't have all the info on exactly what you're trying to do they might indeed be difficult problems. Some info that'd be very good to have about both problems:

1.1: How often does the configuration change?
1.2: How often must the Apaches update their configuration?
1.3: How critical is it that *all* the Apaches *allways* contain the latest configuration data?
1.4: Will all the Apache installations be identical in *all* ways?
1.4.2: Will the OS isntallations for all the Apache's be identical?

2.1: How much data are we talking about?
2.2: Is this dynamic data or static data? (The usefullness of mod_cache depends on this).
2.3: Will there be scripts r other dynamic stuff that the Apaches are supposed to fetch and then execute themselves?
2.4: What kind of hardware (CPU numbers and speed, disk size, amount of RAM) will the Apache's be installed on?
2.5: What kind of connections will the Apache's have to the remote machines?
2.6: What OS will the Apache's machines and the remotes run? (This can be important when considereing mounting directories remotely.)

>  However, we'd actually like to have the website/page files reside
>  on the remote PC/Harddrives and to basically have them read from
>  the remote machine, and served via the Apache app...

Wich you can do with Apache's mod_proxy.

Honestly, I don't understand why you don't want to use mod_proxy for this. You need to use some kind of transport protocol to have Apache fetch the files from a remote machine. You've allready stated that you do not wish to use NFS. Why not use HTTP? Do you have any other particular protocol for fetshing the files that you'd prefer to HTTP? Are the Apache's supposed to fetch executable code (PHP pages, CGIs, etc) and execute it (mod_proxy won't do this)?

>         Apache Server (http://12.13.14.15) ip address
>              |  ^
>              |  |
>              |  |
>              V  |
>         Remote PC/Server

> If this is not easily doable, and we need to utilize the ProxyPassReverse
> solution, what issues are involved?

This can of course be done, but you will need some way for Apache to fetch the files from the remote server. With mod_proxy, Apache allrady contains the functionality that you describe (as I interpret it). If, for some as yet unexplained reson, you do not wish to use mod_proxy you can use some other method. You could mount (not necessarily through NFS) the remote machines directories on the Apache machine, or you can implement your own module to get Apache to fetch them.

If you can accept the use of mod_proxy (wich does exactly what you want) but don't want Apache to fetch the files with HTTP, you can use mod_proxy_ftp instead and have Apache fetch the files with FTP, or you can implement your own module to get Apache's mod_proxy to use some other protocol.

One thing you have not explained is why you want Apache to do the fetching of both content and configurations. Why not let the OS do this?

> In particular, how scalable would the
> ProxyPassReverse approach be? We might need to server potentially 1000's of
> sites with this approach...

1000's of different web sites (meaning Apache will fetch from 1000's of different remote machines)? Or maybe 1000's of different Apache's that will all fetch from just a few remote machines? Or 1000's of something else?

* If you'll have 1000's of Apache instalations all fetching from just a few remote machines:

You've allready created scaling problems as those few remote machines can get extremely heavily loaded.

If this is the case, you should use mod_cache in the 1000's of Apache installations in order to lower the load on the central remote machines. With mod_cache this scheme should be able to scale very well.

One problem, even with using mod_proxy in this case is that mod_proxy just fetches data and sends it on to clients. If you're using PHP, ASP, CGI or similar stuff, the remote machine is the one that'll have to execute this code, because Apache with mod_proxy just fetches it ans serves it as-is. If this is what you're doing (or might be doing), mod_proxy will create problems rather than solve them. This means you will probably be better of letting the OS handle the actuall fetching of the remote files, and use a more standard Apache that doesn't really know wether the files are stored locally or remotely.

OTH, you did say you wanted small stripped Apaches, so I guess you do not want them to be able to execute CGIs, PHP, ASP or other similar stuff anyway.

* If you'll have 1000's of different virtual hosts in each Apache or 1000's of remote machines:

I'm not sure where exactly the scaling problems will be, except that you really should look at the modules for mass virtual hosting if you do this.

Regards
/Jonas

-- 
Jonas Eckerman, jonas_lists@frukt.org
http://www.fsdb.org/

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org