You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Brian Dessent <br...@dessent.net> on 2003/12/27 21:40:24 UTC

Re: [users@httpd] standing naked at the door of infinity with only a butterknife

CUTMAN ~CW~ wrote:

> If you are willing to offer any suggestions as to what I probably SHOULD
> approach here, then I would certainly appreciate the direction and offering.
>   On a minimum level I'd like to be able to mess with cgi through c
> consoles.  Also, php really seems it should be an obvious interest, but I
> can take things one-at-a-time as needed here.

I'd say your best bet would be to get one of the O'Reilly books and sit
down with a cup of coffee and start reading.  There are tons of them, so
I can't really recommend any one in particular, but I'd say anything
from the "In A Nutshell" series would do you well (e.g. _Webmaster in a
Nutshell_)  Browse on over to <http://web.oreilly.com> and see if any of
the titles jump out at you.  These books are "the industry standard" and
so for example if you want to learn Perl then "the camel book"
(_Programming Perl_) is the definitive reference.  Also note that there
are usually several degrees of coverage on each topic, for example the
"Programming Foo" titles tend to be very thorough and in depth, while
the "Learning Foo" or "Foo Cookbook" series are more geared at getting
you off and running if you have less background or desire to really read
a long book on the subject.  Don't be overwhelmed by the sheer number of
books, the majority of them are quite specialized.  Aim for the more
broad, introductory, or general titles.

The other resource you should check out is <http://www.onlamp.com>. 
LAMP is the common acronym for Linux-Apache-MySQL-PHP which powers a lot
of websites.  The site has a lot of tutorials and background information
that's suitable to beginners.  It's worth a browse.

In terms of the "How does it all fit together" part, here's my quick
overview.  Apache is the main server which coordinates all the
activities.  Apache itself typically does not do any actual processing,
although it can with some of its add-on modules.  But usually Apache is
either there to just serve static files from disk or to act as a gateway
(CGI) for some other language or application.  You can use almost any
conceivable language out there to do this.  The most popular are Perl
and PHP, which are both interpreted scripting languages.  They are
somewhat similar in that they share a lot of familiar ideas, but they
each have their own quirks and traits.  Other scripting languages that
are seeing more use are Python and Ruby, but really anything can be
used, including any compiled language.  You can read about the CGI
specification on the NCSA site, but essentially it amounts to stdout of
your application gets sent to the client (browser), and some information
is passed through environment variables and the command line arguments. 
It's really pretty straightforward, but that doesn't mean it's
necessarily best to re-invent the wheel.  By that I mean it's certainly
*possible* to write web apps in C but almost nobody would do that unless
they were absolutely forced to.  The reason is that while C is certainly
fast and efficient, it lacks a great deal of functionality that must
either be implemeted by you or a library.  Furthermore, scripting
languages don't need to be compiled so they are very simple to
distribute, install, and require little porting to be cross-platform. 
You would also be amazed at how much functionality has already been
written for you in Perl or PHP (see for example CPAN) in terms of
modules that you just drop in for whatever task you have to perform. 
The other issue is that C code can be riddled with security flaws (such
as buffer overruns) which is especially relevent to web apps since they
accept input directly from the end user's client, so they must be
hardened to those sorts of common errors.

So, that's CGI.  It's important to note that the CGI interface calls for
a new process to be created to service each request, and terminated when
that request ends.  This makes it inefficient for high-traffic sites, so
people have taken the core internals of popular scripting languages, and
implemented them as modules that run inside Apache.  Thus you have
mod_php and mod_perl, as well as mod_python, etc.  What this means is
that instead of having to spawn a new PHP process for each request, the
Apache process itself parses the code, runs it, and returns the result
to the user.  This is very efficient, and it's a crucial reason why
these scripting languages have been embraced by so many sites.  We'd
probably all be back to writing everything in C (as we did in the
mid-90s) for performance without these modules.  In fact, this is
exactly what has happened with Yahoo.  They wrote a giant infrastructure
around these custom C programs that they developed, starting way back in
the day when the web was young.  They've managed to maintain it and keep
it going, but they recently announced that they're going to ditch it all
in favor of PHP in the future, as it's just too costly for them to
maintain.

Finally, there's MySQL.  One aspect of web programming that differs from
'traditional' application developement is that web apps are
intrinsically multi-user.  Your app doesn't know whether it's the only
one running, or if there are 100 other requests also being handled at
the same time.  So, for example, if you wanted to store a little data on
disk you might just use a simple text file or something if you were
writing a traditional sinlge-user program.  But that doesn't work very
well at all with web apps, because you have to code around all the
possible scenarios of "Is someone else reading this file right now?" and
"What happens if I write to this file and someone else is trying to
write to it too?", etc.  So thus there's a general need for a data-store
that's both flexible and adept at handling the case of many processes
accessing the same data simultaneously.  MySQL gets a lot of use here
because it's free, but other databases are often used when MySQL is
inadequate.  PostgreSQL, Oracle, DB2, SAP, MSSQL, etc. are all
heavyweight alternatives to MySQL.  At the core they all do basically
the same thing -- but of course you get a lot more sophistication and
capability with the ones that cost megabucks.  Anyway, the SQL database
gets used for web stuff because it offers this flexible and robust
data-store capability.  Relational database theory is a very deep and
well-researched field so I can't really sum it up for you, but try
reading <http://eveandersson.com/arsdigita/books/sql/> for a good
background of what a database is and should do, and how that fits into
the web.

HTH,
Brian

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org