You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Rodent of Unusual Size <co...@Apache.Org> on 1998/11/30 05:45:16 UTC

[STATUS] (apache-1.3) Sun Nov 29 23:45:14 EST 1998

 1.3 STATUS:

Release:

    1.3.4-dev: current. There is discussion on releasing it "soon"
    (Lars volunteers as release manager)

    1.3.3: Tagged and rolled on Oct. 7.  Released on 9th, announced on 10th.
    1.3.2: Tagged and rolled on Sep. 21. Announced and released on 23rd.
    1.3.1: Tagged and rolled on July 19. Announced and released.
    1.3.0: Tagged and rolled on June 1.  Announced and released on the 6th.
           
    2.0  : In pre-alpha development, see apache-2.0 repository

RELEASE SHOWSTOPPERS:

    * Win32 device file issues (nul/aux/...)

    * How should an Apache binary release tarball look?

      1. The "old" way where it is just a source release tarball
         plus a pre-compiled src/httpd-<gnutriple>. It is created
         via the apache-devsite/binbuild.sh script which
         - creates the build tree
         - creates the src/Configuration file with standard modules
         - runs "make"
         - renames src/httpd to src/httpd-<gnutriple>
         - runs "make clean"
         - packs the build tree stuff together
         Already known discussion points:
         - should src/httpd be renamed or not because a lot
           of PRs say they cannot find the httpd :-(
         Status: Ralf -0, Ken +0

      2. The way some other projects release binary tarballs, i.e.
         a package containing the installed (binary) files.
         It can be created by a script which
         - creates the build tree
         - runs "./configure --prefix=/usr/local/apache \
                             --enable-shared=remain \
                             --disable-module=auth_db \
                             --enable-suexec ..."
         - runs "make install root=apache-root"
         - packs the stuff together from ./apache-root only!!
         Already known discussion points:
         - should there be a prefix usr/local/apache in 
           the tarball or not?  Some people think
           it's useful while others dislike it a lot.
	 - it doesn't include the source.
	 - the paths don't match the original Apache style, nor the Win32
	   paths
	 - should suexec be prebuilt in a binary tarball?
         Status: Ralf +1, Martin +1, Roy -1, Ken +1 (IFF source is included,
		 there's no usr/local/apache prefix in the tarball, AND the
		 old-style [common with Win32] paths are used)

      3. A source release tarball with three extra directories:
            lib: for the shared library object files
            bin: for the httpd and support executables
            man: for the man files (if desired)
         as if the server was installed in those directories.
         Status: Roy +1, Jim +1 (still need to define which modules
	 	 are built)
                 Ralf -0 (I dislike mixed source+binary tarballs)

Documentation that needs writing:

    * Need a document explaining mod_rewrite/"UseCanonicalName off" based
      virtualhosting.  (If it exists already I can't find it easily.)
      => It still doesn't exists but I've already assembled the relevant
         information and config snippets. We just have to write a
         vhost-xxx.html document out of it. -- rse

Available Patches:

    * Paul's [PATCH] adding a DefaultLanguage directive
      This patch implements a DefaultLanguage directive. It sets the
      language to apply to files with no explicit language set via
      AddLanguage.
	Message-ID: <Pi...@ecstasy.localnet>
	Status: Paul +1, Ken +1

    * Michael van Elst's patch [PR#3160] to improve mod_rewrite's
      in-core cache handling by using a hash table.
        Message-ID: <XF...@unix-ag.org>
        Status: Lars +1

    * Ralf's [PATCH] MODULE_MAGIC_COOKIE field for module structure
        Message-ID: <19...@engelschall.com>
        Status: Currently assumes 8-byte long, Ralf will fix and repost.

    * Ralf's Build outside of source tree (take 2: alternative solution)
      ("overrules"  Wilfredo Sanchez's [PATCH] Build outside of source tree)
        Message-ID: <19...@engelschall.com>
	Status: Ralf +1, Jim +1, Martin +1
                Fred says this doesn't work for him, suggests replacement in
                <19...@scv1.apple.com>

    * Marc's [PATCH] PR#3323: recursive includes
        Message-ID: <Pi...@alive.znep.com>
	Status: Marc +1, Jim +1 (concept)
	* Needs more in-depth review *

    * Ron Record's patch to port Apache to UnixWare 7 (forwarded by
      Randy).
	Message-ID: <x7...@montana.covalent.net>
	Status: 

    * Amiel Lee Yee's patch to update ap_config.h for DGUX/Intel
      and str[n]casecmp().
	Message-ID: PR#3247
	Status: 

    * Khimenko Victor's os_inline.c finctions are not inlined
        Message-ID: <AB...@khim.sch57.msk.ru>
        Status: Roy +1, Dean +1

    * Juan Gallego's patch to add CSH-style modifiers (:h, :r, :t, :e)
      to mod_include's variable processing.
	Mesage-ID: PR#3246, also available at
		   <http://www.physics.mcgill.ca/~juan/mod_include.patch>
	Status: Ken -0 for 1.3/+0 for 2.0

    * Patches for the DSO/mod_perl problem (see below for description):

      Ralf's "[PATCH] Fix module init"
      This fixes the mod_so/mod_perl problems described under "FINAL RELEASE
      SHOWSTOPPERS" by doing a more correct init of the modules after loading
      through two new core API functions.
	Message-ID: <19...@engelschall.com>
	Status: Ralf +1, Lars +1

In progress:
 
    * Addition of "cute little icons" to Apache's main icon groups.
      See <4....@hyperreal.org>
      Status: Ralf +1, Roy +1 (in "icons/small" subdirectory)

    * Ken's IndexFormat enhancement to mod_autoindex to allow
      CustomLog-like tailoring of directory listing formats

Needs patch:

    * Ralf: mod_so doesn't correctly initialise modules. For instance
      the handlers of mod_perl are not initialised. 
      An ap_init_modules() could be done from mod_so but this is too much.

      I've already debugged this up to ap_invoke_handler() and it correctly
      sees the handlers from mod_perl ("perl-script") and actually runs them.
      But under DSO situation it returns DECLINED while under non-DSO
      situation it runs fine. Sure, its mod_perl's fault because its mod_perl
      code which returns DECLINED.  But it definitely seems to be caused by a
      missing init in mod_so under DSO situation. I've already asked Doug for
      hints but he has not had a chance to look into it.

      Currently at least mod_perl is broken under the DSO situation because of
      this missing init in mod_so. But perhaps there are more modules which
      have the same problem. This should be fixed for 1.3.2 or at least found
      out why it is happening!

      Current status: We have two patches available (see above) but still don't
                      know the real reason. And the patches work not under
                      all platforms :-(

    * get_path_info bug; ap_get_remote_host should be ap_vformatter instead.
      See: <Pi...@twinlark.arctic.org>

    * uri issues
	- RFC2068 requires a server to recognize its own IP addr(s) in dot
	notation, we do this fine if the user follows the dns-caveats
	documentation... we should handle it in the case the user doesn't ever
	supply a dot-notation address.

    * Problems dealing with .-rooted domain names such as "twinlark." versus
	"twinlark.arctic.org.".  See the thread containing
	Message-ID: <19...@deejai.mch.sni.de> for more details.
	In particular this affects the correctness of the proxy and the
	vhost mechanism.

    * proxy_*_canon routines use r->proxyreq incorrectly.  See
	<Pi...@twinlark.arctic.org>

    * work around a Navigator/Mozilla bug when mod_proxy is used
      (broken images).
	Message-ID: <XF...@unix-ag.org>
        Status: Lars' patch was vetoed.  Roy and Dean think that it is
                probably another buffer magic number error and should be
                tested to find out and, if so, fixed like it was in core.

    * ap_escape_html() always duplicates the string, even when there is
      no change and the caller would be happy to use the original.
      What is needed is a separate interface for "don't need a dup"
      situations, like just about everywhere we use it in bvputs and
      bputs calls.

Open issues:

    * Underscores on symbols in DSO situation is broken for NetBSD:
      Here is a private conversation between me (rse) and Charles Hannum of
      the NetBSD project:

      From: "Charles M. Hannum" <my...@netbsd.org>
      > We have a bug report at the Apache BugDB (see
      > http://bugs.apache.org/private/index/full/2462) where a user says
      > under a particular NetBSD platform (NetBSD/pmax 1.3.2) the symbols on
      > dlsym() don't need an underscore.  In FreeBSD world we always had the
      > underscore,
      > [...]                               
      This is less an issue of OS, and more an issue of a.out vs. ELF.  The
      underscores are always used for a.out, and are never used for ELF.
      Therefore, on any platform where we use ELF (that would be Alpha, MIPS,
      PowerPC and UltraSPARC currently, although there are plans to eventually
      switch on other platforms), the underscores should not be added, and on
      all other platforms they should be.
      You can differentiate by comparing the output of `uname -m' with any
      of: alpha bebox macppc newsmips ofppc pica pmax sparc64.

    * Redefine APACHE_RELEASE. Add another 'bit' to signify whether
      it's a beta or final release. Maybe 'MMNNFFRBB' which means:
        MM: Major release #
	NN: Minor release #
	FF: "fix" level
	R:  0 if beta, 1 if final release
	BB: beta number

      See: <19...@devsys.jaguNET.com>
      Status: Jim +1, Ben +1, Martin +1, Ralf +1

    * Someone other than Dean has to do a security/correctness review on
      psprintf(), bprintf(), and ap_snprintf().  In particular these routines
      do lots of fun pointer manipulations and such and possibly have overflow
      errors.  The respective flush_funcs also need to be exercised.
       o Jim's looked over the ap_snprintf() stuff (the changes that Dean
         did to make thread-safe) and they look fine.
       o Laura La Gassa's looked over ap_vformatter & other related code
       o Martin did a "source review" as well.
       o Could still use 1 or 2 more sets of eyeballs.
       Status: Is this still valid??

    * Paul would like to see a 'gdbm' option because he uses
      it a lot.

    * Maybe a http_paths.h file? See
	<Pi...@valis.worldgate.com>
	+1: Brian, Paul, Ralf, Martin
	+0: Jim (not for 1.3.0)

    * Release builds: Should we provide Configuration or not?
      Should we 'make all suexec' in src/support?
	+1: Brian, Jim, Ken +1 (possible suexec path issue, though)

    * root's environment is inherited by the Apache server. Jim & Ken
      think we should recommend using 'env' to build the
      appropriate environment. Marc and Alexei don't see any
      big deal. Martin says that not every "env" has a -u flag.

    * Marc's socket options like source routing (kill them?)
	Marc, Martin say Yes

    * Ken's PR#1053: an error when accessing a negotiated document
      explicitly names the variant selected.  Should it do so, or should
      the original URI be referenced?

    * Proposed API Changes:

	- r->content_language is for backwards compatibility... with modules
	  that may not link any longer without some minor editing.  The new
	  field is r->content_languages.  Heck it's not even mentioned in
	  apache-devsite/mmn.txt when we got content_languages (note the s!).
	  The proposal is to remove r->content_language:
	    Status: Paul +1, Ralf +1, Ken +1, Martin +1

	- child_exit() is redundant, it can be implemented via cleanups.  It is
	  not "symmetric" in the sense that there is no exit API method to go
	  along with the init() API method.  There is no need for an exit
	  method, there are already modules using cleanups to perform this (see
	  mod_mmap_static, and mod_php3 for example).  The proposal is to
	  remove the child_exit() method and document cleanups as the method of
	  handling this need.
	    Status: Rasmus +1, Paul +1, Jim +1, 
	            Martin +1, Ralf +1, Ken +1

    * Should we re-enable nagle now that we're non-buffering CGIs?  See
      various messages from Marc in March 98.
  
    * TZ should not be dealt with specially any longer now that we have
      "PassEnv".  See
      <Pi...@twinlark.arctic.org>
       Jim: IMO it's too late in the game for this... I'm
            sure this would cause some strange bug reports as
	    people's cgi-scripts no longer work correctly
	    ("It worked just fine before I upgraded to 1.3.0")
	    unless we warn people in big nasty letters to add
	    PassEnv TZ to their config files "just in case"
	    and hope they do it :)

    * In ap_bclose() there's no test that (fb->fd != -1) -- so it's
      possible that it'll do something completely bogus when it's 
      used for read-only things. - Dean Gaudet

    * Okay, so our negotiation strategy needs a bit of refinement.  See
      <Pi...@twinlark.arctic.org>.
      In general, we need to go through and clean up the negotiation
      module to make it compliant with the final HTTP/1.1 draft, and at the
      very least we should make it more copacetic to the idea of transferring
      gzipped variants of files when both variants exist on the server.

    * Roy's HTTP/1.1 Wishlist items:
        1) byte range error handling
        2) update the Accept-Encoding parser to allow q-values

    * use of spawnvp in uncompress_child in mod_mime_magic - doesn't
      use the new child_info structure, is this still safe?  Needs to be 
      looked at.

    * suexec doesn't understand argv parameters; e.g.

        <!--#exec cmd="./ls -l" -->

      fails even when "ls" is in the same directory because suexec is trying
      to stat a file called "ls -l".  A patch for this is available at

        http://www.xnet.com/~emarshal/suexec.diff

      and it's not bad except that it doesn't handle programs with spaces in
      the filename (think win32, or samba-mounted filesystems).  There are
      several PR's to this and I don't see for security reasons why we can't
      accomodate it, though it does add complexity to suexec.c.
      PR #1120
      Brian: +1

Win32 specific issues:

 Important

    * fix O(n^2) attack in mod_isapi.c ... i.e. recopy the code from
      scan_script_headers_err_core.

 In progress:

    * Ben's ASP work... All agree it sounds cool.

    * DDA's adding a tray application to the Windoze version for ease of
      status/management.
	<01...@caravan.individual.com>
	<01...@caravan.individual.com>
	Status: Ken +1, Sameer +1, Martin +1, Ben +1 (as long as
	we get a single executable)
	Paul: No like Win95 specific stuff
	Ken: What's W95-specific about it?

 Help:

    * should trap ^C when running not-as-service and do proper shutdown

    * should have a pretty little icon for Apache on Win32

    * proxy module doesn't load on Win95.  Why?  Good question.  PR#1462.

    * "Directory /", "Directory C:/" both fail to do anything, 
      while "Directory *" SEGVs.

    * chdir() for CGI scripts and mod_include #exec needs to be 
      re-implemented now that CreateProcess is being used.

    * process/thread model
	- need dynamic thread creation/destruction, similar to 
	  Unix process model
	- can't use WaitForMultipleObjects in the same way we
	  do now, since that has a limit of 64(!) objects.  Grr.
	  PR#1665

    * some errors printed by CGIs to stderr don't end up making it
      to the server log unless an extra debugging message is added
      after they run? (PR#1725 indicates this may not be just Win32)

    * handle bugs that make it pop up errors on console, ie. segv 
      equiv?  Can we do this?  Need to make it robust.

    * install
	- make installshield work
	- config in cvs tree?
	- install docs, etc.?
	- location for install

    * the mutex should be critical-regions, since the current design
      is creating a mess of SO calls that are unnecessary

    * we don't mmap on NT.  Use TransmitFile?

    * CGIs
	- docs on how they work w/scripts
	- use registry to find interpreter?
	- WTF is the buffering coming from?
	    - we don't have a way to make non-blocking files on NT!

    * performance

    * documentation:
	- running the server without admin
	- how CGIs work
	- update README.NT
	- short/long name handling
	- better status page on current state of NT for users

    * http_main.c hell
	- split into two files?

    * who should run the service?  Who exactly is the "system account"?

      docs say:

      Localsystem is a very privileged account locally, so you shouldn't run
      any shareware applications there. However, it has no network privileges
      and cannot leave the machine via any NT-secured mechanism, including
      file system, named pipes, DCOM, or secure RPC.

      and:

      A service that runs in the context of the LocalSystem account
      inherits the security context of the SCM. It is not associated with
      any logged-on user account and does not have credentials (domain
      name, user name, and password) to be used for verification. This
      has several implications: [... removed ...]


      That _really_ sucks.  Can we recommend running Apache as some 
      other user?

    * need a crypt() of some sort.
	- sources are easy; problem is export restrictions on DES
	- if we don't do DES, can do md5

    * modules that need to be made to work on win32
        - mod_example isn't multithreadreded
	- mod_unique_id (needs mt changes)
	- mod_auth_db.c  (do we want to even try this?  We should have some
          db of some sort... what else can we pick from under win32?)
	- mod_auth_dbm.c
	- mod_info.c (PR re exporting symbols for it...)
	- mod_log_agent.c
	- mod_log_referer.c
	- mod_mime_magic.c (needs access to mod_mime API stage...)

    * do something to disable bogus warnings

    * rfc1413.c has static storage which won't work multithreaded

    * mod_include --> exec cgi, exec cmd, etc. don't work right.
      Looks like a code path that isn't run anywhere else that has
      something not quite right...  A PR or two on it.

    * signal type handling
    	- how to rotate logs from command line?
	  (Point people to Andrew Ford's cronolog because it's "better"
	   than ours?)

    * Currently if you double click on the conf files or the
      log files you get a useless dialog offering the set of all
      executables, usually after a very long pause.  Ought
      to stuff .conf in the registry mapping it to text.

    * apparently either "BrowserMatch" or the "nokeepalive" variable
      cause instability - see PR#1729.

Binaries
   The goal here is to have two columns of all-Y (where applicable)
   for the two stable release versions, and nothing under Old unless
   the new version just doesn't work on that platform.

                        1.2.6   1.3.3   Old
   aix_4.1                N       N     1.2.5, 1.3.1
   alphalinux             N       N     1.3.0
   aux_3.1                N       N     1.3.0
   decalphaNT             N       N     1.3b6
   dunix_4.0              N       N     1.2.4, 1.3.0, 1.3.1
   freebsd_2.1            N       N     1.2.4
   freebsd_2.2            N       N     1.2.5
   hpux_10.20             N       N     1.2.5
   hpux_11                N       N     1.3.2
   irix_6.2               N       N     1.2.5
   linux_2.x              N       N     1.2.4, 1.3.0
   netbsd_1.2             N       N     1.2.4
   os2                    N       N     1.3.2
   reliantunix_5.4        Y       N     1.3.1
   solaris                N       N     1.2.5, 1.3.0, 1.3.1
   sparclinux             N       N     1.3.0, 1.3.1
   sunos_4.1.x            N       N     1.2.5
   ultrix_4.4             N       N     1.2.4
   win32                  -       N     1.3.2  (is symlink okay?)

Re: Compression via content negotiation

Posted by Paul Ausbeck <pa...@alumni.cse.ucsc.edu>.
Rodent of Unusual Size wrote:

> Is this actually true?  ISTR lots of problems with IE downloading
> gzipped files and stripping the .gz extension -- but not actually
> gunzipping them.  I don't think I'd call that doing the right
> thing..  Maybe it works properly if the uncompressed document
> is a displayable content-type, but not for others?
> 

Yes this only works correctly for displayable content. For example, if a
request is made for NPpwc32.dll.gz, and apache is configured as follows:
AddEncoding gzip gz, things don't work but not because decompression is
not done. IE decompresses the file but messes up on the filename that it
suggests for saving to disk (nppwc32_dll(1).gz). This is because its url
to suggested filename code is broken not because compression is broken. 

Navigator 4.5 also does not work for non-displayable content (at least
on Windows 95). In the preceding test case, it comes up with the
suggested filename of NPpwc32_dll.gz AND doesn't decompress to boot.
This is probably better behavior that IE in that since it kept the .gz
extension it shouldn't decompress. Note the (1) on the IE suggested
filename. This is IE's way of indicating that the file has been altered
(copied) during transmission.

The problem here is that on the PC platform, multiple . extensions are
not normally used. Probably because of the rarity of this sort of thing
occuring in practice, the browser folks don't seem to have put much
effort into handling this. 

In order to avoid problems, the gz extension should be reserved for
"content-encoding" compression. Publically advertised or href url's that
are compressed typically should only be deflated (.zip). On the Windows
platform, zip is far more common than gzip. Under UNIX both are
typically available. 

The default apache distribution seems to do the best thing possible. In
the default mime.types, only application/zip is defined. Example
AddEncodings are in in the default srm.conf only for the suffix coders
gzip (.gz) and compress (.Z). This seems like a good compromise in that
zip does have a default suffix addition behavior.



Re: Compression via content negotiation

Posted by Paul Ausbeck <pa...@alumni.cse.ucsc.edu>.
Dean Gaudet wrote:
> 
> On Tue, 1 Dec 1998, Rodent of Unusual Size wrote:
> 
> > TE is 'transfer-encoding,' BTW.
> 
> No it's not... TE is a silly/confusing name for a new header.  See
> draft-ietf-http-v11-spec-rev-xx for whatever xx they're up to these days.
> 
> Dean
> 

Isn't the primary intent of transfer-encoding to handle dynamic content?
This thread of thought is about static compression. If dynamic
compression is fairly close for apache, perhaps this whole thing is
moot.

Also, how does this TE interact with compression.



Re: Compression via content negotiation

Posted by Dean Gaudet <dg...@arctic.org>.
On Tue, 1 Dec 1998, Rodent of Unusual Size wrote:

> TE is 'transfer-encoding,' BTW.

No it's not... TE is a silly/confusing name for a new header.  See
draft-ietf-http-v11-spec-rev-xx for whatever xx they're up to these days.

Dean

Here's an older version:

  14.39 TE

  The TE request-header field is similar to Accept-Encoding, but restricts
  the transfer-codings (section 3.6) that are acceptable in the response.

         TE           = "TE" ":" #( t-codings )
         t-codings   = "chunked" | ( transfer-extension [ accept-params ]
  )
  Examples of its use are:

         TE: deflate
         TE:
         TE: chunked, deflate;q=0.5
  The TE header field only applies to the immediate connection. Therefore,
  the keyword MUST be supplied within a Connection header field (section
  14.10) whenever TE is present in an HTTP/1.1 message.

  A server tests whether a transfer-coding is acceptable, according to a
  TE field, using these rules:

    1. If the transfer-coding is one of the transfer-codings listed in the
       TE field, then it is acceptable, unless it is accompanied by a
       qvalue of 0. (As defined in section 3.9, a qvalue of 0 means "not
       acceptable.")

    2. If multiple transfer-codings are acceptable, then the acceptable
       transfer-coding with the highest non-zero qvalue is preferred.

    3. The "identity" transfer-coding is always acceptable, unless
       specifically refused because the TE field includes "identity;q=0".
       The "chunked" transfer-coding is always acceptable. The Trailer
       header field (section 14.40) can be used to indicate the set of
       header fields included in the trailer.

    4.    If the TE field-value is empty, only the "identity" and the
       "chunked" transfer-codings are acceptable.

  If an TE field is present in a request, and if a server cannot send a
  response which is acceptable according to the TE header field, then the
  server SHOULD send an error response with the 406 (Not Acceptable)
  status code.

  If no TE field is present, the sender MAY assume that the recipient will
  accept the "identity" and "chunked" transfer-codings.

  A server using chunked transfer-coding in a response MUST NOT use the
  trailer for header fields other than Content-MD5 and Authentication-Info
  unless the "chunked" transfer-coding is present in the request as an
  accepted transfer-coding in the TE field.

  Fielding, et al                                              [Page 119]^

  INTERNET-DRAFT                HTTP/1.1           Friday,  March 13, 1998

    Note: Because of backwards compatibility considerations with RFC
    2068, neither parameter or accept-params can be used with the
    "chunked" transfer-coding.



Re: Compression via content negotiation

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
Paul Ausbeck wrote:
> 
> I don't know what ISTR or AFAIK mean.

"I seem to recall" and "As far as I know."  Qualifiers, both of 'em.
-- 
#ken    P-)}

Ken Coar                    <http://Web.Golux.Com/coar/>
Apache Group member         <http://www.apache.org/>
"Apache Server for Dummies" <http://Web.Golux.Com/coar/ASFD/>

Re: Compression via content negotiation

Posted by Paul Ausbeck <pa...@alumni.cse.ucsc.edu>.
Marc Slemko wrote:
> 
> On Tue, 1 Dec 1998, Rodent of Unusual Size wrote:
> 
> > Paul Ausbeck wrote:
> > >
> > > Both major browsers have had compression for quite some time and also do the right thing with Accept-Encoding headers.

> > Is this actually true?  ISTR lots of problems with IE downloading
> 
> No, it is not.
> 
> Even current versions of Netscape don't, AFAIK, handle it properly.

I don't know what ISTR or AFAIK mean. I do know what is going on on my
machine. IE has had gzip and deflate compression for almost 1 year and
Navigator has had gzip compression for almost 6 months.

There sure is a wide variety of opinion on this subject. I will attempt
to colate information that seems significant and attach it to pr 3447.

Paul Ausbeck


Re: Compression via content negotiation

Posted by Marc Slemko <ma...@worldgate.com>.
On Tue, 1 Dec 1998, Rodent of Unusual Size wrote:

> Paul Ausbeck wrote:
> > 
> > Both major browsers have had compression for quite some time and also do
> > the right thing with Accept-Encoding headers.
> 
> Is this actually true?  ISTR lots of problems with IE downloading

No, it is not.

Even current versions of Netscape don't, AFAIK, handle it properly.


Re: Compression via content negotiation

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
Paul Ausbeck wrote:
> 
> Both major browsers have had compression for quite some time and also do
> the right thing with Accept-Encoding headers.

Is this actually true?  ISTR lots of problems with IE downloading
gzipped files and stripping the .gz extension -- but not actually
gunzipping them.  I don't think I'd call that doing the right
thing..  Maybe it works properly if the uncompressed document
is a displayable content-type, but not for others?

TE is 'transfer-encoding,' BTW.
-- 
#ken	P-)}

Ken Coar                    <http://Web.Golux.Com/coar/>
Apache Group member         <http://www.apache.org/>
"Apache Server for Dummies" <http://Web.Golux.Com/coar/ASFD/>

Re: Compression via content negotiation

Posted by Dean Gaudet <dg...@arctic.org>.
Yes and no, and it'd be cool if someone else could summarize the current
state for you.

Dean

On Wed, 2 Dec 1998, Paul Ausbeck wrote:

> Dean Gaudet wrote:
> 
> > Yeah, transparent compression is one of the poster child items for apache
> > 2.0.  
> 
> Has anyone done any work on that?
> 
> Paul
> 
> 


Re: Compression via content negotiation

Posted by Paul Ausbeck <pa...@alumni.cse.ucsc.edu>.
Dean Gaudet wrote:

> Yeah, transparent compression is one of the poster child items for apache
> 2.0.  

Has anyone done any work on that?

Paul


Re: Compression via content negotiation

Posted by Dean Gaudet <dg...@arctic.org>.

On Wed, 2 Dec 1998, Paul Ausbeck wrote:

> Dean Gaudet wrote:
> > 
> > Or, a faster alternative in most situations, would be a module which knows
> > exactly which variants to look for.  Then a simple stat("foo.html.gz") can
> > be done when "Accept-Encoding: gzip" is present.
> > 
> > Dean
> 
> A compression specific module would be nice. Is anything currently
> planned? 

Yeah, transparent compression is one of the poster child items for apache
2.0.  But that's not what I was suggesting.

Dean


Re: Compression via content negotiation

Posted by Paul Ausbeck <pa...@alumni.cse.ucsc.edu>.
Dean Gaudet wrote:
> 
> Or, a faster alternative in most situations, would be a module which knows
> exactly which variants to look for.  Then a simple stat("foo.html.gz") can
> be done when "Accept-Encoding: gzip" is present.
> 
> Dean

A compression specific module would be nice. Is anything currently
planned? 

Paul Ausbeck


Re: Compression via content negotiation

Posted by Dean Gaudet <dg...@arctic.org>.

On Wed, 2 Dec 1998, Paul Sutton wrote:

> Yes. The issues is people who have index.html and index.html.gz. At the
> moment a request for index.html will always select index.html, without
> negotiating between index.html and index.html.gz.

I suppose if people want to throw away the performance and take a
directory scan on every hit it's their perogative.  (win32 coders note
that I think you can "one up" unix here and actually do a directory scan
for "index.html*" which might be faster than what we currently do). 

Or, a faster alternative in most situations, would be a module which knows
exactly which variants to look for.  Then a simple stat("foo.html.gz") can
be done when "Accept-Encoding: gzip" is present.

Dean





Re: Compression via content negotiation

Posted by Paul Sutton <pa...@c2.net>.
On Tue, 1 Dec 1998, Paul Ausbeck wrote:
> On Tue, 1 Dec 1998, Dean Gaudet Wrote:
> > Why is this a problem?  Didn't we just add the "default-handler" or
> > something?  Just name the file foo.html.def rather than foo.html, and add
> > a "AddHandler default .def" (check the code I may have this wrong).
> > There's no need to add more code.  This way you also get the advantage of
> > having some semblance of speed for those files for which there is zero
> > negotiation possible (i.e. foo.jpg, blah.zip, yeehaw.mp3, whatever).
> 
> I need a request for an ambiguous url, say "index", to negotiate between
> index.html and index.html.gz. That way I can publish a single url and
> use a single link on other pages and get a compressed file transferred
> if the client can handle it and uncompressed otherwise. I believe the
> AddHandler code would only come into play if an explicit request for
> index.html.gz was received. If I am wrong please let me know, as I am
> completely lost.

Yes. The issues is people who have index.html and index.html.gz. At the
moment a request for index.html will always select index.html, without
negotiating between index.html and index.html.gz.

Paul



Re: Compression via content negotiation

Posted by Paul Sutton <pa...@c2.net>.
On Tue, 1 Dec 1998, Paul Ausbeck wrote:
> Dean Gaudet wrote:
> > What you want is already there.  Turn on multiviews, and request "index"
> > and you'll get negotiation between "index.html" and "index.html.gz".
> > (Whether it does what you want is another question entirely... it
> > certainly works for languages.)  My suggestion was aimed at people who
> > want to use "index.html" to refer to the object.  Apache doesn't negotiate
> > if the named file exists... and I really don't think it should -- it's a
> > performance overhead that's not required for the many files that aren't
> > negotiated.
> 
> It certainly works for languages, but not for compression. The details
> are in PR 3447.

Yes. Language, charset and content-type dimensions was "q" values to
express the relative acceptability of each attribute. Encoding when using
Accept-Encoding does not (which is the root problem expressed in PR 3447).
Encoding when using TE: will use q values.

Paul
--
Paul Sutton, C2Net Europe                    http://www.eu.c2.net/~paul/
Editor, Apache Week .. the latest Apache news http://www.apacheweek.com/


Re: Compression via content negotiation

Posted by Paul Ausbeck <pa...@alumni.cse.ucsc.edu>.
Dean Gaudet wrote:

> What you want is already there.  Turn on multiviews, and request "index"
> and you'll get negotiation between "index.html" and "index.html.gz".
> (Whether it does what you want is another question entirely... it
> certainly works for languages.)  My suggestion was aimed at people who
> want to use "index.html" to refer to the object.  Apache doesn't negotiate
> if the named file exists... and I really don't think it should -- it's a
> performance overhead that's not required for the many files that aren't
> negotiated.
> 

It certainly works for languages, but not for compression. The details
are in PR 3447.

Paul


Re: Compression via content negotiation

Posted by Dean Gaudet <dg...@arctic.org>.

On Tue, 1 Dec 1998, Paul Ausbeck wrote:

> On Tue, 1 Dec 1998, Dean Gaudet Wrote:
> 
> > Why is this a problem?  Didn't we just add the "default-handler" or
> > something?  Just name the file foo.html.def rather than foo.html, and add
> > a "AddHandler default .def" (check the code I may have this wrong).
> > There's no need to add more code.  This way you also get the advantage of
> > having some semblance of speed for those files for which there is zero
> > negotiation possible (i.e. foo.jpg, blah.zip, yeehaw.mp3, whatever).
> > 
> 
> I need a request for an ambiguous url, say "index", to negotiate between
> index.html and index.html.gz. That way I can publish a single url and
> use a single link on other pages and get a compressed file transferred
> if the client can handle it and uncompressed otherwise. I believe the
> AddHandler code would only come into play if an explicit request for
> index.html.gz was received. If I am wrong please let me know, as I am
> completely lost.

What you want is already there.  Turn on multiviews, and request "index" 
and you'll get negotiation between "index.html" and "index.html.gz". 
(Whether it does what you want is another question entirely... it
certainly works for languages.)  My suggestion was aimed at people who
want to use "index.html" to refer to the object.  Apache doesn't negotiate
if the named file exists... and I really don't think it should -- it's a
performance overhead that's not required for the many files that aren't
negotiated.

Regarding the compression issue, if I have time I'll try to find the
summary that was posted here in the summer... or you can dig through the
archives mentioned at dev.apache.org.  I want to stress that I forget all
the exact details, but there were some salient points that I don't want to
be overlooked.

Dean



Re: Compression via content negotiation

Posted by Paul Ausbeck <pa...@alumni.cse.ucsc.edu>.
On Tue, 1 Dec 1998, Dean Gaudet Wrote:

> Why is this a problem?  Didn't we just add the "default-handler" or
> something?  Just name the file foo.html.def rather than foo.html, and add
> a "AddHandler default .def" (check the code I may have this wrong).
> There's no need to add more code.  This way you also get the advantage of
> having some semblance of speed for those files for which there is zero
> negotiation possible (i.e. foo.jpg, blah.zip, yeehaw.mp3, whatever).
> 

I need a request for an ambiguous url, say "index", to negotiate between
index.html and index.html.gz. That way I can publish a single url and
use a single link on other pages and get a compressed file transferred
if the client can handle it and uncompressed otherwise. I believe the
AddHandler code would only come into play if an explicit request for
index.html.gz was received. If I am wrong please let me know, as I am
completely lost.


> I didn't see either of you mention TE.  And I haven't seen Roy pipe up
> yet... last time this came up on the mozilla group the only conclusion
> that I saw which was obvious is that with just RFC2049 you cannot do
> transparent compression.  You need the draft update to HTTP/1.1 and the TE
> header.  Accept-Encoding is just broken when you consider bugs in existing
> clients.  Or something along those lines.  (And I'm also about 8 months
> out of date on this, so I could be wrong.)
> 
> Dean

I don't know what TE is. I have examined several of the commercially
important browsers and I do not believe the the proposed change to
mod_negotiation would break anything. All versions of IE 4.0 will handle
both gzip and deflate compression and indicate as such with
Accept-Encoding headers. I'm not sure, but I don't think that IE3 or
lower add any Accept-Encoding headers. I have tested Navigator 4.04 and
it doesn't add headers or handle compression and I am pretty sure that
older versions do not add any headers either. I have tested Navigator
v4.5 and it both adds headers and handles compression. I have heard from
Adam Costello that Navigator 4.07 also has compression, so it was added
somewhere around there.

Both major browsers have had compression for quite some time and also do
the right thing with Accept-Encoding headers. It seems to me that it is
very important to support this in apache.

I'm not sure how rfc2049 comes into play on this. I believe that the
proposed changes are compliant with rfc2068. I have also sent Paul
Sutton another suggested hack to handle vary headers properly (at least
for Accept-Encoding).

Paul Ausbeck


Re: Compression via content negotiation

Posted by Dean Gaudet <dg...@arctic.org>.

On Tue, 1 Dec 1998, Paul Sutton wrote:

> > 4) It may be better to handle compression via a separate compression
> > specific mechanism similar to Microsoft's Internet Information Server or
> > the apache compression project at mozilla.org. In a compression specific
> > scheme, compression can be applied to a non-ambiguous url. For example,
> > a request for foo.html returns the equivalent of foo.html.gz. In the
> > proposal of PR3447, given a request for foo.html, if foo.html exists it
> > would be returned even if foo.html.gz also existed. Only a request for
> > foo would get foo.html.gz. Of course, this could be changed...
> 
> Yes, this is a problem with language negotiation also. There should be an
> option to force negotiation even if the directly referenced filename
> exists. However this itself may cause problems, for example, if the user
> really did want that variant rather than server-based negotiation.

Why is this a problem?  Didn't we just add the "default-handler" or
something?  Just name the file foo.html.def rather than foo.html, and add
a "AddHandler default .def" (check the code I may have this wrong). 
There's no need to add more code.  This way you also get the advantage of
having some semblance of speed for those files for which there is zero
negotiation possible (i.e. foo.jpg, blah.zip, yeehaw.mp3, whatever).

I didn't see either of you mention TE.  And I haven't seen Roy pipe up
yet... last time this came up on the mozilla group the only conclusion
that I saw which was obvious is that with just RFC2049 you cannot do
transparent compression.  You need the draft update to HTTP/1.1 and the TE
header.  Accept-Encoding is just broken when you consider bugs in existing
clients.  Or something along those lines.  (And I'm also about 8 months
out of date on this, so I could be wrong.) 

Dean



Re: Compression via content negotiation

Posted by Paul Sutton <pa...@c2.net>.
On Mon, 30 Nov 1998, Paul Ausbeck wrote:
> PR 3447 describes a shortcoming in how apache handles content
> negotiation and compression. What's the chance of getting the patch
> suggested there or something like it into 1.3.4?

I've already proposed a fix for the bug in PR 3447.

> I have looked at some of the issues involved and I think that the four
> most important outstanding issues are:
> 
> 1) Apache currently does not place Vary response-headers on content
> negotiated responses. This could mess up some installations where agents
> that handle compression and agents that do not both use a common proxy
> cache.

This is a surprise. Apache is designed to add valid Vary headers onto all
content-negotiated headers. There is abug when the negotiation happens
between a resource with a particular attribute and one without any value
for that attribute, and I've proposed a fix. For all other cases (i.e.
negotiating between variants with different values for a particular
dimension of negotiation) then a valid Vary header should be added. Can
you give an example of a case where it does not?

> 2) It may be useful to add some compression specific configuration
> options.

I prefer the generic dirctives we have now to add a mapping for any
arbitrary compression method.

> 3) A better negotiation algorithm than proposed in PR 3447 may exist.

Negotiation aglorithms are very complex to get right. The current one
implemented in mod_negotiation is quite simplistic, and basically looks at
one dimension of negotiation at a time. One aim I had when I rewrote
mod_negotiation was to allow for different negotiation algorithms. It
would not be different to add new ones: for example, one based on the RVSP
"network algorithm" would make sense in some situations (it basically
multiplies the q factors for all dimensions to get an overall q for a
variant, thus taking equal account of language, content-type, charset and
encoding). But even that is not valid for all situations.

> 4) It may be better to handle compression via a separate compression
> specific mechanism similar to Microsoft's Internet Information Server or
> the apache compression project at mozilla.org. In a compression specific
> scheme, compression can be applied to a non-ambiguous url. For example,
> a request for foo.html returns the equivalent of foo.html.gz. In the
> proposal of PR3447, given a request for foo.html, if foo.html exists it
> would be returned even if foo.html.gz also existed. Only a request for
> foo would get foo.html.gz. Of course, this could be changed...

Yes, this is a problem with language negotiation also. There should be an
option to force negotiation even if the directly referenced filename
exists. However this itself may cause problems, for example, if the user
really did want that variant rather than server-based negotiation.

Paul
--
Paul Sutton, C2Net Europe                    http://www.eu.c2.net/~paul/
Editor, Apache Week .. the latest Apache news http://www.apacheweek.com/


Compression via content negotiation

Posted by Paul Ausbeck <pa...@alumni.cse.ucsc.edu>.
PR 3447 describes a shortcoming in how apache handles content
negotiation and compression. What's the chance of getting the patch
suggested there or something like it into 1.3.4?

This issue is becoming very important in that transparent decompression
has been available in both of the commercially important browsers for 6+
months.

I have looked at some of the issues involved and I think that the four
most important outstanding issues are:

1) Apache currently does not place Vary response-headers on content
negotiated responses. This could mess up some installations where agents
that handle compression and agents that do not both use a common proxy
cache.

2) It may be useful to add some compression specific configuration
options.

3) A better negotiation algorithm than proposed in PR 3447 may exist.

4) It may be better to handle compression via a separate compression
specific mechanism similar to Microsoft's Internet Information Server or
the apache compression project at mozilla.org. In a compression specific
scheme, compression can be applied to a non-ambiguous url. For example,
a request for foo.html returns the equivalent of foo.html.gz. In the
proposal of PR3447, given a request for foo.html, if foo.html exists it
would be returned even if foo.html.gz also existed. Only a request for
foo would get foo.html.gz. Of course, this could be changed...


My feeling is that a simple change to mod_negotiation would be fairly
safe, benefit far more people than not, and wouldn't complicate any
future compression efforts. Unless I'm missing something, it seems
better to get something out there sooner rather than later.

Paul Ausbeck