You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Florent Guillaume <gu...@clipper.ens.fr> on 1995/07/17 02:33:39 UTC

Patches to handle content-language

What follows is a patch for Apache 0.8.0 (and Shambhala) that enables
consistent handling of content-language with MultiViews.

(The new behaviour is much closer to what you can have using CERN's httpd.)

Previously, if you wanted to handle files in several languages, you
were obliged to have a .var file for each, because mod_mime.c didn't
know what Content-Language was and so didn't type on language.

I added a per-directory directive AddLanguage which is very similar to
AddEncoding : it takes a language and a suffix. For example my srm.conf has

AddLanguage fr .fr
AddLanguage en .en
AddLanguage de .de

Mod_mime.c now recognizes filenames of the form
basename.type.lang.encoding, for example chapter1.html.fr.gz is
correctly typed as text/html with language=fr and encoding=x-gzip.  The
Content-Language is stored in a new field in request_rec, exactly like
the Content-Encoding.

Also : when in MultiViews, if you request somefile.html and both
somefile.html.fr and somefile.html.en are available with the same
quality setting, the previous behaviour was to serve whichever was
smallest in size.  This made it impossible to have the server give by
default pages in French if the client didn't send an Accept-Language:
header.  I changed this behaviour to server the pages with the priority
given in the config file (the first AddLanguage has highest priority).
I don't think it has any impact on existing applications.

I also fixed a bug in find_lang_index in which a NULL string could be
strncmp'ed.

Regards,

	Florent Guillaume


*** ../shambhala.orig/httpd.h	Wed Jul 12 19:44:53 1995
--- httpd.h	Sun Jul 16 21:12:14 1995
***************
*** 274,279 ****
--- 274,280 ----
  
    char *content_type;		/* Break these out --- we dispatch on 'em */
    char *content_encoding;
+   char *content_language;
    
    int no_cache;
    
*** ../shambhala.orig/http_config.h	Mon Jun 26 00:42:14 1995
--- http_config.h	Sun Jul 16 21:12:57 1995
***************
*** 176,182 ****
       *                  (as a SERVER_ERROR, since the module which was
       *                  supposed to handle this was configured wrong).
       * type_checker --- Determine MIME type of the requested entity;
!      *                  sets content_type and _encoding fields.
       * logger --- log a transaction.  Not supported yet out of sheer
       *            laziness on my part.
       */
--- 176,182 ----
       *                  (as a SERVER_ERROR, since the module which was
       *                  supposed to handle this was configured wrong).
       * type_checker --- Determine MIME type of the requested entity;
!      *                  sets content_type, _encoding and _language fields.
       * logger --- log a transaction.  Not supported yet out of sheer
       *            laziness on my part.
       */
*** ../shambhala.orig/http_protocol.c	Thu Jul 13 02:28:05 1995
--- http_protocol.c	Sun Jul 16 22:44:53 1995
***************
*** 499,504 ****
--- 499,507 ----
      if (r->content_encoding)
          fprintf (fd, "Content-encoding: %s\015\012", r->content_encoding);
      
+     if (r->content_language)
+         fprintf (fd, "Content-language: %s\015\012", r->content_language);
+     
      for (i = 0; i < hdrs_arr->nelts; ++i) {
          if (!hdrs[i].key) continue;
  	fprintf (fd, "%s: %s\015\012", hdrs[i].key, hdrs[i].val);
*** ../shambhala.orig/mod_negotiation.c	Sat Jul  1 19:46:05 1995
--- mod_negotiation.c	Mon Jul 17 01:32:47 1995
***************
*** 132,138 ****
      char *type_name;
      char *file_name;
      char *content_encoding;
!     char *lang;
      float level;		/* Auxiliary to content-type... */
      float qs;
      float bytes;
--- 132,138 ----
      char *type_name;
      char *file_name;
      char *content_encoding;
!     char *content_language;
      float level;		/* Auxiliary to content-type... */
      float qs;
      float bytes;
***************
*** 172,178 ****
      mime_info->type_name = "";
      mime_info->file_name = "";
      mime_info->content_encoding = "";
!     mime_info->lang = "";
  
      mime_info->is_pseudo_html = 0.0;
      mime_info->level = 0.0;
--- 172,178 ----
      mime_info->type_name = "";
      mime_info->file_name = "";
      mime_info->content_encoding = "";
!     mime_info->content_language = "";
  
      mime_info->is_pseudo_html = 0.0;
      mime_info->level = 0.0;
***************
*** 560,567 ****
  		mime_info.bytes = atoi(body);
  	    }
  	    else if (!strncmp (buffer, "content-language:", 17)) {
! 		mime_info.lang = get_token (neg->pool, &body, 0);
! 		str_tolower (mime_info.lang);
  	    }
  	    else if (!strncmp (buffer, "content-encoding:", 17)) {
  		mime_info.content_encoding = get_token (neg->pool, &body, 0);
--- 560,567 ----
  		mime_info.bytes = atoi(body);
  	    }
  	    else if (!strncmp (buffer, "content-language:", 17)) {
! 		mime_info.content_language = get_token (neg->pool, &body, 0);
! 		str_tolower (mime_info.content_language);
  	    }
  	    else if (!strncmp (buffer, "content-encoding:", 17)) {
  		mime_info.content_encoding = get_token (neg->pool, &body, 0);
***************
*** 589,597 ****
  int read_types_multi (negotiation_state *neg)
  {
      request_rec *r = neg->r;
-     char *file_name = pstrdup (r->pool, r->filename);
      
!     char *filp = &file_name[strlen(file_name) - 1];
      int prefix_len;
      DIR *dirp;
      struct DIR_TYPE *dir_entry;
--- 589,596 ----
  int read_types_multi (negotiation_state *neg)
  {
      request_rec *r = neg->r;
      
!     char *filp;
      int prefix_len;
      DIR *dirp;
      struct DIR_TYPE *dir_entry;
***************
*** 648,653 ****
--- 647,653 ----
  	mime_info.sub_req = sub_req;
  	mime_info.file_name = dir_entry->d_name;
  	mime_info.content_encoding = sub_req->content_encoding;
+ 	mime_info.content_language = sub_req->content_language;
  	
  	get_entry (neg->pool, &accept_info, sub_req->content_type);
  	set_mime_fields (&mime_info, &accept_info);
***************
*** 759,767 ****
  
  int find_lang_index (array_header *accept_langs, char *lang)
  {
!     accept_rec *accs = (accept_rec *)accept_langs->elts;
      int i;
  
      for (i = 0; i < accept_langs->nelts; ++i)
  	if (!strncmp (lang, accs[i].type_name, strlen(accs[i].type_name)))
  	    return i;
--- 759,772 ----
  
  int find_lang_index (array_header *accept_langs, char *lang)
  {
!     accept_rec *accs;
      int i;
  
+     if (!lang)
+ 	return -1;
+ 
+     accs = (accept_rec *)accept_langs->elts;
+ 
      for (i = 0; i < accept_langs->nelts; ++i)
  	if (!strncmp (lang, accs[i].type_name, strlen(accs[i].type_name)))
  	    return i;
***************
*** 777,793 ****
  
      if (neg->accept_langs->nelts == 0) {
  	
! 	/* Client doesn't care */
  
  	for (i = 0; i < neg->avail_vars->nelts; ++i)
! 	    var_recs[i].lang_index = -1;
  
  	return;
      }
      
      for (i = 0; i < neg->avail_vars->nelts; ++i)
  	if (var_recs[i].quality > 0) {
! 	    int index = find_lang_index (neg->accept_langs, var_recs[i].lang);
  
  	    var_recs[i].lang_index = index;
  	    if (index >= 0) found_any = 1;
--- 782,802 ----
  
      if (neg->accept_langs->nelts == 0) {
  	
! 	/* Client doesn't care : use order of config file */
! 
! 	extern int mime_get_lang_index (request_rec *r, char *lang);
  
  	for (i = 0; i < neg->avail_vars->nelts; ++i)
! 	    var_recs[i].lang_index =
! 		mime_get_lang_index (neg->r, var_recs[i].content_language);
  
  	return;
      }
      
      for (i = 0; i < neg->avail_vars->nelts; ++i)
  	if (var_recs[i].quality > 0) {
! 	    int index = find_lang_index (neg->accept_langs,
! 					 var_recs[i].content_language);
  
  	    var_recs[i].lang_index = index;
  	    if (index >= 0) found_any = 1;
***************
*** 1031,1036 ****
--- 1040,1046 ----
      r->filename = sub_req->filename;
      r->content_type = sub_req->content_type;
      r->content_encoding = sub_req->content_encoding;
+     r->content_language = sub_req->content_language;
      r->finfo = sub_req->finfo;
      
      return OK;
*** ../shambhala.orig/mod_mime.c	Fri Jun 30 13:54:26 1995
--- mod_mime.c	Mon Jul 17 01:32:25 1995
***************
*** 69,74 ****
--- 69,75 ----
  typedef struct {
      table *forced_types;	/* Additional AddTyped stuff */
      table *encoding_types;	/* Added with AddEncoding... */
+     table *language_types;	/* Added with AddLanguage... */
  } mime_dir_config;
  
  module mime_module;
***************
*** 80,85 ****
--- 81,87 ----
  
      new->forced_types = make_table (p, 4);
      new->encoding_types = make_table (p, 4);
+     new->language_types = make_table (p, 4);
      
      return new;
  }
***************
*** 95,100 ****
--- 97,104 ----
  					base->forced_types);
      new->encoding_types = overlay_tables (p, add->encoding_types,
  					  base->encoding_types);
+     new->language_types = overlay_tables (p, add->language_types,
+ 					  base->language_types);
  
      return new;
  }
***************
*** 113,118 ****
--- 117,157 ----
      return NULL;
  }
  
+ char *add_language(cmd_parms *cmd, mime_dir_config *m, char *lang, char *ext)
+ {
+     if (*ext == '.') ++ext;
+     table_set (m->language_types, ext, lang);
+     return NULL;
+ }
+ 
+ 
+ /* This function is called by the negotiation module to know the index
+  * of a given language in the config files.
+  */
+ 
+ int mime_get_lang_index (request_rec *r, char *lang)
+ {
+     mime_dir_config *conf;
+     int nelts;
+     table_entry *elts;
+     int i;
+ 
+     if (!lang)
+ 	return -1;
+ 
+     conf = (mime_dir_config *)get_module_config(r->per_dir_config, &mime_module);
+     nelts = conf->language_types->nelts;
+     elts = (table_entry *) conf->language_types->elts;
+ 
+     for (i = 0; i < nelts; ++i)
+         if (!strcasecmp (elts[i].val, lang))
+ 	    return i;
+ 
+     return -1;
+ }
+ 
+ 
+ 
  /* The sole bit of server configuration that the MIME module has is
   * the name of its config file, so...
   */
***************
*** 129,134 ****
--- 168,175 ----
      "a mime type followed by a file extension" },
  { "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, 
      "an encoding (e.g., gzip), followed by a file extension" },
+ { "AddLanguage", add_language, NULL, OR_FILEINFO, TAKE2, 
+     "a language (e.g., fr), followed by a file extension" },
  { "TypesConfig", set_types_config, NULL, RSRC_CONF, TAKE1,
      "the MIME types config file" },
  { NULL }
***************
*** 198,203 ****
--- 239,255 ----
      if ((type = table_get (conf->encoding_types, &fn[i])))
      {
          r->content_encoding = type;
+ 
+ 	/* go back to previous extension to try to use it as a language */
+ 	
+         fn[i-1] = '\0';
+ 	if((i=rind(fn,'.')) < 0) return OK;
+ 	++i;
+     }
+ 
+     if ((type = table_get (conf->language_types, &fn[i])))
+     {
+         r->content_language = type;
  
  	/* go back to previous extension to try to use it as a type */
  	

-- 
Florent.Guillaume@ens.fr

Re: Patches to handle content-language

Posted by Brian Behlendorf <br...@organic.com>.
On Mon, 17 Jul 1995, Florent Guillaume wrote:
> What follows is a patch for Apache 0.8.0 (and Shambhala) that enables
> consistent handling of content-language with MultiViews.
> 
> (The new behaviour is much closer to what you can have using CERN's httpd.)
> 
> Previously, if you wanted to handle files in several languages, you
> were obliged to have a .var file for each, because mod_mime.c didn't
> know what Content-Language was and so didn't type on language.
> 
> I added a per-directory directive AddLanguage which is very similar to
> AddEncoding : it takes a language and a suffix. For example my srm.conf has
> 
> AddLanguage fr .fr
> AddLanguage en .en
> AddLanguage de .de
> 
> Mod_mime.c now recognizes filenames of the form
> basename.type.lang.encoding, for example chapter1.html.fr.gz is
> correctly typed as text/html with language=fr and encoding=x-gzip.  The
> Content-Language is stored in a new field in request_rec, exactly like
> the Content-Encoding.

I really like this, but what resolves name collisions and missing 
info between type, lang, and encoding?  For example, if I decide to name 
all my Framemaker documents .fr, what happens to document.fr? 
document.fr.en?  document.fr.fr?  If type, lang, and encoding shared the 
same namespace, *no* problem.  In this case, we're using filename 
extensions to indicate meta-information other than content-type, which 
I'm certainly comfortable with, but the collision issue should be 
resolved somehow.

Also, it would be tremendous if I could have the flexibility to negotiate 
on file type and language and encoding by specifying only the meta-info I 
want in the filename - in other words, lets say I have documents in all 
the possible variations of

basename.[html,txt,pdf].[en,fr,jp].[gz,Z,uu]

Right now with content-negotiation, if I have an index.html and an 
index.html3, then I can simply point a resource locator to "index" and 
negotiation happens, but I can also defeat negotiation by explicitly 
linking to "index.html3" if I wanted to make sure someone got the 3.0 
version.  

Let's say for the above 9 versions of the document I wanted to 
be able to specify which variables are mandatory.  If I didn't care at 
all which document was fetched, I'd create a link to "basename".  If I 
wanted specifically the gzip'd french PDF, I'd make a link to 
"basename.pdf.fr.gzip".  Now, let's say I want to make a link to all 
french variants explicit, yet let the client/server negotiate on their 
own as to encoding and content-type preferences.  I'd like to then link 
to "basename.fr".  Or, I specifically want the uuencoded PDF's, but I 
don't care what language: "basename.pdf.gz".  

Thoughts?  If we ensure there's no namespace collisions between mime 
type extensions and filename extensions and encoding extension then this 
is easy.  If not....

	Brian


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  brian@hyperreal.com  http://www.[hyperreal,organic].com/