You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Glenn <gs...@gluelogic.com> on 2003/05/09 10:11:06 UTC

server-side includes "virtual" and "exec" questions/patches

1) Why does includes "virtual" sometimes fail with
     "unable to include potential exec \"%s\" in parsed file %s"
   when Options IncludesNoEXEC is used?  Why is this check performed?
   What is the reasoning behind it?

--- httpd-2.0.45/modules/filters/mod_include.c  2003-02-03 12:31:38.000000000 -0500
+++ httpd-2.0.45/modules/filters/mod_include.c.new      2003-05-07 15:38:58.000000000 -0400
@@ -1274,12 +1274,6 @@
                     error_fmt = "unable to include \"%s\" in parsed file %s";
                 }
                                                                                
-                if (!error_fmt && (ctx->flags & FLAG_NO_EXEC) &&
-                    rr->content_type &&
-                    (strncmp(rr->content_type, "text/", 5))) {
-                    error_fmt = "unable to include potential exec \"%s\" "
-                        "in parsed file %s";
-                }
                 if (error_fmt == NULL) {
                     /* try to avoid recursive includes.  We do this by walking
                      * up the r->main list of subrequests, and at each level


The documentation for IncludeNoEXEC (with the Options directive) states:
  IncludesNOEXEC
    Server-side includes are permitted, but the #exec command and
    #exec CGI are disabled. It is still possible to #include virtual
    CGI scripts from ScriptAliase'd directories.

When Options IncludesNoEXEC is used, the code for include virtual="..."
currently checks the initial content-type of the target for for text/*
(before running the subrequest) and therefore disallows, for example,
  <!--#include virtual="/cgi-bin/foo.php"-->
since the initial content-type can be application/x-httpd-php.  This
fails even when /cgi-bin/ is marked Options +ExecCGI.  (A workaround
may be to use AddHandler to set it to "cgi-script" and to use AddType
to set the type to text/html.)

What is the reason that the #include virtual="..." (and "file") are
peeking into the subrequest?  Shouldn't they just run the subrequest
and include the results.  After all, requesting the URI directly would
produce identical results, wouldn't it?  Maybe only "#include file"
should check for content-type text/*, but not "#include virtual".

Can this be backported to 2.0 and 1.3?


2) Should <!--#exec cgi="..."--> be officially deprecated?
   The following two statements are almost semantically equivalent AFAICT:
     <!--#exec cgi="/path/to/cgi"-->
     <!--#include virtual="/path/to/cgi${PATH_INFO}?${QUERY_STRING}"-->
   and #include virtual is more flexible, and allowed with IncludesNoEXEC.
   Besides, I really do not like the idea of the silent inheritance of
   path info and query string by #exec cgi because it is an inheritance of
   the path info and query string of the base document, which gets really
   messy if you have nested includes and have an #include virtual in the
   mix which uses its own path info and query string.

   Is there any reason "#exec cgi" should remain in Apache 2.1?  If people
   support removing it, I'll whip up a quick patch to mod_cgi/mod_cgid.
   Backwards compatibility is one thing, but this is cruft.  Current
   documentation says:
     "The include virtual element should be used in preference to exec cgi"
   but does not go so far as to deprecate it, indicating that it might be
   removed in the future.

   In addition to path info and query string inheritance, the only
   difference between #exec cgi and #include virtual is that if the target
   CGI returns a redirect, #exec cgi will turn it into a hyperlink.  I'm
   not sure how often this "feature" is used, but the simple workaround is
   to have the CGI script return the hyperlink as the body of its output:
     print "Content-type: text/plain\n\n",
           "<a href="link">link</a>\n";


3) Apache2 mod_include performs lazy evaluation of a few environment
   variables: DATE_GMT, DATE_LOCAL, LAST_MODIFIED, and USER_NAME
   These variables end up being empty strings ("") in a CGI environment
   unless they are used in the include document prior to the virtual or
   exec include, e.g. the no-op
     <!--#if expr="$DATE_LOCAL"--><!--#endif-->
   Can this be documented somewhere?  What pages should be updated?
   mod_include documentation, the SSI tutorial, others?


4) Minor nit.  Should USER_NAME be inherited from the base document, just
   as the LAST_MODIFIED time is done?  If USER_NAME was generated from the
   lazy value before a sub-include, then it is inherited.  If it was not
   generated before a sub-include, then it will be set with the value of
   the document when it is actually used, leading to inconsistencies.

--- httpd-2.0.45/modules/filters/mod_include.c  2003-02-03 12:31:38.000000000 -0500
+++ httpd-2.0.45/modules/filters/mod_include.c.new      2003-05-09 04:05:26.000000000 -0400
@@ -3386,10 +3386,12 @@
          * torquing our own last_modified date as well so that the
          * LAST_MODIFIED variable gets reset to the proper value if the
          * nested document resets <!--#config timefmt -->.
+         * Also torque info used to generate USER_NAME.
          */
         r->subprocess_env = r->main->subprocess_env;
         apr_pool_join(r->main->pool, r->pool);
         r->finfo.mtime = r->main->finfo.mtime;
+        r->finfo.user  = r->main->finfo.user;
     }
     else {
         /* we're not a nested include, so we create an initial



5) Can mod_include please export get_include_var() as an optional function?
   How about as ap_ssi_get_include_var()?

--- httpd-2.0.45/modules/filters/mod_include.c  2003-02-03 12:31:38.000000000 -0500
+++ httpd-2.0.45/modules/filters/mod_include.c.new      2003-05-07 15:46:05.000000000 -0400
@@ -206,8 +206,8 @@
     return val;
 }
  
-static const char *get_include_var(request_rec *r, include_ctx_t *ctx,
-                                   const char *var)
+static const char *ap_ssi_get_include_var(request_rec *r, include_ctx_t *ctx,
+                                          const char *var)
 {
     const char *val;
     if (apr_isdigit(*var) && !var[1]) {
@@ -1099,7 +1099,7 @@
                 if (l != 0) {
                     tmp_store        = *end_of_var_name;
                     *end_of_var_name = '\0';
-                    val = get_include_var(r, ctx, start_of_var_name);
+                    val = ap_ssi_get_include_var(r, ctx, start_of_var_name);
                     *end_of_var_name = tmp_store;
  
                     if (val) {
@@ -1387,9 +1387,9 @@
             if (!strcmp(tag, "var")) {
                 conn_rec *c = r->connection;
                 const char *val =
-                    get_include_var(r, ctx,
-                                    ap_ssi_parse_string(r, ctx, tag_val, NULL,
-                                                        MAX_STRING_LEN, 0));
+                    ap_ssi_get_include_var(
+                      r, ctx, ap_ssi_parse_string(r, ctx, tag_val, NULL
+                                                  MAX_STRING_LEN, 0));
                 if (val) {
                     switch(encode) {
                     case E_NONE:
@@ -3569,6 +3569,7 @@
  
 static void register_hooks(apr_pool_t *p)
 {
+    APR_REGISTER_OPTIONAL_FN(ap_ssi_get_include_var);
     APR_REGISTER_OPTIONAL_FN(ap_ssi_get_tag_and_value);
     APR_REGISTER_OPTIONAL_FN(ap_ssi_parse_string);
     APR_REGISTER_OPTIONAL_FN(ap_register_include_handler);


Thanks!
Glenn

Re: server-side includes "virtual" and "exec" questions/patches

Posted by Glenn <gs...@gluelogic.com>.
On Sat, Sep 06, 2003 at 12:15:44AM +0200, Andr? Malo wrote:
> * Glenn wrote:
> 
> > 1) Why does includes "virtual" sometimes fail with
> >      "unable to include potential exec \"%s\" in parsed file %s"
> >    when Options IncludesNoEXEC is used?  Why is this check performed?
> >    What is the reasoning behind it?
> 
> Not to execute anything, as the option name may imply ;-)
> Anything may be a command, a CGI script etc.

I meant to emphasize that I don't think that a "#include virtual" should
be peeking down into the subrequest.  It is a bona fide (sub)request and
any "#include virtual" could also be requested directly; it is visible in
the url-space.  But "include"ing it is broken if the file is, for example,
/cgi-bin/foo.php.

> > -                if (!error_fmt && (ctx->flags & FLAG_NO_EXEC) &&
> > -                    rr->content_type &&
> > -                    (strncmp(rr->content_type, "text/", 5))) {
> > -                    error_fmt = "unable to include potential exec \"%s\" "
> > -                        "in parsed file %s";
> > -                }
> 
> Hmm. Removing an insufficient check doesn't look reasonable to me. We should
> rather improve it, shouldn't we?

Doesn't the subrequest already takes care of the permission issue?
If a direct request to /cgi-bin/foo.php will work, why shouldn't
"#include virtual"?  The returned content should be included as-is
in all cases.

> > 5) Can mod_include please export get_include_var() as an optional function?
> >    How about as ap_ssi_get_include_var()?  This is needed by modules to
> >    access the mod_include lazily-evaluated variables.
> 
> Why isn't ap_ssi_parse_string sufficient?

I am working on a custom module to handle <!--#exec cmd="..."-->
and wish to do a reasonably proper (but limited) word expansion,
a la shell command language:
  http://www.opengroup.org/onlinepubs/007904975/utilities/xcu_chap02.html
For proper field splitting after variable expansion it is necessary to know
whether or not a variable is being expanded within a double-quoted string,
which can only be done before the variable is expanded and the quotes are
removed.

If get_include_var() is exported as an optional function, I do not need
to copy it into my own code with an extra hack to call ap_ssi_parse_string()
on a dummy string whenever I need to expand one of those lazily-evaluated
variables.


Since mod_include maintains some extra env variables, it makes sense to
me to create an interface for other modules to call back and retrieve
needed values.  Extension modules to mod_include might need access to
the variables in situations other than string expansion.

Thanks!
Glenn

Re: server-side includes "virtual" and "exec" questions/patches

Posted by André Malo <nd...@perlig.de>.
* Glenn wrote:

> 1) Why does includes "virtual" sometimes fail with
>      "unable to include potential exec \"%s\" in parsed file %s"
>    when Options IncludesNoEXEC is used?  Why is this check performed?
>    What is the reasoning behind it?

Not to execute anything, as the option name may imply ;-)
Anything may be a command, a CGI script etc.

> -                if (!error_fmt && (ctx->flags & FLAG_NO_EXEC) &&
> -                    rr->content_type &&
> -                    (strncmp(rr->content_type, "text/", 5))) {
> -                    error_fmt = "unable to include potential exec \"%s\" "
> -                        "in parsed file %s";
> -                }

Hmm. Removing an insufficient check doesn't look reasonable to me. We should
rather improve it, shouldn't we?

> The documentation for IncludeNoEXEC (with the Options directive) states:
>   IncludesNOEXEC
>     Server-side includes are permitted, but the #exec command and
>     #exec CGI are disabled. It is still possible to #include virtual
>     CGI scripts from ScriptAliase'd directories.

Yes. The docs describe the *real* behaviour of the code, not the *desired*
behaviour (Fixed a while ago).

> Can this be backported to 2.0 and 1.3?

No way. It would not be backwards compatible, besides the (so called)
"freezed" status.

> 2) Should <!--#exec cgi="..."--> be officially deprecated?

+1 from here. Other opinions?

>    Is there any reason "#exec cgi" should remain in Apache 2.1?  If people
>    support removing it, I'll whip up a quick patch to mod_cgi/mod_cgid.
>    Backwards compatibility is one thing, but this is cruft.  Current
>    documentation says:
>      "The include virtual element should be used in preference to exec cgi"
>    but does not go so far as to deprecate it, indicating that it might be
>    removed in the future.

I'd deprecate it in 2.1 and remove it in 2.3. This seems more gentle to me.

> 3) Apache2 mod_include performs lazy evaluation of a few environment
>    variables: DATE_GMT, DATE_LOCAL, LAST_MODIFIED, and USER_NAME
>    These variables end up being empty strings ("") in a CGI environment
>    unless they are used in the include document prior to the virtual or
>    exec include, e.g. the noop
>      <!--#if expr="$DATE_LOCAL"--><!--#endif-->
>    Can this be documented somewhere?  What pages should be updated?
>    mod_include documentation, the SSI tutorial, others?

Hrm. In fact this is IMHO a general design flaw of mod_include. We should not
mix include variables with the normal environment at all. If we once decide
to fix this, these variables would completely disappear for external
programs. Opinions?

> 4) Minor nit.  Should USER_NAME be inherited from the base document, just
>    as the LAST_MODIFIED time is done?  If USER_NAME was generated from the
>    lazy value before a sub-include, then it is inherited.  If it was not
>    generated before a sub-include, then it will be set with the value of
>    the document when it is actually used, leading to inconsistencies.

ehm... yes?

> 5) Can mod_include please export get_include_var() as an optional function?
>    How about as ap_ssi_get_include_var()?  This is needed by modules to
>    access the mod_include lazily-evaluated variables.

Why isn't ap_ssi_parse_string sufficient?

nd

Re: server-side includes "virtual" and "exec" questions/patches

Posted by Glenn <gs...@gluelogic.com>.
These mod_include patches have been posted numerous times to this list.
(Last posted in May!)  Patches updated to 2.0.47 and diff -ru.

Can I get some feedback please?  Thank you.
Glenn


1) Why does includes "virtual" sometimes fail with
     "unable to include potential exec \"%s\" in parsed file %s"
   when Options IncludesNoEXEC is used?  Why is this check performed?
   What is the reasoning behind it?


diff -ru httpd-2.0.47/modules/filters/mod_include.c httpd-2.0.47.new/modules/filters/mod_include.c
--- httpd-2.0.47/modules/filters/mod_include.c	2003-02-03 12:31:38.000000000 -0500
+++ httpd-2.0.47.new/modules/filters/mod_include.c	2003-09-04 17:41:51.000000000 -0400
@@ -1274,12 +1274,6 @@
                     error_fmt = "unable to include \"%s\" in parsed file %s";
                 }
 
-                if (!error_fmt && (ctx->flags & FLAG_NO_EXEC) && 
-                    rr->content_type && 
-                    (strncmp(rr->content_type, "text/", 5))) {
-                    error_fmt = "unable to include potential exec \"%s\" "
-                        "in parsed file %s";
-                }
                 if (error_fmt == NULL) {
                     /* try to avoid recursive includes.  We do this by walking
                      * up the r->main list of subrequests, and at each level


The documentation for IncludeNoEXEC (with the Options directive) states:
  IncludesNOEXEC
    Server-side includes are permitted, but the #exec command and
    #exec CGI are disabled. It is still possible to #include virtual
    CGI scripts from ScriptAliase'd directories.

When Options IncludesNoEXEC is used, the code for include virtual="..."
currently checks the initial content-type of the target for for text/*
(before running the subrequest) and therefore disallows, for example,
  <!--#include virtual="/cgi-bin/foo.php"-->
since the initial content-type can be application/x-httpd-php.  This
fails even when /cgi-bin/ is marked Options +ExecCGI.  (A workaround
may be to use AddHandler to set it to "cgi-script" and to use AddType
to set the type to text/html.)

What is the reason that the #include virtual="..." (and "file") are
peeking into the subrequest?  Shouldn't they just run the subrequest
and include the results.  After all, requesting the URI directly would
produce identical results, wouldn't it?  Maybe only "#include file"
should check for content-type text/*, but not "#include virtual".

Can this be backported to 2.0 and 1.3?


2) Should <!--#exec cgi="..."--> be officially deprecated?
   The following two statements are almost semantically equivalent AFAICT:
     <!--#exec cgi="/path/to/cgi"-->
     <!--#include virtual="/path/to/cgi${PATH_INFO}?${QUERY_STRING}"-->
   and #include virtual is more flexible, and allowed with IncludesNoEXEC.
   Besides, I really do not like the idea of the silent inheritance of
   path info and query string by #exec cgi because it is an inheritance of
   the path info and query string of the base document, which gets really
   messy if you have nested includes and have an #include virtual in the
   mix which uses its own path info and query string.

   Is there any reason "#exec cgi" should remain in Apache 2.1?  If people
   support removing it, I'll whip up a quick patch to mod_cgi/mod_cgid.
   Backwards compatibility is one thing, but this is cruft.  Current
   documentation says:
     "The include virtual element should be used in preference to exec cgi"
   but does not go so far as to deprecate it, indicating that it might be
   removed in the future.

   In addition to path info and query string inheritance, the only
   difference between #exec cgi and #include virtual is that if the target
   CGI returns a redirect, #exec cgi will turn it into a hyperlink.  I'm
   not sure how often this "feature" is used, but the simple workaround is
   to have the CGI script return the hyperlink as the body of its output:
     print "Content-type: text/plain\n\n",
           "<a href="link">link</a>\n";


3) Apache2 mod_include performs lazy evaluation of a few environment
   variables: DATE_GMT, DATE_LOCAL, LAST_MODIFIED, and USER_NAME
   These variables end up being empty strings ("") in a CGI environment
   unless they are used in the include document prior to the virtual or
   exec include, e.g. the noop
     <!--#if expr="$DATE_LOCAL"--><!--#endif-->
   Can this be documented somewhere?  What pages should be updated?
   mod_include documentation, the SSI tutorial, others?


4) Minor nit.  Should USER_NAME be inherited from the base document, just
   as the LAST_MODIFIED time is done?  If USER_NAME was generated from the
   lazy value before a sub-include, then it is inherited.  If it was not
   generated before a sub-include, then it will be set with the value of
   the document when it is actually used, leading to inconsistencies.


diff -ru httpd-2.0.47/modules/filters/mod_include.c httpd-2.0.47.new/modules/filters/mod_include.c
--- httpd-2.0.47/modules/filters/mod_include.c	2003-02-03 12:31:38.000000000 -0500
+++ httpd-2.0.47.new/modules/filters/mod_include.c	2003-09-04 17:48:45.000000000 -0400
@@ -3386,10 +3386,12 @@
          * torquing our own last_modified date as well so that the
          * LAST_MODIFIED variable gets reset to the proper value if the
          * nested document resets <!--#config timefmt -->.
+         * Also torque info used to generate USER_NAME.
          */
         r->subprocess_env = r->main->subprocess_env;
         apr_pool_join(r->main->pool, r->pool);
         r->finfo.mtime = r->main->finfo.mtime;
+        r->finfo.user  = r->main->finfo.user;
     }
     else {
         /* we're not a nested include, so we create an initial


5) Can mod_include please export get_include_var() as an optional function?
   How about as ap_ssi_get_include_var()?  This is needed by modules to
   access the mod_include lazily-evaluated variables.


diff -ru httpd-2.0.47/modules/filters/mod_include.c httpd-2.0.47.new/modules/filters/mod_include.c
--- httpd-2.0.47/modules/filters/mod_include.c	2003-02-03 12:31:38.000000000 -0500
+++ httpd-2.0.47.new/modules/filters/mod_include.c	2003-09-04 17:53:51.000000000 -0400
@@ -206,8 +206,8 @@
     return val;
 }
 
-static const char *get_include_var(request_rec *r, include_ctx_t *ctx, 
-                                   const char *var)
+static const char *ap_ssi_get_include_var(request_rec *r, include_ctx_t *ctx, 
+                                          const char *var)
 {
     const char *val;
     if (apr_isdigit(*var) && !var[1]) {
@@ -1099,7 +1099,7 @@
                 if (l != 0) {
                     tmp_store        = *end_of_var_name;
                     *end_of_var_name = '\0';
-                    val = get_include_var(r, ctx, start_of_var_name);
+                    val = ap_ssi_get_include_var(r, ctx, start_of_var_name);
                     *end_of_var_name = tmp_store;
 
                     if (val) {
@@ -1387,9 +1387,9 @@
             if (!strcmp(tag, "var")) {
                 conn_rec *c = r->connection;
                 const char *val =
-                    get_include_var(r, ctx,
-                                    ap_ssi_parse_string(r, ctx, tag_val, NULL,
-                                                        MAX_STRING_LEN, 0));
+                    ap_ssi_get_include_var(
+                      r, ctx, ap_ssi_parse_string(r, ctx, tag_val, NULL,
+                                                  MAX_STRING_LEN, 0));
                 if (val) {
                     switch(encode) {
                     case E_NONE:   
@@ -3569,6 +3569,7 @@
 
 static void register_hooks(apr_pool_t *p)
 {
+    APR_REGISTER_OPTIONAL_FN(ap_ssi_get_include_var);
     APR_REGISTER_OPTIONAL_FN(ap_ssi_get_tag_and_value);
     APR_REGISTER_OPTIONAL_FN(ap_ssi_parse_string);
     APR_REGISTER_OPTIONAL_FN(ap_register_include_handler);