You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by wr...@apache.org on 2003/02/23 23:37:35 UTC

cvs commit: httpd-2.0/modules/arch/win32 mod_win32.c

wrowe       2003/02/23 14:37:35

  Modified:    modules/arch/win32 mod_win32.c
  Log:
    Someone (I need to refer back) asked that we do the same skip for the
    utf-8 win32 prefix when testing for shebang lines.  Here's just such
    a test with some bad sizeof(buffer) v.s. bytes read assumptions fixed,
    and the code made generally a little more legible.  Please review and
    comment to consider this patch for backporting to 2.0.
  
  Revision  Changes    Path
  1.23      +31 -10    httpd-2.0/modules/arch/win32/mod_win32.c
  
  Index: mod_win32.c
  ===================================================================
  RCS file: /home/cvs/httpd-2.0/modules/arch/win32/mod_win32.c,v
  retrieving revision 1.22
  retrieving revision 1.23
  diff -u -r1.22 -r1.23
  --- mod_win32.c	3 Feb 2003 17:52:57 -0000	1.22
  +++ mod_win32.c	23 Feb 2003 22:37:35 -0000	1.23
  @@ -481,30 +481,52 @@
           apr_status_t rv;
           char buffer[1024];
           apr_size_t bytes = sizeof(buffer);
  -        int i;
  +        apr_size_t i;
   
           /* Need to peek into the file figure out what it really is... 
            * ### aught to go back and build a cache for this one of these days.
            */
  -        if (((rv = apr_file_open(&fh, *cmd, APR_READ | APR_BUFFERED,
  -                                 APR_OS_DEFAULT, r->pool)) != APR_SUCCESS) 
  -            || ((rv = apr_file_read(fh, buffer, &bytes)) != APR_SUCCESS)) {
  +        if ((rv = apr_file_open(&fh, *cmd, APR_READ | APR_BUFFERED,
  +                                 APR_OS_DEFAULT, r->pool)) != APR_SUCCESS) {
  +            ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, r,
  +                          "Failed to open cgi file %s for testing", *cmd);
  +            return rv;
  +        }
  +        if ((rv = apr_file_read(fh, buffer, &bytes)) != APR_SUCCESS) {
               ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, r,
                             "Failed to read cgi file %s for testing", *cmd);
               return rv;
           }
           apr_file_close(fh);
   
  +        /* Some twisted character [no pun intended] at MS decided that a
  +         * zero width joiner as the lead wide character would be ideal for
  +         * describing Unicode text files.  This was further convoluted to
  +         * another MSism that the same character mapped into utf-8, EF BB BF
  +         * would signify utf-8 text files.
  +         *
  +         * Since MS configuration files are all protecting utf-8 encoded
  +         * Unicode path, file and resource names, we already have the correct 
  +         * WinNT encoding.  But at least eat the stupid three bytes up front.
  +         *
  +         * ### A more thorough check would also allow UNICODE text in buf, and
  +         * convert it to UTF-8 for invoking unicode scripts.  Those are few
  +         * and far between, so leave that code an enterprising soul with a need.
  +         */
  +        if ((bytes >= 3) && memcmp(buffer, "\xEF\xBB\xBF", 3) == 0) {
  +            memmove(buffer, buffer + 3, bytes -= 3);
  +        }
  +
           /* Script or executable, that is the question... */
  -        if ((buffer[0] == '#') && (buffer[1] == '!')) {
  +        if ((bytes >= 2) && (buffer[0] == '#') && (buffer[1] == '!')) {
               /* Assuming file is a script since it starts with a shebang */
  -            for (i = 2; i < sizeof(buffer); i++) {
  +            for (i = 2; i < bytes; i++) {
                   if ((buffer[i] == '\r') || (buffer[i] == '\n')) {
                       buffer[i] = '\0';
                       break;
                   }
               }
  -            if (i < sizeof(buffer)) {
  +            if (i < bytes) {
                   interpreter = buffer + 2;
                   while (apr_isspace(*interpreter)) {
                       ++interpreter;
  @@ -514,11 +536,10 @@
                   }
               }
           }
  -        else {
  +        else if (bytes >= sizeof(IMAGE_DOS_HEADER)) {
               /* Not a script, is it an executable? */
               IMAGE_DOS_HEADER *hdr = (IMAGE_DOS_HEADER*)buffer;    
  -            if ((bytes >= sizeof(IMAGE_DOS_HEADER))
  -                && (hdr->e_magic == IMAGE_DOS_SIGNATURE)) {
  +            if (hdr->e_magic == IMAGE_DOS_SIGNATURE) {
                   if (hdr->e_lfarlc < 0x40) {
                       /* Ought to invoke this 16 bit exe by a stub, (cmd /c?) */
                       interpreter = "";
  
  
  

Re: cvs commit: httpd-2.0/modules/arch/win32 mod_win32.c

Posted by André Malo <nd...@perlig.de>.
* William A. Rowe, Jr. wrote:

> Agreed.  But I don't see this a radical change, it's actually very simple
> respect for folks authoring I18N scripts.  I respect their effort, and would
> still like to backport this patch to help their efforts along.

Ok, convinced. We should support I18N in any way we know.

nd
-- 
Flhacs wird im Usenet grundsätzlich alsfhc geschrieben. Schreibt man
lafhsc nicht slfach, so ist das schlichtweg hclafs. Hingegen darf man
rihctig ruhig rhitcgi schreiben, weil eine shcalfe Schreibweise bei
irhictg nicht als shflac angesehen wird.       -- Hajo Pflüger in dnq

Re: cvs commit: httpd-2.0/modules/arch/win32 mod_win32.c

Posted by André Malo <nd...@perlig.de>.
* Jeff White wrote:

> Usage with the built-in Windows Scripting Engine by
> some Apache on Windows web servers, using the
> built-in scripting languages, is now doable via tricking
> the Apache on Windows web server. Tricking the server
> with a fake file, that uses the "pain" Unix based # way
> (or with an option that also turns on everything else at
> the same time) - ScriptInterpreterSource.

You may want to try ScriptInterpreterSource registry-strict.
This uses the ExecCGI\Command key instead of Open\Command.

> But isn't it
> possible to also check for // (two slashes for JScript
> usage) or for a ' (a single quote for VBScript usage)
> during the Apache on Windows script usage routine?

#! is not a real comment (just to be clear). However, the windows way to 
recognize the interpreter is "query the registry". That's what 
ScriptInterpreterSource is intended for. I cannot see any reason to 
recognize other chars than (BOM)#! at the start of the script (which is 
done _only_ for some unix-compat reasons).

nd
-- 
"Die Untergeschosse der Sempergalerie bleiben währenddessen aus
 statistischen Gründen geflutet." -- Spiegel Online

Re: cvs commit: httpd-2.0/modules/arch/win32 mod_win32.c

Posted by Jeff White <jl...@earthlink.net>.
From: "William A. Rowe, Jr."

>>
>> Doesn't the Apache on Windows web server have
>> available a built-in Windows Scripting Engine and two
>> built-in scripting languages, that do not use the # as
>> a scripting comment marker?

> No; Windows has a command-line Windows Scripting Host
> (cscript, as opposed to the window-based wscript).

<quote>

The Windows Scripting Engine serves as the component
within Windows that interprets and executes script code
written in scripting languages such as JScript or VBScript.

</quote>

This Windows Scripting Engine component is used by
many types of "hosts":

Windows Script Host - (WSH 5.6) has two parts.
   Cscript.exe  - CUI - command line  (like Perl.exe)
                        Accesses COM and file system, StdIn,
                        StdErr and StdOut for usage as say admin
                        config type scripts or for accessing LDAP
                        or ADSI or WMI or databases or Regex and
                        etc. (lots more) on the desktop or server to
                        server.
                        Could run in Apache on Windows web server.

  Wscript.exe - GUI - output in window boxes on the screen
                       much like those in the MsgBox. Same usage
                       as Cscript.exe above but with different output.
                       Best for desktop usage for admin type "config"
                       usage same as Cscript.exe above but with the
                       different output way (MsgBox style).
                       Should not be used in Apache on Windows.

Http Applications - HTA - Html files that run on the desktop
                        but not via the browser.  Uses a separate DLL.
                        Accesses some COM and file system like WSH
                        but is not the Windows Script Host.
                        Desktop and server or to server.
                        Could run on as Intranet usage with Apache on
                        Windows.  Not for the Internet.

Internet Explorer - Browser usage.
                        Internet - limits on Com and file system.
                        Desktop - used for displays of list boxes,
                        radio and combo boxes with WSH scripts.

OS - Used in the Windows OS.
                        No need here to explain DLLs usage.

ASP - Used in IIS.
                      (Sort of, but close enough for here without
                      explaining all the DLLs used by everything.)

Programs -  Windows and third party programs.
                     Calls the Com access to the engine and
                     the scripting language usage is available.

So the Windows Script Host is one of many ways
that  "the Apache on Windows web server have
available a built-in Windows Scripting Engine and two
built-in scripting languages, that do not use the # as
a scripting comment marker".

There are a lot of add-in (downloadable) languages for
usage with the Windows Scripting Engine.  But there
are only two built-in languages, VBScript and JScript.

> As André indicated, this is most trivially enabled using the
> Registry-based ScriptInterpreterSource directives.  It is actually
> recommended that you use ScriptInterpreterSource registry-strict
> which will invoke the ExecCGI verb, instead of using the Open
> verb.

Since it "is most trivially enabled using the Registry-based
ScriptInterpreterSource directives" then why is this not the
default (unneeded to turn on) for _all_ Apache on Windows?

This ScriptInterpreterSource default would allow usage of the
built-in Windows Scripting Engine as well as any other Windows
OS "known" programs, such as Perl.  There would be no need
for _any_  scripting comment marker checking routine with the
Apache on Windows web server.

But if any scripting comment marker checking routine (of the
"shebang" type), is performed, on Apache on Windows, without
first having the ScriptInterpreterSource as the default, then this
"component within Windows" that is available to every Apache
on Windows web server, the built-in Windows Scripting Engine
and it's built-in default languages, should be included into this
scripting comment marker checking routine first, if not at least,
somewhere in the #! checking routine in Apache on Windows.

But if it is "compat ways" one seeks, then have the Unix type
Apache versions, only check for ' (a single quote) and then
the Unix type versions of Apache, can run two files also!  :)

> This patch is a no-op relative to VB/J Scripts.

But one added a BOM change (so changes are allowed)
and yet left the #!, that is not used with Windows Scripting
Engine's built-in default languages, available in all Apache
on Windows web servers.

There are many scripting languages, so if any scripting comment
marker checking routine is needed, on Apache on Windows,
then an option for scripting languages is needed too.

A lot is happening with the Windows Scripting Engine. And most
people only use it "properly" with the other Windows "products".
So for them, there is no need for what sounds like "odd and weird"
usage. But these "strange usage ways" work with the Windows
Scripting Engine and may only be needed with non standard
Windows "products".  :)

Just wait 'til one is using some of the .NET Framework 40+
languages in the Apache on Windows web server, for some
unusual Windows usage ways!  ;)

Jeff





Re: cvs commit: httpd-2.0/modules/arch/win32 mod_win32.c

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 09:11 AM 3/22/2003, Jeff White wrote:

>>> wrowe       2003/02/23 14:37:35
>>>
>>>   Modified:    modules/arch/win32 mod_win32.c
>>>   Log:
>>>  utf-8 win32 prefix when testing for shebang lines.
>
>Doesn't the Apache on Windows web server have
>available a built-in Windows Scripting Engine and two
>built-in scripting languages, that do not use the # as
>a scripting comment marker?

No; Windows has a command-line Windows Scripting Host
(cscript, as opposed to the window-based wscript).  

As André indicated, this is most trivially enabled using the
Registry-based ScriptInterpreterSource directives.  It is actually
recommended that you use ScriptInterpreterSource registry-strict
which will invoke the ExecCGI verb, instead of using the Open
verb.  And yes - enabling that will allow you to use cscript.exe
to invoke your .wsh scripts.

Others have suggested a FileAction directive (as opposed to the
Action directive) which would allow you to associate handlers
for specific extensions with specific interpreter file name.  If you 
are familiar with the Action directive, that option associates 
handlers for specific extensions with a URI.

Finally, you *CAN* create .csh files with #! Shebang lines; because
they are interpreted within the [section] blocks.  The #! line is ignored
as a 'sectionless' line.  So the first line of myscript.wsh might look like;

#!c:/windows/system32/cscript.exe "%1" %*

However, it's sure easier to use the registry than fill up your server
with .wsh wrappers, or even modifying .js or .vbs files with shebang
lines.  In this case, the facilities we provide are already sufficient.

Again, you need to copy the command line from the Open2 verb to
the ExecCGI verb, because the default Open verb invokes wscript!

>> [...]  Now, perhaps some other compiled bytecode
>> uses the mark, but that should be irrelevant, because
>> we drop out of that logic if we don't encounter a !# after
>> the BOM.   No harm no foul.
>
>Has this been checked with the Apache on Windows
>access, to the built-in scripting engine, the Windows
>Scripting Engine (or WSH 5.6 usage, as some call it)
>and the scripting engine's default usage of the scripting
>languages of VBScript and JScript?

This patch is a no-op relative to VB/J Scripts.  It made no
impact, but would in fact be required to recognize any UTF-8
text saved from Notepad or other Win32-based text tools.

The patch simply skips the BOM, and that is harmless, no
matter if you are looking at Win32 executables, or any other
text files.

>[...] But isn't it
>possible to also check for // (two slashes for JScript
>usage) or for a ' (a single quote for VBScript usage)
>during the Apache on Windows script usage routine?
>Perhaps the Apache on Windows web server needs
>a config option to allow the server's scripting routine
>usage of the various scripting comment markers.

Again, why?  If the registry is the supported way for tracking
.js and .vbs script - we really don't profit from overloading.
Shebang lines (even your suggested '//' flavor) make sense
only when you have several interpreters for .js files, and you
need to mark which one.  Which bash, sh, zsh, or whichever
shell should be used to invoke somescript.sh?  The shebang
line tells you.

>Don't forget to check for security updates, for the built-in
>Windows Scripting Engine, and the scripting engine's
>languages. Both the built-in ones and the ones added later
>for usage with the built-in scripting engine in the Apache
>on Windows web server. Check even if you think no one
>uses the built-in Windows Scripting Engine (or WSH 5.6)
>on the Apache on Windows web server.

Good point and thanks for the suggestion! 


Re: cvs commit: httpd-2.0/modules/arch/win32 mod_win32.c

Posted by Jeff White <jl...@earthlink.net>.
>> wrowe       2003/02/23 14:37:35
>>
>>   Modified:    modules/arch/win32 mod_win32.c
>>   Log:
>>  utf-8 win32 prefix when testing for shebang lines.

Doesn't the Apache on Windows web server have
available a built-in Windows Scripting Engine and two
built-in scripting languages, that do not use the # as
a scripting comment marker?

> Never.  Now, perhaps some other compiled bytecode
> uses the mark, but that should be irrelevant, because
> we drop out of that logic if we don't encounter a !# after
> the BOM.   No harm no foul.

Has this been checked with the Apache on Windows
access, to the built-in scripting engine, the Windows
Scripting Engine (or WSH 5.6 usage, as some call it)
and the scripting engine's default usage of the scripting
languages of VBScript and JScript?

Usage with the built-in Windows Scripting Engine by
some Apache on Windows web servers, using the
built-in scripting languages, is now doable via tricking
the Apache on Windows web server. Tricking the server
with a fake file, that uses the "pain" Unix based # way
(or with an option that also turns on everything else at
the same time) - ScriptInterpreterSource. But isn't it
possible to also check for // (two slashes for JScript
usage) or for a ' (a single quote for VBScript usage)
during the Apache on Windows script usage routine?

Perhaps the Apache on Windows web server needs
a config option to allow the server's scripting routine
usage of the various scripting comment markers.

Don't forget to check for security updates, for the built-in
Windows Scripting Engine, and the scripting engine's
languages. Both the built-in ones and the ones added later
for usage with the built-in scripting engine in the Apache
on Windows web server. Check even if you think no one
uses the built-in Windows Scripting Engine (or WSH 5.6)
on the Apache on Windows web server.

Jeff



Re: cvs commit: httpd-2.0/modules/arch/win32 mod_win32.c

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 03:17 AM 3/19/2003, André Malo wrote:
>* wrowe@apache.org wrote:
>
>> wrowe       2003/02/23 14:37:35
>> 
>>   Modified:    modules/arch/win32 mod_win32.c
>>   Log:
>>     Someone (I need to refer back) asked that we do the same skip for the
>>     utf-8 win32 prefix when testing for shebang lines.  Here's just such
>>     a test with some bad sizeof(buffer) v.s. bytes read assumptions fixed,
>>     and the code made generally a little more legible.  Please review and
>>     comment to consider this patch for backporting to 2.0.
>
>I would leave it in 2.1, because (a) it's a minor enhancement and (b) it's 
>an irreversible change in behaviour. It assumes, that (1) the script 
>languages supports BOMs (e.g. perl does) 

Worst case, it doesn't support BOM, the user uses one, and the script
language 500's out.  No problem there.

>and (2) DOS executables never 
>start with the BOM sequence (I don't know).

Never.  Now, perhaps some other compiled bytecode uses the mark,
but that should be irrelevant, because we drop out of that logic if we
don't encounter a !# after the BOM.   No harm no foul.

You are right - we would mess up the DOS tests a little further down,
but they start with their own 'magic' signature.

>The sizeof(buffer) vs. bytes thing should be backported anyway.

Agreed.  But I don't see this a radical change, it's actually very simple
respect for folks authoring I18N scripts.  I respect their effort, and would
still like to backport this patch to help their efforts along.

Bill




Re: cvs commit: httpd-2.0/modules/arch/win32 mod_win32.c

Posted by André Malo <nd...@perlig.de>.
* wrowe@apache.org wrote:

> wrowe       2003/02/23 14:37:35
> 
>   Modified:    modules/arch/win32 mod_win32.c
>   Log:
>     Someone (I need to refer back) asked that we do the same skip for the
>     utf-8 win32 prefix when testing for shebang lines.  Here's just such
>     a test with some bad sizeof(buffer) v.s. bytes read assumptions fixed,
>     and the code made generally a little more legible.  Please review and
>     comment to consider this patch for backporting to 2.0.

This is PR 16687.

>   +        if ((bytes >= 3) && memcmp(buffer, "\xEF\xBB\xBF", 3) == 0) {
>   +            memmove(buffer, buffer + 3, bytes -= 3);
>   +        }

Just a nit ;-) I'd guess you can safely assume that bytes must be > 3.

I would leave it in 2.1, because (a) it's a minor enhancement and (b) it's 
an irreversible change in behaviour. It assumes, that (1) the script 
languages supports BOMs (e.g. perl does) and (2) DOS executables never 
start with the BOM sequence (I don't know).

The sizeof(buffer) vs. bytes thing should be backported anyway.

nd
-- 
Flhacs wird im Usenet grundsätzlich alsfhc geschrieben. Schreibt man
lafhsc nicht slfach, so ist das schlichtweg hclafs. Hingegen darf man
rihctig ruhig rhitcgi schreiben, weil eine shcalfe Schreibweise bei
irhictg nicht als shflac angesehen wird.       -- Hajo Pflüger in dnq