You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Colm MacCarthaigh <co...@stdlib.net> on 2005/12/02 12:33:34 UTC

spamc and apr hackathon?

A long long time ago, before Spamassassin was an Apache project, I
contributed some stuff to the very early spamd and spamc (minor claim to
fame; I came up with the port it runs on ;-).

Anyway, at Apachecon EU I remember someone from the Spamassassin project
asking about spamc and apr, and the possibilities there, and from
looking at spamc in trunk, it still looks familiar and it also looks
like it should be doable without much trouble. There are some apr tricks
we use in httpd input processing which I think may even offer some
performance boosts.

I know some SpamAssassin people are attending Apachecon US, so is there
still any interest in this? I'd be up for having a go at it during the
hackathon.

As it happens, over at APR, we're building new cleanroom build
environments, which is most of the work, and I've been looking for a
real-world project to apply this to. I'm sure some other APR/httpd
people can be roped in too, if there's interest.

Oh, am I correct in thinking that spamc doesn't support IPv6? I'll fix
that no matter what, can't be having that now :-)

-- 
Colm MacCárthaigh                        Public Key: colm+pgp@stdlib.net

Re: spamc and apr hackathon?

Posted by John Madden <ma...@skynet.ie>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On (02/12/05 14:18), Colm MacCarthaigh didst pronounce:
> Oh excellent, looks good, I'll give it some review and testing and
> hopefully that may influence a commit :)
> 
I think a more important commit-influencer might be IPV6 support in
spamd!! I started this at some point aswell, but don't think I put a
patch into bugzilla for it. I may still have the tree I worked on at
home though, so I'll take a look and open a bug if it's there.

- -- 
Chat ya later,

John.
- --
BOFH excuse #149: Dew on the telephone lines.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDkHJKQBw+ZtKOvTIRAmyRAKCBaOgcXmSrRGLR4lE0PejBtzbZhgCdGqtr
8zwGE7xOwqRwJtsBGFVEjsc=
=HUIm
-----END PGP SIGNATURE-----

Re: spamc and apr hackathon?

Posted by Sidney Markowitz <si...@sidney.com>.
Nix wrote:
> Um, `reserved for use as {X}...' means `you cannot use them as {X}
> because they are reserved, but you can use them for other things'.
> 
> Note the phrasing of the previous clause, which reserves
> e.g. identifiers beginning with underscores and uppercase letters `for
> any use'; that doesn't mean you can use them for anything at all, it
> means you *can't*.

No, the common meaning of "reserved for X" means that it is kept for the
exclusive use of X. When a table at a restaurant is reserved for the
Smith party at 11, only members of the Smith part get to use it at 11.
It does not mean that Smith will not be allowed to use that table at 11
because it is reserved.

But you don't have to rely on common meanings, the Standard goes on to
state explicitly what it means by "reserved for X".

After listing what the various classes of identifiers are "reserved for"
it then defines what to do with all of them in the following paragraph
that explains what the lists are all about:

"Applications shall not declare or define identifiers with the same name
as an identifier reserved in the same context."

The meaning of the "reserved for" in the previous paragraphs is to
define the "context" for those identifiers.

So if the C compiler implementation defines __FOO as an external
function (which they are allowed to do because such identifiers are
reserved for any use) then an application is not allowed to define __FOO
because there will be a conflict.

Since you have no way of knowing which function names a compiler
implementation will use, that means that you as an application developer
had better not name anything __FOO, so only compiler implementors get to
use names like that. They can use those names without fear of collision
with application developers.

If you as an application developer want to use a name like _foo you are
allowed to use it as a file local identifier as long as it is not
already being used as a file local identifier, for example in a system
header that you include. Since there would be no way for you to know
about arbitrary macros and function names with file scope defined by
compiler implementors in their system header files, in practice that
means that compiler implementors cannot use names such as _foo as file
local identifiers. That is so you can choose _foo as the name for a
static function and not be concerned about a name collision.

The examples you gave from glibc are all declared external. That is a
different "context" than the local file scope for which those
identifiers are reserved. Thus an application can define identifiers
with the same name as long as they are file local.

That even makes sense: You can define a static function named
_dl_mcount_wrapper_check in a file and not worry about a name collision
unless you are writing something that actually explicitly uses that
external function. If the glibc developers had used
_dl_mcount_wrapper_check as a file local identifier, then you would have
to worry about it. The effect of the rules is that the glibc developers
know to stay away from using _foo as a file local name in system headers
and application developers know to use _foo for that purpose.

 -- sidney

Re: spamc and apr hackathon?

Posted by Nix <ni...@esperi.org.uk>.
On Sun, 04 Dec 2005, Sidney Markowitz uttered the following:
> Nix wrote:
>>     - All identifiers that begin with an underscore are
>>       always reserved for use as macros and as identifiers
>>       with file scope in both the ordinary and tag name
>>       spaces.
>> 
>> libspamc falls foul of the second clause; its identifiers should indeed
>> probably lose the underscore.
> 
> No, when the Standard says they are reserved for use as macros and as
> identifiers with file scope, it means that the compiler implementors
> cannot use them for other things because they are reserved for uses such
> as you see in libspamc.c, in which an ordinary user of C uses them as
> the names of static functions which are visible only in the file (i.e.,
> have file scope).

Um, `reserved for use as {X}...' means `you cannot use them as {X}
because they are reserved, but you can use them for other things'.

Note the phrasing of the previous clause, which reserves
e.g. identifiers beginning with underscores and uppercase letters `for
any use'; that doesn't mean you can use them for anything at all, it
means you *can't*.


The implementation on glibc-using platforms, for instance, uses
identifiers such as _GLOBAL_OFFSET_TABLE_ and _dl_start and _setjmp and
_dl_mcount_wrapper_check and _IO_new_* and _nl_current_default_domain
and, oh, *lots* of others. These identifiers are treated as reserved by
a group of implementors known to be totally paranoid about namespace
correctness; I think it's wise to assume that they *are*, therefore,
reserved.

You can name a local variable _dl_start on a glibc platform and
everything will work fine. Call a global function that and suddenly you
can't statically link your program anymore because you've stamped on a
reserved identifier in use by that implementation. Call a static
function that and... interesting things will happen if you try to
dlopen() anything. Weird random results, yep, looks like undefined
behaviour to me.

> libspamc is using those identifiers exactly as they are supposed to be
> used. Unless I overlooked any, all of them are declared as static
> functions with file scope.

Yes. Thus they are reserved, static functions with file scope being
`identifiers... with file scope'.

> The Standard gives us the ability to name things that are not visible
> outside the file with no fear that the name is being used for an
> incompatible purpose in any included system header.
> 
> See Section 7.1.3 page 112 "Rationale for International Standard—
> Programming Languages— C"
> http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf

Woo, 7.1.3 line 20 of the Rationale disagrees with the Standard,
brilliant. Either `reserved for... use' is used to mean two things with
opposite meaning in two consecutive paragraphs of the Standard, or the
Rationale is wrong when it doesn't mention the second clause of 7.1.3.1.

Thankfully the Rationale is not normative so we can ignore it and follow
the Standard instead.

-- 
`Y'know, London's nice at this time of year. If you like your cities
 freezing cold and full of surly gits.' --- David Damerell


Re: spamc and apr hackathon?

Posted by Sidney Markowitz <si...@sidney.com>.
Nix wrote:
>     - All identifiers that begin with an underscore are
>       always reserved for use as macros and as identifiers
>       with file scope in both the ordinary and tag name
>       spaces.
> 
> libspamc falls foul of the second clause; its identifiers should indeed
> probably lose the underscore.

No, when the Standard says they are reserved for use as macros and as
identifiers with file scope, it means that the compiler implementors
cannot use them for other things because they are reserved for uses such
as you see in libspamc.c, in which an ordinary user of C uses them as
the names of static functions which are visible only in the file (i.e.,
have file scope).

libspamc is using those identifiers exactly as they are supposed to be
used. Unless I overlooked any, all of them are declared as static
functions with file scope.

The Standard gives us the ability to name things that are not visible
outside the file with no fear that the name is being used for an
incompatible purpose in any included system header.

See Section 7.1.3 page 112 "Rationale for International Standard—
Programming Languages— C"
http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf

 -- Sidney Markowitz
    http://www.sidney.com



Re: spamc and apr hackathon?

Posted by Colm MacCarthaigh <co...@stdlib.net>.
On Fri, Dec 02, 2005 at 11:27:46PM +0000, Nix wrote:
>     - All identifiers that begin with an underscore are
>       always reserved for use as macros and as identifiers
>       with file scope in both the ordinary and tag name
>       spaces.

Well I would have thought those identifiers are future use, but lets not
argue :)

> (Oh, and as a heavy user of spamc on odd systems, allow me to plead:
> *please* don't add APR as a mandatory external dependency unless there's
> absolutely no option.

There's certainly an option, if people don't want it or it's a step
backwards, it wont go in. And this works both ways, we want people
to be happy with APR, if they're not, we try to fix whatever people
think is broken.

> One of the things that's valuable about spamc is that it has no
> external dependencies at all, so I can run it on all sorts of insane
> and horrible ancient Unix boxes without significant trouble because
> all it needs is POSIX and not much of that. I'm even running it on a
> SunOS 4.1.3 box at work. Is APR that portable?  

Yep! 

> Are you sure?

People manage to run httpd 2.x on SunOS 4.1.3 so yep :)

> To get APR working you need a system that's a hell of a lot more
> capable than what you need to get spamc working.)

Not particularly, the C should be o.k. because non-portable features are
all conditional, the main dependency would be the configure script, but
that should work too :)

-- 
Colm MacCárthaigh                        Public Key: colm+pgp@stdlib.net

Re: spamc and apr hackathon?

Posted by Nix <ni...@esperi.org.uk>.
On Fri, 2 Dec 2005, Colm MacCarthaigh whispered secretively:
> It's not a problem with your patch, but libspamc has functions that
> start with an underscore. This is bad, and invalid C. The C spec
> reserves all tokens starting with an underscore for future use.

While what you say about libspamc is correct, what you say about
the Standard is not. No reservations are placed on tokens at all:
some identifiers starting with underscores are reserved, but by
no means all.

C99 says:

    - All identifiers that begin with an underscore and either an
      uppercase letter or another underscore are always reserved for
      any use.

    - All identifiers that begin with an underscore are
      always reserved for use as macros and as identifiers
      with file scope in both the ordinary and tag name
      spaces.

libspamc falls foul of the second clause; its identifiers should indeed
probably lose the underscore.


(Oh, and as a heavy user of spamc on odd systems, allow me to plead:
*please* don't add APR as a mandatory external dependency unless there's
absolutely no option. One of the things that's valuable about spamc is
that it has no external dependencies at all, so I can run it on all
sorts of insane and horrible ancient Unix boxes without significant
trouble because all it needs is POSIX and not much of that. I'm even
running it on a SunOS 4.1.3 box at work. Is APR that portable?  Are you
sure? To get APR working you need a system that's a hell of a lot more
capable than what you need to get spamc working.)

-- 
`Y'know, London's nice at this time of year. If you like your cities
 freezing cold and full of surly gits.' --- David Damerell


Re: spamc and apr hackathon?

Posted by Colm MacCarthaigh <co...@stdlib.net>.
On Fri, Dec 02, 2005 at 12:20:40PM +0000, John Madden wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On (02/12/05 11:33), Colm MacCarthaigh didst pronounce:
> > Oh, am I correct in thinking that spamc doesn't support IPv6? I'll fix
> > that no matter what, can't be having that now :-)
> > 
> I wrote the code for this a while back, but haven't looked at it since.
> Bug 4477 in bugzilla
> (http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4477)

Oh excellent, looks good, I'll give it some review and testing and
hopefully that may influence a commit :)

It's not a problem with your patch, but libspamc has functions that
start with an underscore. This is bad, and invalid C. The C spec
reserves all tokens starting with an underscore for future use.

-- 
Colm MacCárthaigh                        Public Key: colm+pgp@stdlib.net

Re: spamc and apr hackathon?

Posted by John Madden <ma...@skynet.ie>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On (02/12/05 11:33), Colm MacCarthaigh didst pronounce:
> Oh, am I correct in thinking that spamc doesn't support IPv6? I'll fix
> that no matter what, can't be having that now :-)
> 
I wrote the code for this a while back, but haven't looked at it since.
Bug 4477 in bugzilla
(http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4477)

- -- 
Chat ya later,

John.
- --
BOFH excuse #187: Reformatting Page. Wait...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDkDwXQBw+ZtKOvTIRArHQAJ4uNqCa124c0ExHCqf/9IU68xCRqQCdHftZ
dAJ0/dCHJsi0nUBUwB5QGEg=
=IYQt
-----END PGP SIGNATURE-----