You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/01/09 05:28:11 UTC

Re: review of SpamAssassin incubator status file

BTW, thanks for the pointer to http://www.apache.org/dev/ , hadn't seen
that.  Clears up a lot of procedural detail I didn't know yet ;)

Sander Striker writes:
>> > Identify the project to be incubated
>> > date 	item
>> > ....-..-.. 	Make sure that the requested project name does not already exist
>> > and check www.nameprotect.com to be sure that the name is not already
>> > trademarked for an existing software product.
>> 
>> DONE: Done, in that we are aware of the existing trademark.
>
>And what is the status of that trademark?  Is it in the process of being
>signed over to the ASF? 

I'll send a ping and see what's new.

>> > ....-..-.. 	If request from anywhere to become a stand-alone PMC, then assess
>> > the fit with the ASF, and create the lists and modules under the incubator
>> > address/module names if accepted.
>>
>> Partially complete.
>
>I'd mark this as DONE.  The assessment of a fit was previously made.
>And resources have and will be created under the incubator namespace.

Great ;)

BTW -- question -- do we maintain the list at
http://incubator.apache.org/projects/spamassassin.html , or someone else?
I'd presume the latter, since I don't have committer access on the
incubator site... as far as I know, but I may be wrong there. ;)

>> We now have:
>> spamassassin-users: to take over from SpamAssassin-talk on sf.net;
>> spamassassin-dev: to take over from SpamAssassin-devel on sf.net;
>> spamassassin-cvs: to take over from SpamAssassin-commits on sf.net.
>> There's a few more -- -announce, -dev-de, and -dev-br. request in separate
>> mail.
>
>Just add them to the list below* when completed.

will do.

> [..........]

>> > ....-..-.. 	Check and make sure that the papers that transfer rights to the ASF been received. It
>> > is only necessary to transfer rights for the package, the core code, and any new code produced by
>> > the project.
>> 
>> DONE: as far as I know, this is complete.
>
>Before marking this as DONE, I want to be 100% sure.  I still need to
>see at least one CLA be marked as received for instance.

OK -- good point, I'd forgotten about the "Matt issue" ;)

>> > ....-..-.. 	Check and make sure that the files that have been donated have been updated to
>> > reflect the new ASF copyright. 
>> 
>> Not yet complete.   Assuming this will also tag the files with the ASL
>> text, this also raises the question -- which version of the ASL are we to
>> use, 1.1 or 2.0?   if I recall correctly, this is an open question.
>
>Use 1.1.  2.0 is still pending approval.

OK, all source and rule files, and most of the scripts, in the
SpamAssassin 2.70 repo are now tagged with ASL 1.1.   I haven't tagged
*everything*, of course, but that covers the main core engine code and
rules.

>> > Verify distribution rights
>> > date 	item
>> > ....-..-.. 	Check and make sure that for all code included with the distribution that is not under
>> > the Apache license, e have the right to combine with Apache-licensed code and redistribute.
>> 
>> DONE: Not applicable; there is no longer any code in the distribution
>> package that is not Apache licensed.   All distribution code in "trunk" is
>> covered by CLAs, and code in the older branches is not intended to be
>> released under the Apache.org aegis.
>
>The latter is indeed important.  For code history purposes, I'd say that
>the code may be there, but releases may absolutely not be made under
>the ASF banner from that code.

Makes perfect sense to me.   I'll add a README file to each branch noting
this.

>[...............]

>> > ....-..-.. 	Decide about and then ask infrastructure to setup an issuetracking system (Bugzilla, Scarab, Jira).
>> 
>> DONE	Will be sticking with our existing one due to enhancements.
>
>Hmmm... :)  Need to think about that one.  We pretty much discourage
>using non-ASF controlled resources.  That said, we'd prolly be
>able to get the customized things in our BZ install, since what
>you have done is not limited to SA (the CLA field).  The bug you
>encountered (and disabled server push for) seems to be fixed in our
>install if IIRC.

The CLA field is *very* handy, FWIW, so this would be useful.
But other than that, I have no objections to moving from our
BZ to an apache.org one ;)

>> > Project specific 
>> > Add project specific tasks here.
>> 
>> OK, here's some more components of our infrastructure we need to think
>> about migrating to Apache.org:
>> 
>> - The nightly-build packaging and signing scripts.
>
>We have a mechanism in place for nightly code snapshots which might be
>something to look into.  Automatic signing?  Sounds something that
>could be fragile, or at least be viewed as less secure than one would
>like.

Yes, we could safely lose the automatic signing of nightly builds
if necessary.  Signing is only *really* essential for releases.

What is the mechanism for nightly builds?    I think we could start
making snapshots at the incubator instead of at
http://SpamAssassin.org/devel/ straight away.

BTW automated test builds -- that's gump, right?   How do we set this up?
I'd be interested in getting this working before we leave the incubator,
if possible.  I presume I need to go bother the gump people ;)

>> - The script to update a tag nightly (used by people running nightly
>>   "mass-check" tests for rule QA).
>
>I need some more details on this one to understand.

Basically, we perform rule QA like so: at a well-defined time every night,
we tag the HEAD rev in CVS (now trunk in SVN).   A set of contributors
then do a "cvs update" from that tag and start off a check of all their
mail spool, then upload the results to a central rsync server.

On an hourly basis, there's a cron job running on another server which
rsyncs down these results, combines them, and generates a webpage
summarising the rule accuracy across everyone's mail spools.

The release-time process operates in a similar way, except we set the tag
manually, and it typically includes a lot more of each person's mail
corpus.

So there's actually three components there -- I forgot two! argh:

  - the cron job to tag nightly

  - the rsync server accepting those log files

  - a cron job to combine the log files and generate a report text file
    visible via HTTP

One important thing to note is that the second item, the rsync server,
holds these log files, and is also used to store the log files generated
from the periodic rescoring process.

We only keep one set of the latter, since they are *VERY* large -- the
current rsync server is holding 2.7Gb of logfiles right now, and there's
no reason to assume that future releases will require any less (quite the
opposite, in fact).

This is one big component; but, if the ASF can't supply this, we can stick
with our donated server we're using at the moment.


>> - The release packaging and signing scripts.
>
>*nod*
>
>> - The website-building scripts.   These use WebMake to build the website
>>   -- http://webmake.taint.org/ .  Do we have to migrate this to Forrest
>>   (which would be quite a lot of work, I think)?
>
>You certainly are not required to use Forrest.  The requirement is that
>the site source and generated content have to be kept under version
>control (to keep things easy for infrastructure).  And ofcourse the
>generated content has to reproducable from source.  Adding ASF tools
>is a bonus, but not a requirement.

Good news -- rewriting the webmake stuff would have been quite a job.

>There are some things with respect to content.  The home page should
>reflect that this is an ASF project.  There should be links to the
>main http://www.apache.org/ page.  Furthermore, while in incubation,
>it has to be noted on the site that the project is in incubation.
>I believe someone was working on an incubation logo to make this
>a bit easier (by just including the image on the page).
>Anyway, just browse around the ASF site and see what other projects have
>done.

No problem -- updating the content now; see http://spamassassin.taint.org/
for the current rev that'll be percolating out tonight.

The incubation logo would be nice; as far as I can see there's no std
on the other incubated-project pages yet.

>> - news.SpamAssassin.org, a Slash-type website for us and third parties to
>>   announce products, releases, services, news about SpamAssassin, and
>>   tools that work with SpamAssassin.   This is running on a donated
>>   server.
>
>Where is this server located?  Is the hardware a solid donation, or is
>it a 'lend-for-use' box?  Is the bandwidth donated?  Do you have full
>control over the machine?

It's lend-for-use -- in fact, we have little control; just administration
via the web, not even any shell access.

>>   Unsure what to do about this one, but I would suggest keeping it
>>   separate -- it provides a good forum for third parties to announce their
>>   plugins etc., so is useful, but I don't see anything like it in the
>>   existing Apache infrastructure.
>
>That's because we don't let third parties announce on our sites.
>
>>   One option: we could always "downgrade" it's status so it becomes more
>>   of a "news about products that support SpamAssassin" third-party site,
>>   instead of an "official" news source as it is now.
>
>What I think is that we should just bring this up for discussion.  Let's
>see how we can fit things in.

OK.

>> - the spamtraps server; this is a dedicated server which receives about
>>   100Mbytes of spam per day, and reports it to various blocklists and
>>   network services.   I don't think this should be brought over in any
>>   way, as it eats a killer amount of CPU time and bandwidth, is not
>>   required when building SpamAssassin releases, and is a donation from our
>>   friendly ISP pals at sonic.net. ;)
>
>Ok, we're gonna need some details here.  Basically the same questions
>as above.  And add: How much bandwidth? to that list.

This is one that almost definitely cannot come along.

It's a dedicated server, because the jobs it runs, and the volume of mail
involved is massive.  Bandwidth usage alone is over 1Gbit per day, I'd
reckon.

It receives mail from a wide range of spamtraps, and feeds each spam into
a script which:

  - archives a copy
  - sends copies to several DNS blocklist submission addresses
  - reports to network services like Razor, Pyzor, and DCC

This is bandwidth- and CPU-heavy, hence it's dedicated to the task.  This
regularly gets nearly knocked out by load, so there's no way any other
service can be run on this machine due to its impact on reliability.

>> So those are the bits and pieces that need a bit of thought.
>
>Yep :).  Thanks for taking the time to do this.
>
>I'll add:
>
> - transfer of spamassassin.org to ASF
> - transfer of DNS to ASF Infrastructure
> - ...
>
>There's probably more to think about, but I think we have enough to
>keep us busy for a bit.

--j.