You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Michael Parker <pa...@pobox.com> on 2009/03/23 16:33:16 UTC

Google Summer of Code

Howdy,

Its that time of year again when Google Summer of Code is starting to  
kick off and the ASF is participating.  I've signed up as a mentor  
again this year, just in case we have some SpamAssassin projects in  
the mix, we didn't end up with any last year.

In the past its always worked best if we as devs proposed a set of  
projects that someone might want to pick up and work on.  So does  
anyone have any thing they would like to see done over the summer?

Also this year, before accepting anyone into the program they are  
going to need to show a real willingness to work with the community  
and speak up on the dev list, etc.  We might even request a small code  
patch to make sure they have the tools necessary to get the job done  
over the summer.

So, any projects old or new you'd like to see worked on?  We can talk  
about them in this thread and I'll spend a few minutes adding them to  
the ASF wide wiki page: http://wiki.apache.org/general/SummerOfCode2009

Michael


Re: Google Summer of Code

Posted by Justin Mason <jm...@jmason.org>.
On Mon, Mar 23, 2009 at 18:47, Mark Martinec <Ma...@ijs.si> wrote:
>> I think there may still be a meta bug in the bugzilla... worth
>> checking it for ideas.
>
> All I could find was:
>  https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4917
> but is empty and closed.

found it.  follow the "depends on" links from
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4560

> Some ideas can be found as enhancement requests in the bugzilla.
>
>
> Here are some other that come to mind:
>
> - 'a bugathlon': there are many bugs open, and some of these are
> rather small things to fix. Some may even be just forgotten and
> already fixed. It would be nice to go systematically through the
> list, doing some triage, and fix the more straightforward ones.
>
> - the M::SA::Message::Metadata::Received::parse_received_line
> looks like one big ad-hoc mess of exceptions. I'd dreamed that
> making a general (but permissive) parser of the syntax as
> prescribed in RFC 2821 could cover 2/3 of the cases, then
> dealing with the remaining exceptions.
>
> - there is a basic IPv6 support in SA, but seems like there are
> several corner cases where IPv6 addresses are not recognized or
> supported. Likely (just guessing) in RBL lookups, in Received header
> field parsing, some DNS lookups in plugins, querying for AAAA in
> addition to A, and in .ip6.arpa for reverse queries, maybe in
> spamc/spamd. It would be nice to go systematically across features,
> checking or fixing their IPv6 support.
>
> - my personal pet peeve: cleanly separating checking of a message
> from score generation and from reporting. This would make it possible
> (when using SA at a MTA level) to run a multi-recipient message
> through checks once, then produce a per-recipient score and/or
> per-recipient report individually for each recipient without having
> to re-run the rules. Most rules are already compatible with this:
> checking could just collect the set of rule names that fire, and
> assigning and summing up scores could be done as a separate step.
> Missing details are excluding rules which have zero score for all
> recipients of a message, short-circuiting, per-recipient bayes.
> Some stats indicate that a message has 1.5 recipients on the average,
> which means saving 50% of time almost for free when running in the
> MTA integration mode, while still preserving many per-recipient features.
>
> - dealing with arbitrary size mail messages: the rules and plugins
> which need it could have access to a complete message kept on a file
> (like checking DKIM signatures, processing of large attached pictures
> or documents, ...), while the rest can continue to work with an
> in-memory copy, but truncated to a managable size if necessary.
> The spamc could for example pass a file name to spamd (when both
> are running on the same host), instead of having to feed mail contents
> through a pipe/socket.
>
>  Mark
>
>

Re: Google Summer of Code

Posted by Michael Parker <pa...@pobox.com>.
How about an XS based message parser?

Maybe something pluggable so you could easily swap in an XS based on  
or different perl ones depending on the need or available  
infrastructure (ie no complier available).

Michael


Re: Google Summer of Code

Posted by Mark Martinec <Ma...@ijs.si>.
> I think there may still be a meta bug in the bugzilla... worth
> checking it for ideas.

All I could find was:
  https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4917
but is empty and closed.

Some ideas can be found as enhancement requests in the bugzilla.


Here are some other that come to mind:

- 'a bugathlon': there are many bugs open, and some of these are
rather small things to fix. Some may even be just forgotten and
already fixed. It would be nice to go systematically through the
list, doing some triage, and fix the more straightforward ones.

- the M::SA::Message::Metadata::Received::parse_received_line
looks like one big ad-hoc mess of exceptions. I'd dreamed that
making a general (but permissive) parser of the syntax as
prescribed in RFC 2821 could cover 2/3 of the cases, then
dealing with the remaining exceptions.

- there is a basic IPv6 support in SA, but seems like there are
several corner cases where IPv6 addresses are not recognized or
supported. Likely (just guessing) in RBL lookups, in Received header
field parsing, some DNS lookups in plugins, querying for AAAA in
addition to A, and in .ip6.arpa for reverse queries, maybe in
spamc/spamd. It would be nice to go systematically across features,
checking or fixing their IPv6 support.

- my personal pet peeve: cleanly separating checking of a message
from score generation and from reporting. This would make it possible
(when using SA at a MTA level) to run a multi-recipient message
through checks once, then produce a per-recipient score and/or
per-recipient report individually for each recipient without having
to re-run the rules. Most rules are already compatible with this:
checking could just collect the set of rule names that fire, and
assigning and summing up scores could be done as a separate step.
Missing details are excluding rules which have zero score for all
recipients of a message, short-circuiting, per-recipient bayes.
Some stats indicate that a message has 1.5 recipients on the average,
which means saving 50% of time almost for free when running in the
MTA integration mode, while still preserving many per-recipient features.

- dealing with arbitrary size mail messages: the rules and plugins
which need it could have access to a complete message kept on a file
(like checking DKIM signatures, processing of large attached pictures
or documents, ...), while the rest can continue to work with an
in-memory copy, but truncated to a managable size if necessary.
The spamc could for example pass a file name to spamd (when both
are running on the same host), instead of having to feed mail contents
through a pipe/socket.

  Mark

Re: Google Summer of Code

Posted by Justin Mason <jm...@jmason.org>.
I think there may still be a meta bug in the bugzilla... worth
checking it for ideas.

--j.

On Mon, Mar 23, 2009 at 15:33, Michael Parker <pa...@pobox.com> wrote:
> Howdy,
>
> Its that time of year again when Google Summer of Code is starting to kick
> off and the ASF is participating.  I've signed up as a mentor again this
> year, just in case we have some SpamAssassin projects in the mix, we didn't
> end up with any last year.
>
> In the past its always worked best if we as devs proposed a set of projects
> that someone might want to pick up and work on.  So does anyone have any
> thing they would like to see done over the summer?
>
> Also this year, before accepting anyone into the program they are going to
> need to show a real willingness to work with the community and speak up on
> the dev list, etc.  We might even request a small code patch to make sure
> they have the tools necessary to get the job done over the summer.
>
> So, any projects old or new you'd like to see worked on?  We can talk about
> them in this thread and I'll spend a few minutes adding them to the ASF wide
> wiki page: http://wiki.apache.org/general/SummerOfCode2009
>
> Michael
>
>