You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Shreyansh Shrivastava." <sh...@nitk.edu.in> on 2019/06/10 02:56:48 UTC

GSoC blog series

I am a Google summer of code student working to develop a statistical
classifier plugin for spamassassin. I will be writing a series of blogs
regarding my technical and non-technical experiences during the program,
and since it's open source I think it's a win-win.

The first part is simply about how I came across Spamassassin and how I got
selected. Will be touching on specifics later on. Blog 1
<https://medium.com/@shreyansh25.shrivastava/spamassassin-singa-and-me-7927dbaa16d0>

Regards,
Shreyansh Shrivastava

Re: GSoC blog series

Posted by "Shreyansh Shrivastava." <sh...@nitk.edu.in>.
Hey everyone,

This is the fourth and final entry of the series of Blogs which I was
writing under the GSoC program. This blog can be used as a walkthrough to
add a basic/minimal plugin to the existing Spamassassin framework. I have
learned about the code and APIs during the program itself hence I am
expecting a lot of errors :)

Blog 4 Spamassassin and plugins
<https://medium.com/@shreyansh25.shrivastava/spamassassin-and-plugins-c60731523680>

Also as I am almost done with the program, I'll be putting up the plugin
code for review very soon.

Regards,
Shreyansh Shrivastava


On Mon, Jul 1, 2019 at 5:16 PM Shreyansh Shrivastava. <
shreyansh.171co244@nitk.edu.in> wrote:

> Hey Stefan,
> Thanks for the pointer. Will keep that in mind from next time. Will add
> that to the blog too :)
>
> Regards,
> Shreyansh Shrivastava
>
>
> On Mon, Jul 1, 2019 at 2:37 PM Stefan Hornburg (Racke) <ra...@linuxia.de>
> wrote:
>
>> On 7/1/19 10:59 AM, Shreyansh Shrivastava. wrote:
>> > This is the third entry of the blog series. I cover the OOPs concepts
>> of Perl and the related syntax. This is a bit off
>> > track when we think about SINGA (although inline with Spamassassin),
>> but I'll be covering SINGA in the subsequent blogs
>> > which I am using for developing a neural net classifier plugin for
>> Spamassassin.
>> > Blog 3 Perl-OOPs <
>> https://medium.com/@shreyansh25.shrivastava/perl-oops-13d7c017e69f?postPublishedType=initial
>> >
>> >
>> > Regards,
>> > Shreyansh Shrivastava
>> >
>>
>> I strongly recommend to utilize one of the established OO systems (Moo,
>> Moose) instead of creating classes in the old way.
>>
>> Regards
>>           Racke
>>
>> >
>> > On Fri, Jun 14, 2019 at 2:26 PM Shreyansh Shrivastava. <
>> shreyansh.171co244@nitk.edu.in
>> > <ma...@nitk.edu.in>> wrote:
>> >
>> >     This is the second blog of the series. It covers data cleaning,
>> feature extraction, model training, and metric
>> >     selection of a SVM spam classifier. All of the things mentioned
>> along with a few extra are implemented in the SVM
>> >     plugin that I am trying to develop. Will be covering ASF SINGA when
>> I will work on neural nets.
>> >     Blog 2 - SVM and Spamassassin <
>> https://medium.com/@shreyansh25.shrivastava/svm-and-spamassassin-9da5c2d3fe33
>> >
>> >
>> >     Regards,
>> >     Shreyansh Shrivastava
>> >
>> >     On Mon, Jun 10, 2019 at 8:26 AM Shreyansh Shrivastava. <
>> shreyansh.171co244@nitk.edu.in
>> >     <ma...@nitk.edu.in>> wrote:
>> >
>> >         I am a Google summer of code student working to develop a
>> statistical classifier plugin for spamassassin. I will
>> >         be writing a series of blogs regarding my technical and
>> non-technical experiences during the program, and since
>> >         it's open source I think it's a win-win.
>> >
>> >         The first part is simply about how I came across Spamassassin
>> and how I got selected. Will be touching on
>> >         specifics later on. Blog 1 <
>> https://medium.com/@shreyansh25.shrivastava/spamassassin-singa-and-me-7927dbaa16d0
>> >
>> >
>> >         Regards,
>> >         Shreyansh Shrivastava
>> >
>>
>>
>> --
>> Ecommerce and Linux consulting + Perl and web application programming.
>> Debian and Sympa administration. Provisioning with Ansible.
>>
>>

Re: GSoC blog series

Posted by "Shreyansh Shrivastava." <sh...@nitk.edu.in>.
Hey Stefan,
Thanks for the pointer. Will keep that in mind from next time. Will add
that to the blog too :)

Regards,
Shreyansh Shrivastava


On Mon, Jul 1, 2019 at 2:37 PM Stefan Hornburg (Racke) <ra...@linuxia.de>
wrote:

> On 7/1/19 10:59 AM, Shreyansh Shrivastava. wrote:
> > This is the third entry of the blog series. I cover the OOPs concepts of
> Perl and the related syntax. This is a bit off
> > track when we think about SINGA (although inline with Spamassassin), but
> I'll be covering SINGA in the subsequent blogs
> > which I am using for developing a neural net classifier plugin for
> Spamassassin.
> > Blog 3 Perl-OOPs <
> https://medium.com/@shreyansh25.shrivastava/perl-oops-13d7c017e69f?postPublishedType=initial
> >
> >
> > Regards,
> > Shreyansh Shrivastava
> >
>
> I strongly recommend to utilize one of the established OO systems (Moo,
> Moose) instead of creating classes in the old way.
>
> Regards
>           Racke
>
> >
> > On Fri, Jun 14, 2019 at 2:26 PM Shreyansh Shrivastava. <
> shreyansh.171co244@nitk.edu.in
> > <ma...@nitk.edu.in>> wrote:
> >
> >     This is the second blog of the series. It covers data cleaning,
> feature extraction, model training, and metric
> >     selection of a SVM spam classifier. All of the things mentioned
> along with a few extra are implemented in the SVM
> >     plugin that I am trying to develop. Will be covering ASF SINGA when
> I will work on neural nets.
> >     Blog 2 - SVM and Spamassassin <
> https://medium.com/@shreyansh25.shrivastava/svm-and-spamassassin-9da5c2d3fe33
> >
> >
> >     Regards,
> >     Shreyansh Shrivastava
> >
> >     On Mon, Jun 10, 2019 at 8:26 AM Shreyansh Shrivastava. <
> shreyansh.171co244@nitk.edu.in
> >     <ma...@nitk.edu.in>> wrote:
> >
> >         I am a Google summer of code student working to develop a
> statistical classifier plugin for spamassassin. I will
> >         be writing a series of blogs regarding my technical and
> non-technical experiences during the program, and since
> >         it's open source I think it's a win-win.
> >
> >         The first part is simply about how I came across Spamassassin
> and how I got selected. Will be touching on
> >         specifics later on. Blog 1 <
> https://medium.com/@shreyansh25.shrivastava/spamassassin-singa-and-me-7927dbaa16d0
> >
> >
> >         Regards,
> >         Shreyansh Shrivastava
> >
>
>
> --
> Ecommerce and Linux consulting + Perl and web application programming.
> Debian and Sympa administration. Provisioning with Ansible.
>
>

Re: GSoC blog series

Posted by "Stefan Hornburg (Racke)" <ra...@linuxia.de>.
On 7/1/19 10:59 AM, Shreyansh Shrivastava. wrote:
> This is the third entry of the blog series. I cover the OOPs concepts of Perl and the related syntax. This is a bit off
> track when we think about SINGA (although inline with Spamassassin), but I'll be covering SINGA in the subsequent blogs
> which I am using for developing a neural net classifier plugin for Spamassassin.
> Blog 3 Perl-OOPs <https://medium.com/@shreyansh25.shrivastava/perl-oops-13d7c017e69f?postPublishedType=initial>
> 
> Regards,
> Shreyansh Shrivastava
> 

I strongly recommend to utilize one of the established OO systems (Moo, Moose) instead of creating classes in the old way.

Regards
          Racke

> 
> On Fri, Jun 14, 2019 at 2:26 PM Shreyansh Shrivastava. <shreyansh.171co244@nitk.edu.in
> <ma...@nitk.edu.in>> wrote:
> 
>     This is the second blog of the series. It covers data cleaning, feature extraction, model training, and metric
>     selection of a SVM spam classifier. All of the things mentioned along with a few extra are implemented in the SVM
>     plugin that I am trying to develop. Will be covering ASF SINGA when I will work on neural nets. 
>     Blog 2 - SVM and Spamassassin <https://medium.com/@shreyansh25.shrivastava/svm-and-spamassassin-9da5c2d3fe33>
> 
>     Regards,
>     Shreyansh Shrivastava
> 
>     On Mon, Jun 10, 2019 at 8:26 AM Shreyansh Shrivastava. <shreyansh.171co244@nitk.edu.in
>     <ma...@nitk.edu.in>> wrote:
> 
>         I am a Google summer of code student working to develop a statistical classifier plugin for spamassassin. I will
>         be writing a series of blogs regarding my technical and non-technical experiences during the program, and since
>         it's open source I think it's a win-win.
> 
>         The first part is simply about how I came across Spamassassin and how I got selected. Will be touching on
>         specifics later on. Blog 1 <https://medium.com/@shreyansh25.shrivastava/spamassassin-singa-and-me-7927dbaa16d0>
> 
>         Regards,
>         Shreyansh Shrivastava
> 


-- 
Ecommerce and Linux consulting + Perl and web application programming.
Debian and Sympa administration. Provisioning with Ansible.


Re: GSoC blog series

Posted by "Shreyansh Shrivastava." <sh...@nitk.edu.in>.
This is the third entry of the blog series. I cover the OOPs concepts of
Perl and the related syntax. This is a bit off track when we think about
SINGA (although inline with Spamassassin), but I'll be covering SINGA in
the subsequent blogs which I am using for developing a neural net
classifier plugin for Spamassassin.
Blog 3 Perl-OOPs
<https://medium.com/@shreyansh25.shrivastava/perl-oops-13d7c017e69f?postPublishedType=initial>

Regards,
Shreyansh Shrivastava


On Fri, Jun 14, 2019 at 2:26 PM Shreyansh Shrivastava. <
shreyansh.171co244@nitk.edu.in> wrote:

> This is the second blog of the series. It covers data cleaning, feature
> extraction, model training, and metric selection of a SVM spam classifier.
> All of the things mentioned along with a few extra are implemented in the
> SVM plugin that I am trying to develop. Will be covering ASF SINGA when I
> will work on neural nets.
> Blog 2 - SVM and Spamassassin
> <https://medium.com/@shreyansh25.shrivastava/svm-and-spamassassin-9da5c2d3fe33>
>
> Regards,
> Shreyansh Shrivastava
>
> On Mon, Jun 10, 2019 at 8:26 AM Shreyansh Shrivastava. <
> shreyansh.171co244@nitk.edu.in> wrote:
>
>> I am a Google summer of code student working to develop a statistical
>> classifier plugin for spamassassin. I will be writing a series of blogs
>> regarding my technical and non-technical experiences during the program,
>> and since it's open source I think it's a win-win.
>>
>> The first part is simply about how I came across Spamassassin and how I
>> got selected. Will be touching on specifics later on. Blog 1
>> <https://medium.com/@shreyansh25.shrivastava/spamassassin-singa-and-me-7927dbaa16d0>
>>
>> Regards,
>> Shreyansh Shrivastava
>>
>

Re: GSoC blog series

Posted by Henrik K <he...@hege.li>.
All I see in the github is a SA plugin that is supposed to run a message
through a "Python script"?  Where is this actual python script?

If this whole concept is based on some external language script, please stop
wasting time learning and developing the SpamAssassin plugin glue.  Such
external script can never be efficiently integrated into SA using any of the
existing Bayes framework.

To me it seems you are developing a standalone classifier, similar to DSpam
or CRM114.  SpamAssassin can then utilize this in standalone fashion,
feeding complete message to it and reading the results.  Such classifier
must maintain it's own token database etc, which has nothing to do with
SpamAssassin.

If and when you have this standalone python classifier in working state,
then it's trivial for some existing SpamAsasssin developer to write the
simple plugin glue.

Cheers,
Henrik



On Fri, Jun 14, 2019 at 05:08:22PM +0530, Shreyansh Shrivastava. wrote:
> On Fri, Jun 14, 2019 at 4:31 PM Axb <[1...@gmail.com> wrote:
> 
>     While blogs may be interesting to read and keep track of stuff...
>     Is there any code to look at and start testing?
> 
> Here's the Git repo of the project,  [2]GSoC Spamassassin. I am still working
> on the Perl plugin code.Any suggestions/help is appreciated.
> 
>     The summer is short...
> 
> The winter has ended, not coming again this time :) 
> 
> 
>     Axb
> 
>     On 6/14/19 10:56 AM, Shreyansh Shrivastava. wrote:
>     > This is the second blog of the series. It covers data cleaning, feature
>     > extraction, model training, and metric selection of a SVM spam
>     classifier.
>     > All of the things mentioned along with a few extra are implemented in the
>     > SVM plugin that I am trying to develop. Will be covering ASF SINGA when I
>     > will work on neural nets.
>     > Blog 2 - SVM and Spamassassin
>     > <[3]https://medium.com/@shreyansh25.shrivastava/
>     svm-and-spamassassin-9da5c2d3fe33>
>     >
>     > Regards,
>     > Shreyansh Shrivastava
>     >
>     > On Mon, Jun 10, 2019 at 8:26 AM Shreyansh Shrivastava. <
>     > [4]shreyansh.171co244@nitk.edu.in> wrote:
>     >
>     >> I am a Google summer of code student working to develop a statistical
>     >> classifier plugin for spamassassin. I will be writing a series of blogs
>     >> regarding my technical and non-technical experiences during the program,
>     >> and since it's open source I think it's a win-win.
>     >>
>     >> The first part is simply about how I came across Spamassassin and how I
>     >> got selected. Will be touching on specifics later on. Blog 1
>     >> <[5]https://medium.com/@shreyansh25.shrivastava/
>     spamassassin-singa-and-me-7927dbaa16d0>
>     >>
>     >> Regards,
>     >> Shreyansh Shrivastava
>     >>
>     >
> 
> 
> 
> References:
> 
> [1] mailto:axb.lists@gmail.com
> [2] https://github.com/sjs253/Spamassassin-GSoC
> [3] https://medium.com/@shreyansh25.shrivastava/svm-and-spamassassin-9da5c2d3fe33
> [4] mailto:shreyansh.171co244@nitk.edu.in
> [5] https://medium.com/@shreyansh25.shrivastava/spamassassin-singa-and-me-7927dbaa16d0

Re: GSoC blog series

Posted by "Shreyansh Shrivastava." <sh...@nitk.edu.in>.
On Fri, Jun 14, 2019 at 4:31 PM Axb <ax...@gmail.com> wrote:

> While blogs may be interesting to read and keep track of stuff...
> Is there any code to look at and start testing?
>
Here's the Git repo of the project,  GSoC Spamassassin
<https://github.com/sjs253/Spamassassin-GSoC>. I am still working on the
Perl plugin code.Any suggestions/help is appreciated.

> The summer is short...
>
The winter has ended, not coming again this time :)

>
> Axb
>
> On 6/14/19 10:56 AM, Shreyansh Shrivastava. wrote:
> > This is the second blog of the series. It covers data cleaning, feature
> > extraction, model training, and metric selection of a SVM spam
> classifier.
> > All of the things mentioned along with a few extra are implemented in the
> > SVM plugin that I am trying to develop. Will be covering ASF SINGA when I
> > will work on neural nets.
> > Blog 2 - SVM and Spamassassin
> > <
> https://medium.com/@shreyansh25.shrivastava/svm-and-spamassassin-9da5c2d3fe33
> >
> >
> > Regards,
> > Shreyansh Shrivastava
> >
> > On Mon, Jun 10, 2019 at 8:26 AM Shreyansh Shrivastava. <
> > shreyansh.171co244@nitk.edu.in> wrote:
> >
> >> I am a Google summer of code student working to develop a statistical
> >> classifier plugin for spamassassin. I will be writing a series of blogs
> >> regarding my technical and non-technical experiences during the program,
> >> and since it's open source I think it's a win-win.
> >>
> >> The first part is simply about how I came across Spamassassin and how I
> >> got selected. Will be touching on specifics later on. Blog 1
> >> <
> https://medium.com/@shreyansh25.shrivastava/spamassassin-singa-and-me-7927dbaa16d0
> >
> >>
> >> Regards,
> >> Shreyansh Shrivastava
> >>
> >
>
>

Re: GSoC blog series

Posted by Axb <ax...@gmail.com>.
While blogs may be interesting to read and keep track of stuff...
Is there any code to look at and start testing?
The summer is short...

Axb

On 6/14/19 10:56 AM, Shreyansh Shrivastava. wrote:
> This is the second blog of the series. It covers data cleaning, feature
> extraction, model training, and metric selection of a SVM spam classifier.
> All of the things mentioned along with a few extra are implemented in the
> SVM plugin that I am trying to develop. Will be covering ASF SINGA when I
> will work on neural nets.
> Blog 2 - SVM and Spamassassin
> <https://medium.com/@shreyansh25.shrivastava/svm-and-spamassassin-9da5c2d3fe33>
> 
> Regards,
> Shreyansh Shrivastava
> 
> On Mon, Jun 10, 2019 at 8:26 AM Shreyansh Shrivastava. <
> shreyansh.171co244@nitk.edu.in> wrote:
> 
>> I am a Google summer of code student working to develop a statistical
>> classifier plugin for spamassassin. I will be writing a series of blogs
>> regarding my technical and non-technical experiences during the program,
>> and since it's open source I think it's a win-win.
>>
>> The first part is simply about how I came across Spamassassin and how I
>> got selected. Will be touching on specifics later on. Blog 1
>> <https://medium.com/@shreyansh25.shrivastava/spamassassin-singa-and-me-7927dbaa16d0>
>>
>> Regards,
>> Shreyansh Shrivastava
>>
> 


Re: GSoC blog series

Posted by "Shreyansh Shrivastava." <sh...@nitk.edu.in>.
This is the second blog of the series. It covers data cleaning, feature
extraction, model training, and metric selection of a SVM spam classifier.
All of the things mentioned along with a few extra are implemented in the
SVM plugin that I am trying to develop. Will be covering ASF SINGA when I
will work on neural nets.
Blog 2 - SVM and Spamassassin
<https://medium.com/@shreyansh25.shrivastava/svm-and-spamassassin-9da5c2d3fe33>

Regards,
Shreyansh Shrivastava

On Mon, Jun 10, 2019 at 8:26 AM Shreyansh Shrivastava. <
shreyansh.171co244@nitk.edu.in> wrote:

> I am a Google summer of code student working to develop a statistical
> classifier plugin for spamassassin. I will be writing a series of blogs
> regarding my technical and non-technical experiences during the program,
> and since it's open source I think it's a win-win.
>
> The first part is simply about how I came across Spamassassin and how I
> got selected. Will be touching on specifics later on. Blog 1
> <https://medium.com/@shreyansh25.shrivastava/spamassassin-singa-and-me-7927dbaa16d0>
>
> Regards,
> Shreyansh Shrivastava
>