You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by Sarang Shrivastava <sa...@gmail.com> on 2015/04/03 19:06:19 UTC

LOOKING OUT FOR A MENTOR FOR GSOC 2015

Hello all,

I am Sarang Shrivastava, an open source enthusiast from MNNIT,
Allahabad,India.

While applying for this year's GSOC I committed a blunder, in the initial
phase I was interested in working with the RSPAMD organisation ( Basically
a SPAM filter ) and was working on the idea of "IMPLEMENTING META-STATISTIC
ALGORITHMS".
But while submitting the proposal I accidentally submitted it with the
Apache software foundation.

I asked the mentors of both Rspamd and Apache to somehow transfer my
proposal to Rspamd but this can't happen now.

The thing is my proposal is not organisation specific.Any open source spam
filtering project that does not has this idea can take the advantage of
it.I went through the Spamassasin wiki page and found out that it only has
Bayesian filtering as statistical classification technique, but the other
machine learning methods that I have listed in my proposal
could surprisingly increase the efficiently of the spam filtering process.

So, it would really be appreciating if anyone could mentor me throughout
the GSOC period. I want to work on this proposal but unless an until anyone
of you signs up as a mentor and accept my proposal in Melange before 12th
of April I cannot work on it further.

Please I kindly request if anyone among you who is interested in my idea ,
please be my mentor. I am sure that given a chance to prove myself, I would
not disappoint you.

The link to my proposal is :
https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2015/xlr_24/5629499534213120

I have also enclosed a copy of my proposal as an attachment.
PS: In my attached proposal wherever I wrote rspamd , I have replaced it
with Spamassasin.

Cheers,
Sarang

--
*Sarang Shrivastava*
*Computer Science & Engineering*
*MNNIT Allahabad*

Re: Fwd: LOOKING OUT FOR A MENTOR FOR GSOC 2015

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

Sarang,

I have changed the proposal to be editable so that you can modify it for 
the new SA focus.  I will provide discussions in the GSoC forum but 
invite you to discuss code and projects questions on users@ and 
dev@spamassassin.apache.org.

The ASF is very much built on transparency so open discussion is 
paramount to your success.

Regards,
KAM

On 4/6/2015 5:37 AM, Sarang Shrivastava wrote:
> Hello Kevin,
>
> Sorry for the late reply, was out of town actually.Well the answers to 
> your queries are ( I have tried answering them but I am not sure about 
> their correctness ) :-
>
> So presently SA uses Bayesian classifier together with some additional 
> DNS filters to check for Spam.
> Firstly the present SA doesn't use any of the neural nets model that I 
> have listed in my proposal.Secondly the new words that are not present 
> in the Bayes database, SA assigns a very high probability to it. But 
> there are chances that together with some garbage there can be a 
> meaning full message along with it.
>
> Regarding the plugins, what I think is going with multiple plugins 
> within a module named statistical classifier. So , this module will 
> contain all the plugins for the models that I have listed together. By 
> default, The model that gives the best result out of the listed models 
> will be on. But the user will be given the flexibility to choose from 
> a range of statistical plugins to choose from, so that if in the 
> future any additional methods together with present methods give a 
> better result, so that plugin can be switched on.
>
> Well the neural nets that I have listed in my proposal seems to be 
> better on paper because of the following reasons:-
>
> 1) NB is not so scalable algorithm for large number of emails and 
> where most of the words are random nouns.
>
> 2) NB is a good approach to classify emailsbut it doesn't do well at 
> all in front of good unsupervised learning because size   of the email 
> doesn't really determine its importance, even small e mails can be 
> spam free.
>
> For example,
>
>  a) we are considering you for a job ..
>
> b) and an urgent job is posted for you ..
>
> Now sentence number one might be from a possible employer with whom 
> you have applied whereas sentence two is 90% from those spammerswhich 
> send random job requirements to people
>
> The second one falls more into the category of a spam, So if the user 
> classifies it as a spam, then the weightage of the combine "urgent 
> job" will be more than the rest of the features, But in case of 
> Bayesian filtering each and ebery features are considered independent. 
> The grammar and language features are not included with the Bayesian 
> filtering but with the neural nets a lot of things can come into action.
>
> A thought which you would like to consider:
>
> I didn't listed SVM in my proposal because of the fact that while I 
> was attached with RSPAMD, there were two separate ideas, one was for 
> implementing the supervised neural nets, the other was of unsupervised 
> SVM, SA project has neither of them, So what I was thinking that 
> considering SVM also is not a bad option, because in cases where 
> frequency of words matter SVM give the best result.
>
>
>
>
>
>
>
> On Sun, Apr 5, 2015 at 2:16 AM, Kevin A. McGrail <KMcGrail@pccc.com 
> <ma...@pccc.com>> wrote:
>
>     Thanks Sarang.  I got your email to my address as well but it's a
>     holiday weekend for me in the states. (Happy Easter to all those
>     who celebrate!)
>
>     It looks to me like you understand that programming is a state of
>     mind not a language which is good and you are capable of switching
>     gears.
>
>     I will sign up as a mentor on Monday and let you know when that is
>     done.
>
>     From there, you can look at the SA code and answer the basic
>     questions below because to me your proposal needs clarification to
>     switch to this project.  There is a lot of information in the
>     proposal I don't grok so I think the basic high level questions
>     for you are:
>
>     - What does SA have now related to your proposal?
>     - What you propose?  A plugin?  Multiple plugins?
>     - Why is this anticipated to be better than what exists now.
>
>     Regards,
>     KAM
>
>
>
>
>
>     On 4/4/2015 4:17 PM, Sarang Shrivastava wrote:
>>
>>     ---------- Forwarded message ----------
>>     From: *Sarang Shrivastava* <sarang24s@gmail.com
>>     <ma...@gmail.com>>
>>     Date: Sat, Apr 4, 2015 at 11:15 AM
>>     Subject: Re: LOOKING OUT FOR A MENTOR FOR GSOC 2015
>>     To: "Kevin A. McGrail" <KMcGrail@pccc.com <ma...@pccc.com>>
>>
>>
>>     Hi Kevin,
>>
>>     Before I came in contact with Rspamd I didn't knew lua at all,
>>     but within a week I was proficient enough so that I could atleast
>>     be able to understand the part written in lua (in the rspamd
>>     source code). As you know necessity is the mother of all
>>     inventions, learning perl and redis  would not be a hurdle.
>>
>>     I was just worried about the fact that first of all I need to
>>     look up for mentor, and now when I have one with me (hopefully
>>     you seem to be interested) , so starting from today itself I will
>>     dig more into the source code of SA and brush upon my perl and
>>     redis skills.
>>
>>     Regarding the dataset What I plan is :
>>
>>     Firstly I could directly use the famous enron dataset for spam
>>     filters :-
>>     http://www.aueb.gr/users/ion/data/enron-spam/
>>
>>     Secondly one more thing can be done, I take the spam dataset from :
>>     http://untroubled.org/spam/
>>     which has a collection of spams from 1998-2011 and take the ham
>>     dataset from my own mail account by importing my or for the
>>     matter of fact anyones mails from the gmail server.
>>     https://www.mattcutts.com/blog/backup-gmail-in-linux-with-getmail/
>>
>>     I'll set up my development environment today itself . I didn't
>>     got one of your questions "Additionally, what resources do you
>>     have to develop and test this code on ?". By this did you meant
>>     that where would I test my code, for that initially I would just
>>     work upon the test data and directly take input from the dataset
>>     in my perl script ( which I would be writing) . Or if SA has any
>>     testing framework I could use that and test my script.
>>     Or If I need to write the unit tests myself - that could be done
>>     but it would be better if there is some framework that I could use.
>>
>>     Just a thought,
>>     While going through the SA source code I came across a script for
>>     that said  "This is the general class used to train a learning
>>     classifier with new samples of spam and ham mail, and classify
>>     based on prior training." in its comments.
>>     But I guess this is primarily for Bayesian filtering.
>>     If this is the case I can design a similar script for my testing
>>     purpose.
>>
>>     One more thing , once I am done with the coding part , I can just
>>     put a off the filter on the other rules that SA uses to filter
>>     spams and then in turn just put on the the filter for my code.
>>     This would guarantee that everything is working fine and then I
>>     would have to focus just on improving the performance of the
>>     filtering process.
>>
>>     So what I plan for the upcoming week is to take a deeper look
>>     into the SA source code ( The part where Bayesian filtering is
>>     implemented ) and meanwhile learning perl and redis side by side.
>>
>>     What else do you want me to do ? Your suggestions are most
>>     welcome and would help me to have a better understanding about
>>     the SA project and how to get things done.
>>
>>     Cheers,
>>     Sarang
>>
>>     On Fri, Apr 3, 2015 at 11:47 PM, Kevin A. McGrail
>>     <KMcGrail@pccc.com <ma...@pccc.com>> wrote:
>>
>>         Hi Sarang,
>>
>>         I've mentored in past GSOCs so I'm interested in helping you
>>         but I am concerned about your proposal and the SpamAssassin
>>         project.  So I can't sign off on it as-is but I'd like to see
>>         if we can fix that.
>>
>>         The SA project is built on plugins primarily in perl.  I
>>         didn't see perl or Redis in your proficiencies which I have
>>         no doubt you can learn but I'd like to know more about your
>>         plans with that.
>>
>>         You also mentioned a data set and I'm not sure what data set
>>         you plan to use for testing. Additionally, what resources do
>>         you have to develop and test this code on?  These may be
>>         simple or difficult hurdles but they merit attention.
>>
>>         Just replacing spamassassin where rspamd exists doesn't
>>         really mean the Project Proposal is ready to go because of
>>         things like the plugin  language (not lua), etc.
>>
>>         Can you look at SA and delve a bit more into the end goal
>>         with your proposal for SA?  I understand completely if this
>>         isn't a fit so don't hesitate to bow out.
>>
>>         regards,
>>         KAM
>>
>>
>>         On 4/3/2015 1:06 PM, Sarang Shrivastava wrote:
>>>         Hello all,
>>>
>>>         I am Sarang Shrivastava, an open source enthusiast from
>>>          MNNIT, Allahabad,India.
>>>
>>>         While applying for this year's GSOC I committed a blunder,
>>>         in the initial phase I was interested in working with the
>>>         RSPAMD organisation ( Basically a SPAM filter ) and was
>>>         working on the idea of "IMPLEMENTING META-STATISTIC ALGORITHMS".
>>>         But while submitting the proposal I accidentally submitted
>>>         it with the Apache software foundation.
>>>
>>>         I asked the mentors of both Rspamd and Apache to somehow
>>>         transfer my proposal to Rspamd but this can't happen now.
>>>
>>>         The thing is my proposal is not organisation specific.Any
>>>         open source spam filtering project that does not has this
>>>         idea can take the advantage of it.I went through the
>>>         Spamassasin wiki page and found out that it only has
>>>         Bayesian filtering as statistical classification technique,
>>>         but the other machine learning methods that I have listed in
>>>         my proposal could surprisingly increase the efficiently of
>>>         the spam filtering process.
>>>
>>>         So, it would really be appreciating if anyone could mentor
>>>         me throughout the GSOC period. I want to work on this
>>>         proposal but unless an until anyone of you signs up as a
>>>         mentor and accept my proposal in Melange before 12th of
>>>         April I cannot work on it further.
>>>
>>>         Please I kindly request if anyone among you who is
>>>         interested in my idea , please be my mentor. I am sure that
>>>         given a chance to prove myself, I would not disappoint you.
>>>
>>>         The link to my proposal is
>>>         :https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2015/xlr_24/5629499534213120
>>>
>>>         I have also enclosed a copy of my proposal as an attachment.
>>>         PS: In my attached proposal wherever I wrote rspamd , I have
>>>         replaced it with Spamassasin.
>>>
>>>         Cheers,
>>>         Sarang
>>>
>>>
>>>
>>>         -- 
>>>         *Sarang Shrivastava*
>>>         *Computer Science & Engineering*
>>>         *MNNIT Allahabad*
>>
>>
>>
>>
>>
>>     -- 
>>     *Sarang Shrivastava*
>>     *Computer Science & Engineering*
>>     *MNNIT Allahabad*
>>
>>
>>
>>     -- 
>>     *Sarang Shrivastava*
>>     *Computer Science & Engineering*
>>     *MNNIT Allahabad*
>
>
>     -- 
>     *Kevin A. McGrail*
>     President
>
>     Peregrine Computer Consultants Corporation
>     3927 Old Lee Highway, Suite 102-C
>     Fairfax, VA 22030-2422
>
>     http://www.pccc.com/
>
>     703-359-9700 x50 / 800-823-8402 (Toll-Free)
>     703-798-0171 (wireless)
>     KMcGrail@PCCC.com <ma...@pccc.com>
>
>
>
>
> -- 
> *Sarang Shrivastava*
> *Computer Science & Engineering*
> *MNNIT Allahabad*


-- 
*Kevin A. McGrail*
President

Peregrine Computer Consultants Corporation
3927 Old Lee Highway, Suite 102-C
Fairfax, VA 22030-2422

http://www.pccc.com/

703-359-9700 x50 / 800-823-8402 (Toll-Free)
703-798-0171 (wireless)
KMcGrail@PCCC.com <ma...@pccc.com>

Re: Fwd: LOOKING OUT FOR A MENTOR FOR GSOC 2015

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 4/7/2015 12:58 PM, Sarang Shrivastava wrote:
> Hello Kevin,
>
> Looking forward to hear from you soon !
> Is there any other mode of communication that we can use ?? What do 
> you suggest ? I guess Skype or Google hangout should solve the issue.
Hi Sarang,

This is a holiday week for many in the US so please expect some delays 
as I am up to my ears at work with people on vacation, etc.

That said, I have just completed the process to officially become a mentor.

Next steps are:

- Tomorrow I will read your email about the proposal modifications and 
give my feedback.  I will also solicit community feedback.
- From there, we need to then have you update the proposal on the GSOC 
system (assuming it can be done)
- And we'll make sure we agree with the proposal and then figure out how 
we accept you on behalf of the org to mentor you on that project.
- In the meantime, you should join the users@ and dev@ spamassassin lists.

After those steps, we can discuss a set schedule offlist for you and I 
to schedule meetings every few days for the mentor check-ins.

Regards,
KAM

Re: Fwd: LOOKING OUT FOR A MENTOR FOR GSOC 2015

Posted by Sarang Shrivastava <sa...@gmail.com>.

Hello Kevin,

Looking forward to hear from you soon !
Is there any other mode of communication that we can use ?? What do you
suggest ? I guess Skype or Google hangout should solve the issue.

Cheers,
Sarang

On Mon, Apr 6, 2015 at 3:07 PM, Sarang Shrivastava <sa...@gmail.com>
wrote:

> Hello Kevin,
>
> Sorry for the late reply, was out of town actually.Well the answers to
> your queries are ( I have tried answering them but I am not sure about
> their correctness ) :-
>
> So presently SA uses Bayesian classifier together with some additional DNS
> filters to check for Spam.
> Firstly the present SA doesn't use any of the neural nets model that I
> have listed in my proposal.Secondly the new words that are not present in
> the Bayes database, SA assigns a very high probability to it. But there are
> chances that together with some garbage there can be a meaning full message
> along with it.
>
> Regarding the plugins, what I think is going with multiple plugins within
> a module named statistical classifier. So , this module will contain all
> the plugins for the models that I have listed together. By default, The
> model that gives the best result out of the listed models will be on. But
> the user will be given the flexibility to choose from a range of
> statistical plugins to choose from, so that if in the future any additional
> methods together with present methods give a better result, so that plugin
> can be switched on.
>
> Well the neural nets that I have listed in my proposal seems to be better
> on paper because of the following reasons:-
>
> 1) NB is not so scalable algorithm for large number of emails and where
> most of the words are random nouns.
>
> 2) NB is a good approach to classify emails but it doesn't do well at all
> in front of good unsupervised learning because size   of the email
> doesn't really determine its importance, even small e mails can be spam
> free.
>
> For example,
>
>  a) we are considering you for a job ..
>
> b) and an urgent job is posted for you ..
>
> Now sentence number one might be from a possible employer with whom you
> have applied whereas sentence two is 90% from those spammers which send
> random job requirements to people
> The second one falls more into the category of a spam, So if the user
> classifies it as a spam, then the weightage of the combine "urgent job"
> will be more than the rest of the features, But in case of Bayesian
> filtering each and ebery features are considered independent. The grammar
> and language features are not included with the Bayesian filtering but with
> the neural nets a lot of things can come into action.
>
> A thought which you would like to consider:
>
> I didn't listed SVM in my proposal because of the fact that while I was
> attached with RSPAMD, there were two separate ideas, one was for
> implementing the supervised neural nets, the other was of unsupervised SVM,
> SA project has neither of them, So what I was thinking that considering SVM
> also is not a bad option, because in cases where frequency of words matter
> SVM give the best result.
>
>
>
>
>
>
>
> On Sun, Apr 5, 2015 at 2:16 AM, Kevin A. McGrail <KM...@pccc.com>
> wrote:
>
>>  Thanks Sarang.  I got your email to my address as well but it's a
>> holiday weekend for me in the states. (Happy Easter to all those who
>> celebrate!)
>>
>> It looks to me like you understand that programming is a state of mind
>> not a language which is good and you are capable of switching gears.
>>
>> I will sign up as a mentor on Monday and let you know when that is done.
>>
>> From there, you can look at the SA code and answer the basic questions
>> below because to me your proposal needs clarification to switch to this
>> project.  There is a lot of information in the proposal I don't grok so I
>> think the basic high level questions for you are:
>>
>> - What does SA have now related to your proposal?
>> - What you propose?  A plugin?  Multiple plugins?
>> - Why is this anticipated to be better than what exists now.
>>
>> Regards,
>> KAM
>>
>>
>>
>>
>>
>> On 4/4/2015 4:17 PM, Sarang Shrivastava wrote:
>>
>>
>> ---------- Forwarded message ----------
>> From: Sarang Shrivastava <sa...@gmail.com>
>> Date: Sat, Apr 4, 2015 at 11:15 AM
>> Subject: Re: LOOKING OUT FOR A MENTOR FOR GSOC 2015
>> To: "Kevin A. McGrail" <KM...@pccc.com>
>>
>>
>>  Hi Kevin,
>>
>>  Before I came in contact with Rspamd I didn't knew lua at all, but
>> within a week I was proficient enough so that I could atleast be able to
>> understand the part written in lua (in the rspamd source code). As you know
>> necessity is the mother of all inventions, learning perl and redis  would
>> not be a hurdle.
>>
>>  I was just worried about the fact that first of all I need to look up
>> for mentor, and now when I have one with me (hopefully you seem to be
>> interested) , so starting from today itself I will dig more into the source
>> code of SA and brush upon my perl and redis skills.
>>
>>  Regarding the dataset What I plan is :
>>
>>  Firstly I could directly use the famous enron dataset for spam filters
>> :-
>> http://www.aueb.gr/users/ion/data/enron-spam/
>>
>>  Secondly one more thing can be done, I take the spam dataset from :
>> http://untroubled.org/spam/
>>  which has a collection of spams from 1998-2011 and take the ham dataset
>> from my own mail account by importing my or for the matter of fact anyones
>> mails from the gmail server.
>> https://www.mattcutts.com/blog/backup-gmail-in-linux-with-getmail/
>>
>>  I'll set up my development environment today itself . I didn't got one
>> of your questions "Additionally, what resources do you have to develop and
>> test this code on ?". By this did you meant that where would I test my
>> code, for that initially I would just work upon the test data and directly
>> take input from the dataset in my perl script ( which I would be writing) .
>> Or if  SA has any testing framework I could use that and test my script.
>>  Or If I need to write the unit tests myself - that could be done but it
>> would be better if there is some framework that I could use.
>>
>>  Just a thought,
>> While going through the SA source code I came across a script for that
>> said  "This is the general class used to train a learning classifier with
>> new samples of spam and ham mail, and classify based on prior training." in
>> its comments.
>>  But I guess this is primarily for Bayesian filtering.
>>  If this is the case I can design a similar script for my testing
>> purpose.
>>
>>  One more thing , once I am done with the coding part , I can just put a
>> off the filter on the other rules that SA uses to filter spams and then in
>> turn just put on the the filter for my code. This would guarantee that
>> everything is working fine and then I would have to focus just on improving
>> the performance of the filtering process.
>>
>>  So what I plan for the upcoming week is to take a deeper look into the
>> SA source code ( The part where Bayesian filtering is implemented ) and
>> meanwhile learning perl and redis side by side.
>>
>>  What else do you want me to do ? Your suggestions are most welcome and
>> would help me to have a better understanding about the SA project and how
>> to get things done.
>>
>>  Cheers,
>>  Sarang
>>
>> On Fri, Apr 3, 2015 at 11:47 PM, Kevin A. McGrail <KM...@pccc.com>
>> wrote:
>>
>>>  Hi Sarang,
>>>
>>> I've mentored in past GSOCs so I'm interested in helping you but I am
>>> concerned about your proposal and the SpamAssassin project.  So I can't
>>> sign off on it as-is but I'd like to see if we can fix that.
>>>
>>> The SA project is built on plugins primarily in perl.  I didn't see perl
>>> or Redis in your proficiencies which I have no doubt you can learn but I'd
>>> like to know more about your plans with that.
>>>
>>> You also mentioned a data set and I'm not sure what data set you plan to
>>> use for testing.  Additionally, what resources do you have to develop and
>>> test this code on?  These may be simple or difficult hurdles but they merit
>>> attention.
>>>
>>> Just replacing spamassassin where rspamd exists doesn't really mean the
>>> Project Proposal is ready to go because of things like the plugin  language
>>> (not lua), etc.
>>>
>>> Can you look at SA and delve a bit more into the end goal with your
>>> proposal for SA?  I understand completely if this isn't a fit so don't
>>> hesitate to bow out.
>>>
>>> regards,
>>> KAM
>>>
>>>
>>> On 4/3/2015 1:06 PM, Sarang Shrivastava wrote:
>>>
>>>  Hello all,
>>>
>>>  I am Sarang Shrivastava, an open source enthusiast from  MNNIT,
>>> Allahabad,India.
>>>
>>>  While applying for this year's GSOC I committed a blunder, in the
>>> initial phase I was interested in working with the RSPAMD organisation (
>>> Basically a SPAM filter ) and was working on the idea of "IMPLEMENTING
>>> META-STATISTIC ALGORITHMS".
>>> But while submitting the proposal I accidentally submitted it with the
>>> Apache software foundation.
>>>
>>>  I asked the mentors of both Rspamd and Apache to somehow transfer my
>>> proposal to Rspamd but this can't happen now.
>>>
>>>  The thing is my proposal is not organisation specific.Any open source
>>> spam filtering project that does not has this idea can take the advantage
>>> of it.I went through the Spamassasin wiki page and found out that it
>>> only has Bayesian filtering as statistical classification technique, but
>>> the other machine learning methods that I have listed in my proposal
>>> could surprisingly increase the efficiently of the spam filtering process.
>>>
>>>  So, it would really be appreciating if anyone could mentor me
>>> throughout the GSOC period. I want to work on this proposal but unless
>>> an until anyone of you signs up as a mentor and accept my proposal in
>>> Melange before 12th of April I cannot work on it further.
>>>
>>>  Please I kindly request if anyone among you who is interested in my
>>> idea , please be my mentor. I am sure that given a chance to prove myself,
>>> I would not disappoint you.
>>>
>>>  The link to my proposal is :
>>> https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2015/xlr_24/5629499534213120
>>>
>>>  I have also enclosed a copy of my proposal as an attachment.
>>> PS: In my attached proposal wherever I wrote rspamd , I have replaced it
>>> with Spamassasin.
>>>
>>>  Cheers,
>>> Sarang
>>>
>>>
>>>
>>>  --
>>> *Sarang Shrivastava*
>>> *Computer Science & Engineering*
>>> *MNNIT Allahabad*
>>>
>>>
>>>
>>>
>>
>>
>> --
>> *Sarang Shrivastava*
>> *Computer Science & Engineering*
>> *MNNIT Allahabad*
>>
>>
>>
>>  --
>> *Sarang Shrivastava*
>> *Computer Science & Engineering*
>> *MNNIT Allahabad*
>>
>>
>>
>> --
>> *Kevin A. McGrail*
>> President
>>
>> Peregrine Computer Consultants Corporation
>> 3927 Old Lee Highway, Suite 102-C
>> Fairfax, VA 22030-2422
>>
>> http://www.pccc.com/
>>
>> 703-359-9700 x50 / 800-823-8402 (Toll-Free)
>> 703-798-0171 (wireless)
>> KMcGrail@PCCC.com <km...@pccc.com>
>>
>>
>
>
> --
> *Sarang Shrivastava*
> *Computer Science & Engineering*
> *MNNIT Allahabad*
>



-- 
*Sarang Shrivastava*
*Computer Science & Engineering*
*MNNIT Allahabad*

Re: Fwd: LOOKING OUT FOR A MENTOR FOR GSOC 2015

Posted by Sarang Shrivastava <sa...@gmail.com>.

Hello Kevin,

Sorry for the late reply, was out of town actually.Well the answers to your
queries are ( I have tried answering them but I am not sure about their
correctness ) :-

So presently SA uses Bayesian classifier together with some additional DNS
filters to check for Spam.
Firstly the present SA doesn't use any of the neural nets model that I have
listed in my proposal.Secondly the new words that are not present in the
Bayes database, SA assigns a very high probability to it. But there are
chances that together with some garbage there can be a meaning full message
along with it.

Regarding the plugins, what I think is going with multiple plugins within a
module named statistical classifier. So , this module will contain all the
plugins for the models that I have listed together. By default, The model
that gives the best result out of the listed models will be on. But the
user will be given the flexibility to choose from a range of statistical
plugins to choose from, so that if in the future any additional methods
together with present methods give a better result, so that plugin can be
switched on.

Well the neural nets that I have listed in my proposal seems to be better
on paper because of the following reasons:-

1) NB is not so scalable algorithm for large number of emails and where
most of the words are random nouns.

2) NB is a good approach to classify emails but it doesn't do well at all
in front of good unsupervised learning because size   of the email doesn't
really determine its importance, even small e mails can be spam free.

For example,

 a) we are considering you for a job ..

b) and an urgent job is posted for you ..

Now sentence number one might be from a possible employer with whom you
have applied whereas sentence two is 90% from those spammers which send
random job requirements to people
The second one falls more into the category of a spam, So if the user
classifies it as a spam, then the weightage of the combine "urgent job"
will be more than the rest of the features, But in case of Bayesian
filtering each and ebery features are considered independent. The grammar
and language features are not included with the Bayesian filtering but with
the neural nets a lot of things can come into action.

A thought which you would like to consider:

I didn't listed SVM in my proposal because of the fact that while I was
attached with RSPAMD, there were two separate ideas, one was for
implementing the supervised neural nets, the other was of unsupervised SVM,
SA project has neither of them, So what I was thinking that considering SVM
also is not a bad option, because in cases where frequency of words matter
SVM give the best result.







On Sun, Apr 5, 2015 at 2:16 AM, Kevin A. McGrail <KM...@pccc.com> wrote:

>  Thanks Sarang.  I got your email to my address as well but it's a
> holiday weekend for me in the states. (Happy Easter to all those who
> celebrate!)
>
> It looks to me like you understand that programming is a state of mind not
> a language which is good and you are capable of switching gears.
>
> I will sign up as a mentor on Monday and let you know when that is done.
>
> From there, you can look at the SA code and answer the basic questions
> below because to me your proposal needs clarification to switch to this
> project.  There is a lot of information in the proposal I don't grok so I
> think the basic high level questions for you are:
>
> - What does SA have now related to your proposal?
> - What you propose?  A plugin?  Multiple plugins?
> - Why is this anticipated to be better than what exists now.
>
> Regards,
> KAM
>
>
>
>
>
> On 4/4/2015 4:17 PM, Sarang Shrivastava wrote:
>
>
> ---------- Forwarded message ----------
> From: Sarang Shrivastava <sa...@gmail.com>
> Date: Sat, Apr 4, 2015 at 11:15 AM
> Subject: Re: LOOKING OUT FOR A MENTOR FOR GSOC 2015
> To: "Kevin A. McGrail" <KM...@pccc.com>
>
>
>  Hi Kevin,
>
>  Before I came in contact with Rspamd I didn't knew lua at all, but
> within a week I was proficient enough so that I could atleast be able to
> understand the part written in lua (in the rspamd source code). As you know
> necessity is the mother of all inventions, learning perl and redis  would
> not be a hurdle.
>
>  I was just worried about the fact that first of all I need to look up
> for mentor, and now when I have one with me (hopefully you seem to be
> interested) , so starting from today itself I will dig more into the source
> code of SA and brush upon my perl and redis skills.
>
>  Regarding the dataset What I plan is :
>
>  Firstly I could directly use the famous enron dataset for spam filters :-
> http://www.aueb.gr/users/ion/data/enron-spam/
>
>  Secondly one more thing can be done, I take the spam dataset from :
> http://untroubled.org/spam/
>  which has a collection of spams from 1998-2011 and take the ham dataset
> from my own mail account by importing my or for the matter of fact anyones
> mails from the gmail server.
> https://www.mattcutts.com/blog/backup-gmail-in-linux-with-getmail/
>
>  I'll set up my development environment today itself . I didn't got one
> of your questions "Additionally, what resources do you have to develop and
> test this code on ?". By this did you meant that where would I test my
> code, for that initially I would just work upon the test data and directly
> take input from the dataset in my perl script ( which I would be writing) .
> Or if  SA has any testing framework I could use that and test my script.
>  Or If I need to write the unit tests myself - that could be done but it
> would be better if there is some framework that I could use.
>
>  Just a thought,
> While going through the SA source code I came across a script for that
> said  "This is the general class used to train a learning classifier with
> new samples of spam and ham mail, and classify based on prior training." in
> its comments.
>  But I guess this is primarily for Bayesian filtering.
>  If this is the case I can design a similar script for my testing purpose.
>
>  One more thing , once I am done with the coding part , I can just put a
> off the filter on the other rules that SA uses to filter spams and then in
> turn just put on the the filter for my code. This would guarantee that
> everything is working fine and then I would have to focus just on improving
> the performance of the filtering process.
>
>  So what I plan for the upcoming week is to take a deeper look into the
> SA source code ( The part where Bayesian filtering is implemented ) and
> meanwhile learning perl and redis side by side.
>
>  What else do you want me to do ? Your suggestions are most welcome and
> would help me to have a better understanding about the SA project and how
> to get things done.
>
>  Cheers,
>  Sarang
>
> On Fri, Apr 3, 2015 at 11:47 PM, Kevin A. McGrail <KM...@pccc.com>
> wrote:
>
>>  Hi Sarang,
>>
>> I've mentored in past GSOCs so I'm interested in helping you but I am
>> concerned about your proposal and the SpamAssassin project.  So I can't
>> sign off on it as-is but I'd like to see if we can fix that.
>>
>> The SA project is built on plugins primarily in perl.  I didn't see perl
>> or Redis in your proficiencies which I have no doubt you can learn but I'd
>> like to know more about your plans with that.
>>
>> You also mentioned a data set and I'm not sure what data set you plan to
>> use for testing.  Additionally, what resources do you have to develop and
>> test this code on?  These may be simple or difficult hurdles but they merit
>> attention.
>>
>> Just replacing spamassassin where rspamd exists doesn't really mean the
>> Project Proposal is ready to go because of things like the plugin  language
>> (not lua), etc.
>>
>> Can you look at SA and delve a bit more into the end goal with your
>> proposal for SA?  I understand completely if this isn't a fit so don't
>> hesitate to bow out.
>>
>> regards,
>> KAM
>>
>>
>> On 4/3/2015 1:06 PM, Sarang Shrivastava wrote:
>>
>>  Hello all,
>>
>>  I am Sarang Shrivastava, an open source enthusiast from  MNNIT,
>> Allahabad,India.
>>
>>  While applying for this year's GSOC I committed a blunder, in the
>> initial phase I was interested in working with the RSPAMD organisation (
>> Basically a SPAM filter ) and was working on the idea of "IMPLEMENTING
>> META-STATISTIC ALGORITHMS".
>> But while submitting the proposal I accidentally submitted it with the
>> Apache software foundation.
>>
>>  I asked the mentors of both Rspamd and Apache to somehow transfer my
>> proposal to Rspamd but this can't happen now.
>>
>>  The thing is my proposal is not organisation specific.Any open source
>> spam filtering project that does not has this idea can take the advantage
>> of it.I went through the Spamassasin wiki page and found out that it
>> only has Bayesian filtering as statistical classification technique, but
>> the other machine learning methods that I have listed in my proposal
>> could surprisingly increase the efficiently of the spam filtering process.
>>
>>  So, it would really be appreciating if anyone could mentor me
>> throughout the GSOC period. I want to work on this proposal but unless
>> an until anyone of you signs up as a mentor and accept my proposal in
>> Melange before 12th of April I cannot work on it further.
>>
>>  Please I kindly request if anyone among you who is interested in my
>> idea , please be my mentor. I am sure that given a chance to prove myself,
>> I would not disappoint you.
>>
>>  The link to my proposal is :
>> https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2015/xlr_24/5629499534213120
>>
>>  I have also enclosed a copy of my proposal as an attachment.
>> PS: In my attached proposal wherever I wrote rspamd , I have replaced it
>> with Spamassasin.
>>
>>  Cheers,
>> Sarang
>>
>>
>>
>>  --
>> *Sarang Shrivastava*
>> *Computer Science & Engineering*
>> *MNNIT Allahabad*
>>
>>
>>
>>
>
>
> --
> *Sarang Shrivastava*
> *Computer Science & Engineering*
> *MNNIT Allahabad*
>
>
>
>  --
> *Sarang Shrivastava*
> *Computer Science & Engineering*
> *MNNIT Allahabad*
>
>
>
> --
> *Kevin A. McGrail*
> President
>
> Peregrine Computer Consultants Corporation
> 3927 Old Lee Highway, Suite 102-C
> Fairfax, VA 22030-2422
>
> http://www.pccc.com/
>
> 703-359-9700 x50 / 800-823-8402 (Toll-Free)
> 703-798-0171 (wireless)
> KMcGrail@PCCC.com <km...@pccc.com>
>
>


-- 
*Sarang Shrivastava*
*Computer Science & Engineering*
*MNNIT Allahabad*

Re: Fwd: LOOKING OUT FOR A MENTOR FOR GSOC 2015

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

Thanks Sarang.  I got your email to my address as well but it's a 
holiday weekend for me in the states. (Happy Easter to all those who 
celebrate!)

It looks to me like you understand that programming is a state of mind 
not a language which is good and you are capable of switching gears.

I will sign up as a mentor on Monday and let you know when that is done.

 From there, you can look at the SA code and answer the basic questions 
below because to me your proposal needs clarification to switch to this 
project.  There is a lot of information in the proposal I don't grok so 
I think the basic high level questions for you are:

- What does SA have now related to your proposal?
- What you propose?  A plugin?  Multiple plugins?
- Why is this anticipated to be better than what exists now.

Regards,
KAM




On 4/4/2015 4:17 PM, Sarang Shrivastava wrote:
>
> ---------- Forwarded message ----------
> From: *Sarang Shrivastava* <sarang24s@gmail.com 
> <ma...@gmail.com>>
> Date: Sat, Apr 4, 2015 at 11:15 AM
> Subject: Re: LOOKING OUT FOR A MENTOR FOR GSOC 2015
> To: "Kevin A. McGrail" <KMcGrail@pccc.com <ma...@pccc.com>>
>
>
> Hi Kevin,
>
> Before I came in contact with Rspamd I didn't knew lua at all, but 
> within a week I was proficient enough so that I could atleast be able 
> to understand the part written in lua (in the rspamd source code). As 
> you know necessity is the mother of all inventions, learning perl and 
> redis would not be a hurdle.
>
> I was just worried about the fact that first of all I need to look up 
> for mentor, and now when I have one with me (hopefully you seem to be 
> interested) , so starting from today itself I will dig more into the 
> source code of SA and brush upon my perl and redis skills.
>
> Regarding the dataset What I plan is :
>
> Firstly I could directly use the famous enron dataset for spam filters :-
> http://www.aueb.gr/users/ion/data/enron-spam/
>
> Secondly one more thing can be done, I take the spam dataset from :
> http://untroubled.org/spam/
> which has a collection of spams from 1998-2011 and take the ham 
> dataset from my own mail account by importing my or for the matter of 
> fact anyones mails from the gmail server.
> https://www.mattcutts.com/blog/backup-gmail-in-linux-with-getmail/
>
> I'll set up my development environment today itself . I didn't got one 
> of your questions "Additionally, what resources do you have to develop 
> and test this code on ?". By this did you meant that where would I 
> test my code, for that initially I would just work upon the test data 
> and directly take input from the dataset in my perl script ( which I 
> would be writing) . Or if  SA has any testing framework I could use 
> that and test my script.
> Or If I need to write the unit tests myself - that could be done but 
> it would be better if there is some framework that I could use.
>
> Just a thought,
> While going through the SA source code I came across a script for that 
> said  "This is the general class used to train a learning classifier 
> with new samples of spam and ham mail, and classify based on prior 
> training." in its comments.
> But I guess this is primarily for Bayesian filtering.
> If this is the case I can design a similar script for my testing purpose.
>
> One more thing , once I am done with the coding part , I can just put 
> a off the filter on the other rules that SA uses to filter spams and 
> then in turn just put on the the filter for my code. This would 
> guarantee that everything is working fine and then I would have to 
> focus just on improving the performance of the filtering process.
>
> So what I plan for the upcoming week is to take a deeper look into the 
> SA source code ( The part where Bayesian filtering is implemented ) 
> and meanwhile learning perl and redis side by side.
>
> What else do you want me to do ? Your suggestions are most welcome and 
> would help me to have a better understanding about the SA project and 
> how to get things done.
>
> Cheers,
> Sarang
>
> On Fri, Apr 3, 2015 at 11:47 PM, Kevin A. McGrail <KMcGrail@pccc.com 
> <ma...@pccc.com>> wrote:
>
>     Hi Sarang,
>
>     I've mentored in past GSOCs so I'm interested in helping you but I
>     am concerned about your proposal and the SpamAssassin project.  So
>     I can't sign off on it as-is but I'd like to see if we can fix that.
>
>     The SA project is built on plugins primarily in perl.  I didn't
>     see perl or Redis in your proficiencies which I have no doubt you
>     can learn but I'd like to know more about your plans with that.
>
>     You also mentioned a data set and I'm not sure what data set you
>     plan to use for testing. Additionally, what resources do you have
>     to develop and test this code on?  These may be simple or
>     difficult hurdles but they merit attention.
>
>     Just replacing spamassassin where rspamd exists doesn't really
>     mean the Project Proposal is ready to go because of things like
>     the plugin language (not lua), etc.
>
>     Can you look at SA and delve a bit more into the end goal with
>     your proposal for SA?  I understand completely if this isn't a fit
>     so don't hesitate to bow out.
>
>     regards,
>     KAM
>
>
>     On 4/3/2015 1:06 PM, Sarang Shrivastava wrote:
>>     Hello all,
>>
>>     I am Sarang Shrivastava, an open source enthusiast from  MNNIT,
>>     Allahabad,India.
>>
>>     While applying for this year's GSOC I committed a blunder, in the
>>     initial phase I was interested in working with the RSPAMD
>>     organisation ( Basically a SPAM filter ) and was working on the
>>     idea of "IMPLEMENTING META-STATISTIC ALGORITHMS".
>>     But while submitting the proposal I accidentally submitted it
>>     with the Apache software foundation.
>>
>>     I asked the mentors of both Rspamd and Apache to somehow transfer
>>     my proposal to Rspamd but this can't happen now.
>>
>>     The thing is my proposal is not organisation specific.Any open
>>     source spam filtering project that does not has this idea can
>>     take the advantage of it.I went through the Spamassasin wiki page
>>     and found out that it only has Bayesian filtering as statistical
>>     classification technique, but the other machine learning methods
>>     that I have listed in my proposal could surprisingly increase
>>     the efficiently of the spam filtering process.
>>
>>     So, it would really be appreciating if anyone could mentor me
>>     throughout the GSOC period. I want to work on this proposal but
>>     unless an until anyone of you signs up as a mentor and accept my
>>     proposal in Melange before 12th of April I cannot work on it further.
>>
>>     Please I kindly request if anyone among you who is interested in
>>     my idea , please be my mentor. I am sure that given a chance to
>>     prove myself, I would not disappoint you.
>>
>>     The link to my proposal is
>>     :https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2015/xlr_24/5629499534213120
>>
>>     I have also enclosed a copy of my proposal as an attachment.
>>     PS: In my attached proposal wherever I wrote rspamd , I have
>>     replaced it with Spamassasin.
>>
>>     Cheers,
>>     Sarang
>>
>>
>>
>>     -- 
>>     *Sarang Shrivastava*
>>     *Computer Science & Engineering*
>>     *MNNIT Allahabad*
>
>
>
>
>
> -- 
> *Sarang Shrivastava*
> *Computer Science & Engineering*
> *MNNIT Allahabad*
>
>
>
> -- 
> *Sarang Shrivastava*
> *Computer Science & Engineering*
> *MNNIT Allahabad*


-- 
*Kevin A. McGrail*
President

Peregrine Computer Consultants Corporation
3927 Old Lee Highway, Suite 102-C
Fairfax, VA 22030-2422

http://www.pccc.com/

703-359-9700 x50 / 800-823-8402 (Toll-Free)
703-798-0171 (wireless)
KMcGrail@PCCC.com <ma...@pccc.com>

Fwd: LOOKING OUT FOR A MENTOR FOR GSOC 2015

Posted by Sarang Shrivastava <sa...@gmail.com>.

---------- Forwarded message ----------
From: Sarang Shrivastava <sa...@gmail.com>
Date: Sat, Apr 4, 2015 at 11:15 AM
Subject: Re: LOOKING OUT FOR A MENTOR FOR GSOC 2015
To: "Kevin A. McGrail" <KM...@pccc.com>

Hi Kevin,

Before I came in contact with Rspamd I didn't knew lua at all, but within a
week I was proficient enough so that I could atleast be able to understand
the part written in lua (in the rspamd source code). As you know necessity
is the mother of all inventions, learning perl and redis  would not be a
hurdle.

I was just worried about the fact that first of all I need to look up for
mentor, and now when I have one with me (hopefully you seem to be
interested) , so starting from today itself I will dig more into the source
code of SA and brush upon my perl and redis skills.

Regarding the dataset What I plan is :

Firstly I could directly use the famous enron dataset for spam filters :-
http://www.aueb.gr/users/ion/data/enron-spam/

Secondly one more thing can be done, I take the spam dataset from :
http://untroubled.org/spam/
which has a collection of spams from 1998-2011 and take the ham dataset
from my own mail account by importing my or for the matter of fact anyones
mails from the gmail server.
https://www.mattcutts.com/blog/backup-gmail-in-linux-with-getmail/

I'll set up my development environment today itself . I didn't got one of
your questions "Additionally, what resources do you have to develop and
test this code on ?". By this did you meant that where would I test my
code, for that initially I would just work upon the test data and directly
take input from the dataset in my perl script ( which I would be writing) .
Or if  SA has any testing framework I could use that and test my script.
Or If I need to write the unit tests myself - that could be done but it
would be better if there is some framework that I could use.

Just a thought,
While going through the SA source code I came across a script for that
said  "This is the general class used to train a learning classifier with
new samples of spam and ham mail, and classify based on prior training." in
its comments.
But I guess this is primarily for Bayesian filtering.
If this is the case I can design a similar script for my testing purpose.

One more thing , once I am done with the coding part , I can just put a off
the filter on the other rules that SA uses to filter spams and then in turn
just put on the the filter for my code. This would guarantee that
everything is working fine and then I would have to focus just on improving
the performance of the filtering process.

So what I plan for the upcoming week is to take a deeper look into the SA
source code ( The part where Bayesian filtering is implemented ) and
meanwhile learning perl and redis side by side.

What else do you want me to do ? Your suggestions are most welcome and
would help me to have a better understanding about the SA project and how
to get things done.

Cheers,
Sarang

On Fri, Apr 3, 2015 at 11:47 PM, Kevin A. McGrail <KM...@pccc.com> wrote:

>  Hi Sarang,
>
> I've mentored in past GSOCs so I'm interested in helping you but I am
> concerned about your proposal and the SpamAssassin project.  So I can't
> sign off on it as-is but I'd like to see if we can fix that.
>
> The SA project is built on plugins primarily in perl.  I didn't see perl
> or Redis in your proficiencies which I have no doubt you can learn but I'd
> like to know more about your plans with that.
>
> You also mentioned a data set and I'm not sure what data set you plan to
> use for testing.  Additionally, what resources do you have to develop and
> test this code on?  These may be simple or difficult hurdles but they merit
> attention.
>
> Just replacing spamassassin where rspamd exists doesn't really mean the
> Project Proposal is ready to go because of things like the plugin  language
> (not lua), etc.
>
> Can you look at SA and delve a bit more into the end goal with your
> proposal for SA?  I understand completely if this isn't a fit so don't
> hesitate to bow out.
>
> regards,
> KAM
>
>
> On 4/3/2015 1:06 PM, Sarang Shrivastava wrote:
>
>  Hello all,
>
>  I am Sarang Shrivastava, an open source enthusiast from  MNNIT,
> Allahabad,India.
>
>  While applying for this year's GSOC I committed a blunder, in the
> initial phase I was interested in working with the RSPAMD organisation (
> Basically a SPAM filter ) and was working on the idea of "IMPLEMENTING
> META-STATISTIC ALGORITHMS".
> But while submitting the proposal I accidentally submitted it with the
> Apache software foundation.
>
>  I asked the mentors of both Rspamd and Apache to somehow transfer my
> proposal to Rspamd but this can't happen now.
>
>  The thing is my proposal is not organisation specific.Any open source
> spam filtering project that does not has this idea can take the advantage
> of it.I went through the Spamassasin wiki page and found out that it only
> has Bayesian filtering as statistical classification technique, but the
> other machine learning methods that I have listed in my proposal
> could surprisingly increase the efficiently of the spam filtering process.
>
>  So, it would really be appreciating if anyone could mentor me throughout
> the GSOC period. I want to work on this proposal but unless an until anyone
> of you signs up as a mentor and accept my proposal in Melange before 12th
> of April I cannot work on it further.
>
>  Please I kindly request if anyone among you who is interested in my idea
> , please be my mentor. I am sure that given a chance to prove myself, I
> would not disappoint you.
>
>  The link to my proposal is :
> https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2015/xlr_24/5629499534213120
>
>  I have also enclosed a copy of my proposal as an attachment.
> PS: In my attached proposal wherever I wrote rspamd , I have replaced it
> with Spamassasin.
>
>  Cheers,
> Sarang
>
>
>
>  --
> *Sarang Shrivastava*
> *Computer Science & Engineering*
> *MNNIT Allahabad*
>
>
>
>

-- 
*Sarang Shrivastava*
*Computer Science & Engineering*
*MNNIT Allahabad*

-- 
*Sarang Shrivastava*
*Computer Science & Engineering*
*MNNIT Allahabad*

Re: LOOKING OUT FOR A MENTOR FOR GSOC 2015

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

Hi Sarang,

I've mentored in past GSOCs so I'm interested in helping you but I am 
concerned about your proposal and the SpamAssassin project.  So I can't 
sign off on it as-is but I'd like to see if we can fix that.

The SA project is built on plugins primarily in perl.  I didn't see perl 
or Redis in your proficiencies which I have no doubt you can learn but 
I'd like to know more about your plans with that.

You also mentioned a data set and I'm not sure what data set you plan to 
use for testing.  Additionally, what resources do you have to develop 
and test this code on?  These may be simple or difficult hurdles but 
they merit attention.

Just replacing spamassassin where rspamd exists doesn't really mean the 
Project Proposal is ready to go because of things like the plugin  
language (not lua), etc.

Can you look at SA and delve a bit more into the end goal with your 
proposal for SA?  I understand completely if this isn't a fit so don't 
hesitate to bow out.

regards,
KAM

On 4/3/2015 1:06 PM, Sarang Shrivastava wrote:
> Hello all,
>
> I am Sarang Shrivastava, an open source enthusiast from  MNNIT, 
> Allahabad,India.
>
> While applying for this year's GSOC I committed a blunder, in the 
> initial phase I was interested in working with the RSPAMD organisation 
> ( Basically a SPAM filter ) and was working on the idea of 
> "IMPLEMENTING META-STATISTIC ALGORITHMS".
> But while submitting the proposal I accidentally submitted it with the 
> Apache software foundation.
>
> I asked the mentors of both Rspamd and Apache to somehow transfer my 
> proposal to Rspamd but this can't happen now.
>
> The thing is my proposal is not organisation specific.Any open source 
> spam filtering project that does not has this idea can take the 
> advantage of it.I went through the Spamassasin wiki page and found out 
> that it only has Bayesian filtering as statistical classification 
> technique, but the other machine learning methods that I have listed 
> in my proposal could surprisingly increase the efficiently of 
> the spam filtering process.
>
> So, it would really be appreciating if anyone could mentor me 
> throughout the GSOC period. I want to work on this proposal but unless 
> an until anyone of you signs up as a mentor and accept my proposal in 
> Melange before 12th of April I cannot work on it further.
>
> Please I kindly request if anyone among you who is interested in my 
> idea , please be my mentor. I am sure that given a chance to prove 
> myself, I would not disappoint you.
>
> The link to my proposal is 
> :https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2015/xlr_24/5629499534213120
>
> I have also enclosed a copy of my proposal as an attachment.
> PS: In my attached proposal wherever I wrote rspamd , I have replaced 
> it with Spamassasin.
>
> Cheers,
> Sarang
>
>
>
> -- 
> *Sarang Shrivastava*
> *Computer Science & Engineering*
> *MNNIT Allahabad*