You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Kevin A. McGrail" <km...@apache.org> on 2018/03/20 11:47:59 UTC

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

+users

All we give is feedback.  The submission to GSoC is what matters.  So if
you mentioned perl here that's not going to carryover to the reviewers.

Can someone with fresh eyes take a look at this?  I read it too recently so
I will gloss over it too much.

Here are some posts the mentors list thought might be helpful.  The first I
believe covers someone's pov who did not get selected.

https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-real-life-experience-and-support-open-source-b1e6a664f6e4?source=linkShare-53ba2bb84284-1521381334

https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/

Regards, KAM

On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs...@iith.ac.in> wrote:

> Hi Kevin and Apache SpamAssassin Dev Community,
>
> I have resolved all the changes you suggested in the previous draft.
> 1) I mentioned about learning PERL a week before the community bonding
> period. It will not take much time. I can assure you that language is not
> going to be an issue.
> 2) I updated the biography part a bit
> 3) Significant changes have been made in the Timeline.
> 4) I'm planning to used cmake/travis ci for automated testing. If there is
> a better alternative please do suggest.
> 5) I gave links to research papers that i will be reading in the timeline.
> 6) I updated the timeline by mentioning to gain advanced information about
> email traffic and spams. I listed some links for the purpose.
> 7) I updated the credits
> 8) There are other changes made in various parts of proposal.
>
> Thanks for your previous detailed feedback.
>
> Here is link to the updated proposal
> GSoC 2018 proposal
> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
> Please rigorously review it and suggest any changes that I should make.
>
> Awaiting for a favorable response.
>
>
> Thanks...
> Saahil Sirowa
> B. Tech Computer Science and Engineering
> Indian Institute of Technology, Hyderabd
>
> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <km...@apache.org>
> wrote:
>
>> Hi Saahil
>>
>> re: Perl. As the project is primarily in Perl and you do not list that in
>> your Proficiencies or any similar languages like PHP, I would address
>> that.  The word Perl does not appear a single time.
>>
>> Your Biography is a little light on why this is something you feel you
>> can implement.  The mentors will likely NOT be able to help you with the
>> science rather focusing on the community, processes, and open source in
>> general.
>>
>> re: Email and SPam, do you have any experience with email traffic or
>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>
>> Re: Deliverables, I think you'll need to propose the first draft of
>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>> can be installed and configured to provide multiple configurable
>> statistical analysis algorithms to better identify ham (good email) and/or
>> spam (bad email)
>>
>> Please use Apache SpamAssassin to properly brand the title.
>>
>> Re: I have no input on the scheduling/timelines except that past proposal
>> I have read have included more phases and do not add "optional" items.  I'd
>> prefer to see small increments to make sure you stay on schedule and don't
>> get overwhelmed and find yourself way behind as the time progresses.
>>
>> Re: Testing Methodology, this is likely the most critical missing part.
>> I am a fan of test driven development where you set up tests that should
>> pass and fall and use continuous testing as you add code to confirm your
>> development is progressing well.
>>
>> This is especially important because spam analysis often doesn't work the
>> way people expect and tests w/statistics can help identify issues.
>>
>> For example, this is a hypothesis that this statistical algorithms will
>> be better than Bayes.  So you'll need a baseline for comparison.
>>
>> Additionally, even experts in the field are surprised when they think
>> something will prove the hamminess of an email but in fact shows the
>> opposite.  Real world example, SPF is a policy when introduced was supposed
>> to allow an automated mechanism that says "this is an email from a
>> legitimate mail server for my domain".
>>
>> However, the FIRST wave of people to adobt it were all spammers.  So it
>> became a spam indicator more than a spam indicator.  It was a very
>> interesting outcome.
>>
>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>> spam.  Have you thought about how you'll get that?  I *might* be able to
>> help but it's 50/50.
>>
>> Re: You mention reading research papers on statisical algorithms from a
>> previous proposal.  You'll want to list them to show which ones you plan to
>> study
>>
>> re: "Discussions with the SA community regarding the various types of
>> spams that the present SA can handle." is unclear.  What is a "type of
>> spam" to you?  Do you have a list of types of spam?
>>
>> re: "Brainstorming with the mentors and SA community about the various
>> input features and parameters that can have a huge impact on the overall
>> performance of the listed neural nets models." I think this is flawed.
>> There won't be a ton of people who can discuss this with you.  You'll need
>> to likely use scientific process to show what has a performance impact.
>> This is not busy work or school work.  This is an experiment that has not
>> been tried at the SA project.
>>
>> re: "actively involved with the community." is a stretch.  A few emails
>> do not active involvement make.
>>
>> re: Bonding, you might consider raising that to 1-2 major bugs and 10-20
>> minor bugs.
>>
>> Re: Credits/references, I would add more clarity about where each of
>> those references are used.
>>
>> Regards,
>> KAM
>>
>
>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by Saahil Sirowa <cs...@iith.ac.in>.
Please ignore my last e-mail

On Mon 26 Mar, 2018, 10:00 Saahil Sirowa, <cs...@iith.ac.in> wrote:

> Hi Kevin and SpamAssassin Dev Community,
> Which one would be better for testing mechanisms; Travis CI or Cmake.
>
> Thanks...
> Saahil Sirowa
> Indian Institute of Technology Hyderabad
> B. Tech Computer Science and Engineering
>
> On Mon 26 Mar, 2018, 09:58 Saahil Sirowa, <cs...@iith.ac.in>
> wrote:
>
>>
>> On Mon 26 Mar, 2018, 09:57 Saahil Sirowa, <cs...@iith.ac.in>
>> wrote:
>>
>>> Hi Kevin and SpamAssassin Dev Community,
>>> Which one would be better for testing mechanisms; Travis CI or Cmake.
>>>
>>> Thanks...
>>> Saahil Sirowa
>>> Indian Institute of Technology Hyderabad
>>> B. Tech Computer Science and Engineering
>>>
>>> On Mon 26 Mar, 2018, 07:29 Saahil Sirowa, <cs...@iith.ac.in>
>>> wrote:
>>>
>>>> Hi Kevin,
>>>> I know you have already gone through the proposal once. But, I still
>>>> request you to go through it. Your suggestions in this final phase will
>>>> prove valuable.
>>>>
>>>> Awaiting for a favorable response.
>>>>
>>>> I intentionally didn't sent this mail in dev mailing list.
>>>>
>>>> Thanks...
>>>> Saahil Sirowa
>>>> B. Tech Computer Science and Engineering
>>>> Indian Institute of Technology, Hyderabad
>>>>
>>>> On Mon, Mar 26, 2018 at 7:24 AM, Saahil Sirowa <
>>>> cs16btech11030@iith.ac.in> wrote:
>>>>
>>>>> Hi Kevin and Spam Assassin Dev Community,
>>>>> I have made some changes in the draft.
>>>>> GSoC 2018 Proposal
>>>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit?usp=sharing>
>>>>>
>>>>> I request you all to rigorously review it and suggest appropriate
>>>>> edits. As, this is the final phase of the application period(Deadline 27th
>>>>> March 16:00 UTC), I would really appreciate it If you respond before this.
>>>>> This will help me in incorporating the suggested changes in time.
>>>>>
>>>>> Thanks...
>>>>> Saahil Sirowa
>>>>> B. Tech Computer Science and Engineering
>>>>> Indian Institute of Technology, Hyderabad
>>>>>
>>>>>
>>>>> On Fri, Mar 23, 2018 at 7:55 PM, Saahil Sirowa <
>>>>> cs16btech11030@iith.ac.in> wrote:
>>>>>
>>>>>> I had some in last 2-3 days. I will update the proposal draft  with
>>>>>> required changes by tomorrow night(Sat night).
>>>>>>
>>>>>> Thanks...
>>>>>> Saahil Sirowa
>>>>>> B. Tech Computer Science and Engineering
>>>>>> Indi@n Institute of Technology, Hyderabad
>>>>>>
>>>>>> On Fri 23 Mar, 2018, 18:01 Kevin A. McGrail, <km...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Wanted to check in and see how you are doing.  THis blog post has
>>>>>>> gotten some praise
>>>>>>>
>>>>>>>
>>>>>>> https://medium.com/@owtf/google-summer-of-code-writing-a-good-proposal-141b1376f076
>>>>>>> .
>>>>>>>
>>>>>>> --
>>>>>>> Kevin A. McGrail
>>>>>>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>>>>>>> Chair Emeritus Apache SpamAssassin Project
>>>>>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>>>>>>
>>>>>>> On Wed, Mar 21, 2018 at 7:52 AM, Kevin A. McGrail <
>>>>>>> kmcgrail@apache.org> wrote:
>>>>>>>
>>>>>>>> Comments allowed might be helpful though :-)
>>>>>>>>
>>>>>>>> --
>>>>>>>> Kevin A. McGrail
>>>>>>>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>>>>>>>> Chair Emeritus Apache SpamAssassin Project
>>>>>>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>>>>>>> <(703)%20798-0171>
>>>>>>>>
>>>>>>>> On Wed, Mar 21, 2018 at 12:36 AM, Rajkiran Rajkumar <
>>>>>>>> rajkiran2507@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> @Saahil, kindly make your doc view-only for people with a link to
>>>>>>>>> it. Giving edit permissions to the world is a bad idea.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Rajkiran
>>>>>>>>>
>>>>>>>>> On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <
>>>>>>>>> kmcgrail@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> +users
>>>>>>>>>>
>>>>>>>>>> All we give is feedback.  The submission to GSoC is what
>>>>>>>>>> matters.  So if you mentioned perl here that's not going to carryover to
>>>>>>>>>> the reviewers.
>>>>>>>>>>
>>>>>>>>>> Can someone with fresh eyes take a look at this?  I read it too
>>>>>>>>>> recently so I will gloss over it too much.
>>>>>>>>>>
>>>>>>>>>> Here are some posts the mentors list thought might be helpful.
>>>>>>>>>> The first I believe covers someone's pov who did not get selected.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-real-life-experience-and-support-open-source-b1e6a664f6e4?source=linkShare-53ba2bb84284-1521381334
>>>>>>>>>>
>>>>>>>>>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>>>>>>>>>>
>>>>>>>>>> Regards, KAM
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <
>>>>>>>>>> cs16btech11030@iith.ac.in> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Kevin and Apache SpamAssassin Dev Community,
>>>>>>>>>>>
>>>>>>>>>>> I have resolved all the changes you suggested in the previous
>>>>>>>>>>> draft.
>>>>>>>>>>> 1) I mentioned about learning PERL a week before the community
>>>>>>>>>>> bonding period. It will not take much time. I can assure you that language
>>>>>>>>>>> is not going to be an issue.
>>>>>>>>>>> 2) I updated the biography part a bit
>>>>>>>>>>> 3) Significant changes have been made in the Timeline.
>>>>>>>>>>> 4) I'm planning to used cmake/travis ci for automated testing.
>>>>>>>>>>> If there is a better alternative please do suggest.
>>>>>>>>>>> 5) I gave links to research papers that i will be reading in the
>>>>>>>>>>> timeline.
>>>>>>>>>>> 6) I updated the timeline by mentioning to gain advanced
>>>>>>>>>>> information about email traffic and spams. I listed some links for the
>>>>>>>>>>> purpose.
>>>>>>>>>>> 7) I updated the credits
>>>>>>>>>>> 8) There are other changes made in various parts of proposal.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your previous detailed feedback.
>>>>>>>>>>>
>>>>>>>>>>> Here is link to the updated proposal
>>>>>>>>>>> GSoC 2018 proposal
>>>>>>>>>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>>>>>>>>>>> Please rigorously review it and suggest any changes that I
>>>>>>>>>>> should make.
>>>>>>>>>>>
>>>>>>>>>>> Awaiting for a favorable response.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks...
>>>>>>>>>>> Saahil Sirowa
>>>>>>>>>>> B. Tech Computer Science and Engineering
>>>>>>>>>>> Indian Institute of Technology, Hyderabd
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <
>>>>>>>>>>> kmcgrail@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Saahil
>>>>>>>>>>>>
>>>>>>>>>>>> re: Perl. As the project is primarily in Perl and you do not
>>>>>>>>>>>> list that in your Proficiencies or any similar languages like PHP, I would
>>>>>>>>>>>> address that.  The word Perl does not appear a single time.
>>>>>>>>>>>>
>>>>>>>>>>>> Your Biography is a little light on why this is something you
>>>>>>>>>>>> feel you can implement.  The mentors will likely NOT be able to help you
>>>>>>>>>>>> with the science rather focusing on the community, processes, and open
>>>>>>>>>>>> source in general.
>>>>>>>>>>>>
>>>>>>>>>>>> re: Email and SPam, do you have any experience with email
>>>>>>>>>>>> traffic or spam?  if so, add it.  If not, explain what you plan to do to
>>>>>>>>>>>> address that.
>>>>>>>>>>>>
>>>>>>>>>>>> Re: Deliverables, I think you'll need to propose the first
>>>>>>>>>>>> draft of that.  But your goal will likely be a plugin for Apache
>>>>>>>>>>>> SpamAssassin that can be installed and configured to provide multiple
>>>>>>>>>>>> configurable statistical analysis algorithms to better identify ham (good
>>>>>>>>>>>> email) and/or spam (bad email)
>>>>>>>>>>>>
>>>>>>>>>>>> Please use Apache SpamAssassin to properly brand the title.
>>>>>>>>>>>>
>>>>>>>>>>>> Re: I have no input on the scheduling/timelines except that
>>>>>>>>>>>> past proposal I have read have included more phases and do not add
>>>>>>>>>>>> "optional" items.  I'd prefer to see small increments to make sure you stay
>>>>>>>>>>>> on schedule and don't get overwhelmed and find yourself way behind as the
>>>>>>>>>>>> time progresses.
>>>>>>>>>>>>
>>>>>>>>>>>> Re: Testing Methodology, this is likely the most critical
>>>>>>>>>>>> missing part.  I am a fan of test driven development where you set up tests
>>>>>>>>>>>> that should pass and fall and use continuous testing as you add code to
>>>>>>>>>>>> confirm your development is progressing well.
>>>>>>>>>>>>
>>>>>>>>>>>> This is especially important because spam analysis often
>>>>>>>>>>>> doesn't work the way people expect and tests w/statistics can help identify
>>>>>>>>>>>> issues.
>>>>>>>>>>>>
>>>>>>>>>>>> For example, this is a hypothesis that this statistical
>>>>>>>>>>>> algorithms will be better than Bayes.  So you'll need a baseline for
>>>>>>>>>>>> comparison.
>>>>>>>>>>>>
>>>>>>>>>>>> Additionally, even experts in the field are surprised when they
>>>>>>>>>>>> think something will prove the hamminess of an email but in fact shows the
>>>>>>>>>>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>>>>>>>>>>> to allow an automated mechanism that says "this is an email from a
>>>>>>>>>>>> legitimate mail server for my domain".
>>>>>>>>>>>>
>>>>>>>>>>>> However, the FIRST wave of people to adobt it were all
>>>>>>>>>>>> spammers.  So it became a spam indicator more than a spam indicator.  It
>>>>>>>>>>>> was a very interesting outcome.
>>>>>>>>>>>>
>>>>>>>>>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham
>>>>>>>>>>>> and spam.  Have you thought about how you'll get that?  I *might* be able
>>>>>>>>>>>> to help but it's 50/50.
>>>>>>>>>>>>
>>>>>>>>>>>> Re: You mention reading research papers on statisical
>>>>>>>>>>>> algorithms from a previous proposal.  You'll want to list them to show
>>>>>>>>>>>> which ones you plan to study
>>>>>>>>>>>>
>>>>>>>>>>>> re: "Discussions with the SA community regarding the various
>>>>>>>>>>>> types of spams that the present SA can handle." is unclear.  What is a
>>>>>>>>>>>> "type of spam" to you?  Do you have a list of types of spam?
>>>>>>>>>>>>
>>>>>>>>>>>> re: "Brainstorming with the mentors and SA community about the
>>>>>>>>>>>> various input features and parameters that can have a huge impact on the
>>>>>>>>>>>> overall performance of the listed neural nets models." I think this is
>>>>>>>>>>>> flawed.  There won't be a ton of people who can discuss this with you.
>>>>>>>>>>>> You'll need to likely use scientific process to show what has a performance
>>>>>>>>>>>> impact.  This is not busy work or school work.  This is an experiment that
>>>>>>>>>>>> has not been tried at the SA project.
>>>>>>>>>>>>
>>>>>>>>>>>> re: "actively involved with the community." is a stretch.  A
>>>>>>>>>>>> few emails do not active involvement make.
>>>>>>>>>>>>
>>>>>>>>>>>> re: Bonding, you might consider raising that to 1-2 major bugs
>>>>>>>>>>>> and 10-20 minor bugs.
>>>>>>>>>>>>
>>>>>>>>>>>> Re: Credits/references, I would add more clarity about where
>>>>>>>>>>>> each of those references are used.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> KAM
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by Saahil Sirowa <cs...@iith.ac.in>.
Hi Kevin and SpamAssassin Dev Community,
Which one would be better for testing mechanisms; Travis CI or Cmake.

Thanks...
Saahil Sirowa
Indian Institute of Technology Hyderabad
B. Tech Computer Science and Engineering

On Mon 26 Mar, 2018, 09:58 Saahil Sirowa, <cs...@iith.ac.in> wrote:

>
> On Mon 26 Mar, 2018, 09:57 Saahil Sirowa, <cs...@iith.ac.in>
> wrote:
>
>> Hi Kevin and SpamAssassin Dev Community,
>> Which one would be better for testing mechanisms; Travis CI or Cmake.
>>
>> Thanks...
>> Saahil Sirowa
>> Indian Institute of Technology Hyderabad
>> B. Tech Computer Science and Engineering
>>
>> On Mon 26 Mar, 2018, 07:29 Saahil Sirowa, <cs...@iith.ac.in>
>> wrote:
>>
>>> Hi Kevin,
>>> I know you have already gone through the proposal once. But, I still
>>> request you to go through it. Your suggestions in this final phase will
>>> prove valuable.
>>>
>>> Awaiting for a favorable response.
>>>
>>> I intentionally didn't sent this mail in dev mailing list.
>>>
>>> Thanks...
>>> Saahil Sirowa
>>> B. Tech Computer Science and Engineering
>>> Indian Institute of Technology, Hyderabad
>>>
>>> On Mon, Mar 26, 2018 at 7:24 AM, Saahil Sirowa <
>>> cs16btech11030@iith.ac.in> wrote:
>>>
>>>> Hi Kevin and Spam Assassin Dev Community,
>>>> I have made some changes in the draft.
>>>> GSoC 2018 Proposal
>>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit?usp=sharing>
>>>>
>>>> I request you all to rigorously review it and suggest appropriate
>>>> edits. As, this is the final phase of the application period(Deadline 27th
>>>> March 16:00 UTC), I would really appreciate it If you respond before this.
>>>> This will help me in incorporating the suggested changes in time.
>>>>
>>>> Thanks...
>>>> Saahil Sirowa
>>>> B. Tech Computer Science and Engineering
>>>> Indian Institute of Technology, Hyderabad
>>>>
>>>>
>>>> On Fri, Mar 23, 2018 at 7:55 PM, Saahil Sirowa <
>>>> cs16btech11030@iith.ac.in> wrote:
>>>>
>>>>> I had some in last 2-3 days. I will update the proposal draft  with
>>>>> required changes by tomorrow night(Sat night).
>>>>>
>>>>> Thanks...
>>>>> Saahil Sirowa
>>>>> B. Tech Computer Science and Engineering
>>>>> Indi@n Institute of Technology, Hyderabad
>>>>>
>>>>> On Fri 23 Mar, 2018, 18:01 Kevin A. McGrail, <km...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Wanted to check in and see how you are doing.  THis blog post has
>>>>>> gotten some praise
>>>>>>
>>>>>>
>>>>>> https://medium.com/@owtf/google-summer-of-code-writing-a-good-proposal-141b1376f076
>>>>>> .
>>>>>>
>>>>>> --
>>>>>> Kevin A. McGrail
>>>>>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>>>>>> Chair Emeritus Apache SpamAssassin Project
>>>>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>>>>>
>>>>>> On Wed, Mar 21, 2018 at 7:52 AM, Kevin A. McGrail <
>>>>>> kmcgrail@apache.org> wrote:
>>>>>>
>>>>>>> Comments allowed might be helpful though :-)
>>>>>>>
>>>>>>> --
>>>>>>> Kevin A. McGrail
>>>>>>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>>>>>>> Chair Emeritus Apache SpamAssassin Project
>>>>>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>>>>>> <(703)%20798-0171>
>>>>>>>
>>>>>>> On Wed, Mar 21, 2018 at 12:36 AM, Rajkiran Rajkumar <
>>>>>>> rajkiran2507@gmail.com> wrote:
>>>>>>>
>>>>>>>> @Saahil, kindly make your doc view-only for people with a link to
>>>>>>>> it. Giving edit permissions to the world is a bad idea.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rajkiran
>>>>>>>>
>>>>>>>> On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <
>>>>>>>> kmcgrail@apache.org> wrote:
>>>>>>>>
>>>>>>>>> +users
>>>>>>>>>
>>>>>>>>> All we give is feedback.  The submission to GSoC is what matters.
>>>>>>>>> So if you mentioned perl here that's not going to carryover to the
>>>>>>>>> reviewers.
>>>>>>>>>
>>>>>>>>> Can someone with fresh eyes take a look at this?  I read it too
>>>>>>>>> recently so I will gloss over it too much.
>>>>>>>>>
>>>>>>>>> Here are some posts the mentors list thought might be helpful.
>>>>>>>>> The first I believe covers someone's pov who did not get selected.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-real-life-experience-and-support-open-source-b1e6a664f6e4?source=linkShare-53ba2bb84284-1521381334
>>>>>>>>>
>>>>>>>>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>>>>>>>>>
>>>>>>>>> Regards, KAM
>>>>>>>>>
>>>>>>>>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <
>>>>>>>>> cs16btech11030@iith.ac.in> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Kevin and Apache SpamAssassin Dev Community,
>>>>>>>>>>
>>>>>>>>>> I have resolved all the changes you suggested in the previous
>>>>>>>>>> draft.
>>>>>>>>>> 1) I mentioned about learning PERL a week before the community
>>>>>>>>>> bonding period. It will not take much time. I can assure you that language
>>>>>>>>>> is not going to be an issue.
>>>>>>>>>> 2) I updated the biography part a bit
>>>>>>>>>> 3) Significant changes have been made in the Timeline.
>>>>>>>>>> 4) I'm planning to used cmake/travis ci for automated testing. If
>>>>>>>>>> there is a better alternative please do suggest.
>>>>>>>>>> 5) I gave links to research papers that i will be reading in the
>>>>>>>>>> timeline.
>>>>>>>>>> 6) I updated the timeline by mentioning to gain advanced
>>>>>>>>>> information about email traffic and spams. I listed some links for the
>>>>>>>>>> purpose.
>>>>>>>>>> 7) I updated the credits
>>>>>>>>>> 8) There are other changes made in various parts of proposal.
>>>>>>>>>>
>>>>>>>>>> Thanks for your previous detailed feedback.
>>>>>>>>>>
>>>>>>>>>> Here is link to the updated proposal
>>>>>>>>>> GSoC 2018 proposal
>>>>>>>>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>>>>>>>>>> Please rigorously review it and suggest any changes that I should
>>>>>>>>>> make.
>>>>>>>>>>
>>>>>>>>>> Awaiting for a favorable response.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks...
>>>>>>>>>> Saahil Sirowa
>>>>>>>>>> B. Tech Computer Science and Engineering
>>>>>>>>>> Indian Institute of Technology, Hyderabd
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <
>>>>>>>>>> kmcgrail@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Saahil
>>>>>>>>>>>
>>>>>>>>>>> re: Perl. As the project is primarily in Perl and you do not
>>>>>>>>>>> list that in your Proficiencies or any similar languages like PHP, I would
>>>>>>>>>>> address that.  The word Perl does not appear a single time.
>>>>>>>>>>>
>>>>>>>>>>> Your Biography is a little light on why this is something you
>>>>>>>>>>> feel you can implement.  The mentors will likely NOT be able to help you
>>>>>>>>>>> with the science rather focusing on the community, processes, and open
>>>>>>>>>>> source in general.
>>>>>>>>>>>
>>>>>>>>>>> re: Email and SPam, do you have any experience with email
>>>>>>>>>>> traffic or spam?  if so, add it.  If not, explain what you plan to do to
>>>>>>>>>>> address that.
>>>>>>>>>>>
>>>>>>>>>>> Re: Deliverables, I think you'll need to propose the first draft
>>>>>>>>>>> of that.  But your goal will likely be a plugin for Apache SpamAssassin
>>>>>>>>>>> that can be installed and configured to provide multiple configurable
>>>>>>>>>>> statistical analysis algorithms to better identify ham (good email) and/or
>>>>>>>>>>> spam (bad email)
>>>>>>>>>>>
>>>>>>>>>>> Please use Apache SpamAssassin to properly brand the title.
>>>>>>>>>>>
>>>>>>>>>>> Re: I have no input on the scheduling/timelines except that past
>>>>>>>>>>> proposal I have read have included more phases and do not add "optional"
>>>>>>>>>>> items.  I'd prefer to see small increments to make sure you stay on
>>>>>>>>>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>>>>>>>>>> progresses.
>>>>>>>>>>>
>>>>>>>>>>> Re: Testing Methodology, this is likely the most critical
>>>>>>>>>>> missing part.  I am a fan of test driven development where you set up tests
>>>>>>>>>>> that should pass and fall and use continuous testing as you add code to
>>>>>>>>>>> confirm your development is progressing well.
>>>>>>>>>>>
>>>>>>>>>>> This is especially important because spam analysis often doesn't
>>>>>>>>>>> work the way people expect and tests w/statistics can help identify issues.
>>>>>>>>>>>
>>>>>>>>>>> For example, this is a hypothesis that this statistical
>>>>>>>>>>> algorithms will be better than Bayes.  So you'll need a baseline for
>>>>>>>>>>> comparison.
>>>>>>>>>>>
>>>>>>>>>>> Additionally, even experts in the field are surprised when they
>>>>>>>>>>> think something will prove the hamminess of an email but in fact shows the
>>>>>>>>>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>>>>>>>>>> to allow an automated mechanism that says "this is an email from a
>>>>>>>>>>> legitimate mail server for my domain".
>>>>>>>>>>>
>>>>>>>>>>> However, the FIRST wave of people to adobt it were all
>>>>>>>>>>> spammers.  So it became a spam indicator more than a spam indicator.  It
>>>>>>>>>>> was a very interesting outcome.
>>>>>>>>>>>
>>>>>>>>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham
>>>>>>>>>>> and spam.  Have you thought about how you'll get that?  I *might* be able
>>>>>>>>>>> to help but it's 50/50.
>>>>>>>>>>>
>>>>>>>>>>> Re: You mention reading research papers on statisical algorithms
>>>>>>>>>>> from a previous proposal.  You'll want to list them to show which ones you
>>>>>>>>>>> plan to study
>>>>>>>>>>>
>>>>>>>>>>> re: "Discussions with the SA community regarding the various
>>>>>>>>>>> types of spams that the present SA can handle." is unclear.  What is a
>>>>>>>>>>> "type of spam" to you?  Do you have a list of types of spam?
>>>>>>>>>>>
>>>>>>>>>>> re: "Brainstorming with the mentors and SA community about the
>>>>>>>>>>> various input features and parameters that can have a huge impact on the
>>>>>>>>>>> overall performance of the listed neural nets models." I think this is
>>>>>>>>>>> flawed.  There won't be a ton of people who can discuss this with you.
>>>>>>>>>>> You'll need to likely use scientific process to show what has a performance
>>>>>>>>>>> impact.  This is not busy work or school work.  This is an experiment that
>>>>>>>>>>> has not been tried at the SA project.
>>>>>>>>>>>
>>>>>>>>>>> re: "actively involved with the community." is a stretch.  A few
>>>>>>>>>>> emails do not active involvement make.
>>>>>>>>>>>
>>>>>>>>>>> re: Bonding, you might consider raising that to 1-2 major bugs
>>>>>>>>>>> and 10-20 minor bugs.
>>>>>>>>>>>
>>>>>>>>>>> Re: Credits/references, I would add more clarity about where
>>>>>>>>>>> each of those references are used.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> KAM
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by Saahil Sirowa <cs...@iith.ac.in>.
On Mon 26 Mar, 2018, 09:57 Saahil Sirowa, <cs...@iith.ac.in> wrote:

> Hi Kevin and SpamAssassin Dev Community,
> Which one would be better for testing mechanisms; Travis CI or Cmake.
>
> Thanks...
> Saahil Sirowa
> Indian Institute of Technology Hyderabad
> B. Tech Computer Science and Engineering
>
> On Mon 26 Mar, 2018, 07:29 Saahil Sirowa, <cs...@iith.ac.in>
> wrote:
>
>> Hi Kevin,
>> I know you have already gone through the proposal once. But, I still
>> request you to go through it. Your suggestions in this final phase will
>> prove valuable.
>>
>> Awaiting for a favorable response.
>>
>> I intentionally didn't sent this mail in dev mailing list.
>>
>> Thanks...
>> Saahil Sirowa
>> B. Tech Computer Science and Engineering
>> Indian Institute of Technology, Hyderabad
>>
>> On Mon, Mar 26, 2018 at 7:24 AM, Saahil Sirowa <cs16btech11030@iith.ac.in
>> > wrote:
>>
>>> Hi Kevin and Spam Assassin Dev Community,
>>> I have made some changes in the draft.
>>> GSoC 2018 Proposal
>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit?usp=sharing>
>>>
>>> I request you all to rigorously review it and suggest appropriate edits.
>>> As, this is the final phase of the application period(Deadline 27th March
>>> 16:00 UTC), I would really appreciate it If you respond before this. This
>>> will help me in incorporating the suggested changes in time.
>>>
>>> Thanks...
>>> Saahil Sirowa
>>> B. Tech Computer Science and Engineering
>>> Indian Institute of Technology, Hyderabad
>>>
>>>
>>> On Fri, Mar 23, 2018 at 7:55 PM, Saahil Sirowa <
>>> cs16btech11030@iith.ac.in> wrote:
>>>
>>>> I had some in last 2-3 days. I will update the proposal draft  with
>>>> required changes by tomorrow night(Sat night).
>>>>
>>>> Thanks...
>>>> Saahil Sirowa
>>>> B. Tech Computer Science and Engineering
>>>> Indi@n Institute of Technology, Hyderabad
>>>>
>>>> On Fri 23 Mar, 2018, 18:01 Kevin A. McGrail, <km...@apache.org>
>>>> wrote:
>>>>
>>>>> Wanted to check in and see how you are doing.  THis blog post has
>>>>> gotten some praise
>>>>>
>>>>>
>>>>> https://medium.com/@owtf/google-summer-of-code-writing-a-good-proposal-141b1376f076
>>>>> .
>>>>>
>>>>> --
>>>>> Kevin A. McGrail
>>>>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>>>>> Chair Emeritus Apache SpamAssassin Project
>>>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>>>>
>>>>> On Wed, Mar 21, 2018 at 7:52 AM, Kevin A. McGrail <kmcgrail@apache.org
>>>>> > wrote:
>>>>>
>>>>>> Comments allowed might be helpful though :-)
>>>>>>
>>>>>> --
>>>>>> Kevin A. McGrail
>>>>>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>>>>>> Chair Emeritus Apache SpamAssassin Project
>>>>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>>>>> <(703)%20798-0171>
>>>>>>
>>>>>> On Wed, Mar 21, 2018 at 12:36 AM, Rajkiran Rajkumar <
>>>>>> rajkiran2507@gmail.com> wrote:
>>>>>>
>>>>>>> @Saahil, kindly make your doc view-only for people with a link to
>>>>>>> it. Giving edit permissions to the world is a bad idea.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rajkiran
>>>>>>>
>>>>>>> On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <
>>>>>>> kmcgrail@apache.org> wrote:
>>>>>>>
>>>>>>>> +users
>>>>>>>>
>>>>>>>> All we give is feedback.  The submission to GSoC is what matters.
>>>>>>>> So if you mentioned perl here that's not going to carryover to the
>>>>>>>> reviewers.
>>>>>>>>
>>>>>>>> Can someone with fresh eyes take a look at this?  I read it too
>>>>>>>> recently so I will gloss over it too much.
>>>>>>>>
>>>>>>>> Here are some posts the mentors list thought might be helpful.  The
>>>>>>>> first I believe covers someone's pov who did not get selected.
>>>>>>>>
>>>>>>>>
>>>>>>>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-real-life-experience-and-support-open-source-b1e6a664f6e4?source=linkShare-53ba2bb84284-1521381334
>>>>>>>>
>>>>>>>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>>>>>>>>
>>>>>>>> Regards, KAM
>>>>>>>>
>>>>>>>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <
>>>>>>>> cs16btech11030@iith.ac.in> wrote:
>>>>>>>>
>>>>>>>>> Hi Kevin and Apache SpamAssassin Dev Community,
>>>>>>>>>
>>>>>>>>> I have resolved all the changes you suggested in the previous
>>>>>>>>> draft.
>>>>>>>>> 1) I mentioned about learning PERL a week before the community
>>>>>>>>> bonding period. It will not take much time. I can assure you that language
>>>>>>>>> is not going to be an issue.
>>>>>>>>> 2) I updated the biography part a bit
>>>>>>>>> 3) Significant changes have been made in the Timeline.
>>>>>>>>> 4) I'm planning to used cmake/travis ci for automated testing. If
>>>>>>>>> there is a better alternative please do suggest.
>>>>>>>>> 5) I gave links to research papers that i will be reading in the
>>>>>>>>> timeline.
>>>>>>>>> 6) I updated the timeline by mentioning to gain advanced
>>>>>>>>> information about email traffic and spams. I listed some links for the
>>>>>>>>> purpose.
>>>>>>>>> 7) I updated the credits
>>>>>>>>> 8) There are other changes made in various parts of proposal.
>>>>>>>>>
>>>>>>>>> Thanks for your previous detailed feedback.
>>>>>>>>>
>>>>>>>>> Here is link to the updated proposal
>>>>>>>>> GSoC 2018 proposal
>>>>>>>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>>>>>>>>> Please rigorously review it and suggest any changes that I should
>>>>>>>>> make.
>>>>>>>>>
>>>>>>>>> Awaiting for a favorable response.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks...
>>>>>>>>> Saahil Sirowa
>>>>>>>>> B. Tech Computer Science and Engineering
>>>>>>>>> Indian Institute of Technology, Hyderabd
>>>>>>>>>
>>>>>>>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <
>>>>>>>>> kmcgrail@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Saahil
>>>>>>>>>>
>>>>>>>>>> re: Perl. As the project is primarily in Perl and you do not list
>>>>>>>>>> that in your Proficiencies or any similar languages like PHP, I would
>>>>>>>>>> address that.  The word Perl does not appear a single time.
>>>>>>>>>>
>>>>>>>>>> Your Biography is a little light on why this is something you
>>>>>>>>>> feel you can implement.  The mentors will likely NOT be able to help you
>>>>>>>>>> with the science rather focusing on the community, processes, and open
>>>>>>>>>> source in general.
>>>>>>>>>>
>>>>>>>>>> re: Email and SPam, do you have any experience with email traffic
>>>>>>>>>> or spam?  if so, add it.  If not, explain what you plan to do to address
>>>>>>>>>> that.
>>>>>>>>>>
>>>>>>>>>> Re: Deliverables, I think you'll need to propose the first draft
>>>>>>>>>> of that.  But your goal will likely be a plugin for Apache SpamAssassin
>>>>>>>>>> that can be installed and configured to provide multiple configurable
>>>>>>>>>> statistical analysis algorithms to better identify ham (good email) and/or
>>>>>>>>>> spam (bad email)
>>>>>>>>>>
>>>>>>>>>> Please use Apache SpamAssassin to properly brand the title.
>>>>>>>>>>
>>>>>>>>>> Re: I have no input on the scheduling/timelines except that past
>>>>>>>>>> proposal I have read have included more phases and do not add "optional"
>>>>>>>>>> items.  I'd prefer to see small increments to make sure you stay on
>>>>>>>>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>>>>>>>>> progresses.
>>>>>>>>>>
>>>>>>>>>> Re: Testing Methodology, this is likely the most critical missing
>>>>>>>>>> part.  I am a fan of test driven development where you set up tests that
>>>>>>>>>> should pass and fall and use continuous testing as you add code to confirm
>>>>>>>>>> your development is progressing well.
>>>>>>>>>>
>>>>>>>>>> This is especially important because spam analysis often doesn't
>>>>>>>>>> work the way people expect and tests w/statistics can help identify issues.
>>>>>>>>>>
>>>>>>>>>> For example, this is a hypothesis that this statistical
>>>>>>>>>> algorithms will be better than Bayes.  So you'll need a baseline for
>>>>>>>>>> comparison.
>>>>>>>>>>
>>>>>>>>>> Additionally, even experts in the field are surprised when they
>>>>>>>>>> think something will prove the hamminess of an email but in fact shows the
>>>>>>>>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>>>>>>>>> to allow an automated mechanism that says "this is an email from a
>>>>>>>>>> legitimate mail server for my domain".
>>>>>>>>>>
>>>>>>>>>> However, the FIRST wave of people to adobt it were all spammers.
>>>>>>>>>> So it became a spam indicator more than a spam indicator.  It was a very
>>>>>>>>>> interesting outcome.
>>>>>>>>>>
>>>>>>>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham
>>>>>>>>>> and spam.  Have you thought about how you'll get that?  I *might* be able
>>>>>>>>>> to help but it's 50/50.
>>>>>>>>>>
>>>>>>>>>> Re: You mention reading research papers on statisical algorithms
>>>>>>>>>> from a previous proposal.  You'll want to list them to show which ones you
>>>>>>>>>> plan to study
>>>>>>>>>>
>>>>>>>>>> re: "Discussions with the SA community regarding the various
>>>>>>>>>> types of spams that the present SA can handle." is unclear.  What is a
>>>>>>>>>> "type of spam" to you?  Do you have a list of types of spam?
>>>>>>>>>>
>>>>>>>>>> re: "Brainstorming with the mentors and SA community about the
>>>>>>>>>> various input features and parameters that can have a huge impact on the
>>>>>>>>>> overall performance of the listed neural nets models." I think this is
>>>>>>>>>> flawed.  There won't be a ton of people who can discuss this with you.
>>>>>>>>>> You'll need to likely use scientific process to show what has a performance
>>>>>>>>>> impact.  This is not busy work or school work.  This is an experiment that
>>>>>>>>>> has not been tried at the SA project.
>>>>>>>>>>
>>>>>>>>>> re: "actively involved with the community." is a stretch.  A few
>>>>>>>>>> emails do not active involvement make.
>>>>>>>>>>
>>>>>>>>>> re: Bonding, you might consider raising that to 1-2 major bugs
>>>>>>>>>> and 10-20 minor bugs.
>>>>>>>>>>
>>>>>>>>>> Re: Credits/references, I would add more clarity about where each
>>>>>>>>>> of those references are used.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> KAM
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by Saahil Sirowa <cs...@iith.ac.in>.
Hi Kevin and Spam Assassin Dev Community,
I have made some changes in the draft.
GSoC 2018 Proposal
<https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit?usp=sharing>

I request you all to rigorously review it and suggest appropriate edits.
As, this is the final phase of the application period(Deadline 27th March
16:00 UTC), I would really appreciate it If you respond before this. This
will help me in incorporating the suggested changes in time.

Thanks...
Saahil Sirowa
B. Tech Computer Science and Engineering
Indian Institute of Technology, Hyderabad


On Fri, Mar 23, 2018 at 7:55 PM, Saahil Sirowa <cs...@iith.ac.in>
wrote:

> I had some in last 2-3 days. I will update the proposal draft  with
> required changes by tomorrow night(Sat night).
>
> Thanks...
> Saahil Sirowa
> B. Tech Computer Science and Engineering
> Indi@n Institute of Technology, Hyderabad
>
> On Fri 23 Mar, 2018, 18:01 Kevin A. McGrail, <km...@apache.org> wrote:
>
>> Wanted to check in and see how you are doing.  THis blog post has gotten
>> some praise
>>
>>  https://medium.com/@owtf/google-summer-of-code-writing-
>> a-good-proposal-141b1376f076.
>>
>> --
>> Kevin A. McGrail
>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>> Chair Emeritus Apache SpamAssassin Project
>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>
>> On Wed, Mar 21, 2018 at 7:52 AM, Kevin A. McGrail <km...@apache.org>
>> wrote:
>>
>>> Comments allowed might be helpful though :-)
>>>
>>> --
>>> Kevin A. McGrail
>>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>>> Chair Emeritus Apache SpamAssassin Project
>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171 <(703)%20798-0171>
>>>
>>> On Wed, Mar 21, 2018 at 12:36 AM, Rajkiran Rajkumar <
>>> rajkiran2507@gmail.com> wrote:
>>>
>>>> @Saahil, kindly make your doc view-only for people with a link to it.
>>>> Giving edit permissions to the world is a bad idea.
>>>>
>>>> Thanks,
>>>> Rajkiran
>>>>
>>>> On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <km...@apache.org>
>>>> wrote:
>>>>
>>>>> +users
>>>>>
>>>>> All we give is feedback.  The submission to GSoC is what matters.  So
>>>>> if you mentioned perl here that's not going to carryover to the reviewers.
>>>>>
>>>>> Can someone with fresh eyes take a look at this?  I read it too
>>>>> recently so I will gloss over it too much.
>>>>>
>>>>> Here are some posts the mentors list thought might be helpful.  The
>>>>> first I believe covers someone's pov who did not get selected.
>>>>>
>>>>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-
>>>>> real-life-experience-and-support-open-source-
>>>>> b1e6a664f6e4?source=linkShare-53ba2bb84284-1521381334
>>>>>
>>>>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>>>>>
>>>>> Regards, KAM
>>>>>
>>>>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs...@iith.ac.in>
>>>>> wrote:
>>>>>
>>>>>> Hi Kevin and Apache SpamAssassin Dev Community,
>>>>>>
>>>>>> I have resolved all the changes you suggested in the previous draft.
>>>>>> 1) I mentioned about learning PERL a week before the community
>>>>>> bonding period. It will not take much time. I can assure you that language
>>>>>> is not going to be an issue.
>>>>>> 2) I updated the biography part a bit
>>>>>> 3) Significant changes have been made in the Timeline.
>>>>>> 4) I'm planning to used cmake/travis ci for automated testing. If
>>>>>> there is a better alternative please do suggest.
>>>>>> 5) I gave links to research papers that i will be reading in the
>>>>>> timeline.
>>>>>> 6) I updated the timeline by mentioning to gain advanced information
>>>>>> about email traffic and spams. I listed some links for the purpose.
>>>>>> 7) I updated the credits
>>>>>> 8) There are other changes made in various parts of proposal.
>>>>>>
>>>>>> Thanks for your previous detailed feedback.
>>>>>>
>>>>>> Here is link to the updated proposal
>>>>>> GSoC 2018 proposal
>>>>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>>>>>> Please rigorously review it and suggest any changes that I should
>>>>>> make.
>>>>>>
>>>>>> Awaiting for a favorable response.
>>>>>>
>>>>>>
>>>>>> Thanks...
>>>>>> Saahil Sirowa
>>>>>> B. Tech Computer Science and Engineering
>>>>>> Indian Institute of Technology, Hyderabd
>>>>>>
>>>>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <
>>>>>> kmcgrail@apache.org> wrote:
>>>>>>
>>>>>>> Hi Saahil
>>>>>>>
>>>>>>> re: Perl. As the project is primarily in Perl and you do not list
>>>>>>> that in your Proficiencies or any similar languages like PHP, I would
>>>>>>> address that.  The word Perl does not appear a single time.
>>>>>>>
>>>>>>> Your Biography is a little light on why this is something you feel
>>>>>>> you can implement.  The mentors will likely NOT be able to help you with
>>>>>>> the science rather focusing on the community, processes, and open source in
>>>>>>> general.
>>>>>>>
>>>>>>> re: Email and SPam, do you have any experience with email traffic or
>>>>>>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>>>>>>
>>>>>>> Re: Deliverables, I think you'll need to propose the first draft of
>>>>>>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>>>>>>> can be installed and configured to provide multiple configurable
>>>>>>> statistical analysis algorithms to better identify ham (good email) and/or
>>>>>>> spam (bad email)
>>>>>>>
>>>>>>> Please use Apache SpamAssassin to properly brand the title.
>>>>>>>
>>>>>>> Re: I have no input on the scheduling/timelines except that past
>>>>>>> proposal I have read have included more phases and do not add "optional"
>>>>>>> items.  I'd prefer to see small increments to make sure you stay on
>>>>>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>>>>>> progresses.
>>>>>>>
>>>>>>> Re: Testing Methodology, this is likely the most critical missing
>>>>>>> part.  I am a fan of test driven development where you set up tests that
>>>>>>> should pass and fall and use continuous testing as you add code to confirm
>>>>>>> your development is progressing well.
>>>>>>>
>>>>>>> This is especially important because spam analysis often doesn't
>>>>>>> work the way people expect and tests w/statistics can help identify issues.
>>>>>>>
>>>>>>> For example, this is a hypothesis that this statistical algorithms
>>>>>>> will be better than Bayes.  So you'll need a baseline for comparison.
>>>>>>>
>>>>>>> Additionally, even experts in the field are surprised when they
>>>>>>> think something will prove the hamminess of an email but in fact shows the
>>>>>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>>>>>> to allow an automated mechanism that says "this is an email from a
>>>>>>> legitimate mail server for my domain".
>>>>>>>
>>>>>>> However, the FIRST wave of people to adobt it were all spammers.  So
>>>>>>> it became a spam indicator more than a spam indicator.  It was a very
>>>>>>> interesting outcome.
>>>>>>>
>>>>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>>>>>>> spam.  Have you thought about how you'll get that?  I *might* be able to
>>>>>>> help but it's 50/50.
>>>>>>>
>>>>>>> Re: You mention reading research papers on statisical algorithms
>>>>>>> from a previous proposal.  You'll want to list them to show which ones you
>>>>>>> plan to study
>>>>>>>
>>>>>>> re: "Discussions with the SA community regarding the various types
>>>>>>> of spams that the present SA can handle." is unclear.  What is a "type of
>>>>>>> spam" to you?  Do you have a list of types of spam?
>>>>>>>
>>>>>>> re: "Brainstorming with the mentors and SA community about the
>>>>>>> various input features and parameters that can have a huge impact on the
>>>>>>> overall performance of the listed neural nets models." I think this is
>>>>>>> flawed.  There won't be a ton of people who can discuss this with you.
>>>>>>> You'll need to likely use scientific process to show what has a performance
>>>>>>> impact.  This is not busy work or school work.  This is an experiment that
>>>>>>> has not been tried at the SA project.
>>>>>>>
>>>>>>> re: "actively involved with the community." is a stretch.  A few
>>>>>>> emails do not active involvement make.
>>>>>>>
>>>>>>> re: Bonding, you might consider raising that to 1-2 major bugs and
>>>>>>> 10-20 minor bugs.
>>>>>>>
>>>>>>> Re: Credits/references, I would add more clarity about where each of
>>>>>>> those references are used.
>>>>>>>
>>>>>>> Regards,
>>>>>>> KAM
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by Saahil Sirowa <cs...@iith.ac.in>.
I had some in last 2-3 days. I will update the proposal draft  with
required changes by tomorrow night(Sat night).

Thanks...
Saahil Sirowa
B. Tech Computer Science and Engineering
Indi@n Institute of Technology, Hyderabad

On Fri 23 Mar, 2018, 18:01 Kevin A. McGrail, <km...@apache.org> wrote:

> Wanted to check in and see how you are doing.  THis blog post has gotten
> some praise
>
>
> https://medium.com/@owtf/google-summer-of-code-writing-a-good-proposal-141b1376f076
> .
>
> --
> Kevin A. McGrail
> Asst. Treasurer & VP Fundraising, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
> On Wed, Mar 21, 2018 at 7:52 AM, Kevin A. McGrail <km...@apache.org>
> wrote:
>
>> Comments allowed might be helpful though :-)
>>
>> --
>> Kevin A. McGrail
>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>> Chair Emeritus Apache SpamAssassin Project
>> https://www.linkedin.com/in/kmcgrail - 703.798.0171 <(703)%20798-0171>
>>
>> On Wed, Mar 21, 2018 at 12:36 AM, Rajkiran Rajkumar <
>> rajkiran2507@gmail.com> wrote:
>>
>>> @Saahil, kindly make your doc view-only for people with a link to it.
>>> Giving edit permissions to the world is a bad idea.
>>>
>>> Thanks,
>>> Rajkiran
>>>
>>> On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <km...@apache.org>
>>> wrote:
>>>
>>>> +users
>>>>
>>>> All we give is feedback.  The submission to GSoC is what matters.  So
>>>> if you mentioned perl here that's not going to carryover to the reviewers.
>>>>
>>>> Can someone with fresh eyes take a look at this?  I read it too
>>>> recently so I will gloss over it too much.
>>>>
>>>> Here are some posts the mentors list thought might be helpful.  The
>>>> first I believe covers someone's pov who did not get selected.
>>>>
>>>>
>>>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-real-life-experience-and-support-open-source-b1e6a664f6e4?source=linkShare-53ba2bb84284-1521381334
>>>>
>>>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>>>>
>>>> Regards, KAM
>>>>
>>>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs...@iith.ac.in>
>>>> wrote:
>>>>
>>>>> Hi Kevin and Apache SpamAssassin Dev Community,
>>>>>
>>>>> I have resolved all the changes you suggested in the previous draft.
>>>>> 1) I mentioned about learning PERL a week before the community bonding
>>>>> period. It will not take much time. I can assure you that language is not
>>>>> going to be an issue.
>>>>> 2) I updated the biography part a bit
>>>>> 3) Significant changes have been made in the Timeline.
>>>>> 4) I'm planning to used cmake/travis ci for automated testing. If
>>>>> there is a better alternative please do suggest.
>>>>> 5) I gave links to research papers that i will be reading in the
>>>>> timeline.
>>>>> 6) I updated the timeline by mentioning to gain advanced information
>>>>> about email traffic and spams. I listed some links for the purpose.
>>>>> 7) I updated the credits
>>>>> 8) There are other changes made in various parts of proposal.
>>>>>
>>>>> Thanks for your previous detailed feedback.
>>>>>
>>>>> Here is link to the updated proposal
>>>>> GSoC 2018 proposal
>>>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>>>>> Please rigorously review it and suggest any changes that I should
>>>>> make.
>>>>>
>>>>> Awaiting for a favorable response.
>>>>>
>>>>>
>>>>> Thanks...
>>>>> Saahil Sirowa
>>>>> B. Tech Computer Science and Engineering
>>>>> Indian Institute of Technology, Hyderabd
>>>>>
>>>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <kmcgrail@apache.org
>>>>> > wrote:
>>>>>
>>>>>> Hi Saahil
>>>>>>
>>>>>> re: Perl. As the project is primarily in Perl and you do not list
>>>>>> that in your Proficiencies or any similar languages like PHP, I would
>>>>>> address that.  The word Perl does not appear a single time.
>>>>>>
>>>>>> Your Biography is a little light on why this is something you feel
>>>>>> you can implement.  The mentors will likely NOT be able to help you with
>>>>>> the science rather focusing on the community, processes, and open source in
>>>>>> general.
>>>>>>
>>>>>> re: Email and SPam, do you have any experience with email traffic or
>>>>>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>>>>>
>>>>>> Re: Deliverables, I think you'll need to propose the first draft of
>>>>>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>>>>>> can be installed and configured to provide multiple configurable
>>>>>> statistical analysis algorithms to better identify ham (good email) and/or
>>>>>> spam (bad email)
>>>>>>
>>>>>> Please use Apache SpamAssassin to properly brand the title.
>>>>>>
>>>>>> Re: I have no input on the scheduling/timelines except that past
>>>>>> proposal I have read have included more phases and do not add "optional"
>>>>>> items.  I'd prefer to see small increments to make sure you stay on
>>>>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>>>>> progresses.
>>>>>>
>>>>>> Re: Testing Methodology, this is likely the most critical missing
>>>>>> part.  I am a fan of test driven development where you set up tests that
>>>>>> should pass and fall and use continuous testing as you add code to confirm
>>>>>> your development is progressing well.
>>>>>>
>>>>>> This is especially important because spam analysis often doesn't work
>>>>>> the way people expect and tests w/statistics can help identify issues.
>>>>>>
>>>>>> For example, this is a hypothesis that this statistical algorithms
>>>>>> will be better than Bayes.  So you'll need a baseline for comparison.
>>>>>>
>>>>>> Additionally, even experts in the field are surprised when they think
>>>>>> something will prove the hamminess of an email but in fact shows the
>>>>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>>>>> to allow an automated mechanism that says "this is an email from a
>>>>>> legitimate mail server for my domain".
>>>>>>
>>>>>> However, the FIRST wave of people to adobt it were all spammers.  So
>>>>>> it became a spam indicator more than a spam indicator.  It was a very
>>>>>> interesting outcome.
>>>>>>
>>>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>>>>>> spam.  Have you thought about how you'll get that?  I *might* be able to
>>>>>> help but it's 50/50.
>>>>>>
>>>>>> Re: You mention reading research papers on statisical algorithms from
>>>>>> a previous proposal.  You'll want to list them to show which ones you plan
>>>>>> to study
>>>>>>
>>>>>> re: "Discussions with the SA community regarding the various types of
>>>>>> spams that the present SA can handle." is unclear.  What is a "type of
>>>>>> spam" to you?  Do you have a list of types of spam?
>>>>>>
>>>>>> re: "Brainstorming with the mentors and SA community about the
>>>>>> various input features and parameters that can have a huge impact on the
>>>>>> overall performance of the listed neural nets models." I think this is
>>>>>> flawed.  There won't be a ton of people who can discuss this with you.
>>>>>> You'll need to likely use scientific process to show what has a performance
>>>>>> impact.  This is not busy work or school work.  This is an experiment that
>>>>>> has not been tried at the SA project.
>>>>>>
>>>>>> re: "actively involved with the community." is a stretch.  A few
>>>>>> emails do not active involvement make.
>>>>>>
>>>>>> re: Bonding, you might consider raising that to 1-2 major bugs and
>>>>>> 10-20 minor bugs.
>>>>>>
>>>>>> Re: Credits/references, I would add more clarity about where each of
>>>>>> those references are used.
>>>>>>
>>>>>> Regards,
>>>>>> KAM
>>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by "Kevin A. McGrail" <km...@apache.org>.
Wanted to check in and see how you are doing.  THis blog post has gotten
some praise


https://medium.com/@owtf/google-summer-of-code-writing-a-good-proposal-141b1376f076
.

--
Kevin A. McGrail
Asst. Treasurer & VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Wed, Mar 21, 2018 at 7:52 AM, Kevin A. McGrail <km...@apache.org>
wrote:

> Comments allowed might be helpful though :-)
>
> --
> Kevin A. McGrail
> Asst. Treasurer & VP Fundraising, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171 <(703)%20798-0171>
>
> On Wed, Mar 21, 2018 at 12:36 AM, Rajkiran Rajkumar <
> rajkiran2507@gmail.com> wrote:
>
>> @Saahil, kindly make your doc view-only for people with a link to it.
>> Giving edit permissions to the world is a bad idea.
>>
>> Thanks,
>> Rajkiran
>>
>> On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <km...@apache.org>
>> wrote:
>>
>>> +users
>>>
>>> All we give is feedback.  The submission to GSoC is what matters.  So if
>>> you mentioned perl here that's not going to carryover to the reviewers.
>>>
>>> Can someone with fresh eyes take a look at this?  I read it too recently
>>> so I will gloss over it too much.
>>>
>>> Here are some posts the mentors list thought might be helpful.  The
>>> first I believe covers someone's pov who did not get selected.
>>>
>>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-rea
>>> l-life-experience-and-support-open-source-b1e6a664f6e4?sourc
>>> e=linkShare-53ba2bb84284-1521381334
>>>
>>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>>>
>>> Regards, KAM
>>>
>>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs...@iith.ac.in>
>>> wrote:
>>>
>>>> Hi Kevin and Apache SpamAssassin Dev Community,
>>>>
>>>> I have resolved all the changes you suggested in the previous draft.
>>>> 1) I mentioned about learning PERL a week before the community bonding
>>>> period. It will not take much time. I can assure you that language is not
>>>> going to be an issue.
>>>> 2) I updated the biography part a bit
>>>> 3) Significant changes have been made in the Timeline.
>>>> 4) I'm planning to used cmake/travis ci for automated testing. If there
>>>> is a better alternative please do suggest.
>>>> 5) I gave links to research papers that i will be reading in the
>>>> timeline.
>>>> 6) I updated the timeline by mentioning to gain advanced information
>>>> about email traffic and spams. I listed some links for the purpose.
>>>> 7) I updated the credits
>>>> 8) There are other changes made in various parts of proposal.
>>>>
>>>> Thanks for your previous detailed feedback.
>>>>
>>>> Here is link to the updated proposal
>>>> GSoC 2018 proposal
>>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>>>> Please rigorously review it and suggest any changes that I should make.
>>>>
>>>> Awaiting for a favorable response.
>>>>
>>>>
>>>> Thanks...
>>>> Saahil Sirowa
>>>> B. Tech Computer Science and Engineering
>>>> Indian Institute of Technology, Hyderabd
>>>>
>>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <km...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Saahil
>>>>>
>>>>> re: Perl. As the project is primarily in Perl and you do not list that
>>>>> in your Proficiencies or any similar languages like PHP, I would address
>>>>> that.  The word Perl does not appear a single time.
>>>>>
>>>>> Your Biography is a little light on why this is something you feel you
>>>>> can implement.  The mentors will likely NOT be able to help you with the
>>>>> science rather focusing on the community, processes, and open source in
>>>>> general.
>>>>>
>>>>> re: Email and SPam, do you have any experience with email traffic or
>>>>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>>>>
>>>>> Re: Deliverables, I think you'll need to propose the first draft of
>>>>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>>>>> can be installed and configured to provide multiple configurable
>>>>> statistical analysis algorithms to better identify ham (good email) and/or
>>>>> spam (bad email)
>>>>>
>>>>> Please use Apache SpamAssassin to properly brand the title.
>>>>>
>>>>> Re: I have no input on the scheduling/timelines except that past
>>>>> proposal I have read have included more phases and do not add "optional"
>>>>> items.  I'd prefer to see small increments to make sure you stay on
>>>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>>>> progresses.
>>>>>
>>>>> Re: Testing Methodology, this is likely the most critical missing
>>>>> part.  I am a fan of test driven development where you set up tests that
>>>>> should pass and fall and use continuous testing as you add code to confirm
>>>>> your development is progressing well.
>>>>>
>>>>> This is especially important because spam analysis often doesn't work
>>>>> the way people expect and tests w/statistics can help identify issues.
>>>>>
>>>>> For example, this is a hypothesis that this statistical algorithms
>>>>> will be better than Bayes.  So you'll need a baseline for comparison.
>>>>>
>>>>> Additionally, even experts in the field are surprised when they think
>>>>> something will prove the hamminess of an email but in fact shows the
>>>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>>>> to allow an automated mechanism that says "this is an email from a
>>>>> legitimate mail server for my domain".
>>>>>
>>>>> However, the FIRST wave of people to adobt it were all spammers.  So
>>>>> it became a spam indicator more than a spam indicator.  It was a very
>>>>> interesting outcome.
>>>>>
>>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>>>>> spam.  Have you thought about how you'll get that?  I *might* be able to
>>>>> help but it's 50/50.
>>>>>
>>>>> Re: You mention reading research papers on statisical algorithms from
>>>>> a previous proposal.  You'll want to list them to show which ones you plan
>>>>> to study
>>>>>
>>>>> re: "Discussions with the SA community regarding the various types of
>>>>> spams that the present SA can handle." is unclear.  What is a "type of
>>>>> spam" to you?  Do you have a list of types of spam?
>>>>>
>>>>> re: "Brainstorming with the mentors and SA community about the various
>>>>> input features and parameters that can have a huge impact on the overall
>>>>> performance of the listed neural nets models." I think this is flawed.
>>>>> There won't be a ton of people who can discuss this with you.  You'll need
>>>>> to likely use scientific process to show what has a performance impact.
>>>>> This is not busy work or school work.  This is an experiment that has not
>>>>> been tried at the SA project.
>>>>>
>>>>> re: "actively involved with the community." is a stretch.  A few
>>>>> emails do not active involvement make.
>>>>>
>>>>> re: Bonding, you might consider raising that to 1-2 major bugs and
>>>>> 10-20 minor bugs.
>>>>>
>>>>> Re: Credits/references, I would add more clarity about where each of
>>>>> those references are used.
>>>>>
>>>>> Regards,
>>>>> KAM
>>>>>
>>>>
>>>>
>>
>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by "Kevin A. McGrail" <km...@apache.org>.
Comments allowed might be helpful though :-)

--
Kevin A. McGrail
Asst. Treasurer & VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Wed, Mar 21, 2018 at 12:36 AM, Rajkiran Rajkumar <ra...@gmail.com>
wrote:

> @Saahil, kindly make your doc view-only for people with a link to it.
> Giving edit permissions to the world is a bad idea.
>
> Thanks,
> Rajkiran
>
> On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <km...@apache.org>
> wrote:
>
>> +users
>>
>> All we give is feedback.  The submission to GSoC is what matters.  So if
>> you mentioned perl here that's not going to carryover to the reviewers.
>>
>> Can someone with fresh eyes take a look at this?  I read it too recently
>> so I will gloss over it too much.
>>
>> Here are some posts the mentors list thought might be helpful.  The first
>> I believe covers someone's pov who did not get selected.
>>
>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-rea
>> l-life-experience-and-support-open-source-b1e6a664f6e4?
>> source=linkShare-53ba2bb84284-1521381334
>>
>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>>
>> Regards, KAM
>>
>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs...@iith.ac.in>
>> wrote:
>>
>>> Hi Kevin and Apache SpamAssassin Dev Community,
>>>
>>> I have resolved all the changes you suggested in the previous draft.
>>> 1) I mentioned about learning PERL a week before the community bonding
>>> period. It will not take much time. I can assure you that language is not
>>> going to be an issue.
>>> 2) I updated the biography part a bit
>>> 3) Significant changes have been made in the Timeline.
>>> 4) I'm planning to used cmake/travis ci for automated testing. If there
>>> is a better alternative please do suggest.
>>> 5) I gave links to research papers that i will be reading in the
>>> timeline.
>>> 6) I updated the timeline by mentioning to gain advanced information
>>> about email traffic and spams. I listed some links for the purpose.
>>> 7) I updated the credits
>>> 8) There are other changes made in various parts of proposal.
>>>
>>> Thanks for your previous detailed feedback.
>>>
>>> Here is link to the updated proposal
>>> GSoC 2018 proposal
>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>>> Please rigorously review it and suggest any changes that I should make.
>>>
>>> Awaiting for a favorable response.
>>>
>>>
>>> Thanks...
>>> Saahil Sirowa
>>> B. Tech Computer Science and Engineering
>>> Indian Institute of Technology, Hyderabd
>>>
>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <km...@apache.org>
>>> wrote:
>>>
>>>> Hi Saahil
>>>>
>>>> re: Perl. As the project is primarily in Perl and you do not list that
>>>> in your Proficiencies or any similar languages like PHP, I would address
>>>> that.  The word Perl does not appear a single time.
>>>>
>>>> Your Biography is a little light on why this is something you feel you
>>>> can implement.  The mentors will likely NOT be able to help you with the
>>>> science rather focusing on the community, processes, and open source in
>>>> general.
>>>>
>>>> re: Email and SPam, do you have any experience with email traffic or
>>>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>>>
>>>> Re: Deliverables, I think you'll need to propose the first draft of
>>>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>>>> can be installed and configured to provide multiple configurable
>>>> statistical analysis algorithms to better identify ham (good email) and/or
>>>> spam (bad email)
>>>>
>>>> Please use Apache SpamAssassin to properly brand the title.
>>>>
>>>> Re: I have no input on the scheduling/timelines except that past
>>>> proposal I have read have included more phases and do not add "optional"
>>>> items.  I'd prefer to see small increments to make sure you stay on
>>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>>> progresses.
>>>>
>>>> Re: Testing Methodology, this is likely the most critical missing
>>>> part.  I am a fan of test driven development where you set up tests that
>>>> should pass and fall and use continuous testing as you add code to confirm
>>>> your development is progressing well.
>>>>
>>>> This is especially important because spam analysis often doesn't work
>>>> the way people expect and tests w/statistics can help identify issues.
>>>>
>>>> For example, this is a hypothesis that this statistical algorithms will
>>>> be better than Bayes.  So you'll need a baseline for comparison.
>>>>
>>>> Additionally, even experts in the field are surprised when they think
>>>> something will prove the hamminess of an email but in fact shows the
>>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>>> to allow an automated mechanism that says "this is an email from a
>>>> legitimate mail server for my domain".
>>>>
>>>> However, the FIRST wave of people to adobt it were all spammers.  So it
>>>> became a spam indicator more than a spam indicator.  It was a very
>>>> interesting outcome.
>>>>
>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>>>> spam.  Have you thought about how you'll get that?  I *might* be able to
>>>> help but it's 50/50.
>>>>
>>>> Re: You mention reading research papers on statisical algorithms from a
>>>> previous proposal.  You'll want to list them to show which ones you plan to
>>>> study
>>>>
>>>> re: "Discussions with the SA community regarding the various types of
>>>> spams that the present SA can handle." is unclear.  What is a "type of
>>>> spam" to you?  Do you have a list of types of spam?
>>>>
>>>> re: "Brainstorming with the mentors and SA community about the various
>>>> input features and parameters that can have a huge impact on the overall
>>>> performance of the listed neural nets models." I think this is flawed.
>>>> There won't be a ton of people who can discuss this with you.  You'll need
>>>> to likely use scientific process to show what has a performance impact.
>>>> This is not busy work or school work.  This is an experiment that has not
>>>> been tried at the SA project.
>>>>
>>>> re: "actively involved with the community." is a stretch.  A few emails
>>>> do not active involvement make.
>>>>
>>>> re: Bonding, you might consider raising that to 1-2 major bugs and
>>>> 10-20 minor bugs.
>>>>
>>>> Re: Credits/references, I would add more clarity about where each of
>>>> those references are used.
>>>>
>>>> Regards,
>>>> KAM
>>>>
>>>
>>>
>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by "Kevin A. McGrail" <km...@apache.org>.
Comments allowed might be helpful though :-)

--
Kevin A. McGrail
Asst. Treasurer & VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Wed, Mar 21, 2018 at 12:36 AM, Rajkiran Rajkumar <ra...@gmail.com>
wrote:

> @Saahil, kindly make your doc view-only for people with a link to it.
> Giving edit permissions to the world is a bad idea.
>
> Thanks,
> Rajkiran
>
> On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <km...@apache.org>
> wrote:
>
>> +users
>>
>> All we give is feedback.  The submission to GSoC is what matters.  So if
>> you mentioned perl here that's not going to carryover to the reviewers.
>>
>> Can someone with fresh eyes take a look at this?  I read it too recently
>> so I will gloss over it too much.
>>
>> Here are some posts the mentors list thought might be helpful.  The first
>> I believe covers someone's pov who did not get selected.
>>
>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-rea
>> l-life-experience-and-support-open-source-b1e6a664f6e4?
>> source=linkShare-53ba2bb84284-1521381334
>>
>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>>
>> Regards, KAM
>>
>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs...@iith.ac.in>
>> wrote:
>>
>>> Hi Kevin and Apache SpamAssassin Dev Community,
>>>
>>> I have resolved all the changes you suggested in the previous draft.
>>> 1) I mentioned about learning PERL a week before the community bonding
>>> period. It will not take much time. I can assure you that language is not
>>> going to be an issue.
>>> 2) I updated the biography part a bit
>>> 3) Significant changes have been made in the Timeline.
>>> 4) I'm planning to used cmake/travis ci for automated testing. If there
>>> is a better alternative please do suggest.
>>> 5) I gave links to research papers that i will be reading in the
>>> timeline.
>>> 6) I updated the timeline by mentioning to gain advanced information
>>> about email traffic and spams. I listed some links for the purpose.
>>> 7) I updated the credits
>>> 8) There are other changes made in various parts of proposal.
>>>
>>> Thanks for your previous detailed feedback.
>>>
>>> Here is link to the updated proposal
>>> GSoC 2018 proposal
>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>>> Please rigorously review it and suggest any changes that I should make.
>>>
>>> Awaiting for a favorable response.
>>>
>>>
>>> Thanks...
>>> Saahil Sirowa
>>> B. Tech Computer Science and Engineering
>>> Indian Institute of Technology, Hyderabd
>>>
>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <km...@apache.org>
>>> wrote:
>>>
>>>> Hi Saahil
>>>>
>>>> re: Perl. As the project is primarily in Perl and you do not list that
>>>> in your Proficiencies or any similar languages like PHP, I would address
>>>> that.  The word Perl does not appear a single time.
>>>>
>>>> Your Biography is a little light on why this is something you feel you
>>>> can implement.  The mentors will likely NOT be able to help you with the
>>>> science rather focusing on the community, processes, and open source in
>>>> general.
>>>>
>>>> re: Email and SPam, do you have any experience with email traffic or
>>>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>>>
>>>> Re: Deliverables, I think you'll need to propose the first draft of
>>>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>>>> can be installed and configured to provide multiple configurable
>>>> statistical analysis algorithms to better identify ham (good email) and/or
>>>> spam (bad email)
>>>>
>>>> Please use Apache SpamAssassin to properly brand the title.
>>>>
>>>> Re: I have no input on the scheduling/timelines except that past
>>>> proposal I have read have included more phases and do not add "optional"
>>>> items.  I'd prefer to see small increments to make sure you stay on
>>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>>> progresses.
>>>>
>>>> Re: Testing Methodology, this is likely the most critical missing
>>>> part.  I am a fan of test driven development where you set up tests that
>>>> should pass and fall and use continuous testing as you add code to confirm
>>>> your development is progressing well.
>>>>
>>>> This is especially important because spam analysis often doesn't work
>>>> the way people expect and tests w/statistics can help identify issues.
>>>>
>>>> For example, this is a hypothesis that this statistical algorithms will
>>>> be better than Bayes.  So you'll need a baseline for comparison.
>>>>
>>>> Additionally, even experts in the field are surprised when they think
>>>> something will prove the hamminess of an email but in fact shows the
>>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>>> to allow an automated mechanism that says "this is an email from a
>>>> legitimate mail server for my domain".
>>>>
>>>> However, the FIRST wave of people to adobt it were all spammers.  So it
>>>> became a spam indicator more than a spam indicator.  It was a very
>>>> interesting outcome.
>>>>
>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>>>> spam.  Have you thought about how you'll get that?  I *might* be able to
>>>> help but it's 50/50.
>>>>
>>>> Re: You mention reading research papers on statisical algorithms from a
>>>> previous proposal.  You'll want to list them to show which ones you plan to
>>>> study
>>>>
>>>> re: "Discussions with the SA community regarding the various types of
>>>> spams that the present SA can handle." is unclear.  What is a "type of
>>>> spam" to you?  Do you have a list of types of spam?
>>>>
>>>> re: "Brainstorming with the mentors and SA community about the various
>>>> input features and parameters that can have a huge impact on the overall
>>>> performance of the listed neural nets models." I think this is flawed.
>>>> There won't be a ton of people who can discuss this with you.  You'll need
>>>> to likely use scientific process to show what has a performance impact.
>>>> This is not busy work or school work.  This is an experiment that has not
>>>> been tried at the SA project.
>>>>
>>>> re: "actively involved with the community." is a stretch.  A few emails
>>>> do not active involvement make.
>>>>
>>>> re: Bonding, you might consider raising that to 1-2 major bugs and
>>>> 10-20 minor bugs.
>>>>
>>>> Re: Credits/references, I would add more clarity about where each of
>>>> those references are used.
>>>>
>>>> Regards,
>>>> KAM
>>>>
>>>
>>>
>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by Rajkiran Rajkumar <ra...@gmail.com>.
@Saahil, kindly make your doc view-only for people with a link to it.
Giving edit permissions to the world is a bad idea.

Thanks,
Rajkiran

On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <km...@apache.org>
wrote:

> +users
>
> All we give is feedback.  The submission to GSoC is what matters.  So if
> you mentioned perl here that's not going to carryover to the reviewers.
>
> Can someone with fresh eyes take a look at this?  I read it too recently
> so I will gloss over it too much.
>
> Here are some posts the mentors list thought might be helpful.  The first
> I believe covers someone's pov who did not get selected.
>
> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-
> real-life-experience-and-support-open-source-
> b1e6a664f6e4?source=linkShare-53ba2bb84284-1521381334
>
> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>
> Regards, KAM
>
> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs...@iith.ac.in>
> wrote:
>
>> Hi Kevin and Apache SpamAssassin Dev Community,
>>
>> I have resolved all the changes you suggested in the previous draft.
>> 1) I mentioned about learning PERL a week before the community bonding
>> period. It will not take much time. I can assure you that language is not
>> going to be an issue.
>> 2) I updated the biography part a bit
>> 3) Significant changes have been made in the Timeline.
>> 4) I'm planning to used cmake/travis ci for automated testing. If there
>> is a better alternative please do suggest.
>> 5) I gave links to research papers that i will be reading in the timeline.
>> 6) I updated the timeline by mentioning to gain advanced information
>> about email traffic and spams. I listed some links for the purpose.
>> 7) I updated the credits
>> 8) There are other changes made in various parts of proposal.
>>
>> Thanks for your previous detailed feedback.
>>
>> Here is link to the updated proposal
>> GSoC 2018 proposal
>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>> Please rigorously review it and suggest any changes that I should make.
>>
>> Awaiting for a favorable response.
>>
>>
>> Thanks...
>> Saahil Sirowa
>> B. Tech Computer Science and Engineering
>> Indian Institute of Technology, Hyderabd
>>
>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <km...@apache.org>
>> wrote:
>>
>>> Hi Saahil
>>>
>>> re: Perl. As the project is primarily in Perl and you do not list that
>>> in your Proficiencies or any similar languages like PHP, I would address
>>> that.  The word Perl does not appear a single time.
>>>
>>> Your Biography is a little light on why this is something you feel you
>>> can implement.  The mentors will likely NOT be able to help you with the
>>> science rather focusing on the community, processes, and open source in
>>> general.
>>>
>>> re: Email and SPam, do you have any experience with email traffic or
>>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>>
>>> Re: Deliverables, I think you'll need to propose the first draft of
>>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>>> can be installed and configured to provide multiple configurable
>>> statistical analysis algorithms to better identify ham (good email) and/or
>>> spam (bad email)
>>>
>>> Please use Apache SpamAssassin to properly brand the title.
>>>
>>> Re: I have no input on the scheduling/timelines except that past
>>> proposal I have read have included more phases and do not add "optional"
>>> items.  I'd prefer to see small increments to make sure you stay on
>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>> progresses.
>>>
>>> Re: Testing Methodology, this is likely the most critical missing part.
>>> I am a fan of test driven development where you set up tests that should
>>> pass and fall and use continuous testing as you add code to confirm your
>>> development is progressing well.
>>>
>>> This is especially important because spam analysis often doesn't work
>>> the way people expect and tests w/statistics can help identify issues.
>>>
>>> For example, this is a hypothesis that this statistical algorithms will
>>> be better than Bayes.  So you'll need a baseline for comparison.
>>>
>>> Additionally, even experts in the field are surprised when they think
>>> something will prove the hamminess of an email but in fact shows the
>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>> to allow an automated mechanism that says "this is an email from a
>>> legitimate mail server for my domain".
>>>
>>> However, the FIRST wave of people to adobt it were all spammers.  So it
>>> became a spam indicator more than a spam indicator.  It was a very
>>> interesting outcome.
>>>
>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>>> spam.  Have you thought about how you'll get that?  I *might* be able to
>>> help but it's 50/50.
>>>
>>> Re: You mention reading research papers on statisical algorithms from a
>>> previous proposal.  You'll want to list them to show which ones you plan to
>>> study
>>>
>>> re: "Discussions with the SA community regarding the various types of
>>> spams that the present SA can handle." is unclear.  What is a "type of
>>> spam" to you?  Do you have a list of types of spam?
>>>
>>> re: "Brainstorming with the mentors and SA community about the various
>>> input features and parameters that can have a huge impact on the overall
>>> performance of the listed neural nets models." I think this is flawed.
>>> There won't be a ton of people who can discuss this with you.  You'll need
>>> to likely use scientific process to show what has a performance impact.
>>> This is not busy work or school work.  This is an experiment that has not
>>> been tried at the SA project.
>>>
>>> re: "actively involved with the community." is a stretch.  A few emails
>>> do not active involvement make.
>>>
>>> re: Bonding, you might consider raising that to 1-2 major bugs and 10-20
>>> minor bugs.
>>>
>>> Re: Credits/references, I would add more clarity about where each of
>>> those references are used.
>>>
>>> Regards,
>>> KAM
>>>
>>
>>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by Rajkiran Rajkumar <ra...@gmail.com>.
@Saahil, kindly make your doc view-only for people with a link to it.
Giving edit permissions to the world is a bad idea.

Thanks,
Rajkiran

On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <km...@apache.org>
wrote:

> +users
>
> All we give is feedback.  The submission to GSoC is what matters.  So if
> you mentioned perl here that's not going to carryover to the reviewers.
>
> Can someone with fresh eyes take a look at this?  I read it too recently
> so I will gloss over it too much.
>
> Here are some posts the mentors list thought might be helpful.  The first
> I believe covers someone's pov who did not get selected.
>
> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-
> real-life-experience-and-support-open-source-
> b1e6a664f6e4?source=linkShare-53ba2bb84284-1521381334
>
> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>
> Regards, KAM
>
> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs...@iith.ac.in>
> wrote:
>
>> Hi Kevin and Apache SpamAssassin Dev Community,
>>
>> I have resolved all the changes you suggested in the previous draft.
>> 1) I mentioned about learning PERL a week before the community bonding
>> period. It will not take much time. I can assure you that language is not
>> going to be an issue.
>> 2) I updated the biography part a bit
>> 3) Significant changes have been made in the Timeline.
>> 4) I'm planning to used cmake/travis ci for automated testing. If there
>> is a better alternative please do suggest.
>> 5) I gave links to research papers that i will be reading in the timeline.
>> 6) I updated the timeline by mentioning to gain advanced information
>> about email traffic and spams. I listed some links for the purpose.
>> 7) I updated the credits
>> 8) There are other changes made in various parts of proposal.
>>
>> Thanks for your previous detailed feedback.
>>
>> Here is link to the updated proposal
>> GSoC 2018 proposal
>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>> Please rigorously review it and suggest any changes that I should make.
>>
>> Awaiting for a favorable response.
>>
>>
>> Thanks...
>> Saahil Sirowa
>> B. Tech Computer Science and Engineering
>> Indian Institute of Technology, Hyderabd
>>
>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <km...@apache.org>
>> wrote:
>>
>>> Hi Saahil
>>>
>>> re: Perl. As the project is primarily in Perl and you do not list that
>>> in your Proficiencies or any similar languages like PHP, I would address
>>> that.  The word Perl does not appear a single time.
>>>
>>> Your Biography is a little light on why this is something you feel you
>>> can implement.  The mentors will likely NOT be able to help you with the
>>> science rather focusing on the community, processes, and open source in
>>> general.
>>>
>>> re: Email and SPam, do you have any experience with email traffic or
>>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>>
>>> Re: Deliverables, I think you'll need to propose the first draft of
>>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>>> can be installed and configured to provide multiple configurable
>>> statistical analysis algorithms to better identify ham (good email) and/or
>>> spam (bad email)
>>>
>>> Please use Apache SpamAssassin to properly brand the title.
>>>
>>> Re: I have no input on the scheduling/timelines except that past
>>> proposal I have read have included more phases and do not add "optional"
>>> items.  I'd prefer to see small increments to make sure you stay on
>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>> progresses.
>>>
>>> Re: Testing Methodology, this is likely the most critical missing part.
>>> I am a fan of test driven development where you set up tests that should
>>> pass and fall and use continuous testing as you add code to confirm your
>>> development is progressing well.
>>>
>>> This is especially important because spam analysis often doesn't work
>>> the way people expect and tests w/statistics can help identify issues.
>>>
>>> For example, this is a hypothesis that this statistical algorithms will
>>> be better than Bayes.  So you'll need a baseline for comparison.
>>>
>>> Additionally, even experts in the field are surprised when they think
>>> something will prove the hamminess of an email but in fact shows the
>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>> to allow an automated mechanism that says "this is an email from a
>>> legitimate mail server for my domain".
>>>
>>> However, the FIRST wave of people to adobt it were all spammers.  So it
>>> became a spam indicator more than a spam indicator.  It was a very
>>> interesting outcome.
>>>
>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>>> spam.  Have you thought about how you'll get that?  I *might* be able to
>>> help but it's 50/50.
>>>
>>> Re: You mention reading research papers on statisical algorithms from a
>>> previous proposal.  You'll want to list them to show which ones you plan to
>>> study
>>>
>>> re: "Discussions with the SA community regarding the various types of
>>> spams that the present SA can handle." is unclear.  What is a "type of
>>> spam" to you?  Do you have a list of types of spam?
>>>
>>> re: "Brainstorming with the mentors and SA community about the various
>>> input features and parameters that can have a huge impact on the overall
>>> performance of the listed neural nets models." I think this is flawed.
>>> There won't be a ton of people who can discuss this with you.  You'll need
>>> to likely use scientific process to show what has a performance impact.
>>> This is not busy work or school work.  This is an experiment that has not
>>> been tried at the SA project.
>>>
>>> re: "actively involved with the community." is a stretch.  A few emails
>>> do not active involvement make.
>>>
>>> re: Bonding, you might consider raising that to 1-2 major bugs and 10-20
>>> minor bugs.
>>>
>>> Re: Credits/references, I would add more clarity about where each of
>>> those references are used.
>>>
>>> Regards,
>>> KAM
>>>
>>
>>

Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin

Posted by Saahil Sirowa <cs...@iith.ac.in>.
Yes, I have experience with PHP(medium). I updated this in language
proficiencies.

On 20-Mar-2018 22:59, "Kevin A. McGrail" <km...@apache.org> wrote:

> P.S. ask this on list because that's a big thing with Apache is everything
> is onlist.  I'll repeat my answer there.
>
> --
> Kevin A. McGrail
> Asst. Treasurer & VP Fundraising, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
> On Tue, Mar 20, 2018 at 1:28 PM, Kevin A. McGrail <km...@apache.org>
> wrote:
>
>> I do not know.  Do you have any experience with similar languages like
>> PHP?
>>
>> I usually say "programming is a state of mind not a language" and work in
>> dozens of languages professionally with working code and a syntax book.  So
>> it's likely ok.
>>
>> --
>> Kevin A. McGrail
>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>> Chair Emeritus Apache SpamAssassin Project
>> https://www.linkedin.com/in/kmcgrail - 703.798.0171 <(703)%20798-0171>
>>
>> On Tue, Mar 20, 2018 at 1:10 PM, Saahil Sirowa <cs16btech11030@iith.ac.in
>> > wrote:
>>
>>> I added a one week period(17th April - 23rd April). before community
>>> bonding starts in the timeline for learning perl. Is this okay.
>>>
>>> On 20-Mar-2018 17:18, "Kevin A. McGrail" <km...@apache.org> wrote:
>>>
>>>> +users
>>>>
>>>> All we give is feedback.  The submission to GSoC is what matters.  So
>>>> if you mentioned perl here that's not going to carryover to the reviewers.
>>>>
>>>> Can someone with fresh eyes take a look at this?  I read it too
>>>> recently so I will gloss over it too much.
>>>>
>>>> Here are some posts the mentors list thought might be helpful.  The
>>>> first I believe covers someone's pov who did not get selected.
>>>>
>>>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-rea
>>>> l-life-experience-and-support-open-source-b1e6a664f6e4?sourc
>>>> e=linkShare-53ba2bb84284-1521381334
>>>>
>>>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>>>>
>>>> Regards, KAM
>>>>
>>>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs...@iith.ac.in>
>>>> wrote:
>>>>
>>>>> Hi Kevin and Apache SpamAssassin Dev Community,
>>>>>
>>>>> I have resolved all the changes you suggested in the previous draft.
>>>>> 1) I mentioned about learning PERL a week before the community bonding
>>>>> period. It will not take much time. I can assure you that language is not
>>>>> going to be an issue.
>>>>> 2) I updated the biography part a bit
>>>>> 3) Significant changes have been made in the Timeline.
>>>>> 4) I'm planning to used cmake/travis ci for automated testing. If
>>>>> there is a better alternative please do suggest.
>>>>> 5) I gave links to research papers that i will be reading in the
>>>>> timeline.
>>>>> 6) I updated the timeline by mentioning to gain advanced information
>>>>> about email traffic and spams. I listed some links for the purpose.
>>>>> 7) I updated the credits
>>>>> 8) There are other changes made in various parts of proposal.
>>>>>
>>>>> Thanks for your previous detailed feedback.
>>>>>
>>>>> Here is link to the updated proposal
>>>>> GSoC 2018 proposal
>>>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>>>>> Please rigorously review it and suggest any changes that I should
>>>>> make.
>>>>>
>>>>> Awaiting for a favorable response.
>>>>>
>>>>>
>>>>> Thanks...
>>>>> Saahil Sirowa
>>>>> B. Tech Computer Science and Engineering
>>>>> Indian Institute of Technology, Hyderabd
>>>>>
>>>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <kmcgrail@apache.org
>>>>> > wrote:
>>>>>
>>>>>> Hi Saahil
>>>>>>
>>>>>> re: Perl. As the project is primarily in Perl and you do not list
>>>>>> that in your Proficiencies or any similar languages like PHP, I would
>>>>>> address that.  The word Perl does not appear a single time.
>>>>>>
>>>>>> Your Biography is a little light on why this is something you feel
>>>>>> you can implement.  The mentors will likely NOT be able to help you with
>>>>>> the science rather focusing on the community, processes, and open source in
>>>>>> general.
>>>>>>
>>>>>> re: Email and SPam, do you have any experience with email traffic or
>>>>>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>>>>>
>>>>>> Re: Deliverables, I think you'll need to propose the first draft of
>>>>>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>>>>>> can be installed and configured to provide multiple configurable
>>>>>> statistical analysis algorithms to better identify ham (good email) and/or
>>>>>> spam (bad email)
>>>>>>
>>>>>> Please use Apache SpamAssassin to properly brand the title.
>>>>>>
>>>>>> Re: I have no input on the scheduling/timelines except that past
>>>>>> proposal I have read have included more phases and do not add "optional"
>>>>>> items.  I'd prefer to see small increments to make sure you stay on
>>>>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>>>>> progresses.
>>>>>>
>>>>>> Re: Testing Methodology, this is likely the most critical missing
>>>>>> part.  I am a fan of test driven development where you set up tests that
>>>>>> should pass and fall and use continuous testing as you add code to confirm
>>>>>> your development is progressing well.
>>>>>>
>>>>>> This is especially important because spam analysis often doesn't work
>>>>>> the way people expect and tests w/statistics can help identify issues.
>>>>>>
>>>>>> For example, this is a hypothesis that this statistical algorithms
>>>>>> will be better than Bayes.  So you'll need a baseline for comparison.
>>>>>>
>>>>>> Additionally, even experts in the field are surprised when they think
>>>>>> something will prove the hamminess of an email but in fact shows the
>>>>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>>>>> to allow an automated mechanism that says "this is an email from a
>>>>>> legitimate mail server for my domain".
>>>>>>
>>>>>> However, the FIRST wave of people to adobt it were all spammers.  So
>>>>>> it became a spam indicator more than a spam indicator.  It was a very
>>>>>> interesting outcome.
>>>>>>
>>>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>>>>>> spam.  Have you thought about how you'll get that?  I *might* be able to
>>>>>> help but it's 50/50.
>>>>>>
>>>>>> Re: You mention reading research papers on statisical algorithms from
>>>>>> a previous proposal.  You'll want to list them to show which ones you plan
>>>>>> to study
>>>>>>
>>>>>> re: "Discussions with the SA community regarding the various types of
>>>>>> spams that the present SA can handle." is unclear.  What is a "type of
>>>>>> spam" to you?  Do you have a list of types of spam?
>>>>>>
>>>>>> re: "Brainstorming with the mentors and SA community about the
>>>>>> various input features and parameters that can have a huge impact on the
>>>>>> overall performance of the listed neural nets models." I think this is
>>>>>> flawed.  There won't be a ton of people who can discuss this with you.
>>>>>> You'll need to likely use scientific process to show what has a performance
>>>>>> impact.  This is not busy work or school work.  This is an experiment that
>>>>>> has not been tried at the SA project.
>>>>>>
>>>>>> re: "actively involved with the community." is a stretch.  A few
>>>>>> emails do not active involvement make.
>>>>>>
>>>>>> re: Bonding, you might consider raising that to 1-2 major bugs and
>>>>>> 10-20 minor bugs.
>>>>>>
>>>>>> Re: Credits/references, I would add more clarity about where each of
>>>>>> those references are used.
>>>>>>
>>>>>> Regards,
>>>>>> KAM
>>>>>>
>>>>>
>>>>>
>>
>