You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Saikat Kanjilal <sx...@hotmail.com> on 2016/04/27 01:17:46 UTC

RE: Mahout contributions

Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
Would love to have some pointers to get started?Regards

From: sxk1969@hotmail.com
To: dev@mahout.apache.org
Subject: Mahout contributions
Date: Wed, 30 Mar 2016 10:23:45 -0700

Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards

Re: Mahout contributions

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Andrew/Khurrum,
To be clear this project involves building some algorithms that are not yet implemented in spark based on the wiki (namely the clustering algorithms) and then integrating them into elasticsearch and kibana through a rest API.  Mahout will remain as is, I will look at Prediction.io for the integration efforts which includes visualization.

Let me know if I am missing anything.

Sent from my iPhone

> On Apr 28, 2016, at 2:48 PM, Khurrum Nasim <kh...@useitc.com> wrote:
> 
> I agree with Andrew.   Mahout should remain indigenous.  
> 
> 
> Prakash - you may want to create your own project on github using the mahout library.   
> 
> 
>> On Apr 28, 2016, at 5:43 PM, Andrew Palumbo <ap...@outlook.com> wrote:
>> 
>> I don't  think that this sort of of integration work would be a good fit directly to the Mahout project.  Mahout is more about math, algorithms and an environment to develop algorithms.  We stay away from direct platform integration.  In the past we did have some elasticsearch/mahout integration work that is not in the code base for this exact reason.  I would suggest that better places to contribute something like this may be: PIO (https://prediction.io/), or even directly as a package for spark http://spark-packages.org/ .
>> 
>> Recent projects integrating Mahout have recently been added to PIO: https://github.com/PredictionIO/template-scala-parallel-universal-recommendation.  
>> 
>> I think that the project that you are proposing would be a better fit there.
>> 
>> Thanks,
>> 
>> Andy
>> 
>> 
>> ________________________________________
>> From: Saikat Kanjilal <sx...@hotmail.com>
>> Sent: Thursday, April 28, 2016 1:50 PM
>> To: dev@mahout.apache.org
>> Subject: Re: Mahout contributions
>> 
>> I want to start with social data as an example, for example data returned from FB graph API as well user Twitter data, will send some samples later if you're interested.
>> 
>> Sent from my iPhone
>> 
>>> On Apr 28, 2016, at 10:41 AM, Khurrum Nasim <kh...@useitc.com> wrote:
>>> 
>>> 
>>> What type of JSON payload size are we talking about here ?
>>> 
>>>> On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>>> 
>>>> Because EL gives you the visualization and non Lucene type query constructs as well and also that it already has a rest API that I plan on tying into mahout.  I plan on wrapping some of the clustering algorithms that I implement using Mahout and Spark as a service which can then make calls into other services (namely elasticsearch and neo4j graph service).
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <kh...@useitc.com> wrote:
>>>>> 
>>>>> @Saikat- why use EL instead of Lucene directly.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>>>>> 
>>>>>> This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
>>>>>> Thank you again
>>>>>> 
>>>>>>> From: ap.dev@outlook.com
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: Re: Mahout contributions
>>>>>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>>>>>>> 
>>>>>>> Saikat,
>>>>>>> 
>>>>>>> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.
>>>>>>> 
>>>>>>> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
>>>>>>> 
>>>>>>> http://mahout.apache.org/developers/how-to-contribute.html
>>>>>>> 
>>>>>>> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail.
>>>>>>> 
>>>>>>> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it.
>>>>>>> 
>>>>>>> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.
>>>>>>> 
>>>>>>> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
>>>>>>> 
>>>>>>> Andy
>>>>>>> 
>>>>>>> ________________________________________
>>>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>>>> Sent: Wednesday, April 27, 2016 8:24 PM
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: RE: Mahout contributions
>>>>>>> 
>>>>>>> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
>>>>>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
>>>>>>> Looking forward to your and other committers input.Thanks
>>>>>>> 
>>>>>>>> From: ap.dev@outlook.com
>>>>>>>> To: dev@mahout.apache.org
>>>>>>>> Subject: Re: Mahout contributions
>>>>>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>>>>>>> 
>>>>>>>> Hello Saikat,
>>>>>>>> 
>>>>>>>> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
>>>>>>>> 
>>>>>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>>>>>>> 
>>>>>>>> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
>>>>>>>> 
>>>>>>>> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
>>>>>>>> 
>>>>>>>> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>>>>>>> 
>>>>>>>> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
>>>>>>>> 
>>>>>>>> Thank You,
>>>>>>>> 
>>>>>>>> Andy
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ________________________________________
>>>>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>>>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>>>>>>> To: dev@mahout.apache.org
>>>>>>>> Subject: RE: Mahout contributions
>>>>>>>> 
>>>>>>>> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
>>>>>>>> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
>>>>>>>> Would love to have some pointers to get started?Regards
>>>>>>>> 
>>>>>>>> From: sxk1969@hotmail.com
>>>>>>>> To: dev@mahout.apache.org
>>>>>>>> Subject: Mahout contributions
>>>>>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards
>

Re: Mahout contributions

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

there might be a concept of "contrib" sub project with totally separate
code tree, some asf projects do that. that way it is easy to keep it around
if it turns out to be useful, and easy to strip off if it becomes
unsupported (sorry for pragmatic cynicism)

On Thu, Apr 28, 2016 at 2:48 PM, Khurrum Nasim <kh...@useitc.com>
wrote:

> I agree with Andrew.   Mahout should remain indigenous.
>
>
> Prakash - you may want to create your own project on github using the
> mahout library.
>
>
> > On Apr 28, 2016, at 5:43 PM, Andrew Palumbo <ap...@outlook.com> wrote:
> >
> > I don't  think that this sort of of integration work would be a good fit
> directly to the Mahout project.  Mahout is more about math, algorithms and
> an environment to develop algorithms.  We stay away from direct platform
> integration.  In the past we did have some elasticsearch/mahout integration
> work that is not in the code base for this exact reason.  I would suggest
> that better places to contribute something like this may be: PIO (
> https://prediction.io/), or even directly as a package for spark
> http://spark-packages.org/ .
> >
> > Recent projects integrating Mahout have recently been added to PIO:
> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
> .
> >
> > I think that the project that you are proposing would be a better fit
> there.
> >
> > Thanks,
> >
> > Andy
> >
> >
> > ________________________________________
> > From: Saikat Kanjilal <sx...@hotmail.com>
> > Sent: Thursday, April 28, 2016 1:50 PM
> > To: dev@mahout.apache.org
> > Subject: Re: Mahout contributions
> >
> > I want to start with social data as an example, for example data
> returned from FB graph API as well user Twitter data, will send some
> samples later if you're interested.
> >
> > Sent from my iPhone
> >
> >> On Apr 28, 2016, at 10:41 AM, Khurrum Nasim <kh...@useitc.com>
> wrote:
> >>
> >>
> >> What type of JSON payload size are we talking about here ?
> >>
> >>> On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal <sx...@hotmail.com>
> wrote:
> >>>
> >>> Because EL gives you the visualization and non Lucene type query
> constructs as well and also that it already has a rest API that I plan on
> tying into mahout.  I plan on wrapping some of the clustering algorithms
> that I implement using Mahout and Spark as a service which can then make
> calls into other services (namely elasticsearch and neo4j graph service).
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <kh...@useitc.com>
> wrote:
> >>>>
> >>>> @Saikat- why use EL instead of Lucene directly.
> >>>>
> >>>>
> >>>>
> >>>>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sx...@hotmail.com>
> wrote:
> >>>>>
> >>>>> This is great information thank you, based on this recommendation I
> won't create a JIRA but start work on my project and when the code
> approaches the percentages you are describing I will create the appropriate
> JIRA's and put together a proposal to send to the list, sound ok?  Based on
> your latest updates to the wiki i will work on a handful of the clustering
> algorithms since I see that the Spark implementations for these are not yet
> complete.
> >>>>> Thank you again
> >>>>>
> >>>>>> From: ap.dev@outlook.com
> >>>>>> To: dev@mahout.apache.org
> >>>>>> Subject: Re: Mahout contributions
> >>>>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
> >>>>>>
> >>>>>> Saikat,
> >>>>>>
> >>>>>> One other thing that I should say is that you do not need clearance
> or input from the committers to begin work on your project, and the
> interest can and should come from the community as a whole. You can write
> proposal as you've done, and if you don't see any "+1"s or responses from
> the community at whole with in a few days, you may want to explain in more
> detail, give examples and use cases.  If you are still not seeing +1s or
> any responses from others then I think you can assume that there may not be
> interest; this is usually how things work.
> >>>>>>
> >>>>>> However if its something that your passionate about and you feel
> like you can deliver this should not to stop you.  People do not always
> read the dev@ emails or have time to respond.  You can still move forward
> with your proposed contribution by following the steps laid out in my
> previous email; follow the protocol at:
> >>>>>>
> >>>>>> http://mahout.apache.org/developers/how-to-contribute.html
> >>>>>>
> >>>>>> and create a JIRA.  When you have reached a significant amount of
> completion (around 70-80%), open a PR for review, this way you can explain
> in more detail.
> >>>>>>
> >>>>>> But please realize that when you open a JIRA for a new issue there
> is some expectation of a commitment on your part to complete it.
> >>>>>>
> >>>>>> For example, I am currently investigating some new plotting
> features.  I have spent a good deal of time this week and last already and
> am even mocking up code as a sketch of what may become an implementation
> before I open a "New Feature" JIRA for it.
> >>>>>>
> >>>>>> My point is absolutely not to discourage you or anybody else from
> opening JIRAs for new features, rather to let you know that when you open
> an JIRA for a new issue, It tells others that your are working on it, and
> thus may discourage another with a similar idea to contribute this
> feature.  So it is best to open it once you've begun your work and are
> committed to it.
> >>>>>>
> >>>>>> Andy
> >>>>>>
> >>>>>> ________________________________________
> >>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
> >>>>>> Sent: Wednesday, April 27, 2016 8:24 PM
> >>>>>> To: dev@mahout.apache.org
> >>>>>> Subject: RE: Mahout contributions
> >>>>>>
> >>>>>> Andrew,Thank you very much for your input, I actually want to start
> a new set of JIRAs, here's what I want to work on, I want to build a
> framework that ties together search/visualization capability with some
> machine learning algorithms, so essentially think of it as tying in
> elasticsearch and kibana  into mahout , the user can search for their data
> with elasticsearch and for deeper analysis on that data they can feed that
> data into one or more mahout backends for analysis.  Another interesting
> tie in might be to hack kibana to render ggplot like graphics based on the
> output of mahout algorithms (assuming this can be a kibana plugin).
> >>>>>> Before I go hog wild to create a bunch of JIRA's I'd like to know
> if there's interest in this initiative.  The tool will bring together the
> ELK stack with dynamic machine learning algorithms.  I can go into a lot
> more detail around use cases if there's enough interest.
> >>>>>> Looking forward to your and other committers input.Thanks
> >>>>>>
> >>>>>>> From: ap.dev@outlook.com
> >>>>>>> To: dev@mahout.apache.org
> >>>>>>> Subject: Re: Mahout contributions
> >>>>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
> >>>>>>>
> >>>>>>> Hello Saikat,
> >>>>>>>
> >>>>>>> #1 and #2 above are already implemented.  #4 is tricky so i would
> not recommend without a strong knowledge of the codebase, and #5 is now
> deprecated.  (I've just updated the algorithms grid to reflect this).  The
> algorithms page includes both algorithms implemented in the math-scala
> library and algorithms which have CLI drivers written for them.
> >>>>>>>
> >>>>>>> Please see:
> http://mahout.apache.org/developers/how-to-contribute.html
> >>>>>>>
> >>>>>>> And please note that per that documentation, it is in everybody's
> best interest to keep messages on list, contacting committers directly is
> discouraged.
> >>>>>>>
> >>>>>>> The best way to contribute (if you have not found a new bug or
> issue) would be for you to pick a single open issue in the mahout JIRA
> which is not already assigned, and start work on it.  When your work is
> ready for review, just open up a PR and the committers will review it.
> Please note that if you do pick up an issue to work on, we do expect some
> amount of responsibility and reliability and tangible amount of
> satisfactory work since once you've marked a JIRA as something you're
> working on, others will pass on it.
> >>>>>>>
> >>>>>>> Another good way to contribute would be to look for enhancements
> that could make to existing code not necessarily open JIRAs that need to be
> assigned to you.  For example please see the recent contribution and
> workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
> >>>>>>>
> >>>>>>> If you have something new that you'd like to implement, simply
> start a new JIRA issue and begin work on it.  In this case, when you have
> some code that is ready for review,  you can simply open up a PR for it and
> committers will review it.  For new implementations, we generally say that
> you should do this when you are at least 70-80% finished with your coding.
> >>>>>>>
> >>>>>>> Thank You,
> >>>>>>>
> >>>>>>> Andy
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> ________________________________________
> >>>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
> >>>>>>> Sent: Tuesday, April 26, 2016 7:17 PM
> >>>>>>> To: dev@mahout.apache.org
> >>>>>>> Subject: RE: Mahout contributions
> >>>>>>>
> >>>>>>> Hello,Following up on my last email with more specifics,  I've
> looked through the wiki (
> https://mahout.apache.org/users/basics/algorithms.html) and I'm
> interested in implementing the one or more of the following algorithms with
> Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3)
> Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5)
> Lucene integration.
> >>>>>>> Had a few questions:1) Which of these should I start with and
> where is there the greatest need?2) Should I fork the repo and create
> branches for the each of the above implementations?3) Should I go ahead and
> create some JIRAs for these?
> >>>>>>> Would love to have some pointers to get started?Regards
> >>>>>>>
> >>>>>>> From: sxk1969@hotmail.com
> >>>>>>> To: dev@mahout.apache.org
> >>>>>>> Subject: Mahout contributions
> >>>>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Hello Committers,I was looking through the current jira tickets
> and was wondering if there's a particular area of Mahout that needs some
> more help than others, should I focus on contributing some algorithms usign
> DSL or Samsara related efforts, I've finally got some bandwidth to do some
> work and would love some guidance before assigning myself some
> tickets.Regards
> >>
>
>

Re: Mahout contributions

Posted by Khurrum Nasim <kh...@useitc.com>.

I agree with Andrew.   Mahout should remain indigenous.  


Prakash - you may want to create your own project on github using the mahout library.   


> On Apr 28, 2016, at 5:43 PM, Andrew Palumbo <ap...@outlook.com> wrote:
> 
> I don't  think that this sort of of integration work would be a good fit directly to the Mahout project.  Mahout is more about math, algorithms and an environment to develop algorithms.  We stay away from direct platform integration.  In the past we did have some elasticsearch/mahout integration work that is not in the code base for this exact reason.  I would suggest that better places to contribute something like this may be: PIO (https://prediction.io/), or even directly as a package for spark http://spark-packages.org/ .
> 
> Recent projects integrating Mahout have recently been added to PIO: https://github.com/PredictionIO/template-scala-parallel-universal-recommendation.  
> 
> I think that the project that you are proposing would be a better fit there.
> 
> Thanks,
> 
> Andy
> 
> 
> ________________________________________
> From: Saikat Kanjilal <sx...@hotmail.com>
> Sent: Thursday, April 28, 2016 1:50 PM
> To: dev@mahout.apache.org
> Subject: Re: Mahout contributions
> 
> I want to start with social data as an example, for example data returned from FB graph API as well user Twitter data, will send some samples later if you're interested.
> 
> Sent from my iPhone
> 
>> On Apr 28, 2016, at 10:41 AM, Khurrum Nasim <kh...@useitc.com> wrote:
>> 
>> 
>> What type of JSON payload size are we talking about here ?
>> 
>>> On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>> 
>>> Because EL gives you the visualization and non Lucene type query constructs as well and also that it already has a rest API that I plan on tying into mahout.  I plan on wrapping some of the clustering algorithms that I implement using Mahout and Spark as a service which can then make calls into other services (namely elasticsearch and neo4j graph service).
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <kh...@useitc.com> wrote:
>>>> 
>>>> @Saikat- why use EL instead of Lucene directly.
>>>> 
>>>> 
>>>> 
>>>>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>>>> 
>>>>> This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
>>>>> Thank you again
>>>>> 
>>>>>> From: ap.dev@outlook.com
>>>>>> To: dev@mahout.apache.org
>>>>>> Subject: Re: Mahout contributions
>>>>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>>>>>> 
>>>>>> Saikat,
>>>>>> 
>>>>>> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.
>>>>>> 
>>>>>> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
>>>>>> 
>>>>>> http://mahout.apache.org/developers/how-to-contribute.html
>>>>>> 
>>>>>> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail.
>>>>>> 
>>>>>> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it.
>>>>>> 
>>>>>> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.
>>>>>> 
>>>>>> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
>>>>>> 
>>>>>> Andy
>>>>>> 
>>>>>> ________________________________________
>>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>>> Sent: Wednesday, April 27, 2016 8:24 PM
>>>>>> To: dev@mahout.apache.org
>>>>>> Subject: RE: Mahout contributions
>>>>>> 
>>>>>> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
>>>>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
>>>>>> Looking forward to your and other committers input.Thanks
>>>>>> 
>>>>>>> From: ap.dev@outlook.com
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: Re: Mahout contributions
>>>>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>>>>>> 
>>>>>>> Hello Saikat,
>>>>>>> 
>>>>>>> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
>>>>>>> 
>>>>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>>>>>> 
>>>>>>> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
>>>>>>> 
>>>>>>> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
>>>>>>> 
>>>>>>> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>>>>>> 
>>>>>>> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
>>>>>>> 
>>>>>>> Thank You,
>>>>>>> 
>>>>>>> Andy
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ________________________________________
>>>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: RE: Mahout contributions
>>>>>>> 
>>>>>>> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
>>>>>>> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
>>>>>>> Would love to have some pointers to get started?Regards
>>>>>>> 
>>>>>>> From: sxk1969@hotmail.com
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: Mahout contributions
>>>>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards
>>

Re: Mahout contributions

Posted by Khurrum Nasim <kh...@useitc.com>.

@Saikat - One thing I shall say is that REST is slow.  There is latency because of deserialization overhead.  For very large datasets probably not very good to use REST.  


> On Apr 30, 2016, at 2:35 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> 
> Andrew et al,I wanted to ask about a few items while I'm researching my dev proposal, so what I'm looking to build is a streaming analytics platform to do things like collaborative filtering and anomaly detection on large amounts of streaming data that are either generated from events (kafka) or through a firehose like Amazon Kinesis, my initial thinking is that this pipe of events/data would be connected to a rest API that sits on top of mahout, the backend underneath mahout would use a hybrid form of spark as well as spark streaming, I'm wondering whether Samsara was designed from the ground up to deal with large amounts of streaming data or whether this is not a use case targeted yet.  My goal is to build a platform with several data sources/sinks and produce intermediate checkpoints where transformations are applied to the data before once again sending to a set of sinks/sources.  Therefore the potential fits into and out of mahout include:
> 1) A rest API that leverages spray and akka and invokes one or more algorithms in mahout2) A runtime environment with scala actors that allows one to either ingest data or perform transformations on data through the use of various classification and clustering algorithms, the runtime environment would ingest algorithms using mahout as a library3) A rich set of actors dealing with various no sql and graph based datastores (cassandra/neo4j/titan/mongo)
> 
> Some insight into Samsara would be great as I'm trying to understand the entry points into mahout.
> Thanks in advance.
> 
>> From: ap.dev@outlook.com
>> To: dev@mahout.apache.org
>> Subject: Re: Mahout contributions
>> Date: Thu, 28 Apr 2016 21:43:19 +0000
>> 
>> I don't  think that this sort of of integration work would be a good fit directly to the Mahout project.  Mahout is more about math, algorithms and an environment to develop algorithms.  We stay away from direct platform integration.  In the past we did have some elasticsearch/mahout integration work that is not in the code base for this exact reason.  I would suggest that better places to contribute something like this may be: PIO (https://prediction.io/), or even directly as a package for spark http://spark-packages.org/ .
>> 
>> Recent projects integrating Mahout have recently been added to PIO: https://github.com/PredictionIO/template-scala-parallel-universal-recommendation.  
>> 
>> I think that the project that you are proposing would be a better fit there.
>> 
>> Thanks,
>> 
>> Andy
>> 
>> 
>> ________________________________________
>> From: Saikat Kanjilal <sx...@hotmail.com>
>> Sent: Thursday, April 28, 2016 1:50 PM
>> To: dev@mahout.apache.org
>> Subject: Re: Mahout contributions
>> 
>> I want to start with social data as an example, for example data returned from FB graph API as well user Twitter data, will send some samples later if you're interested.
>> 
>> Sent from my iPhone
>> 
>>> On Apr 28, 2016, at 10:41 AM, Khurrum Nasim <kh...@useitc.com> wrote:
>>> 
>>> 
>>> What type of JSON payload size are we talking about here ?
>>> 
>>>> On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>>> 
>>>> Because EL gives you the visualization and non Lucene type query constructs as well and also that it already has a rest API that I plan on tying into mahout.  I plan on wrapping some of the clustering algorithms that I implement using Mahout and Spark as a service which can then make calls into other services (namely elasticsearch and neo4j graph service).
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <kh...@useitc.com> wrote:
>>>>> 
>>>>> @Saikat- why use EL instead of Lucene directly.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>>>>> 
>>>>>> This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
>>>>>> Thank you again
>>>>>> 
>>>>>>> From: ap.dev@outlook.com
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: Re: Mahout contributions
>>>>>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>>>>>>> 
>>>>>>> Saikat,
>>>>>>> 
>>>>>>> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.
>>>>>>> 
>>>>>>> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
>>>>>>> 
>>>>>>> http://mahout.apache.org/developers/how-to-contribute.html
>>>>>>> 
>>>>>>> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail.
>>>>>>> 
>>>>>>> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it.
>>>>>>> 
>>>>>>> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.
>>>>>>> 
>>>>>>> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
>>>>>>> 
>>>>>>> Andy
>>>>>>> 
>>>>>>> ________________________________________
>>>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>>>> Sent: Wednesday, April 27, 2016 8:24 PM
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: RE: Mahout contributions
>>>>>>> 
>>>>>>> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
>>>>>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
>>>>>>> Looking forward to your and other committers input.Thanks
>>>>>>> 
>>>>>>>> From: ap.dev@outlook.com
>>>>>>>> To: dev@mahout.apache.org
>>>>>>>> Subject: Re: Mahout contributions
>>>>>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>>>>>>> 
>>>>>>>> Hello Saikat,
>>>>>>>> 
>>>>>>>> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
>>>>>>>> 
>>>>>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>>>>>>> 
>>>>>>>> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
>>>>>>>> 
>>>>>>>> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
>>>>>>>> 
>>>>>>>> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>>>>>>> 
>>>>>>>> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
>>>>>>>> 
>>>>>>>> Thank You,
>>>>>>>> 
>>>>>>>> Andy
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ________________________________________
>>>>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>>>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>>>>>>> To: dev@mahout.apache.org
>>>>>>>> Subject: RE: Mahout contributions
>>>>>>>> 
>>>>>>>> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
>>>>>>>> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
>>>>>>>> Would love to have some pointers to get started?Regards
>>>>>>>> 
>>>>>>>> From: sxk1969@hotmail.com
>>>>>>>> To: dev@mahout.apache.org
>>>>>>>> Subject: Mahout contributions
>>>>>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards
>>> 
>

RE: Mahout contributions

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Andrew et al,I wanted to ask about a few items while I'm researching my dev proposal, so what I'm looking to build is a streaming analytics platform to do things like collaborative filtering and anomaly detection on large amounts of streaming data that are either generated from events (kafka) or through a firehose like Amazon Kinesis, my initial thinking is that this pipe of events/data would be connected to a rest API that sits on top of mahout, the backend underneath mahout would use a hybrid form of spark as well as spark streaming, I'm wondering whether Samsara was designed from the ground up to deal with large amounts of streaming data or whether this is not a use case targeted yet.  My goal is to build a platform with several data sources/sinks and produce intermediate checkpoints where transformations are applied to the data before once again sending to a set of sinks/sources.  Therefore the potential fits into and out of mahout include:
1) A rest API that leverages spray and akka and invokes one or more algorithms in mahout2) A runtime environment with scala actors that allows one to either ingest data or perform transformations on data through the use of various classification and clustering algorithms, the runtime environment would ingest algorithms using mahout as a library3) A rich set of actors dealing with various no sql and graph based datastores (cassandra/neo4j/titan/mongo)

Some insight into Samsara would be great as I'm trying to understand the entry points into mahout.
Thanks in advance.

> From: ap.dev@outlook.com
> To: dev@mahout.apache.org
> Subject: Re: Mahout contributions
> Date: Thu, 28 Apr 2016 21:43:19 +0000
> 
> I don't  think that this sort of of integration work would be a good fit directly to the Mahout project.  Mahout is more about math, algorithms and an environment to develop algorithms.  We stay away from direct platform integration.  In the past we did have some elasticsearch/mahout integration work that is not in the code base for this exact reason.  I would suggest that better places to contribute something like this may be: PIO (https://prediction.io/), or even directly as a package for spark http://spark-packages.org/ .
> 
> Recent projects integrating Mahout have recently been added to PIO: https://github.com/PredictionIO/template-scala-parallel-universal-recommendation.  
> 
> I think that the project that you are proposing would be a better fit there.
> 
> Thanks,
> 
> Andy
>  
> 
> ________________________________________
> From: Saikat Kanjilal <sx...@hotmail.com>
> Sent: Thursday, April 28, 2016 1:50 PM
> To: dev@mahout.apache.org
> Subject: Re: Mahout contributions
> 
> I want to start with social data as an example, for example data returned from FB graph API as well user Twitter data, will send some samples later if you're interested.
> 
> Sent from my iPhone
> 
> > On Apr 28, 2016, at 10:41 AM, Khurrum Nasim <kh...@useitc.com> wrote:
> >
> >
> > What type of JSON payload size are we talking about here ?
> >
> >> On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> >>
> >> Because EL gives you the visualization and non Lucene type query constructs as well and also that it already has a rest API that I plan on tying into mahout.  I plan on wrapping some of the clustering algorithms that I implement using Mahout and Spark as a service which can then make calls into other services (namely elasticsearch and neo4j graph service).
> >>
> >> Sent from my iPhone
> >>
> >>> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <kh...@useitc.com> wrote:
> >>>
> >>> @Saikat- why use EL instead of Lucene directly.
> >>>
> >>>
> >>>
> >>>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> >>>>
> >>>> This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
> >>>> Thank you again
> >>>>
> >>>>> From: ap.dev@outlook.com
> >>>>> To: dev@mahout.apache.org
> >>>>> Subject: Re: Mahout contributions
> >>>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
> >>>>>
> >>>>> Saikat,
> >>>>>
> >>>>> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.
> >>>>>
> >>>>> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
> >>>>>
> >>>>> http://mahout.apache.org/developers/how-to-contribute.html
> >>>>>
> >>>>> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail.
> >>>>>
> >>>>> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it.
> >>>>>
> >>>>> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.
> >>>>>
> >>>>> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
> >>>>>
> >>>>> Andy
> >>>>>
> >>>>> ________________________________________
> >>>>> From: Saikat Kanjilal <sx...@hotmail.com>
> >>>>> Sent: Wednesday, April 27, 2016 8:24 PM
> >>>>> To: dev@mahout.apache.org
> >>>>> Subject: RE: Mahout contributions
> >>>>>
> >>>>> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
> >>>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
> >>>>> Looking forward to your and other committers input.Thanks
> >>>>>
> >>>>>> From: ap.dev@outlook.com
> >>>>>> To: dev@mahout.apache.org
> >>>>>> Subject: Re: Mahout contributions
> >>>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
> >>>>>>
> >>>>>> Hello Saikat,
> >>>>>>
> >>>>>> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
> >>>>>>
> >>>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
> >>>>>>
> >>>>>> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
> >>>>>>
> >>>>>> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
> >>>>>>
> >>>>>> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
> >>>>>>
> >>>>>> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
> >>>>>>
> >>>>>> Thank You,
> >>>>>>
> >>>>>> Andy
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ________________________________________
> >>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
> >>>>>> Sent: Tuesday, April 26, 2016 7:17 PM
> >>>>>> To: dev@mahout.apache.org
> >>>>>> Subject: RE: Mahout contributions
> >>>>>>
> >>>>>> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
> >>>>>> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
> >>>>>> Would love to have some pointers to get started?Regards
> >>>>>>
> >>>>>> From: sxk1969@hotmail.com
> >>>>>> To: dev@mahout.apache.org
> >>>>>> Subject: Mahout contributions
> >>>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards
> >

Re: Mahout contributions

Posted by Andrew Palumbo <ap...@outlook.com>.

I don't  think that this sort of of integration work would be a good fit directly to the Mahout project.  Mahout is more about math, algorithms and an environment to develop algorithms.  We stay away from direct platform integration.  In the past we did have some elasticsearch/mahout integration work that is not in the code base for this exact reason.  I would suggest that better places to contribute something like this may be: PIO (https://prediction.io/), or even directly as a package for spark http://spark-packages.org/ .

Recent projects integrating Mahout have recently been added to PIO: https://github.com/PredictionIO/template-scala-parallel-universal-recommendation.  

I think that the project that you are proposing would be a better fit there.

Thanks,

Andy
 

________________________________________
From: Saikat Kanjilal <sx...@hotmail.com>
Sent: Thursday, April 28, 2016 1:50 PM
To: dev@mahout.apache.org
Subject: Re: Mahout contributions

I want to start with social data as an example, for example data returned from FB graph API as well user Twitter data, will send some samples later if you're interested.

Sent from my iPhone

> On Apr 28, 2016, at 10:41 AM, Khurrum Nasim <kh...@useitc.com> wrote:
>
>
> What type of JSON payload size are we talking about here ?
>
>> On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>
>> Because EL gives you the visualization and non Lucene type query constructs as well and also that it already has a rest API that I plan on tying into mahout.  I plan on wrapping some of the clustering algorithms that I implement using Mahout and Spark as a service which can then make calls into other services (namely elasticsearch and neo4j graph service).
>>
>> Sent from my iPhone
>>
>>> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <kh...@useitc.com> wrote:
>>>
>>> @Saikat- why use EL instead of Lucene directly.
>>>
>>>
>>>
>>>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>>>
>>>> This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
>>>> Thank you again
>>>>
>>>>> From: ap.dev@outlook.com
>>>>> To: dev@mahout.apache.org
>>>>> Subject: Re: Mahout contributions
>>>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>>>>>
>>>>> Saikat,
>>>>>
>>>>> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.
>>>>>
>>>>> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
>>>>>
>>>>> http://mahout.apache.org/developers/how-to-contribute.html
>>>>>
>>>>> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail.
>>>>>
>>>>> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it.
>>>>>
>>>>> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.
>>>>>
>>>>> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
>>>>>
>>>>> Andy
>>>>>
>>>>> ________________________________________
>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>> Sent: Wednesday, April 27, 2016 8:24 PM
>>>>> To: dev@mahout.apache.org
>>>>> Subject: RE: Mahout contributions
>>>>>
>>>>> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
>>>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
>>>>> Looking forward to your and other committers input.Thanks
>>>>>
>>>>>> From: ap.dev@outlook.com
>>>>>> To: dev@mahout.apache.org
>>>>>> Subject: Re: Mahout contributions
>>>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>>>>>
>>>>>> Hello Saikat,
>>>>>>
>>>>>> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
>>>>>>
>>>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>>>>>
>>>>>> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
>>>>>>
>>>>>> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
>>>>>>
>>>>>> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>>>>>
>>>>>> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
>>>>>>
>>>>>> Thank You,
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>>>>> To: dev@mahout.apache.org
>>>>>> Subject: RE: Mahout contributions
>>>>>>
>>>>>> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
>>>>>> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
>>>>>> Would love to have some pointers to get started?Regards
>>>>>>
>>>>>> From: sxk1969@hotmail.com
>>>>>> To: dev@mahout.apache.org
>>>>>> Subject: Mahout contributions
>>>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards
>

Re: Mahout contributions

Posted by Saikat Kanjilal <sx...@hotmail.com>.

I want to start with social data as an example, for example data returned from FB graph API as well user Twitter data, will send some samples later if you're interested.

Sent from my iPhone

> On Apr 28, 2016, at 10:41 AM, Khurrum Nasim <kh...@useitc.com> wrote:
> 
> 
> What type of JSON payload size are we talking about here ?
> 
>> On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>> 
>> Because EL gives you the visualization and non Lucene type query constructs as well and also that it already has a rest API that I plan on tying into mahout.  I plan on wrapping some of the clustering algorithms that I implement using Mahout and Spark as a service which can then make calls into other services (namely elasticsearch and neo4j graph service).
>> 
>> Sent from my iPhone
>> 
>>> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <kh...@useitc.com> wrote:
>>> 
>>> @Saikat- why use EL instead of Lucene directly. 
>>> 
>>> 
>>> 
>>>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>>> 
>>>> This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
>>>> Thank you again
>>>> 
>>>>> From: ap.dev@outlook.com
>>>>> To: dev@mahout.apache.org
>>>>> Subject: Re: Mahout contributions
>>>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>>>>> 
>>>>> Saikat, 
>>>>> 
>>>>> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.  
>>>>> 
>>>>> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
>>>>> 
>>>>> http://mahout.apache.org/developers/how-to-contribute.html
>>>>> 
>>>>> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail. 
>>>>> 
>>>>> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it. 
>>>>> 
>>>>> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.    
>>>>> 
>>>>> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
>>>>> 
>>>>> Andy
>>>>> 
>>>>> ________________________________________
>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>> Sent: Wednesday, April 27, 2016 8:24 PM
>>>>> To: dev@mahout.apache.org
>>>>> Subject: RE: Mahout contributions
>>>>> 
>>>>> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
>>>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
>>>>> Looking forward to your and other committers input.Thanks
>>>>> 
>>>>>> From: ap.dev@outlook.com
>>>>>> To: dev@mahout.apache.org
>>>>>> Subject: Re: Mahout contributions
>>>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>>>>> 
>>>>>> Hello Saikat,
>>>>>> 
>>>>>> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
>>>>>> 
>>>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>>>>> 
>>>>>> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
>>>>>> 
>>>>>> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
>>>>>> 
>>>>>> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>>>>> 
>>>>>> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
>>>>>> 
>>>>>> Thank You,
>>>>>> 
>>>>>> Andy
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ________________________________________
>>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>>>>> To: dev@mahout.apache.org
>>>>>> Subject: RE: Mahout contributions
>>>>>> 
>>>>>> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
>>>>>> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
>>>>>> Would love to have some pointers to get started?Regards
>>>>>> 
>>>>>> From: sxk1969@hotmail.com
>>>>>> To: dev@mahout.apache.org
>>>>>> Subject: Mahout contributions
>>>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards
>

Re: Mahout contributions

Posted by Khurrum Nasim <kh...@useitc.com>.

What type of JSON payload size are we talking about here ?

> On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> 
> Because EL gives you the visualization and non Lucene type query constructs as well and also that it already has a rest API that I plan on tying into mahout.  I plan on wrapping some of the clustering algorithms that I implement using Mahout and Spark as a service which can then make calls into other services (namely elasticsearch and neo4j graph service).
> 
> Sent from my iPhone
> 
>> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <kh...@useitc.com> wrote:
>> 
>> @Saikat- why use EL instead of Lucene directly. 
>> 
>> 
>> 
>>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>>> 
>>> This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
>>> Thank you again
>>> 
>>>> From: ap.dev@outlook.com
>>>> To: dev@mahout.apache.org
>>>> Subject: Re: Mahout contributions
>>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>>>> 
>>>> Saikat, 
>>>> 
>>>> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.  
>>>> 
>>>> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
>>>> 
>>>> http://mahout.apache.org/developers/how-to-contribute.html
>>>> 
>>>> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail. 
>>>> 
>>>> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it. 
>>>> 
>>>> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.    
>>>> 
>>>> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
>>>> 
>>>> Andy
>>>> 
>>>> ________________________________________
>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>> Sent: Wednesday, April 27, 2016 8:24 PM
>>>> To: dev@mahout.apache.org
>>>> Subject: RE: Mahout contributions
>>>> 
>>>> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
>>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
>>>> Looking forward to your and other committers input.Thanks
>>>> 
>>>>> From: ap.dev@outlook.com
>>>>> To: dev@mahout.apache.org
>>>>> Subject: Re: Mahout contributions
>>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>>>> 
>>>>> Hello Saikat,
>>>>> 
>>>>> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
>>>>> 
>>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>>>> 
>>>>> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
>>>>> 
>>>>> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
>>>>> 
>>>>> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>>>> 
>>>>> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
>>>>> 
>>>>> Thank You,
>>>>> 
>>>>> Andy
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________________
>>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>>>> To: dev@mahout.apache.org
>>>>> Subject: RE: Mahout contributions
>>>>> 
>>>>> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
>>>>> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
>>>>> Would love to have some pointers to get started?Regards
>>>>> 
>>>>> From: sxk1969@hotmail.com
>>>>> To: dev@mahout.apache.org
>>>>> Subject: Mahout contributions
>>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards
>>

Re: Mahout contributions

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Because EL gives you the visualization and non Lucene type query constructs as well and also that it already has a rest API that I plan on tying into mahout.  I plan on wrapping some of the clustering algorithms that I implement using Mahout and Spark as a service which can then make calls into other services (namely elasticsearch and neo4j graph service).

Sent from my iPhone

> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <kh...@useitc.com> wrote:
> 
> @Saikat- why use EL instead of Lucene directly. 
> 
> 
> 
>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
>> 
>> This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
>> Thank you again
>> 
>>> From: ap.dev@outlook.com
>>> To: dev@mahout.apache.org
>>> Subject: Re: Mahout contributions
>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>>> 
>>> Saikat, 
>>> 
>>> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.  
>>> 
>>> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
>>> 
>>> http://mahout.apache.org/developers/how-to-contribute.html
>>> 
>>> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail. 
>>> 
>>> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it. 
>>> 
>>> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.    
>>> 
>>> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
>>> 
>>> Andy
>>> 
>>> ________________________________________
>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>> Sent: Wednesday, April 27, 2016 8:24 PM
>>> To: dev@mahout.apache.org
>>> Subject: RE: Mahout contributions
>>> 
>>> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
>>> Looking forward to your and other committers input.Thanks
>>> 
>>>> From: ap.dev@outlook.com
>>>> To: dev@mahout.apache.org
>>>> Subject: Re: Mahout contributions
>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>>> 
>>>> Hello Saikat,
>>>> 
>>>> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
>>>> 
>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>>> 
>>>> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
>>>> 
>>>> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
>>>> 
>>>> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>>> 
>>>> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
>>>> 
>>>> Thank You,
>>>> 
>>>> Andy
>>>> 
>>>> 
>>>> 
>>>> ________________________________________
>>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>>> To: dev@mahout.apache.org
>>>> Subject: RE: Mahout contributions
>>>> 
>>>> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
>>>> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
>>>> Would love to have some pointers to get started?Regards
>>>> 
>>>> From: sxk1969@hotmail.com
>>>> To: dev@mahout.apache.org
>>>> Subject: Mahout contributions
>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards
>

Re: Mahout contributions

Posted by Khurrum Nasim <kh...@useitc.com>.

@Saikat- why use EL instead of Lucene directly. 



> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:
> 
> This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
> Thank you again
> 
>> From: ap.dev@outlook.com
>> To: dev@mahout.apache.org
>> Subject: Re: Mahout contributions
>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>> 
>> Saikat, 
>> 
>> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.  
>> 
>> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
>> 
>> http://mahout.apache.org/developers/how-to-contribute.html
>> 
>> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail. 
>> 
>> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it. 
>> 
>> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.    
>> 
>> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
>> 
>> Andy
>> 
>> ________________________________________
>> From: Saikat Kanjilal <sx...@hotmail.com>
>> Sent: Wednesday, April 27, 2016 8:24 PM
>> To: dev@mahout.apache.org
>> Subject: RE: Mahout contributions
>> 
>> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
>> Looking forward to your and other committers input.Thanks
>> 
>>> From: ap.dev@outlook.com
>>> To: dev@mahout.apache.org
>>> Subject: Re: Mahout contributions
>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>> 
>>> Hello Saikat,
>>> 
>>> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
>>> 
>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>> 
>>> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
>>> 
>>> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
>>> 
>>> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>> 
>>> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
>>> 
>>> Thank You,
>>> 
>>> Andy
>>> 
>>> 
>>> 
>>> ________________________________________
>>> From: Saikat Kanjilal <sx...@hotmail.com>
>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>> To: dev@mahout.apache.org
>>> Subject: RE: Mahout contributions
>>> 
>>> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
>>> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
>>> Would love to have some pointers to get started?Regards
>>> 
>>> From: sxk1969@hotmail.com
>>> To: dev@mahout.apache.org
>>> Subject: Mahout contributions
>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>> 
>>> 
>>> 
>>> 
>>> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards
>

RE: Mahout contributions

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Thanks, this helps, I hope to have a proposal to dev outlining some use cases in the next few weeks.

> From: ap.dev@outlook.com
> To: dev@mahout.apache.org
> Subject: Re: Mahout contributions
> Date: Fri, 29 Apr 2016 00:03:41 +0000
> 
> One last thing, Saikat, in answer to your question below.  To clarify, for proposed smaller scale mahout contributions (not on the roadmap or in currently open Jiras):
> a good workflow would be as follows:
> 
> 1. Investigate your idea independently 
> 2. Float the proposal to dev@, 
> 3. Allow some time for feedback.
> 4. Sketch out the problem independently
> 5. If you decide to go on with your work Create a JIRA
> 6. Begin work.
> 7. When you're 70%-80% (or even 100%) finished with your work, open a PR for review.
> 
> I only mention this as it seems better to open the JIRA  _Before_  you begin your work rather than after as you mention below.  As well It would probably be best not to open multiple Jiras. 
> 
> Also you might want to take a look at: http://www.apache.org/foundation/voting.html
> 
> These are ways that people can vote and give feedback. As well as rules for commiters voting in finished code.
> 
> I think that should cover it.
> 
> Andy 
> 
> ________________________________________
> From: Saikat Kanjilal <sx...@hotmail.com>
> Sent: Thursday, April 28, 2016 12:08 PM
> To: dev@mahout.apache.org
> Subject: RE: Mahout contributions
> 
> This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
> Thank you again
> 
> > From: ap.dev@outlook.com
> > To: dev@mahout.apache.org
> > Subject: Re: Mahout contributions
> > Date: Thu, 28 Apr 2016 01:31:09 +0000
> >
> > Saikat,
> >
> > One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.
> >
> > However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
> >
> > http://mahout.apache.org/developers/how-to-contribute.html
> >
> > and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail.
> >
> > But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it.
> >
> > For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.
> >
> > My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
> >
> > Andy
> >
> > ________________________________________
> > From: Saikat Kanjilal <sx...@hotmail.com>
> > Sent: Wednesday, April 27, 2016 8:24 PM
> > To: dev@mahout.apache.org
> > Subject: RE: Mahout contributions
> >
> > Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
> > Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
> > Looking forward to your and other committers input.Thanks
> >
> > > From: ap.dev@outlook.com
> > > To: dev@mahout.apache.org
> > > Subject: Re: Mahout contributions
> > > Date: Wed, 27 Apr 2016 20:16:38 +0000
> > >
> > > Hello Saikat,
> > >
> > > #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
> > >
> > > Please see: http://mahout.apache.org/developers/how-to-contribute.html
> > >
> > > And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
> > >
> > > The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
> > >
> > > Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
> > >
> > > If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
> > >
> > > Thank You,
> > >
> > > Andy
> > >
> > >
> > >
> > > ________________________________________
> > > From: Saikat Kanjilal <sx...@hotmail.com>
> > > Sent: Tuesday, April 26, 2016 7:17 PM
> > > To: dev@mahout.apache.org
> > > Subject: RE: Mahout contributions
> > >
> > > Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
> > > Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
> > > Would love to have some pointers to get started?Regards
> > >
> > > From: sxk1969@hotmail.com
> > > To: dev@mahout.apache.org
> > > Subject: Mahout contributions
> > > Date: Wed, 30 Mar 2016 10:23:45 -0700
> > >
> > >
> > >
> > >
> > > Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards

Re: Mahout contributions

Posted by Andrew Palumbo <ap...@outlook.com>.

One last thing, Saikat, in answer to your question below.  To clarify, for proposed smaller scale mahout contributions (not on the roadmap or in currently open Jiras):
a good workflow would be as follows:

1. Investigate your idea independently 
2. Float the proposal to dev@, 
3. Allow some time for feedback.
4. Sketch out the problem independently
5. If you decide to go on with your work Create a JIRA
6. Begin work.
7. When you're 70%-80% (or even 100%) finished with your work, open a PR for review.

I only mention this as it seems better to open the JIRA  _Before_  you begin your work rather than after as you mention below.  As well It would probably be best not to open multiple Jiras. 

Also you might want to take a look at: http://www.apache.org/foundation/voting.html

These are ways that people can vote and give feedback. As well as rules for commiters voting in finished code.

I think that should cover it.

Andy 

________________________________________
From: Saikat Kanjilal <sx...@hotmail.com>
Sent: Thursday, April 28, 2016 12:08 PM
To: dev@mahout.apache.org
Subject: RE: Mahout contributions

This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
Thank you again

> From: ap.dev@outlook.com
> To: dev@mahout.apache.org
> Subject: Re: Mahout contributions
> Date: Thu, 28 Apr 2016 01:31:09 +0000
>
> Saikat,
>
> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.
>
> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
>
> http://mahout.apache.org/developers/how-to-contribute.html
>
> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail.
>
> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it.
>
> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.
>
> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
>
> Andy
>
> ________________________________________
> From: Saikat Kanjilal <sx...@hotmail.com>
> Sent: Wednesday, April 27, 2016 8:24 PM
> To: dev@mahout.apache.org
> Subject: RE: Mahout contributions
>
> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
> Looking forward to your and other committers input.Thanks
>
> > From: ap.dev@outlook.com
> > To: dev@mahout.apache.org
> > Subject: Re: Mahout contributions
> > Date: Wed, 27 Apr 2016 20:16:38 +0000
> >
> > Hello Saikat,
> >
> > #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
> >
> > Please see: http://mahout.apache.org/developers/how-to-contribute.html
> >
> > And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
> >
> > The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
> >
> > Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
> >
> > If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
> >
> > Thank You,
> >
> > Andy
> >
> >
> >
> > ________________________________________
> > From: Saikat Kanjilal <sx...@hotmail.com>
> > Sent: Tuesday, April 26, 2016 7:17 PM
> > To: dev@mahout.apache.org
> > Subject: RE: Mahout contributions
> >
> > Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
> > Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
> > Would love to have some pointers to get started?Regards
> >
> > From: sxk1969@hotmail.com
> > To: dev@mahout.apache.org
> > Subject: Mahout contributions
> > Date: Wed, 30 Mar 2016 10:23:45 -0700
> >
> >
> >
> >
> > Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards

RE: Mahout contributions

Posted by Saikat Kanjilal <sx...@hotmail.com>.

This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms since I see that the Spark implementations for these are not yet complete.
Thank you again

> From: ap.dev@outlook.com
> To: dev@mahout.apache.org
> Subject: Re: Mahout contributions
> Date: Thu, 28 Apr 2016 01:31:09 +0000
> 
> Saikat, 
> 
> One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.  
> 
> However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
>  
> http://mahout.apache.org/developers/how-to-contribute.html
> 
> and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail. 
> 
> But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it. 
> 
> For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.    
> 
> My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
>   
> Andy
> 
> ________________________________________
> From: Saikat Kanjilal <sx...@hotmail.com>
> Sent: Wednesday, April 27, 2016 8:24 PM
> To: dev@mahout.apache.org
> Subject: RE: Mahout contributions
> 
> Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
> Looking forward to your and other committers input.Thanks
> 
> > From: ap.dev@outlook.com
> > To: dev@mahout.apache.org
> > Subject: Re: Mahout contributions
> > Date: Wed, 27 Apr 2016 20:16:38 +0000
> >
> > Hello Saikat,
> >
> > #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
> >
> > Please see: http://mahout.apache.org/developers/how-to-contribute.html
> >
> > And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
> >
> > The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
> >
> > Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
> >
> > If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
> >
> > Thank You,
> >
> > Andy
> >
> >
> >
> > ________________________________________
> > From: Saikat Kanjilal <sx...@hotmail.com>
> > Sent: Tuesday, April 26, 2016 7:17 PM
> > To: dev@mahout.apache.org
> > Subject: RE: Mahout contributions
> >
> > Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
> > Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
> > Would love to have some pointers to get started?Regards
> >
> > From: sxk1969@hotmail.com
> > To: dev@mahout.apache.org
> > Subject: Mahout contributions
> > Date: Wed, 30 Mar 2016 10:23:45 -0700
> >
> >
> >
> >
> > Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards

Re: Mahout contributions

Posted by Andrew Palumbo <ap...@outlook.com>.

Saikat, 

One other thing that I should say is that you do not need clearance or input from the committers to begin work on your project, and the interest can and should come from the community as a whole. You can write proposal as you've done, and if you don't see any "+1"s or responses from the community at whole with in a few days, you may want to explain in more detail, give examples and use cases.  If you are still not seeing +1s or any responses from others then I think you can assume that there may not be interest; this is usually how things work.  

However if its something that your passionate about and you feel like you can deliver this should not to stop you.  People do not always read the dev@ emails or have time to respond.  You can still move forward with your proposed contribution by following the steps laid out in my previous email; follow the protocol at:
 
http://mahout.apache.org/developers/how-to-contribute.html

and create a JIRA.  When you have reached a significant amount of completion (around 70-80%), open a PR for review, this way you can explain in more detail. 

But please realize that when you open a JIRA for a new issue there is some expectation of a commitment on your part to complete it. 

For example, I am currently investigating some new plotting features.  I have spent a good deal of time this week and last already and am even mocking up code as a sketch of what may become an implementation before I open a "New Feature" JIRA for it.    

My point is absolutely not to discourage you or anybody else from opening JIRAs for new features, rather to let you know that when you open an JIRA for a new issue, It tells others that your are working on it, and thus may discourage another with a similar idea to contribute this feature.  So it is best to open it once you've begun your work and are committed to it.
  
Andy

________________________________________
From: Saikat Kanjilal <sx...@hotmail.com>
Sent: Wednesday, April 27, 2016 8:24 PM
To: dev@mahout.apache.org
Subject: RE: Mahout contributions

Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
Looking forward to your and other committers input.Thanks

> From: ap.dev@outlook.com
> To: dev@mahout.apache.org
> Subject: Re: Mahout contributions
> Date: Wed, 27 Apr 2016 20:16:38 +0000
>
> Hello Saikat,
>
> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.
>
> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>
> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
>
> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
>
> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>
> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
>
> Thank You,
>
> Andy
>
>
>
> ________________________________________
> From: Saikat Kanjilal <sx...@hotmail.com>
> Sent: Tuesday, April 26, 2016 7:17 PM
> To: dev@mahout.apache.org
> Subject: RE: Mahout contributions
>
> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
> Would love to have some pointers to get started?Regards
>
> From: sxk1969@hotmail.com
> To: dev@mahout.apache.org
> Subject: Mahout contributions
> Date: Wed, 30 Mar 2016 10:23:45 -0700
>
>
>
>
> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards

RE: Mahout contributions

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Andrew,Thank you very much for your input, I actually want to start a new set of JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch and kibana  into mahout , the user can search for their data with elasticsearch and for deeper analysis on that data they can feed that data into one or more mahout backends for analysis.  Another interesting tie in might be to hack kibana to render ggplot like graphics based on the output of mahout algorithms (assuming this can be a kibana plugin).
Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest in this initiative.  The tool will bring together the ELK stack with dynamic machine learning algorithms.  I can go into a lot more detail around use cases if there's enough interest.
Looking forward to your and other committers input.Thanks

> From: ap.dev@outlook.com
> To: dev@mahout.apache.org
> Subject: Re: Mahout contributions
> Date: Wed, 27 Apr 2016 20:16:38 +0000
> 
> Hello Saikat,
> 
> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.  
> 
> Please see: http://mahout.apache.org/developers/how-to-contribute.html
> 
> And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.
> 
> The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.
> 
> Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
> 
> If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.
> 
> Thank You,
> 
> Andy
> 
> 
> 
> ________________________________________
> From: Saikat Kanjilal <sx...@hotmail.com>
> Sent: Tuesday, April 26, 2016 7:17 PM
> To: dev@mahout.apache.org
> Subject: RE: Mahout contributions
> 
> Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
> Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
> Would love to have some pointers to get started?Regards
> 
> From: sxk1969@hotmail.com
> To: dev@mahout.apache.org
> Subject: Mahout contributions
> Date: Wed, 30 Mar 2016 10:23:45 -0700
> 
> 
> 
> 
> Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards

Re: Mahout contributions

Posted by Andrew Palumbo <ap...@outlook.com>.

Hello Saikat,

#1 and #2 above are already implemented.  #4 is tricky so i would not recommend without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented in the math-scala library and algorithms which have CLI drivers written for them.  

Please see: http://mahout.apache.org/developers/how-to-contribute.html

And please note that per that documentation, it is in everybody's best interest to keep messages on list, contacting committers directly is discouraged.

The best way to contribute (if you have not found a new bug or issue) would be for you to pick a single open issue in the mahout JIRA which is not already assigned, and start work on it.  When your work is ready for review, just open up a PR and the committers will review it.  Please note that if you do pick up an issue to work on, we do expect some amount of responsibility and reliability and tangible amount of satisfactory work since once you've marked a JIRA as something you're working on, others will pass on it.

Another good way to contribute would be to look for enhancements that could make to existing code not necessarily open JIRAs that need to be assigned to you.  For example please see the recent contribution and workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .

If you have something new that you'd like to implement, simply start a new JIRA issue and begin work on it.  In this case, when you have some code that is ready for review,  you can simply open up a PR for it and committers will review it.  For new implementations, we generally say that you should do this when you are at least 70-80% finished with your coding.

Thank You,

Andy



________________________________________
From: Saikat Kanjilal <sx...@hotmail.com>
Sent: Tuesday, April 26, 2016 7:17 PM
To: dev@mahout.apache.org
Subject: RE: Mahout contributions

Hello,Following up on my last email with more specifics,  I've looked through the wiki (https://mahout.apache.org/users/basics/algorithms.html) and I'm interested in implementing the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5) Lucene integration.
Had a few questions:1) Which of these should I start with and where is there the greatest need?2) Should I fork the repo and create branches for the each of the above implementations?3) Should I go ahead and create some JIRAs for these?
Would love to have some pointers to get started?Regards

From: sxk1969@hotmail.com
To: dev@mahout.apache.org
Subject: Mahout contributions
Date: Wed, 30 Mar 2016 10:23:45 -0700




Hello Committers,I was looking through the current jira tickets and was wondering if there's a particular area of Mahout that needs some more help than others, should I focus on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards