You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Xinh Huynh <xi...@gmail.com> on 2016/06/17 16:46:13 UTC

Re: Hello

Here are some guidelines about contributing to Spark:

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

There is also a section specific to MLlib:

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-MLlib-specificContributionGuidelines

-Xinh

On Thu, Jun 16, 2016 at 9:30 AM, <my...@gmail.com> wrote:

> Dear All,
>
>
> Looking for guidance.
>
> I am Interested in contributing to the Spark MLlib. Could you please take
> a few minutes to guide me as to what you would consider an ideal path /
> skill an individual should posses.
>
> I know R / Python / Java / C and C++
>
> I have a firm understanding of algorithms and Machine learning. I do know
> spark at a "workable knowledge level".
>
> Where should I start and what should I try to do first  ( spark internal
> level ) and then pick up items on JIRA OR new specifications on Spark.
>
> R has a great set of packages - would it be difficult to migrate them to
> Spark R set. I could try it with your support or if it's desired.
>
>
> I wouldn't mind doing testing of some defects etc as an initial learning
> curve if that would assist the community.
>
> Please, guide.
>
> Regards,
> Harmeet
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Hello

Posted by Ted Yu <yu...@gmail.com>.

You can use a JIRA filter to find JIRAs of the component(s) you're
interested in.
Then sort by Priority.

Maybe comment on the JIRA if you want to work on it.

On Fri, Jun 17, 2016 at 3:22 PM, Pedro Rodriguez <sk...@gmail.com>
wrote:

> What is the best way to determine what the library maintainers believe is
> important work to be done?
>
> I have looked through the JIRA and its unclear what are priority items one
> could do work on. I am guessing this is in part because things are a little
> hectic with final work for 2.0, but it would be helpful to know what to
> look for or if its better to ask library maintainers directly.
>
> Thanks,
> Pedro Rodriguez
>
> On Fri, Jun 17, 2016 at 10:46 AM, Xinh Huynh <xi...@gmail.com> wrote:
>
>> Here are some guidelines about contributing to Spark:
>>
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>>
>> There is also a section specific to MLlib:
>>
>>
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-MLlib-specificContributionGuidelines
>>
>> -Xinh
>>
>> On Thu, Jun 16, 2016 at 9:30 AM, <my...@gmail.com> wrote:
>>
>>> Dear All,
>>>
>>>
>>> Looking for guidance.
>>>
>>> I am Interested in contributing to the Spark MLlib. Could you please
>>> take a few minutes to guide me as to what you would consider an ideal path
>>> / skill an individual should posses.
>>>
>>> I know R / Python / Java / C and C++
>>>
>>> I have a firm understanding of algorithms and Machine learning. I do
>>> know spark at a "workable knowledge level".
>>>
>>> Where should I start and what should I try to do first  ( spark internal
>>> level ) and then pick up items on JIRA OR new specifications on Spark.
>>>
>>> R has a great set of packages - would it be difficult to migrate them to
>>> Spark R set. I could try it with your support or if it's desired.
>>>
>>>
>>> I wouldn't mind doing testing of some defects etc as an initial learning
>>> curve if that would assist the community.
>>>
>>> Please, guide.
>>>
>>> Regards,
>>> Harmeet
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>
>>
>
>
> --
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
>
> ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
> Github: github.com/EntilZha | LinkedIn:
> https://www.linkedin.com/in/pedrorodriguezscience
>
>

Re: Hello

Posted by Joseph Bradley <jo...@databricks.com>.

Hi Harmeet,

I'll add one more item to the other advice: The community is in the process
of putting together a roadmap JIRA for 2.1 for ML:
https://issues.apache.org/jira/browse/SPARK-15581

This JIRA lists some of the major items and links to a few umbrella JIRAs
with subtasks.  I'd expect this roadmap to change a little more as it is
still being formed, but I hope it provides some guidance.  Feel free to
ping on specific JIRAs to ask about their current importance and to see who
else is working on them.

Thanks!
Joseph

On Fri, Jun 17, 2016 at 3:32 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> Another good signal is the "target version" (which by convention is only
> set by committers).  When I set this for the upcoming version it means I
> think its important enough that I will prioritize reviewing a patch for it.
>
> On Fri, Jun 17, 2016 at 3:22 PM, Pedro Rodriguez <sk...@gmail.com>
> wrote:
>
>> What is the best way to determine what the library maintainers believe is
>> important work to be done?
>>
>> I have looked through the JIRA and its unclear what are priority items
>> one could do work on. I am guessing this is in part because things are a
>> little hectic with final work for 2.0, but it would be helpful to know what
>> to look for or if its better to ask library maintainers directly.
>>
>> Thanks,
>> Pedro Rodriguez
>>
>> On Fri, Jun 17, 2016 at 10:46 AM, Xinh Huynh <xi...@gmail.com>
>> wrote:
>>
>>> Here are some guidelines about contributing to Spark:
>>>
>>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>>>
>>> There is also a section specific to MLlib:
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-MLlib-specificContributionGuidelines
>>>
>>> -Xinh
>>>
>>> On Thu, Jun 16, 2016 at 9:30 AM, <my...@gmail.com> wrote:
>>>
>>>> Dear All,
>>>>
>>>>
>>>> Looking for guidance.
>>>>
>>>> I am Interested in contributing to the Spark MLlib. Could you please
>>>> take a few minutes to guide me as to what you would consider an ideal path
>>>> / skill an individual should posses.
>>>>
>>>> I know R / Python / Java / C and C++
>>>>
>>>> I have a firm understanding of algorithms and Machine learning. I do
>>>> know spark at a "workable knowledge level".
>>>>
>>>> Where should I start and what should I try to do first  ( spark
>>>> internal level ) and then pick up items on JIRA OR new specifications on
>>>> Spark.
>>>>
>>>> R has a great set of packages - would it be difficult to migrate them
>>>> to Spark R set. I could try it with your support or if it's desired.
>>>>
>>>>
>>>> I wouldn't mind doing testing of some defects etc as an initial
>>>> learning curve if that would assist the community.
>>>>
>>>> Please, guide.
>>>>
>>>> Regards,
>>>> Harmeet
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>
>>>>
>>>
>>
>>
>> --
>> Pedro Rodriguez
>> PhD Student in Distributed Machine Learning | CU Boulder
>> UC Berkeley AMPLab Alumni
>>
>> ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
>> Github: github.com/EntilZha | LinkedIn:
>> https://www.linkedin.com/in/pedrorodriguezscience
>>
>>
>

Re: Hello

Posted by Michael Armbrust <mi...@databricks.com>.

Another good signal is the "target version" (which by convention is only
set by committers).  When I set this for the upcoming version it means I
think its important enough that I will prioritize reviewing a patch for it.

On Fri, Jun 17, 2016 at 3:22 PM, Pedro Rodriguez <sk...@gmail.com>
wrote:

> What is the best way to determine what the library maintainers believe is
> important work to be done?
>
> I have looked through the JIRA and its unclear what are priority items one
> could do work on. I am guessing this is in part because things are a little
> hectic with final work for 2.0, but it would be helpful to know what to
> look for or if its better to ask library maintainers directly.
>
> Thanks,
> Pedro Rodriguez
>
> On Fri, Jun 17, 2016 at 10:46 AM, Xinh Huynh <xi...@gmail.com> wrote:
>
>> Here are some guidelines about contributing to Spark:
>>
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>>
>> There is also a section specific to MLlib:
>>
>>
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-MLlib-specificContributionGuidelines
>>
>> -Xinh
>>
>> On Thu, Jun 16, 2016 at 9:30 AM, <my...@gmail.com> wrote:
>>
>>> Dear All,
>>>
>>>
>>> Looking for guidance.
>>>
>>> I am Interested in contributing to the Spark MLlib. Could you please
>>> take a few minutes to guide me as to what you would consider an ideal path
>>> / skill an individual should posses.
>>>
>>> I know R / Python / Java / C and C++
>>>
>>> I have a firm understanding of algorithms and Machine learning. I do
>>> know spark at a "workable knowledge level".
>>>
>>> Where should I start and what should I try to do first  ( spark internal
>>> level ) and then pick up items on JIRA OR new specifications on Spark.
>>>
>>> R has a great set of packages - would it be difficult to migrate them to
>>> Spark R set. I could try it with your support or if it's desired.
>>>
>>>
>>> I wouldn't mind doing testing of some defects etc as an initial learning
>>> curve if that would assist the community.
>>>
>>> Please, guide.
>>>
>>> Regards,
>>> Harmeet
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>
>>
>
>
> --
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
>
> ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
> Github: github.com/EntilZha | LinkedIn:
> https://www.linkedin.com/in/pedrorodriguezscience
>
>

Re: Hello

Posted by Pedro Rodriguez <sk...@gmail.com>.

What is the best way to determine what the library maintainers believe is
important work to be done?

I have looked through the JIRA and its unclear what are priority items one
could do work on. I am guessing this is in part because things are a little
hectic with final work for 2.0, but it would be helpful to know what to
look for or if its better to ask library maintainers directly.

Thanks,
Pedro Rodriguez

On Fri, Jun 17, 2016 at 10:46 AM, Xinh Huynh <xi...@gmail.com> wrote:

> Here are some guidelines about contributing to Spark:
>
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>
> There is also a section specific to MLlib:
>
>
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-MLlib-specificContributionGuidelines
>
> -Xinh
>
> On Thu, Jun 16, 2016 at 9:30 AM, <my...@gmail.com> wrote:
>
>> Dear All,
>>
>>
>> Looking for guidance.
>>
>> I am Interested in contributing to the Spark MLlib. Could you please take
>> a few minutes to guide me as to what you would consider an ideal path /
>> skill an individual should posses.
>>
>> I know R / Python / Java / C and C++
>>
>> I have a firm understanding of algorithms and Machine learning. I do know
>> spark at a "workable knowledge level".
>>
>> Where should I start and what should I try to do first  ( spark internal
>> level ) and then pick up items on JIRA OR new specifications on Spark.
>>
>> R has a great set of packages - would it be difficult to migrate them to
>> Spark R set. I could try it with your support or if it's desired.
>>
>>
>> I wouldn't mind doing testing of some defects etc as an initial learning
>> curve if that would assist the community.
>>
>> Please, guide.
>>
>> Regards,
>> Harmeet
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>


-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience