You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by dustin vanstee <du...@gmail.com> on 2017/03/15 01:12:11 UTC

looking to contribute to the project

Hi I have been looking into mahout and think it has some very nice
ML/Linear alg capabilities.  I would like to contribute to the project, and
I was hoping someone on the mailing list might be able to give me a few
ideas about where I could start.  Thanks!

Re: looking to contribute to the project

Posted by Trevor Grant <tr...@gmail.com>.
Hey Manuel,

(I think I accidentally dropped dev@m.a.o when replying to I've added them
back.)
Let me open a WIP PR and then we can discuss on there.

In general though, the current form will create a docker image with Hadoop
and/or Spark, and mounts the project directory in the docker image at
/opt/mahout (which is also Mahout Home)
Also a script is run upon start up that runs a few of the examples/CLI
drivers.

We want:
- A script which runs through an exhaustive list of tests (cli
drivers/examples/etc)
- A way to tell weather those tests passed or failed (checking the output?)
- A way to fail the build if if the examples/etc fail. (no idea how this
works, I've always tried to make build successful, never tried to fail one).




Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Sat, Mar 18, 2017 at 6:55 PM, Manuel Sequino <ma...@gmail.com>
wrote:

> Hi Trevor,
> let's start with this task.
> I did some experiments with maven and docker but I am not still
> comfortable.
>
> Now it looks like clear, if I have some doubt, I'll get back to you.
>
> Just a problem, I don't know how and what to write a jira, may you direct
> me?
>
> Best regards,
> Manuel
>
> 2017-03-16 15:48 GMT+01:00 Trevor Grant <tr...@gmail.com>:
>
>> Hey Manuel,
>>
>> Awesome!!  I don't think I even started a JIRA yet.  I was literally just
>> toying- I saw some cool stuff when building Apache Streams-Incubating, and
>> copied it.  Having maven kick off docker images is a strange thing.
>>
>> https://github.com/rawkintrevo/mahout/tree/docker-based-its/dockerITs
>>
>> At this point I 1) Recognize it is a thing we should do to streamline our
>> testing 2) don't know enough to intelligently write a JIRA.
>> The idea is, there should be a maven phase where we fire up pseudo spark
>> and hadoop clusters, and then run all of the exambles, cli drivers, and
>> shell tests.  And fail loudly should any of those tests fail.
>>
>> As I was telling Saikat, also kind of busy with 100 other things.  If you
>> want to take point on this, feel free to write a jira- copy or fork what
>> I've done so far and go.
>>
>> Again, also check out Apache Streams-incubating since I am admittedly
>> copying them.
>>
>> tg
>>
>>
>> Trevor Grant
>> Data Scientist
>> https://github.com/rawkintrevo
>> http://stackexchange.com/users/3002022/rawkintrevo
>> http://trevorgrant.org
>>
>> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>>
>>
>> On Wed, Mar 15, 2017 at 10:42 AM, Manuel Sequino <ma...@gmail.com>
>> wrote:
>>
>>> Hi Trevor,
>>> I'd like to contribute on Mahout specially working on something
>>> inherently docker, I am pretty new but I think I could give you help.
>>>
>>> What about this bullet?
>>>
>>> "I have been toying with some docker based integration tests if you
>>> happen
>>> to be familiar with Dockers and using them for maven IT (or want to
>>> learn)"
>>>
>>> Where can I get more info? Jira doesn't contain the "docker" keyword.
>>>
>>> Best regards,
>>>
>>> ---------------------------------------
>>> Manuel Sequino
>>>
>>> Email: mansequino@gmail.com
>>> Skype: manuel.sequino
>>> +39 320 4869904 <+39%20320%20486%209904>
>>>
>>> Linkedin page <https://it.linkedin.com/pub/manuel-sequino/96/261/494>
>>> --------------------------------------
>>>
>>> 2017-03-15 8:24 GMT+01:00 Trevor Grant <tr...@gmail.com>:
>>>
>>>> Hey Dustin!
>>>>
>>>> Welcome to the community.
>>>>
>>>> At the moment, we are in the middle of a release.  The most immediate
>>>> thing
>>>> you could help with would be to help us test the release candidate.  See
>>>> Andrew's email.
>>>>
>>>> Moving forward though, there are lots of opportunities-
>>>> Some things that have been kicked around on here over the last few
>>>> months
>>>> include:
>>>> - Migrating website to a git based so that non committers can edit and
>>>> contribute to the docs.
>>>> - Expanding the algorithms section (are there any algorithms you are
>>>> familiar with? Implementing in Mahout would be a good start)
>>>> - I have been toying with some docker based integration tests if you
>>>> happen
>>>> to be familiar with Dockers and using them for maven IT (or want to
>>>> learn)
>>>> - Beginner issues- at the moment there aren't many on the JIRA board bc
>>>> we
>>>> fixed most in preparation for the release.
>>>>
>>>> Testing the release would be a good start point however, because it will
>>>> get you familiar with building Mahout ( a necessary first step).
>>>>
>>>> Items 1 and 3 are a bit advanced for someone just starting out- so
>>>> unless
>>>> you have some specific familiarity- I would direct you toward number 2.
>>>>
>>>> In that case- check out:
>>>> https://github.com/apache/mahout/tree/master/math-scala/src/
>>>> main/scala/org/apache/mahout/math/algorithms
>>>>
>>>> There is the algorithm framework- look through it.  If there is an
>>>> algorithm you have in mind (try to start with an easy one), let us know
>>>> and
>>>> open a JIRA ticket!
>>>>
>>>> Best,
>>>>
>>>> tg
>>>>
>>>> Trevor Grant
>>>> Data Scientist
>>>> https://github.com/rawkintrevo
>>>> http://stackexchange.com/users/3002022/rawkintrevo
>>>> http://trevorgrant.org
>>>>
>>>> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>>>>
>>>>
>>>> On Tue, Mar 14, 2017 at 6:12 PM, dustin vanstee <
>>>> dustinvanstee@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi I have been looking into mahout and think it has some very nice
>>>> > ML/Linear alg capabilities.  I would like to contribute to the
>>>> project, and
>>>> > I was hoping someone on the mailing list might be able to give me a
>>>> few
>>>> > ideas about where I could start.  Thanks!
>>>> >
>>>>
>>>
>>>
>>
>

Re: looking to contribute to the project

Posted by Manuel Sequino <ma...@gmail.com>.
Hi Trevor,
I'd like to contribute on Mahout specially working on something inherently
docker, I am pretty new but I think I could give you help.

What about this bullet?

"I have been toying with some docker based integration tests if you happen
to be familiar with Dockers and using them for maven IT (or want to learn)"

Where can I get more info? Jira doesn't contain the "docker" keyword.

Best regards,

---------------------------------------
Manuel Sequino

Email: mansequino@gmail.com
Skype: manuel.sequino
+39 320 4869904

Linkedin page <https://it.linkedin.com/pub/manuel-sequino/96/261/494>
--------------------------------------

2017-03-15 8:24 GMT+01:00 Trevor Grant <tr...@gmail.com>:

> Hey Dustin!
>
> Welcome to the community.
>
> At the moment, we are in the middle of a release.  The most immediate thing
> you could help with would be to help us test the release candidate.  See
> Andrew's email.
>
> Moving forward though, there are lots of opportunities-
> Some things that have been kicked around on here over the last few months
> include:
> - Migrating website to a git based so that non committers can edit and
> contribute to the docs.
> - Expanding the algorithms section (are there any algorithms you are
> familiar with? Implementing in Mahout would be a good start)
> - I have been toying with some docker based integration tests if you happen
> to be familiar with Dockers and using them for maven IT (or want to learn)
> - Beginner issues- at the moment there aren't many on the JIRA board bc we
> fixed most in preparation for the release.
>
> Testing the release would be a good start point however, because it will
> get you familiar with building Mahout ( a necessary first step).
>
> Items 1 and 3 are a bit advanced for someone just starting out- so unless
> you have some specific familiarity- I would direct you toward number 2.
>
> In that case- check out:
> https://github.com/apache/mahout/tree/master/math-scala/
> src/main/scala/org/apache/mahout/math/algorithms
>
> There is the algorithm framework- look through it.  If there is an
> algorithm you have in mind (try to start with an easy one), let us know and
> open a JIRA ticket!
>
> Best,
>
> tg
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Tue, Mar 14, 2017 at 6:12 PM, dustin vanstee <du...@gmail.com>
> wrote:
>
> > Hi I have been looking into mahout and think it has some very nice
> > ML/Linear alg capabilities.  I would like to contribute to the project,
> and
> > I was hoping someone on the mailing list might be able to give me a few
> > ideas about where I could start.  Thanks!
> >
>

Re: looking to contribute to the project

Posted by Trevor Grant <tr...@gmail.com>.
Hey Dustin!

Welcome to the community.

At the moment, we are in the middle of a release.  The most immediate thing
you could help with would be to help us test the release candidate.  See
Andrew's email.

Moving forward though, there are lots of opportunities-
Some things that have been kicked around on here over the last few months
include:
- Migrating website to a git based so that non committers can edit and
contribute to the docs.
- Expanding the algorithms section (are there any algorithms you are
familiar with? Implementing in Mahout would be a good start)
- I have been toying with some docker based integration tests if you happen
to be familiar with Dockers and using them for maven IT (or want to learn)
- Beginner issues- at the moment there aren't many on the JIRA board bc we
fixed most in preparation for the release.

Testing the release would be a good start point however, because it will
get you familiar with building Mahout ( a necessary first step).

Items 1 and 3 are a bit advanced for someone just starting out- so unless
you have some specific familiarity- I would direct you toward number 2.

In that case- check out:
https://github.com/apache/mahout/tree/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms

There is the algorithm framework- look through it.  If there is an
algorithm you have in mind (try to start with an easy one), let us know and
open a JIRA ticket!

Best,

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, Mar 14, 2017 at 6:12 PM, dustin vanstee <du...@gmail.com>
wrote:

> Hi I have been looking into mahout and think it has some very nice
> ML/Linear alg capabilities.  I would like to contribute to the project, and
> I was hoping someone on the mailing list might be able to give me a few
> ideas about where I could start.  Thanks!
>