You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "shashi bushan dongur (JIRA)" <ji...@apache.org> on 2016/03/30 18:05:25 UTC

[jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

    [ https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216 ] 

shashi bushan dongur commented on MAHOUT-1788:
----------------------------------------------

Hello. I would like to start contributing to mahout. Can I work on this issue? 

> spark-itemsimilarity integration test script cleanup
> ----------------------------------------------------
>
>                 Key: MAHOUT-1788
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1788
>             Project: Mahout
>          Issue Type: Improvement
>          Components: cooccurrence
>    Affects Versions: 0.11.0
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>            Priority: Trivial
>             Fix For: 1.0.0
>
>
> binary release does not contain data for itemsimilarity tests, neith binary nor source versions will run on a cluster unless data is hand copied to hdfs.
> Clean this up so it copies data if needed and the data is in both versions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Suneel Marthi <sm...@apache.org>.
On Tue, Apr 19, 2016 at 11:08 AM, Khurrum Nasim <kh...@useitc.com>
wrote:

> Thank you Dimitry.
>
> So is there an architectural blueprint for mahout ?   What I mean is how
> can get the 1000 feet overview ? Or the bird eye view of the project.
> I do see Mahout is very modularized - however I’m still trying to make
> heads and tails out it :)
>
> @Dimitry -
> "my investigation points that  there are architectural problems in spark
> that
> are hard to overcome at this point for high IO algorithms.”  - Can you
> share some more details about this - I’m just curious.
>

Long story short - "Distributed != Scalable"

>
>
> > On Apr 18, 2016, at 8:18 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> >
> > Khurrum,
> >
> > mahout is so much  a library at this point.
> >
> > if you mean if it can be used to build networks with 2d inputs, yes i did
> > some of that. multi-epoch SGD based systems should be easy enough to
> build,
> > and will probably have a reasonable performance -- although I think
> > dedicated CNN systems like Caffe would still run faster at this point.
> Full
> > batch trainers are somewhat slow for larger problems though, my
> > investigation points that  there are architectural problems in spark that
> > are hard to overcome at this point for high IO algorithms.
> >
> > On Mon, Apr 18, 2016 at 11:49 AM, Khurrum Nasim <
> khurrum.nasim@useitc.com>
> > wrote:
> >
> >> Hi Guys,
> >>
> >> Can Mahout be used for things like face detection ?    Also which unit
> >> tests or integration tests do you recommend I should run just to get a
> >> better feel of the execution flow.
> >>
> >> I’m still slowly acclimating to the project.  But hopefully should come
> up
> >> to speed soon.
> >>
> >>
> >> Many Thanks,
> >>
> >> Khurrum
> >>
> >>
> >>
> >>
> >>> On Mar 30, 2016, at 3:10 PM, Suneel Marthi <sm...@apache.org> wrote:
> >>>
> >>> Thanks Khurrum for stepping up.
> >>>
> >>> You just need basic programming skills - Java/Scala to be able to
> >>> contribute. We can help you with the algorithms and linear algebra
> stuff.
> >>>
> >>>
> >>> Welcome aboard !!
> >>>
> >>>
> >>> On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <
> khurrum.nasim@useitc.com
> >>>
> >>> wrote:
> >>>
> >>>> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.
> My
> >>>> handle is “nasimk”
> >>>>
> >>>> Do I need to be a linear algebra expert and or math phd  to
> contribute ?
> >>>> I have 10 plus years of computer programming experience.  my
> background
> >> is
> >>>> comp sci.
> >>>>
> >>>> Khurrum
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <dl...@gmail.com>
> >> wrote:
> >>>>>
> >>>>> PS You may also want to sign up with ASF Jira so we can assign issues
> >> to
> >>>>> yourself.
> >>>>>
> >>>>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
> >>>> khurrum.nasim@useitc.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks Dimirtry.
> >>>>>>>
> >>>>>>> I take a look at see where I can start pitching in.  Do I need
> >>>>>>> contributor access ? how  would I create feature branch of my work
> ?
> >>>>>>>
> >>>>>>
> >>>>>> Khurrum,
> >>>>>>
> >>>>>> you only need github account. What you need is to create mahout's
> >> master
> >>>>>> fork in your github space and keep it in sync, as possible, with
> >> master
> >>>> as
> >>>>>> you go (by doing regular pulls). That way you have the most chance
> of
> >>>>>> having least conflicts possible.
> >>>>>>
> >>>>>> At any point in time (I recommend at perhaps when you feel you are
> >> about
> >>>>>> 50 to 70% done or just need a code advice), you can create a github
> >> pull
> >>>>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX
> >>>> issue
> >>>>>> in the head of the pull request, that way ASF will automatically
> >>>> propagate
> >>>>>> code comments to jira, and so all discussion can be done entirely on
> >>>> github.
> >>>>>>
> >>>>>> Again, if you take on a signficant contribution (such as a new
> >> numerical
> >>>>>> method contribution), I recommend to discuss the proposal on the
> @dev
> >>>> list
> >>>>>>
> >>>>>> thanks.
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Khurrum
> >>>>>>>
> >>>>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Oh but of course! please do!
> >>>>>>>>
> >>>>>>>> You may work on any issue, this or any other of your choice, or
> even
> >>>> on
> >>>>>>> any
> >>>>>>>> new issue you can think of (for sizeable contributions it is
> >>>>>>> recommended to
> >>>>>>>> start discussion on the @dev list first though, to make sure to
> >>>> benefit
> >>>>>>>> from experience of others. Please file any new issue first to
> jira).
> >>>>>>>>
> >>>>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
> >>>>>>>> jira@apache.org> wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> [
> >>>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
> >>>>>>>>> ]
> >>>>>>>>>
> >>>>>>>>> shashi bushan dongur commented on MAHOUT-1788:
> >>>>>>>>> ----------------------------------------------
> >>>>>>>>>
> >>>>>>>>> Hello. I would like to start contributing to mahout. Can I work
> on
> >>>> this
> >>>>>>>>> issue?
> >>>>>>>>>
> >>>>>>>>>> spark-itemsimilarity integration test script cleanup
> >>>>>>>>>> ----------------------------------------------------
> >>>>>>>>>>
> >>>>>>>>>>             Key: MAHOUT-1788
> >>>>>>>>>>             URL:
> >>>> https://issues.apache.org/jira/browse/MAHOUT-1788
> >>>>>>>>>>         Project: Mahout
> >>>>>>>>>>      Issue Type: Improvement
> >>>>>>>>>>      Components: cooccurrence
> >>>>>>>>>> Affects Versions: 0.11.0
> >>>>>>>>>>        Reporter: Pat Ferrel
> >>>>>>>>>>        Assignee: Pat Ferrel
> >>>>>>>>>>        Priority: Trivial
> >>>>>>>>>>         Fix For: 1.0.0
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> binary release does not contain data for itemsimilarity tests,
> >> neith
> >>>>>>>>> binary nor source versions will run on a cluster unless data is
> >> hand
> >>>>>>> copied
> >>>>>>>>> to hdfs.
> >>>>>>>>>> Clean this up so it copies data if needed and the data is in
> both
> >>>>>>>>> versions.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> This message was sent by Atlassian JIRA
> >>>>>>>>> (v6.3.4#6332)
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Khurrum Nasim <kh...@useitc.com>.
Thank you Dimitry.  

So is there an architectural blueprint for mahout ?   What I mean is how can get the 1000 feet overview ? Or the bird eye view of the project.  
I do see Mahout is very modularized - however I’m still trying to make heads and tails out it :)

@Dimitry - 
"my investigation points that  there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.”  - Can you share some more details about this - I’m just curious.  


> On Apr 18, 2016, at 8:18 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
> Khurrum,
> 
> mahout is so much  a library at this point.
> 
> if you mean if it can be used to build networks with 2d inputs, yes i did
> some of that. multi-epoch SGD based systems should be easy enough to build,
> and will probably have a reasonable performance -- although I think
> dedicated CNN systems like Caffe would still run faster at this point. Full
> batch trainers are somewhat slow for larger problems though, my
> investigation points that  there are architectural problems in spark that
> are hard to overcome at this point for high IO algorithms.
> 
> On Mon, Apr 18, 2016 at 11:49 AM, Khurrum Nasim <kh...@useitc.com>
> wrote:
> 
>> Hi Guys,
>> 
>> Can Mahout be used for things like face detection ?    Also which unit
>> tests or integration tests do you recommend I should run just to get a
>> better feel of the execution flow.
>> 
>> I’m still slowly acclimating to the project.  But hopefully should come up
>> to speed soon.
>> 
>> 
>> Many Thanks,
>> 
>> Khurrum
>> 
>> 
>> 
>> 
>>> On Mar 30, 2016, at 3:10 PM, Suneel Marthi <sm...@apache.org> wrote:
>>> 
>>> Thanks Khurrum for stepping up.
>>> 
>>> You just need basic programming skills - Java/Scala to be able to
>>> contribute. We can help you with the algorithms and linear algebra stuff.
>>> 
>>> 
>>> Welcome aboard !!
>>> 
>>> 
>>> On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <khurrum.nasim@useitc.com
>>> 
>>> wrote:
>>> 
>>>> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.    My
>>>> handle is “nasimk”
>>>> 
>>>> Do I need to be a linear algebra expert and or math phd  to contribute ?
>>>> I have 10 plus years of computer programming experience.  my background
>> is
>>>> comp sci.
>>>> 
>>>> Khurrum
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>>>>> 
>>>>> PS You may also want to sign up with ASF Jira so we can assign issues
>> to
>>>>> yourself.
>>>>> 
>>>>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <dl...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
>>>> khurrum.nasim@useitc.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks Dimirtry.
>>>>>>> 
>>>>>>> I take a look at see where I can start pitching in.  Do I need
>>>>>>> contributor access ? how  would I create feature branch of my work ?
>>>>>>> 
>>>>>> 
>>>>>> Khurrum,
>>>>>> 
>>>>>> you only need github account. What you need is to create mahout's
>> master
>>>>>> fork in your github space and keep it in sync, as possible, with
>> master
>>>> as
>>>>>> you go (by doing regular pulls). That way you have the most chance of
>>>>>> having least conflicts possible.
>>>>>> 
>>>>>> At any point in time (I recommend at perhaps when you feel you are
>> about
>>>>>> 50 to 70% done or just need a code advice), you can create a github
>> pull
>>>>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX
>>>> issue
>>>>>> in the head of the pull request, that way ASF will automatically
>>>> propagate
>>>>>> code comments to jira, and so all discussion can be done entirely on
>>>> github.
>>>>>> 
>>>>>> Again, if you take on a signficant contribution (such as a new
>> numerical
>>>>>> method contribution), I recommend to discuss the proposal on the @dev
>>>> list
>>>>>> 
>>>>>> thanks.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Khurrum
>>>>>>> 
>>>>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Oh but of course! please do!
>>>>>>>> 
>>>>>>>> You may work on any issue, this or any other of your choice, or even
>>>> on
>>>>>>> any
>>>>>>>> new issue you can think of (for sizeable contributions it is
>>>>>>> recommended to
>>>>>>>> start discussion on the @dev list first though, to make sure to
>>>> benefit
>>>>>>>> from experience of others. Please file any new issue first to jira).
>>>>>>>> 
>>>>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
>>>>>>>> jira@apache.org> wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> [
>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
>>>>>>>>> ]
>>>>>>>>> 
>>>>>>>>> shashi bushan dongur commented on MAHOUT-1788:
>>>>>>>>> ----------------------------------------------
>>>>>>>>> 
>>>>>>>>> Hello. I would like to start contributing to mahout. Can I work on
>>>> this
>>>>>>>>> issue?
>>>>>>>>> 
>>>>>>>>>> spark-itemsimilarity integration test script cleanup
>>>>>>>>>> ----------------------------------------------------
>>>>>>>>>> 
>>>>>>>>>>             Key: MAHOUT-1788
>>>>>>>>>>             URL:
>>>> https://issues.apache.org/jira/browse/MAHOUT-1788
>>>>>>>>>>         Project: Mahout
>>>>>>>>>>      Issue Type: Improvement
>>>>>>>>>>      Components: cooccurrence
>>>>>>>>>> Affects Versions: 0.11.0
>>>>>>>>>>        Reporter: Pat Ferrel
>>>>>>>>>>        Assignee: Pat Ferrel
>>>>>>>>>>        Priority: Trivial
>>>>>>>>>>         Fix For: 1.0.0
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> binary release does not contain data for itemsimilarity tests,
>> neith
>>>>>>>>> binary nor source versions will run on a cluster unless data is
>> hand
>>>>>>> copied
>>>>>>>>> to hdfs.
>>>>>>>>>> Clean this up so it copies data if needed and the data is in both
>>>>>>>>> versions.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>> (v6.3.4#6332)
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Khurrum Nasim <kh...@useitc.com>.
okay thanks - i’ll run those tests. i actually ran a few others as well like the MatrixWritableTest.  

> On Apr 18, 2016, at 8:22 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
> I am not sure of your question about tests...
> 
> there are in-memory tests which you can by 'mvn test' in /math-scala
> module; distributed tests are done per engine under 'spark', 'h2o' or
> 'flink' modules.
> 
> 
> On Mon, Apr 18, 2016 at 5:19 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
>> i meant "not so much a library"
>> 
>> On Mon, Apr 18, 2016 at 5:18 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>> 
>>> Khurrum,
>>> 
>>> mahout is so much  a library at this point.
>>> 
>>> if you mean if it can be used to build networks with 2d inputs, yes i did
>>> some of that. multi-epoch SGD based systems should be easy enough to build,
>>> and will probably have a reasonable performance -- although I think
>>> dedicated CNN systems like Caffe would still run faster at this point. Full
>>> batch trainers are somewhat slow for larger problems though, my
>>> investigation points that  there are architectural problems in spark that
>>> are hard to overcome at this point for high IO algorithms.
>>> 
>>> On Mon, Apr 18, 2016 at 11:49 AM, Khurrum Nasim <khurrum.nasim@useitc.com
>>>> wrote:
>>> 
>>>> Hi Guys,
>>>> 
>>>> Can Mahout be used for things like face detection ?    Also which unit
>>>> tests or integration tests do you recommend I should run just to get a
>>>> better feel of the execution flow.
>>>> 
>>>> I’m still slowly acclimating to the project.  But hopefully should come
>>>> up to speed soon.
>>>> 
>>>> 
>>>> Many Thanks,
>>>> 
>>>> Khurrum
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Mar 30, 2016, at 3:10 PM, Suneel Marthi <sm...@apache.org> wrote:
>>>>> 
>>>>> Thanks Khurrum for stepping up.
>>>>> 
>>>>> You just need basic programming skills - Java/Scala to be able to
>>>>> contribute. We can help you with the algorithms and linear algebra
>>>> stuff.
>>>>> 
>>>>> 
>>>>> Welcome aboard !!
>>>>> 
>>>>> 
>>>>> On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <
>>>> khurrum.nasim@useitc.com>
>>>>> wrote:
>>>>> 
>>>>>> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.
>>>> My
>>>>>> handle is “nasimk”
>>>>>> 
>>>>>> Do I need to be a linear algebra expert and or math phd  to
>>>> contribute ?
>>>>>> I have 10 plus years of computer programming experience.  my
>>>> background is
>>>>>> comp sci.
>>>>>> 
>>>>>> Khurrum
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <dl...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>> PS You may also want to sign up with ASF Jira so we can assign
>>>> issues to
>>>>>>> yourself.
>>>>>>> 
>>>>>>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <
>>>> dlieu.7@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
>>>>>> khurrum.nasim@useitc.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks Dimirtry.
>>>>>>>>> 
>>>>>>>>> I take a look at see where I can start pitching in.  Do I need
>>>>>>>>> contributor access ? how  would I create feature branch of my work
>>>> ?
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Khurrum,
>>>>>>>> 
>>>>>>>> you only need github account. What you need is to create mahout's
>>>> master
>>>>>>>> fork in your github space and keep it in sync, as possible, with
>>>> master
>>>>>> as
>>>>>>>> you go (by doing regular pulls). That way you have the most chance
>>>> of
>>>>>>>> having least conflicts possible.
>>>>>>>> 
>>>>>>>> At any point in time (I recommend at perhaps when you feel you are
>>>> about
>>>>>>>> 50 to 70% done or just need a code advice), you can create a github
>>>> pull
>>>>>>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX
>>>>>> issue
>>>>>>>> in the head of the pull request, that way ASF will automatically
>>>>>> propagate
>>>>>>>> code comments to jira, and so all discussion can be done entirely on
>>>>>> github.
>>>>>>>> 
>>>>>>>> Again, if you take on a signficant contribution (such as a new
>>>> numerical
>>>>>>>> method contribution), I recommend to discuss the proposal on the
>>>> @dev
>>>>>> list
>>>>>>>> 
>>>>>>>> thanks.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Khurrum
>>>>>>>>> 
>>>>>>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Oh but of course! please do!
>>>>>>>>>> 
>>>>>>>>>> You may work on any issue, this or any other of your choice, or
>>>> even
>>>>>> on
>>>>>>>>> any
>>>>>>>>>> new issue you can think of (for sizeable contributions it is
>>>>>>>>> recommended to
>>>>>>>>>> start discussion on the @dev list first though, to make sure to
>>>>>> benefit
>>>>>>>>>> from experience of others. Please file any new issue first to
>>>> jira).
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
>>>>>>>>>> jira@apache.org> wrote:
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> [
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
>>>>>>>>>>> ]
>>>>>>>>>>> 
>>>>>>>>>>> shashi bushan dongur commented on MAHOUT-1788:
>>>>>>>>>>> ----------------------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> Hello. I would like to start contributing to mahout. Can I work
>>>> on
>>>>>> this
>>>>>>>>>>> issue?
>>>>>>>>>>> 
>>>>>>>>>>>> spark-itemsimilarity integration test script cleanup
>>>>>>>>>>>> ----------------------------------------------------
>>>>>>>>>>>> 
>>>>>>>>>>>>             Key: MAHOUT-1788
>>>>>>>>>>>>             URL:
>>>>>> https://issues.apache.org/jira/browse/MAHOUT-1788
>>>>>>>>>>>>         Project: Mahout
>>>>>>>>>>>>      Issue Type: Improvement
>>>>>>>>>>>>      Components: cooccurrence
>>>>>>>>>>>> Affects Versions: 0.11.0
>>>>>>>>>>>>        Reporter: Pat Ferrel
>>>>>>>>>>>>        Assignee: Pat Ferrel
>>>>>>>>>>>>        Priority: Trivial
>>>>>>>>>>>>         Fix For: 1.0.0
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> binary release does not contain data for itemsimilarity tests,
>>>> neith
>>>>>>>>>>> binary nor source versions will run on a cluster unless data is
>>>> hand
>>>>>>>>> copied
>>>>>>>>>>> to hdfs.
>>>>>>>>>>>> Clean this up so it copies data if needed and the data is in
>>>> both
>>>>>>>>>>> versions.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>> (v6.3.4#6332)
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 


Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
I am not sure of your question about tests...

there are in-memory tests which you can by 'mvn test' in /math-scala
module; distributed tests are done per engine under 'spark', 'h2o' or
'flink' modules.


On Mon, Apr 18, 2016 at 5:19 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> i meant "not so much a library"
>
> On Mon, Apr 18, 2016 at 5:18 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
>> Khurrum,
>>
>> mahout is so much  a library at this point.
>>
>> if you mean if it can be used to build networks with 2d inputs, yes i did
>> some of that. multi-epoch SGD based systems should be easy enough to build,
>> and will probably have a reasonable performance -- although I think
>> dedicated CNN systems like Caffe would still run faster at this point. Full
>> batch trainers are somewhat slow for larger problems though, my
>> investigation points that  there are architectural problems in spark that
>> are hard to overcome at this point for high IO algorithms.
>>
>> On Mon, Apr 18, 2016 at 11:49 AM, Khurrum Nasim <khurrum.nasim@useitc.com
>> > wrote:
>>
>>> Hi Guys,
>>>
>>> Can Mahout be used for things like face detection ?    Also which unit
>>> tests or integration tests do you recommend I should run just to get a
>>> better feel of the execution flow.
>>>
>>> I’m still slowly acclimating to the project.  But hopefully should come
>>> up to speed soon.
>>>
>>>
>>> Many Thanks,
>>>
>>> Khurrum
>>>
>>>
>>>
>>>
>>> > On Mar 30, 2016, at 3:10 PM, Suneel Marthi <sm...@apache.org> wrote:
>>> >
>>> > Thanks Khurrum for stepping up.
>>> >
>>> > You just need basic programming skills - Java/Scala to be able to
>>> > contribute. We can help you with the algorithms and linear algebra
>>> stuff.
>>> >
>>> >
>>> > Welcome aboard !!
>>> >
>>> >
>>> > On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <
>>> khurrum.nasim@useitc.com>
>>> > wrote:
>>> >
>>> >> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.
>>> My
>>> >> handle is “nasimk”
>>> >>
>>> >> Do I need to be a linear algebra expert and or math phd  to
>>> contribute ?
>>> >> I have 10 plus years of computer programming experience.  my
>>> background is
>>> >> comp sci.
>>> >>
>>> >> Khurrum
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <dl...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> PS You may also want to sign up with ASF Jira so we can assign
>>> issues to
>>> >>> yourself.
>>> >>>
>>> >>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <
>>> dlieu.7@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
>>> >> khurrum.nasim@useitc.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> Thanks Dimirtry.
>>> >>>>>
>>> >>>>> I take a look at see where I can start pitching in.  Do I need
>>> >>>>> contributor access ? how  would I create feature branch of my work
>>> ?
>>> >>>>>
>>> >>>>
>>> >>>> Khurrum,
>>> >>>>
>>> >>>> you only need github account. What you need is to create mahout's
>>> master
>>> >>>> fork in your github space and keep it in sync, as possible, with
>>> master
>>> >> as
>>> >>>> you go (by doing regular pulls). That way you have the most chance
>>> of
>>> >>>> having least conflicts possible.
>>> >>>>
>>> >>>> At any point in time (I recommend at perhaps when you feel you are
>>> about
>>> >>>> 50 to 70% done or just need a code advice), you can create a github
>>> pull
>>> >>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX
>>> >> issue
>>> >>>> in the head of the pull request, that way ASF will automatically
>>> >> propagate
>>> >>>> code comments to jira, and so all discussion can be done entirely on
>>> >> github.
>>> >>>>
>>> >>>> Again, if you take on a signficant contribution (such as a new
>>> numerical
>>> >>>> method contribution), I recommend to discuss the proposal on the
>>> @dev
>>> >> list
>>> >>>>
>>> >>>> thanks.
>>> >>>>
>>> >>>>
>>> >>>>>
>>> >>>>> Khurrum
>>> >>>>>
>>> >>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> Oh but of course! please do!
>>> >>>>>>
>>> >>>>>> You may work on any issue, this or any other of your choice, or
>>> even
>>> >> on
>>> >>>>> any
>>> >>>>>> new issue you can think of (for sizeable contributions it is
>>> >>>>> recommended to
>>> >>>>>> start discussion on the @dev list first though, to make sure to
>>> >> benefit
>>> >>>>>> from experience of others. Please file any new issue first to
>>> jira).
>>> >>>>>>
>>> >>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
>>> >>>>>> jira@apache.org> wrote:
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>>  [
>>> >>>>>>>
>>> >>>>>
>>> >>
>>> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
>>> >>>>>>> ]
>>> >>>>>>>
>>> >>>>>>> shashi bushan dongur commented on MAHOUT-1788:
>>> >>>>>>> ----------------------------------------------
>>> >>>>>>>
>>> >>>>>>> Hello. I would like to start contributing to mahout. Can I work
>>> on
>>> >> this
>>> >>>>>>> issue?
>>> >>>>>>>
>>> >>>>>>>> spark-itemsimilarity integration test script cleanup
>>> >>>>>>>> ----------------------------------------------------
>>> >>>>>>>>
>>> >>>>>>>>              Key: MAHOUT-1788
>>> >>>>>>>>              URL:
>>> >> https://issues.apache.org/jira/browse/MAHOUT-1788
>>> >>>>>>>>          Project: Mahout
>>> >>>>>>>>       Issue Type: Improvement
>>> >>>>>>>>       Components: cooccurrence
>>> >>>>>>>> Affects Versions: 0.11.0
>>> >>>>>>>>         Reporter: Pat Ferrel
>>> >>>>>>>>         Assignee: Pat Ferrel
>>> >>>>>>>>         Priority: Trivial
>>> >>>>>>>>          Fix For: 1.0.0
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> binary release does not contain data for itemsimilarity tests,
>>> neith
>>> >>>>>>> binary nor source versions will run on a cluster unless data is
>>> hand
>>> >>>>> copied
>>> >>>>>>> to hdfs.
>>> >>>>>>>> Clean this up so it copies data if needed and the data is in
>>> both
>>> >>>>>>> versions.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> This message was sent by Atlassian JIRA
>>> >>>>>>> (v6.3.4#6332)
>>> >>>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>>
>>>
>>
>

Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
i meant "not so much a library"

On Mon, Apr 18, 2016 at 5:18 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Khurrum,
>
> mahout is so much  a library at this point.
>
> if you mean if it can be used to build networks with 2d inputs, yes i did
> some of that. multi-epoch SGD based systems should be easy enough to build,
> and will probably have a reasonable performance -- although I think
> dedicated CNN systems like Caffe would still run faster at this point. Full
> batch trainers are somewhat slow for larger problems though, my
> investigation points that  there are architectural problems in spark that
> are hard to overcome at this point for high IO algorithms.
>
> On Mon, Apr 18, 2016 at 11:49 AM, Khurrum Nasim <kh...@useitc.com>
> wrote:
>
>> Hi Guys,
>>
>> Can Mahout be used for things like face detection ?    Also which unit
>> tests or integration tests do you recommend I should run just to get a
>> better feel of the execution flow.
>>
>> I’m still slowly acclimating to the project.  But hopefully should come
>> up to speed soon.
>>
>>
>> Many Thanks,
>>
>> Khurrum
>>
>>
>>
>>
>> > On Mar 30, 2016, at 3:10 PM, Suneel Marthi <sm...@apache.org> wrote:
>> >
>> > Thanks Khurrum for stepping up.
>> >
>> > You just need basic programming skills - Java/Scala to be able to
>> > contribute. We can help you with the algorithms and linear algebra
>> stuff.
>> >
>> >
>> > Welcome aboard !!
>> >
>> >
>> > On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <
>> khurrum.nasim@useitc.com>
>> > wrote:
>> >
>> >> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.
>> My
>> >> handle is “nasimk”
>> >>
>> >> Do I need to be a linear algebra expert and or math phd  to contribute
>> ?
>> >> I have 10 plus years of computer programming experience.  my
>> background is
>> >> comp sci.
>> >>
>> >> Khurrum
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>> >>>
>> >>> PS You may also want to sign up with ASF Jira so we can assign issues
>> to
>> >>> yourself.
>> >>>
>> >>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
>> >
>> >>> wrote:
>> >>>
>> >>>>
>> >>>>
>> >>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
>> >> khurrum.nasim@useitc.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Thanks Dimirtry.
>> >>>>>
>> >>>>> I take a look at see where I can start pitching in.  Do I need
>> >>>>> contributor access ? how  would I create feature branch of my work ?
>> >>>>>
>> >>>>
>> >>>> Khurrum,
>> >>>>
>> >>>> you only need github account. What you need is to create mahout's
>> master
>> >>>> fork in your github space and keep it in sync, as possible, with
>> master
>> >> as
>> >>>> you go (by doing regular pulls). That way you have the most chance of
>> >>>> having least conflicts possible.
>> >>>>
>> >>>> At any point in time (I recommend at perhaps when you feel you are
>> about
>> >>>> 50 to 70% done or just need a code advice), you can create a github
>> pull
>> >>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX
>> >> issue
>> >>>> in the head of the pull request, that way ASF will automatically
>> >> propagate
>> >>>> code comments to jira, and so all discussion can be done entirely on
>> >> github.
>> >>>>
>> >>>> Again, if you take on a signficant contribution (such as a new
>> numerical
>> >>>> method contribution), I recommend to discuss the proposal on the @dev
>> >> list
>> >>>>
>> >>>> thanks.
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> Khurrum
>> >>>>>
>> >>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Oh but of course! please do!
>> >>>>>>
>> >>>>>> You may work on any issue, this or any other of your choice, or
>> even
>> >> on
>> >>>>> any
>> >>>>>> new issue you can think of (for sizeable contributions it is
>> >>>>> recommended to
>> >>>>>> start discussion on the @dev list first though, to make sure to
>> >> benefit
>> >>>>>> from experience of others. Please file any new issue first to
>> jira).
>> >>>>>>
>> >>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
>> >>>>>> jira@apache.org> wrote:
>> >>>>>>
>> >>>>>>>
>> >>>>>>>  [
>> >>>>>>>
>> >>>>>
>> >>
>> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
>> >>>>>>> ]
>> >>>>>>>
>> >>>>>>> shashi bushan dongur commented on MAHOUT-1788:
>> >>>>>>> ----------------------------------------------
>> >>>>>>>
>> >>>>>>> Hello. I would like to start contributing to mahout. Can I work on
>> >> this
>> >>>>>>> issue?
>> >>>>>>>
>> >>>>>>>> spark-itemsimilarity integration test script cleanup
>> >>>>>>>> ----------------------------------------------------
>> >>>>>>>>
>> >>>>>>>>              Key: MAHOUT-1788
>> >>>>>>>>              URL:
>> >> https://issues.apache.org/jira/browse/MAHOUT-1788
>> >>>>>>>>          Project: Mahout
>> >>>>>>>>       Issue Type: Improvement
>> >>>>>>>>       Components: cooccurrence
>> >>>>>>>> Affects Versions: 0.11.0
>> >>>>>>>>         Reporter: Pat Ferrel
>> >>>>>>>>         Assignee: Pat Ferrel
>> >>>>>>>>         Priority: Trivial
>> >>>>>>>>          Fix For: 1.0.0
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> binary release does not contain data for itemsimilarity tests,
>> neith
>> >>>>>>> binary nor source versions will run on a cluster unless data is
>> hand
>> >>>>> copied
>> >>>>>>> to hdfs.
>> >>>>>>>> Clean this up so it copies data if needed and the data is in both
>> >>>>>>> versions.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> This message was sent by Atlassian JIRA
>> >>>>>>> (v6.3.4#6332)
>> >>>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Khurrum,

mahout is so much  a library at this point.

if you mean if it can be used to build networks with 2d inputs, yes i did
some of that. multi-epoch SGD based systems should be easy enough to build,
and will probably have a reasonable performance -- although I think
dedicated CNN systems like Caffe would still run faster at this point. Full
batch trainers are somewhat slow for larger problems though, my
investigation points that  there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.

On Mon, Apr 18, 2016 at 11:49 AM, Khurrum Nasim <kh...@useitc.com>
wrote:

> Hi Guys,
>
> Can Mahout be used for things like face detection ?    Also which unit
> tests or integration tests do you recommend I should run just to get a
> better feel of the execution flow.
>
> I’m still slowly acclimating to the project.  But hopefully should come up
> to speed soon.
>
>
> Many Thanks,
>
> Khurrum
>
>
>
>
> > On Mar 30, 2016, at 3:10 PM, Suneel Marthi <sm...@apache.org> wrote:
> >
> > Thanks Khurrum for stepping up.
> >
> > You just need basic programming skills - Java/Scala to be able to
> > contribute. We can help you with the algorithms and linear algebra stuff.
> >
> >
> > Welcome aboard !!
> >
> >
> > On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <khurrum.nasim@useitc.com
> >
> > wrote:
> >
> >> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.    My
> >> handle is “nasimk”
> >>
> >> Do I need to be a linear algebra expert and or math phd  to contribute ?
> >> I have 10 plus years of computer programming experience.  my background
> is
> >> comp sci.
> >>
> >> Khurrum
> >>
> >>
> >>
> >>
> >>
> >>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> >>>
> >>> PS You may also want to sign up with ASF Jira so we can assign issues
> to
> >>> yourself.
> >>>
> >>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <dl...@gmail.com>
> >>> wrote:
> >>>
> >>>>
> >>>>
> >>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
> >> khurrum.nasim@useitc.com>
> >>>> wrote:
> >>>>
> >>>>> Thanks Dimirtry.
> >>>>>
> >>>>> I take a look at see where I can start pitching in.  Do I need
> >>>>> contributor access ? how  would I create feature branch of my work ?
> >>>>>
> >>>>
> >>>> Khurrum,
> >>>>
> >>>> you only need github account. What you need is to create mahout's
> master
> >>>> fork in your github space and keep it in sync, as possible, with
> master
> >> as
> >>>> you go (by doing regular pulls). That way you have the most chance of
> >>>> having least conflicts possible.
> >>>>
> >>>> At any point in time (I recommend at perhaps when you feel you are
> about
> >>>> 50 to 70% done or just need a code advice), you can create a github
> pull
> >>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX
> >> issue
> >>>> in the head of the pull request, that way ASF will automatically
> >> propagate
> >>>> code comments to jira, and so all discussion can be done entirely on
> >> github.
> >>>>
> >>>> Again, if you take on a signficant contribution (such as a new
> numerical
> >>>> method contribution), I recommend to discuss the proposal on the @dev
> >> list
> >>>>
> >>>> thanks.
> >>>>
> >>>>
> >>>>>
> >>>>> Khurrum
> >>>>>
> >>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Oh but of course! please do!
> >>>>>>
> >>>>>> You may work on any issue, this or any other of your choice, or even
> >> on
> >>>>> any
> >>>>>> new issue you can think of (for sizeable contributions it is
> >>>>> recommended to
> >>>>>> start discussion on the @dev list first though, to make sure to
> >> benefit
> >>>>>> from experience of others. Please file any new issue first to jira).
> >>>>>>
> >>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
> >>>>>> jira@apache.org> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>  [
> >>>>>>>
> >>>>>
> >>
> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
> >>>>>>> ]
> >>>>>>>
> >>>>>>> shashi bushan dongur commented on MAHOUT-1788:
> >>>>>>> ----------------------------------------------
> >>>>>>>
> >>>>>>> Hello. I would like to start contributing to mahout. Can I work on
> >> this
> >>>>>>> issue?
> >>>>>>>
> >>>>>>>> spark-itemsimilarity integration test script cleanup
> >>>>>>>> ----------------------------------------------------
> >>>>>>>>
> >>>>>>>>              Key: MAHOUT-1788
> >>>>>>>>              URL:
> >> https://issues.apache.org/jira/browse/MAHOUT-1788
> >>>>>>>>          Project: Mahout
> >>>>>>>>       Issue Type: Improvement
> >>>>>>>>       Components: cooccurrence
> >>>>>>>> Affects Versions: 0.11.0
> >>>>>>>>         Reporter: Pat Ferrel
> >>>>>>>>         Assignee: Pat Ferrel
> >>>>>>>>         Priority: Trivial
> >>>>>>>>          Fix For: 1.0.0
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> binary release does not contain data for itemsimilarity tests,
> neith
> >>>>>>> binary nor source versions will run on a cluster unless data is
> hand
> >>>>> copied
> >>>>>>> to hdfs.
> >>>>>>>> Clean this up so it copies data if needed and the data is in both
> >>>>>>> versions.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> This message was sent by Atlassian JIRA
> >>>>>>> (v6.3.4#6332)
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Khurrum Nasim <kh...@useitc.com>.
Hi Guys,

Can Mahout be used for things like face detection ?    Also which unit tests or integration tests do you recommend I should run just to get a better feel of the execution flow.  

I’m still slowly acclimating to the project.  But hopefully should come up to speed soon.   


Many Thanks,

Khurrum




> On Mar 30, 2016, at 3:10 PM, Suneel Marthi <sm...@apache.org> wrote:
> 
> Thanks Khurrum for stepping up.
> 
> You just need basic programming skills - Java/Scala to be able to
> contribute. We can help you with the algorithms and linear algebra stuff.
> 
> 
> Welcome aboard !!
> 
> 
> On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <kh...@useitc.com>
> wrote:
> 
>> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.    My
>> handle is “nasimk”
>> 
>> Do I need to be a linear algebra expert and or math phd  to contribute ?
>> I have 10 plus years of computer programming experience.  my background is
>> comp sci.
>> 
>> Khurrum
>> 
>> 
>> 
>> 
>> 
>>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>> 
>>> PS You may also want to sign up with ASF Jira so we can assign issues to
>>> yourself.
>>> 
>>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <dl...@gmail.com>
>>> wrote:
>>> 
>>>> 
>>>> 
>>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
>> khurrum.nasim@useitc.com>
>>>> wrote:
>>>> 
>>>>> Thanks Dimirtry.
>>>>> 
>>>>> I take a look at see where I can start pitching in.  Do I need
>>>>> contributor access ? how  would I create feature branch of my work ?
>>>>> 
>>>> 
>>>> Khurrum,
>>>> 
>>>> you only need github account. What you need is to create mahout's master
>>>> fork in your github space and keep it in sync, as possible, with master
>> as
>>>> you go (by doing regular pulls). That way you have the most chance of
>>>> having least conflicts possible.
>>>> 
>>>> At any point in time (I recommend at perhaps when you feel you are about
>>>> 50 to 70% done or just need a code advice), you can create a github pull
>>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX
>> issue
>>>> in the head of the pull request, that way ASF will automatically
>> propagate
>>>> code comments to jira, and so all discussion can be done entirely on
>> github.
>>>> 
>>>> Again, if you take on a signficant contribution (such as a new numerical
>>>> method contribution), I recommend to discuss the proposal on the @dev
>> list
>>>> 
>>>> thanks.
>>>> 
>>>> 
>>>>> 
>>>>> Khurrum
>>>>> 
>>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Oh but of course! please do!
>>>>>> 
>>>>>> You may work on any issue, this or any other of your choice, or even
>> on
>>>>> any
>>>>>> new issue you can think of (for sizeable contributions it is
>>>>> recommended to
>>>>>> start discussion on the @dev list first though, to make sure to
>> benefit
>>>>>> from experience of others. Please file any new issue first to jira).
>>>>>> 
>>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
>>>>>> jira@apache.org> wrote:
>>>>>> 
>>>>>>> 
>>>>>>>  [
>>>>>>> 
>>>>> 
>> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
>>>>>>> ]
>>>>>>> 
>>>>>>> shashi bushan dongur commented on MAHOUT-1788:
>>>>>>> ----------------------------------------------
>>>>>>> 
>>>>>>> Hello. I would like to start contributing to mahout. Can I work on
>> this
>>>>>>> issue?
>>>>>>> 
>>>>>>>> spark-itemsimilarity integration test script cleanup
>>>>>>>> ----------------------------------------------------
>>>>>>>> 
>>>>>>>>              Key: MAHOUT-1788
>>>>>>>>              URL:
>> https://issues.apache.org/jira/browse/MAHOUT-1788
>>>>>>>>          Project: Mahout
>>>>>>>>       Issue Type: Improvement
>>>>>>>>       Components: cooccurrence
>>>>>>>> Affects Versions: 0.11.0
>>>>>>>>         Reporter: Pat Ferrel
>>>>>>>>         Assignee: Pat Ferrel
>>>>>>>>         Priority: Trivial
>>>>>>>>          Fix For: 1.0.0
>>>>>>>> 
>>>>>>>> 
>>>>>>>> binary release does not contain data for itemsimilarity tests, neith
>>>>>>> binary nor source versions will run on a cluster unless data is hand
>>>>> copied
>>>>>>> to hdfs.
>>>>>>>> Clean this up so it copies data if needed and the data is in both
>>>>>>> versions.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> This message was sent by Atlassian JIRA
>>>>>>> (v6.3.4#6332)
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Khurrum Nasim <kh...@useitc.com>.
Thanks everyone - I’m glad to be a part of this.  

Khurrum


> On Mar 30, 2016, at 3:10 PM, Suneel Marthi <sm...@apache.org> wrote:
> 
> Thanks Khurrum for stepping up.
> 
> You just need basic programming skills - Java/Scala to be able to
> contribute. We can help you with the algorithms and linear algebra stuff.
> 
> 
> Welcome aboard !!
> 
> 
> On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <kh...@useitc.com>
> wrote:
> 
>> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.    My
>> handle is “nasimk”
>> 
>> Do I need to be a linear algebra expert and or math phd  to contribute ?
>> I have 10 plus years of computer programming experience.  my background is
>> comp sci.
>> 
>> Khurrum
>> 
>> 
>> 
>> 
>> 
>>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>> 
>>> PS You may also want to sign up with ASF Jira so we can assign issues to
>>> yourself.
>>> 
>>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <dl...@gmail.com>
>>> wrote:
>>> 
>>>> 
>>>> 
>>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
>> khurrum.nasim@useitc.com>
>>>> wrote:
>>>> 
>>>>> Thanks Dimirtry.
>>>>> 
>>>>> I take a look at see where I can start pitching in.  Do I need
>>>>> contributor access ? how  would I create feature branch of my work ?
>>>>> 
>>>> 
>>>> Khurrum,
>>>> 
>>>> you only need github account. What you need is to create mahout's master
>>>> fork in your github space and keep it in sync, as possible, with master
>> as
>>>> you go (by doing regular pulls). That way you have the most chance of
>>>> having least conflicts possible.
>>>> 
>>>> At any point in time (I recommend at perhaps when you feel you are about
>>>> 50 to 70% done or just need a code advice), you can create a github pull
>>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX
>> issue
>>>> in the head of the pull request, that way ASF will automatically
>> propagate
>>>> code comments to jira, and so all discussion can be done entirely on
>> github.
>>>> 
>>>> Again, if you take on a signficant contribution (such as a new numerical
>>>> method contribution), I recommend to discuss the proposal on the @dev
>> list
>>>> 
>>>> thanks.
>>>> 
>>>> 
>>>>> 
>>>>> Khurrum
>>>>> 
>>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Oh but of course! please do!
>>>>>> 
>>>>>> You may work on any issue, this or any other of your choice, or even
>> on
>>>>> any
>>>>>> new issue you can think of (for sizeable contributions it is
>>>>> recommended to
>>>>>> start discussion on the @dev list first though, to make sure to
>> benefit
>>>>>> from experience of others. Please file any new issue first to jira).
>>>>>> 
>>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
>>>>>> jira@apache.org> wrote:
>>>>>> 
>>>>>>> 
>>>>>>>  [
>>>>>>> 
>>>>> 
>> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
>>>>>>> ]
>>>>>>> 
>>>>>>> shashi bushan dongur commented on MAHOUT-1788:
>>>>>>> ----------------------------------------------
>>>>>>> 
>>>>>>> Hello. I would like to start contributing to mahout. Can I work on
>> this
>>>>>>> issue?
>>>>>>> 
>>>>>>>> spark-itemsimilarity integration test script cleanup
>>>>>>>> ----------------------------------------------------
>>>>>>>> 
>>>>>>>>              Key: MAHOUT-1788
>>>>>>>>              URL:
>> https://issues.apache.org/jira/browse/MAHOUT-1788
>>>>>>>>          Project: Mahout
>>>>>>>>       Issue Type: Improvement
>>>>>>>>       Components: cooccurrence
>>>>>>>> Affects Versions: 0.11.0
>>>>>>>>         Reporter: Pat Ferrel
>>>>>>>>         Assignee: Pat Ferrel
>>>>>>>>         Priority: Trivial
>>>>>>>>          Fix For: 1.0.0
>>>>>>>> 
>>>>>>>> 
>>>>>>>> binary release does not contain data for itemsimilarity tests, neith
>>>>>>> binary nor source versions will run on a cluster unless data is hand
>>>>> copied
>>>>>>> to hdfs.
>>>>>>>> Clean this up so it copies data if needed and the data is in both
>>>>>>> versions.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> This message was sent by Atlassian JIRA
>>>>>>> (v6.3.4#6332)
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Suneel Marthi <sm...@apache.org>.
Thanks Khurrum for stepping up.

You just need basic programming skills - Java/Scala to be able to
contribute. We can help you with the algorithms and linear algebra stuff.


Welcome aboard !!


On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <kh...@useitc.com>
wrote:

> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.    My
> handle is “nasimk”
>
> Do I need to be a linear algebra expert and or math phd  to contribute ?
> I have 10 plus years of computer programming experience.  my background is
> comp sci.
>
> Khurrum
>
>
>
>
>
> > On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> >
> > PS You may also want to sign up with ASF Jira so we can assign issues to
> > yourself.
> >
> > On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
> khurrum.nasim@useitc.com>
> >> wrote:
> >>
> >>> Thanks Dimirtry.
> >>>
> >>> I take a look at see where I can start pitching in.  Do I need
> >>> contributor access ? how  would I create feature branch of my work ?
> >>>
> >>
> >> Khurrum,
> >>
> >> you only need github account. What you need is to create mahout's master
> >> fork in your github space and keep it in sync, as possible, with master
> as
> >> you go (by doing regular pulls). That way you have the most chance of
> >> having least conflicts possible.
> >>
> >> At any point in time (I recommend at perhaps when you feel you are about
> >> 50 to 70% done or just need a code advice), you can create a github pull
> >> request to the apache/mahout master. Make sure to include MAHOUT-XXX
> issue
> >> in the head of the pull request, that way ASF will automatically
> propagate
> >> code comments to jira, and so all discussion can be done entirely on
> github.
> >>
> >> Again, if you take on a signficant contribution (such as a new numerical
> >> method contribution), I recommend to discuss the proposal on the @dev
> list
> >>
> >> thanks.
> >>
> >>
> >>>
> >>> Khurrum
> >>>
> >>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Oh but of course! please do!
> >>>>
> >>>> You may work on any issue, this or any other of your choice, or even
> on
> >>> any
> >>>> new issue you can think of (for sizeable contributions it is
> >>> recommended to
> >>>> start discussion on the @dev list first though, to make sure to
> benefit
> >>>> from experience of others. Please file any new issue first to jira).
> >>>>
> >>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
> >>>> jira@apache.org> wrote:
> >>>>
> >>>>>
> >>>>>   [
> >>>>>
> >>>
> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
> >>>>> ]
> >>>>>
> >>>>> shashi bushan dongur commented on MAHOUT-1788:
> >>>>> ----------------------------------------------
> >>>>>
> >>>>> Hello. I would like to start contributing to mahout. Can I work on
> this
> >>>>> issue?
> >>>>>
> >>>>>> spark-itemsimilarity integration test script cleanup
> >>>>>> ----------------------------------------------------
> >>>>>>
> >>>>>>               Key: MAHOUT-1788
> >>>>>>               URL:
> https://issues.apache.org/jira/browse/MAHOUT-1788
> >>>>>>           Project: Mahout
> >>>>>>        Issue Type: Improvement
> >>>>>>        Components: cooccurrence
> >>>>>>  Affects Versions: 0.11.0
> >>>>>>          Reporter: Pat Ferrel
> >>>>>>          Assignee: Pat Ferrel
> >>>>>>          Priority: Trivial
> >>>>>>           Fix For: 1.0.0
> >>>>>>
> >>>>>>
> >>>>>> binary release does not contain data for itemsimilarity tests, neith
> >>>>> binary nor source versions will run on a cluster unless data is hand
> >>> copied
> >>>>> to hdfs.
> >>>>>> Clean this up so it copies data if needed and the data is in both
> >>>>> versions.
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> This message was sent by Atlassian JIRA
> >>>>> (v6.3.4#6332)
> >>>>>
> >>>
> >>>
> >>
>
>

Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Khurrum Nasim <kh...@useitc.com>.
Thanks for the advice Dimitry.  I’m already signed up on ASF jira.    My handle is “nasimk”

Do I need to be a linear algebra expert and or math phd  to contribute ?  
I have 10 plus years of computer programming experience.  my background is comp sci. 

Khurrum
 




> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
> PS You may also want to sign up with ASF Jira so we can assign issues to
> yourself.
> 
> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> 
>> 
>> 
>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <kh...@useitc.com>
>> wrote:
>> 
>>> Thanks Dimirtry.
>>> 
>>> I take a look at see where I can start pitching in.  Do I need
>>> contributor access ? how  would I create feature branch of my work ?
>>> 
>> 
>> Khurrum,
>> 
>> you only need github account. What you need is to create mahout's master
>> fork in your github space and keep it in sync, as possible, with master as
>> you go (by doing regular pulls). That way you have the most chance of
>> having least conflicts possible.
>> 
>> At any point in time (I recommend at perhaps when you feel you are about
>> 50 to 70% done or just need a code advice), you can create a github pull
>> request to the apache/mahout master. Make sure to include MAHOUT-XXX issue
>> in the head of the pull request, that way ASF will automatically propagate
>> code comments to jira, and so all discussion can be done entirely on github.
>> 
>> Again, if you take on a signficant contribution (such as a new numerical
>> method contribution), I recommend to discuss the proposal on the @dev list
>> 
>> thanks.
>> 
>> 
>>> 
>>> Khurrum
>>> 
>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
>>> wrote:
>>>> 
>>>> Oh but of course! please do!
>>>> 
>>>> You may work on any issue, this or any other of your choice, or even on
>>> any
>>>> new issue you can think of (for sizeable contributions it is
>>> recommended to
>>>> start discussion on the @dev list first though, to make sure to benefit
>>>> from experience of others. Please file any new issue first to jira).
>>>> 
>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
>>>> jira@apache.org> wrote:
>>>> 
>>>>> 
>>>>>   [
>>>>> 
>>> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
>>>>> ]
>>>>> 
>>>>> shashi bushan dongur commented on MAHOUT-1788:
>>>>> ----------------------------------------------
>>>>> 
>>>>> Hello. I would like to start contributing to mahout. Can I work on this
>>>>> issue?
>>>>> 
>>>>>> spark-itemsimilarity integration test script cleanup
>>>>>> ----------------------------------------------------
>>>>>> 
>>>>>>               Key: MAHOUT-1788
>>>>>>               URL: https://issues.apache.org/jira/browse/MAHOUT-1788
>>>>>>           Project: Mahout
>>>>>>        Issue Type: Improvement
>>>>>>        Components: cooccurrence
>>>>>>  Affects Versions: 0.11.0
>>>>>>          Reporter: Pat Ferrel
>>>>>>          Assignee: Pat Ferrel
>>>>>>          Priority: Trivial
>>>>>>           Fix For: 1.0.0
>>>>>> 
>>>>>> 
>>>>>> binary release does not contain data for itemsimilarity tests, neith
>>>>> binary nor source versions will run on a cluster unless data is hand
>>> copied
>>>>> to hdfs.
>>>>>> Clean this up so it copies data if needed and the data is in both
>>>>> versions.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> This message was sent by Atlassian JIRA
>>>>> (v6.3.4#6332)
>>>>> 
>>> 
>>> 
>> 


Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
PS You may also want to sign up with ASF Jira so we can assign issues to
yourself.

On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <dl...@gmail.com>
wrote:

>
>
> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <kh...@useitc.com>
> wrote:
>
>> Thanks Dimirtry.
>>
>> I take a look at see where I can start pitching in.  Do I need
>> contributor access ? how  would I create feature branch of my work ?
>>
>
> Khurrum,
>
> you only need github account. What you need is to create mahout's master
> fork in your github space and keep it in sync, as possible, with master as
> you go (by doing regular pulls). That way you have the most chance of
> having least conflicts possible.
>
> At any point in time (I recommend at perhaps when you feel you are about
> 50 to 70% done or just need a code advice), you can create a github pull
> request to the apache/mahout master. Make sure to include MAHOUT-XXX issue
> in the head of the pull request, that way ASF will automatically propagate
> code comments to jira, and so all discussion can be done entirely on github.
>
> Again, if you take on a signficant contribution (such as a new numerical
> method contribution), I recommend to discuss the proposal on the @dev list
>
> thanks.
>
>
>>
>> Khurrum
>>
>> > On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>> >
>> > Oh but of course! please do!
>> >
>> > You may work on any issue, this or any other of your choice, or even on
>> any
>> > new issue you can think of (for sizeable contributions it is
>> recommended to
>> > start discussion on the @dev list first though, to make sure to benefit
>> > from experience of others. Please file any new issue first to jira).
>> >
>> > On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
>> > jira@apache.org> wrote:
>> >
>> >>
>> >>    [
>> >>
>> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
>> >> ]
>> >>
>> >> shashi bushan dongur commented on MAHOUT-1788:
>> >> ----------------------------------------------
>> >>
>> >> Hello. I would like to start contributing to mahout. Can I work on this
>> >> issue?
>> >>
>> >>> spark-itemsimilarity integration test script cleanup
>> >>> ----------------------------------------------------
>> >>>
>> >>>                Key: MAHOUT-1788
>> >>>                URL: https://issues.apache.org/jira/browse/MAHOUT-1788
>> >>>            Project: Mahout
>> >>>         Issue Type: Improvement
>> >>>         Components: cooccurrence
>> >>>   Affects Versions: 0.11.0
>> >>>           Reporter: Pat Ferrel
>> >>>           Assignee: Pat Ferrel
>> >>>           Priority: Trivial
>> >>>            Fix For: 1.0.0
>> >>>
>> >>>
>> >>> binary release does not contain data for itemsimilarity tests, neith
>> >> binary nor source versions will run on a cluster unless data is hand
>> copied
>> >> to hdfs.
>> >>> Clean this up so it copies data if needed and the data is in both
>> >> versions.
>> >>
>> >>
>> >>
>> >> --
>> >> This message was sent by Atlassian JIRA
>> >> (v6.3.4#6332)
>> >>
>>
>>
>

Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <kh...@useitc.com>
wrote:

> Thanks Dimirtry.
>
> I take a look at see where I can start pitching in.  Do I need contributor
> access ? how  would I create feature branch of my work ?
>

Khurrum,

you only need github account. What you need is to create mahout's master
fork in your github space and keep it in sync, as possible, with master as
you go (by doing regular pulls). That way you have the most chance of
having least conflicts possible.

At any point in time (I recommend at perhaps when you feel you are about 50
to 70% done or just need a code advice), you can create a github pull
request to the apache/mahout master. Make sure to include MAHOUT-XXX issue
in the head of the pull request, that way ASF will automatically propagate
code comments to jira, and so all discussion can be done entirely on github.

Again, if you take on a signficant contribution (such as a new numerical
method contribution), I recommend to discuss the proposal on the @dev list

thanks.


>
> Khurrum
>
> > On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> >
> > Oh but of course! please do!
> >
> > You may work on any issue, this or any other of your choice, or even on
> any
> > new issue you can think of (for sizeable contributions it is recommended
> to
> > start discussion on the @dev list first though, to make sure to benefit
> > from experience of others. Please file any new issue first to jira).
> >
> > On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
> > jira@apache.org> wrote:
> >
> >>
> >>    [
> >>
> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
> >> ]
> >>
> >> shashi bushan dongur commented on MAHOUT-1788:
> >> ----------------------------------------------
> >>
> >> Hello. I would like to start contributing to mahout. Can I work on this
> >> issue?
> >>
> >>> spark-itemsimilarity integration test script cleanup
> >>> ----------------------------------------------------
> >>>
> >>>                Key: MAHOUT-1788
> >>>                URL: https://issues.apache.org/jira/browse/MAHOUT-1788
> >>>            Project: Mahout
> >>>         Issue Type: Improvement
> >>>         Components: cooccurrence
> >>>   Affects Versions: 0.11.0
> >>>           Reporter: Pat Ferrel
> >>>           Assignee: Pat Ferrel
> >>>           Priority: Trivial
> >>>            Fix For: 1.0.0
> >>>
> >>>
> >>> binary release does not contain data for itemsimilarity tests, neith
> >> binary nor source versions will run on a cluster unless data is hand
> copied
> >> to hdfs.
> >>> Clean this up so it copies data if needed and the data is in both
> >> versions.
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.3.4#6332)
> >>
>
>

Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Khurrum Nasim <kh...@useitc.com>.
Thanks Dimirtry.  

I take a look at see where I can start pitching in.  Do I need contributor access ? how  would I create feature branch of my work ? 

Khurrum

> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
> Oh but of course! please do!
> 
> You may work on any issue, this or any other of your choice, or even on any
> new issue you can think of (for sizeable contributions it is recommended to
> start discussion on the @dev list first though, to make sure to benefit
> from experience of others. Please file any new issue first to jira).
> 
> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
> jira@apache.org> wrote:
> 
>> 
>>    [
>> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
>> ]
>> 
>> shashi bushan dongur commented on MAHOUT-1788:
>> ----------------------------------------------
>> 
>> Hello. I would like to start contributing to mahout. Can I work on this
>> issue?
>> 
>>> spark-itemsimilarity integration test script cleanup
>>> ----------------------------------------------------
>>> 
>>>                Key: MAHOUT-1788
>>>                URL: https://issues.apache.org/jira/browse/MAHOUT-1788
>>>            Project: Mahout
>>>         Issue Type: Improvement
>>>         Components: cooccurrence
>>>   Affects Versions: 0.11.0
>>>           Reporter: Pat Ferrel
>>>           Assignee: Pat Ferrel
>>>           Priority: Trivial
>>>            Fix For: 1.0.0
>>> 
>>> 
>>> binary release does not contain data for itemsimilarity tests, neith
>> binary nor source versions will run on a cluster unless data is hand copied
>> to hdfs.
>>> Clean this up so it copies data if needed and the data is in both
>> versions.
>> 
>> 
>> 
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>> 


Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Oh but of course! please do!

You may work on any issue, this or any other of your choice, or even on any
new issue you can think of (for sizeable contributions it is recommended to
start discussion on the @dev list first though, to make sure to benefit
from experience of others. Please file any new issue first to jira).

On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
jira@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
> ]
>
> shashi bushan dongur commented on MAHOUT-1788:
> ----------------------------------------------
>
> Hello. I would like to start contributing to mahout. Can I work on this
> issue?
>
> > spark-itemsimilarity integration test script cleanup
> > ----------------------------------------------------
> >
> >                 Key: MAHOUT-1788
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1788
> >             Project: Mahout
> >          Issue Type: Improvement
> >          Components: cooccurrence
> >    Affects Versions: 0.11.0
> >            Reporter: Pat Ferrel
> >            Assignee: Pat Ferrel
> >            Priority: Trivial
> >             Fix For: 1.0.0
> >
> >
> > binary release does not contain data for itemsimilarity tests, neith
> binary nor source versions will run on a cluster unless data is hand copied
> to hdfs.
> > Clean this up so it copies data if needed and the data is in both
> versions.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>