You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Krishna Kalyan <kr...@gmail.com> on 2017/03/18 15:18:46 UTC

Re: GSoc 2017

Hello All,
A Gentle ping. Student applications open in a couple of days. I like to
work on 'Support for Python DSLs'.
However for now I am not sure on how to proceed.

Thank you,
Krishna

On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:

> Yeah helping to build out our Python DSL into a full-out replacement for
> the current "DML" language would be great, and we'd be quite supportive!
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
> >
> > Hi Krishna,
> >
> > cool to see that you're interested in SystemML!
> >
> > From your list I personally think that a) and d) would be well suited
> for projects, especially a good python DSL is a high priority.
> >
> > We will apply as an organization to GSoC once organization applications
> are open (Jan. 19th) and I think we will find mentors for at least a) and
> d). If you already want to take a look at what is currently there, I
> suggest to look at our python APIs and documentation. If you want to take
> on the DSL project it might also be a good idea to look into the DML
> documentation and related papers to see what we need to support.
> >
> > The proposals will probably circulate on the mailinglist, too, so keep
> an eye on that :)
> >
> > -Felix
> >
> > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
> >> Hello All,
> >> Thank you for your wonderful replies.
> >> Tasks that I am interested in:
> >> a) Support for Python DSLs
> >> b) Python wrappers for all existing algorithms
> >> c) GPU support
> >> d) Perftest : automated performance tests of algorithms
> >> I am also willing to work on the tasks that SystemML community think are
> >> important.
> >> Regards,
> >> Krishna
> >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
> dusenberrymw@gmail.com>
> >> wrote:
> >>> Hi Krishna!  Welcome, and thanks for your interest!
> >>> We would definitely be excited to collaborate with you on a GSOC
> project.
> >>> We've started another thread to discuss possible new proposals, and we
> >>> would also be quite interested in any particular proposal that you
> might
> >>> like to generate tailored towards your interests.  Copied from the
> other
> >>> thread, some possible ideas could include: building out a full ML demo
> to
> >>> solve a real, large-scale problem that would benefit from a distributed
> >>> approach; overall performance improvements that address a full class,
> or
> >>> wider area, of ML algorithms, rather than a single, specific script;
> >>> infrastructure for [performance] testing, and identification of wide
> areas
> >>> of improvement; helping with building out fully-featured, clean,
> >>> well-tested DSLs in Python & Scala (we've started, but it would be
> good to
> >>> continue stressing them -- we could even aim to replace DML with the
> DSLs);
> >>> etc.  Overall, we want to improve the ability of the user to work on a
> wide
> >>> range of large-scale, distributed ML problems in a simple and easy
> manner
> >>> on top of Spark.
> >>> In the meantime, you could explore our recent open issues [1] and even
> >>> begin discussions or contributions on any of the items.  You could also
> >>> view our recent roadmap discussion thread on the mailing list, starting
> >>> with the first email [2]:
> >>> [1]:
> >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
> 20SYSTEMML%20AND%
> >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%
> >>> 20priority%20DESC
> >>> [2]:
> >>> http://mail-archives.apache.org/mod_mbox/incubator-
> >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
> >>> bad74059930d@gmail.com%3E
> >>> - Mike
> >>> --
> >>> Michael W. Dusenberry
> >>> GitHub: github.com/dusenberrymw
> >>> LinkedIn: linkedin.com/in/mikedusenberry
> >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <luckbr1975@gmail.com
> >
> >>> wrote:
> >>> > As some folks have described on this thread, it would be great to
> get you
> >>> > familiarized with SystemML.
> >>> >
> >>> > In parallel, I would look for a mentor from the active committer
> list and
> >>> > start working on a project proposal which could be based on the
> recent
> >>> > Roadmap discussion [1].
> >>> >
> >>> > If you are looking for some guidance on how Apache participate on
> GSOC,
> >>> > take a look at the following resources [2] and [3], and don't
> hesitate to
> >>> > ask questions here.
> >>> >
> >>> >
> >>> > [1]
> >>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o
> >>> > rg/msg01199.html
> >>> > [2] http://community.apache.org/gsoc.html
> >>> > [3]
> >>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help-
> >>> > you-start-contributing-to-open-source
> >>> >
> >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
> krishnakalyan3@gmail.com
> >>> >
> >>> > wrote:
> >>> >
> >>> > > Hello Developers,
> >>> > > I am Krishna, currently a 2nd year Masters student in (MSc. in Data
> >>> > Mining)
> >>> > > currently in Barcelona studying at Université Polytechnique de
> >>> Catalogne.
> >>> > > I was interested in contributing to SystemML this year under GSoc
> >>> > program.
> >>> > > Could anyone please guide on how to go about it?. (I understand
> the I
> >>> > need
> >>> > > to write a proposal)
> >>> > >
> >>> > > Related Experience:
> >>> > > My masters is mostly focussed on data mining techniques. Before my
> >>> > masters,
> >>> > > I was a  data engineer with IBM (India). I was responsible for
> managing
> >>> > 50
> >>> > > node Hadoop Cluster for more than a year. Most of my time was spent
> >>> > > optimising and writing ETL (Apache Pig) jobs.
> >>> > >
> >>> > > I am the most comfortable with Python followed by R and Scala.
> >>> > >
> >>> > > My Webpage
> >>> > > kkalyan.in
> >>> > >
> >>> > > My Spark Pull Requests
> >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
> >>> is%3Apr%20author%
> >>> > > 3Akrishnakalyan3%20
> >>> > >
> >>> > > Thank you so much,
> >>> > > Krishna
> >>> > >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Luciano Resende
> >>> > http://twitter.com/lresende1975
> >>> > http://lresende.blogspot.com/
> >>> >
>

Re: GSoc 2017

Posted by Krishna Kalyan <kr...@gmail.com>.
Hello Nakul
Thank you so much for your feedback, especially during the weekend. I have
submitted the proposal and the final version attached below.

Cheers,
Krishna

On Mon, Apr 3, 2017 at 4:37 AM, Nakul Jindal <na...@gmail.com> wrote:

> Your project proposal looks great. Be sure to submit a final project
> proposal wherever it is you need to.
>
> Thanks,
> Nakul
>
> On Apr 2, 2017, at 4:08 PM, Krishna Kalyan <kr...@gmail.com>
> wrote:
>
> Hello All,
> I have updated the proposal. I hope this one is better. Please share your
> feedback.
>
> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
> o8ALGjLH2DrIfRsJksA/edit#
>
> FYI : Student Application Deadline April 3 16:00 UTC.
>
>
> Regards,
> Krishna
>
> On Sun, Apr 2, 2017 at 2:39 PM, Krishna Kalyan <kr...@gmail.com>
> wrote:
>
>> Hello Nakul,
>> My comments in *Italics* below.
>>
>> On Sat, Apr 1, 2017 at 11:27 PM, Nakul Jindal <na...@gmail.com> wrote:
>>
>>> Hi Krishna,
>>>
>>> Here are some questions/remarks i have about parts of your proposal:
>>>
>>> In the section titled Summary -
>>>
>>> "The systematic evaluation of performance can be measured with
>>> performance tests and micro-benchmarks"
>>> We currently do not have any micro benchmarks. Do you plan on adding
>>> any? (It would be awesome, but remember to keep the number of tasks
>>> reasonable given the time frame and your familiarity with the project)
>>>
>> *- Removed micro bench marks from the proposal. *
>>
>>>
>>> Your summary section feels like its generally applicable for performance
>>> testing on any project, which is good. However, when it comes to talking
>>> about what you'd actually be doing, I see - " build a benchmark
>>> infrastructure and conduct experiments, that compare different choices in
>>> critical parts (sparsity thresholds, optimisation decisions, etc..)".
>>>
>> *-  I agree and have made these changes.*
>>
>> Going over each point:
>>>
>>> 1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty
>>> much all the tasks involved
>>> 2. "conduct experiments" - sure, although I think you mean testing your
>>> benchmarking infrastructure, please correct me if this is not what you meant
>>>
>>>
>> 3. "that compare different choices in critical parts"
>>> a. "sparsity thresholds" - awesome. You'd need to figure out what
>>> SystemML already does and what to add.
>>> b. "optimization decisions" - could you provide an example or two of
>>> what exactly you mean by this. Do you mean to enable and/or disable certain
>>> optimizations and run the perf suite and also automate the process? or
>>> something else?
>>> c. "etc" - more detail would be nice here. It would be nice to know what
>>> exactly you are committing to.
>>> *- will add more details in this section *
>>>
>>> In the section titled Deliverables -
>>>
>>> You mention
>>> - "automation for all performance tests" - awesome! this is the primary
>>> task
>>> - "automatic scripts to test performance on a cloud provider" - this is
>>> great
>>> - "web dashboard" - awesome! this is a nice-to-have
>>>
>>> But before the "cloud provider" and "web dashboard" task, we'd like to
>>> robustly check for errors and record performance numbers and generate
>>> reports. (Tasks 2 - 6 on https://issues.apache.org/j
>>> ira/browse/SYSTEMML-1451). I see that you've mentioned some of these
>>> tasks in you "Project milestones" section as "Understand metrics to be
>>> captured like time, memory, errors". It'd be good to put them here as well.
>>>
>> *- Will add this information under Deliverables*
>>
>>>
>>> Remember, you might also need to change the way SystemML reports errors
>>> and performance numbers to complete your tasks. You, along with the
>>> currently active members of SystemML might need to change the algorithms
>>> being tested as well.
>>>
>> *- Sure will keep this in mind and will account for this in proposal. *
>>
>>>
>>> In the section titled "Project Milestones" -
>>> Your project timeline looks good, the initial set of things to before
>>> May 30 and the fact that you've set aside the final week for buffer. You
>>> have dug down into a week by week schedule, which is good. I have some
>>> suggestion though:
>>>
>>> You need to
>>> T1. Understand what is happening now, try it out for yourself
>>>
>> *- Yes, I am following the documentation to simulate benchmarks on my
>> local system. *
>>
>> T2. You need to automate this process
>>> T3. You need to test that this automated process works as expected (and
>>> make it robust)
>>> T4. You need to add additional capabilities (like micro-benchmarks
>>> and/or parameterizing the tests and/or running it with sparse and dense
>>> sets)
>>>
>> *- I will account for T3 and T4 more explicitly in my proposal.*
>>
>>
>>> For each of the tasks that you mention in your deliverables, could you
>>> please think about how you'd spend each week doing either T1-3 for a
>>> deliverable that is now being done manually and T4 for one that is not
>>> being done at all right now?
>>> Please revisit some of the tasks on your timeline with this in mind.
>>>
>>> I'd also ask that you set some deliverable(s) for phase 1 (due on June
>>> 26), phase 2 (due on July 26) and the final phase (ends on Aug 29).
>>>
>>> A suggestion for the deliverables, if you wanted to be really ambitious
>>> and complete every task possible :
>>> Phase 1 - implement infrastructure to launch perf suite and to detect
>>> errors & report performance numbers in a plain text file
>>> Phase 2 - implement scripts to compare performance against older
>>> versions of SystemML and other packages (Spark MLLib) and implement
>>> mechanism to generate report(s) with errors and performance information in
>>> a spreadsheet or pdf or on a web interface
>>> Phase 3 - add additional perf tests for more algorithms, different
>>> sparsity thresholds and optimization levels and include them in the
>>> reports. Also implement and test scripts to run the perf suite on a cloud
>>> provider; doing this through a web UI.
>>>
>>> Something very conservative could be do
>>> Phase 1 - automate perf suite and report perf numbers
>>> Phase 2 - make error reporting and handling robust, compare against
>>> previous versions of systemml
>>> Phase 3 - add additional algorithms to the test suite,
>>>
>> *- I would prefer taking the conservative approach here.*
>>
>>>
>>> These are just a suggestions, tweak it as you see fit.
>>> Having a deliverable attached to the end of a phase is a good thing.
>>>
>>> Hope I am not being too critical and hopefully this helps
>>>
>> *- Not at all,  appreciate your feedback detailed reply. *
>>
>> *- Could you also let me know the co-mentors for this project?. I am
>> working on the proposal and will share an updated version soon.*
>>
>>
>>> -Nakul
>>>
>>>
>>>
>>>
>>> On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <
>>> krishnakalyan3@gmail.com> wrote:
>>>
>>>> Hello All,
>>>> Based on "SYSTEMML-1451" and  relevant SystemML source code, I have
>>>> updated the draft proposal. Please have a look and share your valuable
>>>> feedback.
>>>>
>>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>>>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>>
>>>> Regards,
>>>> Krishna
>>>>
>>>> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <
>>>> krishnakalyan3@gmail.com> wrote:
>>>>
>>>>> Hello All,
>>>>> I have created a proposal for
>>>>>
>>>>> d) Perftest : automated performance tests of algorithms
>>>>> (I am most comfortable with bash scripting and Python)
>>>>>
>>>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>>>>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>>>
>>>>> Please share your feedback on the proposal. If someone from the
>>>>> community could mentor, it would be great.
>>>>>
>>>>> Regards,
>>>>> Krishna
>>>>>
>>>>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <
>>>>> krishnakalyan3@gmail.com> wrote:
>>>>>
>>>>>> Thanks Nakul,
>>>>>> Replied to the JIRA thread.
>>>>>>
>>>>>> Cheers,
>>>>>> Krishna
>>>>>>
>>>>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <na...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Krishna,
>>>>>>>
>>>>>>> We have 2 proposals up :
>>>>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=p
>>>>>>> roject%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20OR
>>>>>>> DER%20BY%20created%20DESC
>>>>>>>
>>>>>>> Would you be interested in any of these?
>>>>>>> If you are specifically interested in the Python DSL project, we can
>>>>>>> look for more volunteers or I could just volunteer to mentor it.
>>>>>>>
>>>>>>> -Nakul
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <na...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Krishna,
>>>>>>>>
>>>>>>>> We are working on putting together some proposals. I created is for
>>>>>>>> a GPU based project.
>>>>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>>>>>>> Be on the lookout for more.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Nakul
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <
>>>>>>>> krishnakalyan3@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello Adina and Arvind thanks you for your reply,
>>>>>>>>> I am open to writing a proposal with a mentor and would appreciate
>>>>>>>>> if we
>>>>>>>>> could take action quickly on this.
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Krishna
>>>>>>>>>
>>>>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> > Apache Software Foundation applied and was accepted for GSOC. I
>>>>>>>>> believe
>>>>>>>>> > SystemML could still participate as part of ASF if interested
>>>>>>>>> (record your
>>>>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this
>>>>>>>>> subject on
>>>>>>>>> > the community.apache.org mailing list from Ulrich Stark.
>>>>>>>>> > The following page also has useful info, even if it is not
>>>>>>>>> updated for this
>>>>>>>>> > year: http://community.apache.org/gsoc.html - mentors need to
>>>>>>>>> register
>>>>>>>>> > very
>>>>>>>>> > soon.
>>>>>>>>> >
>>>>>>>>> > Best regards,
>>>>>>>>> > Adina
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve <
>>>>>>>>> acs_s@yahoo.com.invalid>
>>>>>>>>> > wrote:
>>>>>>>>> >
>>>>>>>>> > > Thanks Krishna for your interest.
>>>>>>>>> > > Unfortunately we could not submit topic to GSoc on
>>>>>>>>> time.However please
>>>>>>>>> > > feel free to leverage SystemML for your use cases and do
>>>>>>>>> possible
>>>>>>>>> > > contribution to SystemML.
>>>>>>>>> > > Please let us know if you have any question.
>>>>>>>>> > >
>>>>>>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>>>>>>> > >
>>>>>>>>> > >       From: Krishna Kalyan <kr...@gmail.com>
>>>>>>>>> > >  To: dev@systemml.incubator.apache.org
>>>>>>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>>>>>>> > >  Subject: Re: GSoc 2017
>>>>>>>>> > >
>>>>>>>>> > > Hello All,
>>>>>>>>> > > A Gentle ping. Student applications open in a couple of days.
>>>>>>>>> I like to
>>>>>>>>> > > work on 'Support for Python DSLs'.
>>>>>>>>> > > However for now I am not sure on how to proceed.
>>>>>>>>> > >
>>>>>>>>> > > Thank you,
>>>>>>>>> > > Krishna
>>>>>>>>> > >
>>>>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> > >
>>>>>>>>> > > > Yeah helping to build out our Python DSL into a full-out
>>>>>>>>> replacement
>>>>>>>>> > for
>>>>>>>>> > > > the current "DML" language would be great, and we'd be quite
>>>>>>>>> > supportive!
>>>>>>>>> > > >
>>>>>>>>> > > > -Mike
>>>>>>>>> > > >
>>>>>>>>> > > > --
>>>>>>>>> > > >
>>>>>>>>> > > > Mike Dusenberry
>>>>>>>>> > > > GitHub: github.com/dusenberrymw
>>>>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>>>> > > >
>>>>>>>>> > > > Sent from my iPhone.
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
>>>>>>>>> > > > >
>>>>>>>>> > > > > Hi Krishna,
>>>>>>>>> > > > >
>>>>>>>>> > > > > cool to see that you're interested in SystemML!
>>>>>>>>> > > > >
>>>>>>>>> > > > > From your list I personally think that a) and d) would be
>>>>>>>>> well suited
>>>>>>>>> > > > for projects, especially a good python DSL is a high
>>>>>>>>> priority.
>>>>>>>>> > > > >
>>>>>>>>> > > > > We will apply as an organization to GSoC once organization
>>>>>>>>> > applications
>>>>>>>>> > > > are open (Jan. 19th) and I think we will find mentors for at
>>>>>>>>> least a)
>>>>>>>>> > and
>>>>>>>>> > > > d). If you already want to take a look at what is currently
>>>>>>>>> there, I
>>>>>>>>> > > > suggest to look at our python APIs and documentation. If you
>>>>>>>>> want to
>>>>>>>>> > take
>>>>>>>>> > > > on the DSL project it might also be a good idea to look into
>>>>>>>>> the DML
>>>>>>>>> > > > documentation and related papers to see what we need to
>>>>>>>>> support.
>>>>>>>>> > > > >
>>>>>>>>> > > > > The proposals will probably circulate on the mailinglist,
>>>>>>>>> too, so
>>>>>>>>> > keep
>>>>>>>>> > > > an eye on that :)
>>>>>>>>> > > > >
>>>>>>>>> > > > > -Felix
>>>>>>>>> > > > >
>>>>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>>>>>>>> > > > >> Hello All,
>>>>>>>>> > > > >> Thank you for your wonderful replies.
>>>>>>>>> > > > >> Tasks that I am interested in:
>>>>>>>>> > > > >> a) Support for Python DSLs
>>>>>>>>> > > > >> b) Python wrappers for all existing algorithms
>>>>>>>>> > > > >> c) GPU support
>>>>>>>>> > > > >> d) Perftest : automated performance tests of algorithms
>>>>>>>>> > > > >> I am also willing to work on the tasks that SystemML
>>>>>>>>> community think
>>>>>>>>> > > are
>>>>>>>>> > > > >> important.
>>>>>>>>> > > > >> Regards,
>>>>>>>>> > > > >> Krishna
>>>>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
>>>>>>>>> > > > dusenberrymw@gmail.com>
>>>>>>>>> > > > >> wrote:
>>>>>>>>> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
>>>>>>>>> > > > >>> We would definitely be excited to collaborate with you
>>>>>>>>> on a GSOC
>>>>>>>>> > > > project.
>>>>>>>>> > > > >>> We've started another thread to discuss possible new
>>>>>>>>> proposals, and
>>>>>>>>> > > we
>>>>>>>>> > > > >>> would also be quite interested in any particular
>>>>>>>>> proposal that you
>>>>>>>>> > > > might
>>>>>>>>> > > > >>> like to generate tailored towards your interests.
>>>>>>>>> Copied from the
>>>>>>>>> > > > other
>>>>>>>>> > > > >>> thread, some possible ideas could include: building out
>>>>>>>>> a full ML
>>>>>>>>> > > demo
>>>>>>>>> > > > to
>>>>>>>>> > > > >>> solve a real, large-scale problem that would benefit
>>>>>>>>> from a
>>>>>>>>> > > distributed
>>>>>>>>> > > > >>> approach; overall performance improvements that address
>>>>>>>>> a full
>>>>>>>>> > class,
>>>>>>>>> > > > or
>>>>>>>>> > > > >>> wider area, of ML algorithms, rather than a single,
>>>>>>>>> specific
>>>>>>>>> > script;
>>>>>>>>> > > > >>> infrastructure for [performance] testing, and
>>>>>>>>> identification of
>>>>>>>>> > wide
>>>>>>>>> > > > areas
>>>>>>>>> > > > >>> of improvement; helping with building out
>>>>>>>>> fully-featured, clean,
>>>>>>>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but
>>>>>>>>> it would be
>>>>>>>>> > > > good to
>>>>>>>>> > > > >>> continue stressing them -- we could even aim to replace
>>>>>>>>> DML with
>>>>>>>>> > the
>>>>>>>>> > > > DSLs);
>>>>>>>>> > > > >>> etc.  Overall, we want to improve the ability of the
>>>>>>>>> user to work
>>>>>>>>> > on
>>>>>>>>> > > a
>>>>>>>>> > > > wide
>>>>>>>>> > > > >>> range of large-scale, distributed ML problems in a
>>>>>>>>> simple and easy
>>>>>>>>> > > > manner
>>>>>>>>> > > > >>> on top of Spark.
>>>>>>>>> > > > >>> In the meantime, you could explore our recent open
>>>>>>>>> issues [1] and
>>>>>>>>> > > even
>>>>>>>>> > > > >>> begin discussions or contributions on any of the items.
>>>>>>>>> You could
>>>>>>>>> > > also
>>>>>>>>> > > > >>> view our recent roadmap discussion thread on the mailing
>>>>>>>>> list,
>>>>>>>>> > > starting
>>>>>>>>> > > > >>> with the first email [2]:
>>>>>>>>> > > > >>> [1]:
>>>>>>>>> > > > >>> https://issues.apache.org/jira
>>>>>>>>> /issues/?jql=project%20%3D%
>>>>>>>>> > > > 20SYSTEMML%20AND%
>>>>>>>>> > > > >>> 20resolution%20%3D%20Unresolve
>>>>>>>>> d%20ORDER%20BY%20updated%20DESC%2C%
>>>>>>>>> > > > >>> 20priority%20DESC
>>>>>>>>> > > > >>> [2]:
>>>>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>>>>>>> > > > >>> bad74059930d@gmail.com%3E
>>>>>>>>> > > > >>> - Mike
>>>>>>>>> > > > >>> --
>>>>>>>>> > > > >>> Michael W. Dusenberry
>>>>>>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
>>>>>>>>> > > luckbr1975@gmail.com
>>>>>>>>> > > > >
>>>>>>>>> > > > >>> wrote:
>>>>>>>>> > > > >>> > As some folks have described on this thread, it would
>>>>>>>>> be great to
>>>>>>>>> > > > get you
>>>>>>>>> > > > >>> > familiarized with SystemML.
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > In parallel, I would look for a mentor from the active
>>>>>>>>> committer
>>>>>>>>> > > > list and
>>>>>>>>> > > > >>> > start working on a project proposal which could be
>>>>>>>>> based on the
>>>>>>>>> > > > recent
>>>>>>>>> > > > >>> > Roadmap discussion [1].
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > If you are looking for some guidance on how Apache
>>>>>>>>> participate on
>>>>>>>>> > > > GSOC,
>>>>>>>>> > > > >>> > take a look at the following resources [2] and [3],
>>>>>>>>> and don't
>>>>>>>>> > > > hesitate to
>>>>>>>>> > > > >>> > ask questions here.
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > [1]
>>>>>>>>> > > > >>> > https://www.mail-archive.com/d
>>>>>>>>> ev@systemml.incubator.apache.o
>>>>>>>>> > > > >>> > rg/msg01199.html
>>>>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>>>>>>> > > > >>> > [3]
>>>>>>>>> > > > >>> > http://www.slideshare.net/luck
>>>>>>>>> br1975/how-mentoring-can-help-
>>>>>>>>> > > > >>> > you-start-contributing-to-open-source
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
>>>>>>>>> > > > krishnakalyan3@gmail.com
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > wrote:
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > > Hello Developers,
>>>>>>>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student
>>>>>>>>> in (MSc. in
>>>>>>>>> > > Data
>>>>>>>>> > > > >>> > Mining)
>>>>>>>>> > > > >>> > > currently in Barcelona studying at Université
>>>>>>>>> Polytechnique de
>>>>>>>>> > > > >>> Catalogne.
>>>>>>>>> > > > >>> > > I was interested in contributing to SystemML this
>>>>>>>>> year under
>>>>>>>>> > GSoc
>>>>>>>>> > > > >>> > program.
>>>>>>>>> > > > >>> > > Could anyone please guide on how to go about it?. (I
>>>>>>>>> understand
>>>>>>>>> > > > the I
>>>>>>>>> > > > >>> > need
>>>>>>>>> > > > >>> > > to write a proposal)
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > Related Experience:
>>>>>>>>> > > > >>> > > My masters is mostly focussed on data mining
>>>>>>>>> techniques. Before
>>>>>>>>> > > my
>>>>>>>>> > > > >>> > masters,
>>>>>>>>> > > > >>> > > I was a  data engineer with IBM (India). I was
>>>>>>>>> responsible for
>>>>>>>>> > > > managing
>>>>>>>>> > > > >>> > 50
>>>>>>>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my
>>>>>>>>> time was
>>>>>>>>> > > spent
>>>>>>>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > I am the most comfortable with Python followed by R
>>>>>>>>> and Scala.
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > My Webpage
>>>>>>>>> > > > >>> > > kkalyan.in
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > My Spark Pull Requests
>>>>>>>>> > > > >>> > > https://github.com/apache/spar
>>>>>>>>> k/pulls?utf8=%E2%9C%93&q=
>>>>>>>>> > > > >>> is%3Apr%20author%
>>>>>>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > Thank you so much,
>>>>>>>>> > > > >>> > > Krishna
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > --
>>>>>>>>> > > > >>> > Luciano Resende
>>>>>>>>> > > > >>> > http://twitter.com/lresende1975
>>>>>>>>> > > > >>> > http://lresende.blogspot.com/
>>>>>>>>> > > > >>> >
>>>>>>>>> > > >
>>>>>>>>> > >
>>>>>>>>> > >
>>>>>>>>> > >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Dr. Adina Crainiceanu
>>>>>>>>> > Associate Professor, Computer Science Department
>>>>>>>>> > United States Naval Academy
>>>>>>>>> > 410-293-6822
>>>>>>>>> > adina@usna.edu
>>>>>>>>> > http://www.usna.edu/Users/cs/adina/
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: GSoc 2017

Posted by Nakul Jindal <na...@gmail.com>.
Your project proposal looks great. Be sure to submit a final project proposal wherever it is you need to. 

Thanks,
Nakul

> On Apr 2, 2017, at 4:08 PM, Krishna Kalyan <kr...@gmail.com> wrote:
> 
> Hello All,
> I have updated the proposal. I hope this one is better. Please share your feedback.
> 
> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALGjLH2DrIfRsJksA/edit#
> 
> FYI : Student Application Deadline April 3 16:00 UTC. 
> 
> 
> Regards,
> Krishna
> 
>> On Sun, Apr 2, 2017 at 2:39 PM, Krishna Kalyan <kr...@gmail.com> wrote:
>> Hello Nakul,
>> My comments in Italics below.
>> 
>>> On Sat, Apr 1, 2017 at 11:27 PM, Nakul Jindal <na...@gmail.com> wrote:
>>> Hi Krishna,
>>> 
>>> Here are some questions/remarks i have about parts of your proposal:
>>> 
>>> In the section titled Summary -
>>> 
>>> "The systematic evaluation of performance can be measured with performance tests and micro-benchmarks"
>>> We currently do not have any micro benchmarks. Do you plan on adding any? (It would be awesome, but remember to keep the number of tasks reasonable given the time frame and your familiarity with the project)
>> - Removed micro bench marks from the proposal. 
>>> 
>>> Your summary section feels like its generally applicable for performance testing on any project, which is good. However, when it comes to talking about what you'd actually be doing, I see - " build a benchmark infrastructure and conduct experiments, that compare different choices in critical parts (sparsity thresholds, optimisation decisions, etc..)".
>> 
>> -  I agree and have made these changes.
>> 
>>> Going over each point:
>>> 
>>> 1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty much all the tasks involved 
>>> 2. "conduct experiments" - sure, although I think you mean testing your benchmarking infrastructure, please correct me if this is not what you meant 
>>> 3. "that compare different choices in critical parts"
>>>   a. "sparsity thresholds" - awesome. You'd need to figure out what SystemML already does and what to add. 
>>>   b. "optimization decisions" - could you provide an example or two of what exactly you mean by this. Do you mean to enable and/or disable certain optimizations and run the perf suite and also automate the process? or something else?
>>>   c. "etc" - more detail would be nice here. It would be nice to know what exactly you are committing to.
>>> - will add more details in this section 
>>> 
>>> In the section titled Deliverables - 
>>> 
>>> You mention
>>> - "automation for all performance tests" - awesome! this is the primary task
>>> - "automatic scripts to test performance on a cloud provider" - this is great
>>> - "web dashboard" - awesome! this is a nice-to-have
>>> 
>>> But before the "cloud provider" and "web dashboard" task, we'd like to robustly check for errors and record performance numbers and generate reports. (Tasks 2 - 6 on https://issues.apache.org/jira/browse/SYSTEMML-1451). I see that you've mentioned some of these tasks in you "Project milestones" section as "Understand metrics to be captured like time, memory, errors". It'd be good to put them here as well.
>> - Will add this information under Deliverables
>>> 
>>> Remember, you might also need to change the way SystemML reports errors and performance numbers to complete your tasks. You, along with the currently active members of SystemML might need to change the algorithms being tested as well.
>> 
>> - Sure will keep this in mind and will account for this in proposal. 
>>> 
>>> In the section titled "Project Milestones" - 
>>> Your project timeline looks good, the initial set of things to before May 30 and the fact that you've set aside the final week for buffer. You have dug down into a week by week schedule, which is good. I have some suggestion though:
>>> 
>>> You need to 
>>> T1. Understand what is happening now, try it out for yourself
>> 
>> - Yes, I am following the documentation to simulate benchmarks on my local system. 
>> 
>>> T2. You need to automate this process
>>> T3. You need to test that this automated process works as expected (and make it robust)
>>> T4. You need to add additional capabilities (like micro-benchmarks and/or parameterizing the tests and/or running it with sparse and dense sets)
>> 
>> - I will account for T3 and T4 more explicitly in my proposal.
>>  
>>> For each of the tasks that you mention in your deliverables, could you please think about how you'd spend each week doing either T1-3 for a deliverable that is now being done manually and T4 for one that is not being done at all right now?
>>> Please revisit some of the tasks on your timeline with this in mind.
>>> 
>>> I'd also ask that you set some deliverable(s) for phase 1 (due on June 26), phase 2 (due on July 26) and the final phase (ends on Aug 29).
>>> 
>>> A suggestion for the deliverables, if you wanted to be really ambitious and complete every task possible :
>>> Phase 1 - implement infrastructure to launch perf suite and to detect errors & report performance numbers in a plain text file
>>> Phase 2 - implement scripts to compare performance against older versions of SystemML and other packages (Spark MLLib) and implement mechanism to generate report(s) with errors and performance information in a spreadsheet or pdf or on a web interface
>>> Phase 3 - add additional perf tests for more algorithms, different sparsity thresholds and optimization levels and include them in the reports. Also implement and test scripts to run the perf suite on a cloud provider; doing this through a web UI.  
>>> 
>>> Something very conservative could be do 
>>> Phase 1 - automate perf suite and report perf numbers
>>> Phase 2 - make error reporting and handling robust, compare against previous versions of systemml
>>> Phase 3 - add additional algorithms to the test suite, 
>> 
>> - I would prefer taking the conservative approach here.
>>> 
>>> These are just a suggestions, tweak it as you see fit.
>>> Having a deliverable attached to the end of a phase is a good thing. 
>>> 
>>> Hope I am not being too critical and hopefully this helps
>> 
>> - Not at all,  appreciate your feedback detailed reply. 
>> 
>> - Could you also let me know the co-mentors for this project?. I am working on the proposal and will share an updated version soon.
>>  
>>> -Nakul
>>> 
>>> 
>>> 
>>> 
>>>> On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <kr...@gmail.com> wrote:
>>>> Hello All,
>>>> Based on "SYSTEMML-1451" and  relevant SystemML source code, I have updated the draft proposal. Please have a look and share your valuable feedback. 
>>>> 
>>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>> 
>>>> Regards,
>>>> Krishna
>>>> 
>>>>> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <kr...@gmail.com> wrote:
>>>>> Hello All,
>>>>> I have created a proposal for 
>>>>> 
>>>>> d) Perftest : automated performance tests of algorithms
>>>>> (I am most comfortable with bash scripting and Python)
>>>>> 
>>>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>>> 
>>>>> Please share your feedback on the proposal. If someone from the community could mentor, it would be great.
>>>>> 
>>>>> Regards,
>>>>> Krishna
>>>>> 
>>>>>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <kr...@gmail.com> wrote:
>>>>>> Thanks Nakul,
>>>>>> Replied to the JIRA thread.
>>>>>> 
>>>>>> Cheers,
>>>>>> Krishna
>>>>>> 
>>>>>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <na...@gmail.com> wrote:
>>>>>>> Hi Krishna,
>>>>>>> 
>>>>>>> We have 2 proposals up :
>>>>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=project%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20ORDER%20BY%20created%20DESC
>>>>>>> 
>>>>>>> Would you be interested in any of these?
>>>>>>> If you are specifically interested in the Python DSL project, we can look for more volunteers or I could just volunteer to mentor it.
>>>>>>> 
>>>>>>> -Nakul
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <na...@gmail.com> wrote:
>>>>>>>> Hi Krishna, 
>>>>>>>> 
>>>>>>>> We are working on putting together some proposals. I created is for a GPU based project.
>>>>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>>>>>>> Be on the lookout for more.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Nakul
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <kr...@gmail.com> wrote:
>>>>>>>>> Hello Adina and Arvind thanks you for your reply,
>>>>>>>>> I am open to writing a proposal with a mentor and would appreciate if we
>>>>>>>>> could take action quickly on this.
>>>>>>>>> 
>>>>>>>>> Best Regards,
>>>>>>>>> Krishna
>>>>>>>>> 
>>>>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu> wrote:
>>>>>>>>> 
>>>>>>>>> > Apache Software Foundation applied and was accepted for GSOC. I believe
>>>>>>>>> > SystemML could still participate as part of ASF if interested (record your
>>>>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this subject on
>>>>>>>>> > the community.apache.org mailing list from Ulrich Stark.
>>>>>>>>> > The following page also has useful info, even if it is not updated for this
>>>>>>>>> > year: http://community.apache.org/gsoc.html - mentors need to register
>>>>>>>>> > very
>>>>>>>>> > soon.
>>>>>>>>> >
>>>>>>>>> > Best regards,
>>>>>>>>> > Adina
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve <ac...@yahoo.com.invalid>
>>>>>>>>> > wrote:
>>>>>>>>> >
>>>>>>>>> > > Thanks Krishna for your interest.
>>>>>>>>> > > Unfortunately we could not submit topic to GSoc on time.However please
>>>>>>>>> > > feel free to leverage SystemML for your use cases and do possible
>>>>>>>>> > > contribution to SystemML.
>>>>>>>>> > > Please let us know if you have any question.
>>>>>>>>> > >
>>>>>>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>>>>>>> > >
>>>>>>>>> > >       From: Krishna Kalyan <kr...@gmail.com>
>>>>>>>>> > >  To: dev@systemml.incubator.apache.org
>>>>>>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>>>>>>> > >  Subject: Re: GSoc 2017
>>>>>>>>> > >
>>>>>>>>> > > Hello All,
>>>>>>>>> > > A Gentle ping. Student applications open in a couple of days. I like to
>>>>>>>>> > > work on 'Support for Python DSLs'.
>>>>>>>>> > > However for now I am not sure on how to proceed.
>>>>>>>>> > >
>>>>>>>>> > > Thank you,
>>>>>>>>> > > Krishna
>>>>>>>>> > >
>>>>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:
>>>>>>>>> > >
>>>>>>>>> > > > Yeah helping to build out our Python DSL into a full-out replacement
>>>>>>>>> > for
>>>>>>>>> > > > the current "DML" language would be great, and we'd be quite
>>>>>>>>> > supportive!
>>>>>>>>> > > >
>>>>>>>>> > > > -Mike
>>>>>>>>> > > >
>>>>>>>>> > > > --
>>>>>>>>> > > >
>>>>>>>>> > > > Mike Dusenberry
>>>>>>>>> > > > GitHub: github.com/dusenberrymw
>>>>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>>>> > > >
>>>>>>>>> > > > Sent from my iPhone.
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
>>>>>>>>> > > > >
>>>>>>>>> > > > > Hi Krishna,
>>>>>>>>> > > > >
>>>>>>>>> > > > > cool to see that you're interested in SystemML!
>>>>>>>>> > > > >
>>>>>>>>> > > > > From your list I personally think that a) and d) would be well suited
>>>>>>>>> > > > for projects, especially a good python DSL is a high priority.
>>>>>>>>> > > > >
>>>>>>>>> > > > > We will apply as an organization to GSoC once organization
>>>>>>>>> > applications
>>>>>>>>> > > > are open (Jan. 19th) and I think we will find mentors for at least a)
>>>>>>>>> > and
>>>>>>>>> > > > d). If you already want to take a look at what is currently there, I
>>>>>>>>> > > > suggest to look at our python APIs and documentation. If you want to
>>>>>>>>> > take
>>>>>>>>> > > > on the DSL project it might also be a good idea to look into the DML
>>>>>>>>> > > > documentation and related papers to see what we need to support.
>>>>>>>>> > > > >
>>>>>>>>> > > > > The proposals will probably circulate on the mailinglist, too, so
>>>>>>>>> > keep
>>>>>>>>> > > > an eye on that :)
>>>>>>>>> > > > >
>>>>>>>>> > > > > -Felix
>>>>>>>>> > > > >
>>>>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>>>>>>>> > > > >> Hello All,
>>>>>>>>> > > > >> Thank you for your wonderful replies.
>>>>>>>>> > > > >> Tasks that I am interested in:
>>>>>>>>> > > > >> a) Support for Python DSLs
>>>>>>>>> > > > >> b) Python wrappers for all existing algorithms
>>>>>>>>> > > > >> c) GPU support
>>>>>>>>> > > > >> d) Perftest : automated performance tests of algorithms
>>>>>>>>> > > > >> I am also willing to work on the tasks that SystemML community think
>>>>>>>>> > > are
>>>>>>>>> > > > >> important.
>>>>>>>>> > > > >> Regards,
>>>>>>>>> > > > >> Krishna
>>>>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
>>>>>>>>> > > > dusenberrymw@gmail.com>
>>>>>>>>> > > > >> wrote:
>>>>>>>>> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
>>>>>>>>> > > > >>> We would definitely be excited to collaborate with you on a GSOC
>>>>>>>>> > > > project.
>>>>>>>>> > > > >>> We've started another thread to discuss possible new proposals, and
>>>>>>>>> > > we
>>>>>>>>> > > > >>> would also be quite interested in any particular proposal that you
>>>>>>>>> > > > might
>>>>>>>>> > > > >>> like to generate tailored towards your interests.  Copied from the
>>>>>>>>> > > > other
>>>>>>>>> > > > >>> thread, some possible ideas could include: building out a full ML
>>>>>>>>> > > demo
>>>>>>>>> > > > to
>>>>>>>>> > > > >>> solve a real, large-scale problem that would benefit from a
>>>>>>>>> > > distributed
>>>>>>>>> > > > >>> approach; overall performance improvements that address a full
>>>>>>>>> > class,
>>>>>>>>> > > > or
>>>>>>>>> > > > >>> wider area, of ML algorithms, rather than a single, specific
>>>>>>>>> > script;
>>>>>>>>> > > > >>> infrastructure for [performance] testing, and identification of
>>>>>>>>> > wide
>>>>>>>>> > > > areas
>>>>>>>>> > > > >>> of improvement; helping with building out fully-featured, clean,
>>>>>>>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it would be
>>>>>>>>> > > > good to
>>>>>>>>> > > > >>> continue stressing them -- we could even aim to replace DML with
>>>>>>>>> > the
>>>>>>>>> > > > DSLs);
>>>>>>>>> > > > >>> etc.  Overall, we want to improve the ability of the user to work
>>>>>>>>> > on
>>>>>>>>> > > a
>>>>>>>>> > > > wide
>>>>>>>>> > > > >>> range of large-scale, distributed ML problems in a simple and easy
>>>>>>>>> > > > manner
>>>>>>>>> > > > >>> on top of Spark.
>>>>>>>>> > > > >>> In the meantime, you could explore our recent open issues [1] and
>>>>>>>>> > > even
>>>>>>>>> > > > >>> begin discussions or contributions on any of the items.  You could
>>>>>>>>> > > also
>>>>>>>>> > > > >>> view our recent roadmap discussion thread on the mailing list,
>>>>>>>>> > > starting
>>>>>>>>> > > > >>> with the first email [2]:
>>>>>>>>> > > > >>> [1]:
>>>>>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>>>>>> > > > 20SYSTEMML%20AND%
>>>>>>>>> > > > >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%
>>>>>>>>> > > > >>> 20priority%20DESC
>>>>>>>>> > > > >>> [2]:
>>>>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>>>>>>> > > > >>> bad74059930d@gmail.com%3E
>>>>>>>>> > > > >>> - Mike
>>>>>>>>> > > > >>> --
>>>>>>>>> > > > >>> Michael W. Dusenberry
>>>>>>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
>>>>>>>>> > > luckbr1975@gmail.com
>>>>>>>>> > > > >
>>>>>>>>> > > > >>> wrote:
>>>>>>>>> > > > >>> > As some folks have described on this thread, it would be great to
>>>>>>>>> > > > get you
>>>>>>>>> > > > >>> > familiarized with SystemML.
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > In parallel, I would look for a mentor from the active committer
>>>>>>>>> > > > list and
>>>>>>>>> > > > >>> > start working on a project proposal which could be based on the
>>>>>>>>> > > > recent
>>>>>>>>> > > > >>> > Roadmap discussion [1].
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > If you are looking for some guidance on how Apache participate on
>>>>>>>>> > > > GSOC,
>>>>>>>>> > > > >>> > take a look at the following resources [2] and [3], and don't
>>>>>>>>> > > > hesitate to
>>>>>>>>> > > > >>> > ask questions here.
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > [1]
>>>>>>>>> > > > >>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o
>>>>>>>>> > > > >>> > rg/msg01199.html
>>>>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>>>>>>> > > > >>> > [3]
>>>>>>>>> > > > >>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help-
>>>>>>>>> > > > >>> > you-start-contributing-to-open-source
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
>>>>>>>>> > > > krishnakalyan3@gmail.com
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > wrote:
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > > Hello Developers,
>>>>>>>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in (MSc. in
>>>>>>>>> > > Data
>>>>>>>>> > > > >>> > Mining)
>>>>>>>>> > > > >>> > > currently in Barcelona studying at Université Polytechnique de
>>>>>>>>> > > > >>> Catalogne.
>>>>>>>>> > > > >>> > > I was interested in contributing to SystemML this year under
>>>>>>>>> > GSoc
>>>>>>>>> > > > >>> > program.
>>>>>>>>> > > > >>> > > Could anyone please guide on how to go about it?. (I understand
>>>>>>>>> > > > the I
>>>>>>>>> > > > >>> > need
>>>>>>>>> > > > >>> > > to write a proposal)
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > Related Experience:
>>>>>>>>> > > > >>> > > My masters is mostly focussed on data mining techniques. Before
>>>>>>>>> > > my
>>>>>>>>> > > > >>> > masters,
>>>>>>>>> > > > >>> > > I was a  data engineer with IBM (India). I was responsible for
>>>>>>>>> > > > managing
>>>>>>>>> > > > >>> > 50
>>>>>>>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my time was
>>>>>>>>> > > spent
>>>>>>>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > I am the most comfortable with Python followed by R and Scala.
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > My Webpage
>>>>>>>>> > > > >>> > > kkalyan.in
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > My Spark Pull Requests
>>>>>>>>> > > > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
>>>>>>>>> > > > >>> is%3Apr%20author%
>>>>>>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > Thank you so much,
>>>>>>>>> > > > >>> > > Krishna
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > --
>>>>>>>>> > > > >>> > Luciano Resende
>>>>>>>>> > > > >>> > http://twitter.com/lresende1975
>>>>>>>>> > > > >>> > http://lresende.blogspot.com/
>>>>>>>>> > > > >>> >
>>>>>>>>> > > >
>>>>>>>>> > >
>>>>>>>>> > >
>>>>>>>>> > >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Dr. Adina Crainiceanu
>>>>>>>>> > Associate Professor, Computer Science Department
>>>>>>>>> > United States Naval Academy
>>>>>>>>> > 410-293-6822
>>>>>>>>> > adina@usna.edu
>>>>>>>>> > http://www.usna.edu/Users/cs/adina/
>>>>>>>>> >
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Re: GSoc 2017

Posted by Krishna Kalyan <kr...@gmail.com>.
Hello All,
I have updated the proposal. I hope this one is better. Please share your
feedback.

https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALG
jLH2DrIfRsJksA/edit#

FYI : Student Application Deadline April 3 16:00 UTC.


Regards,
Krishna

On Sun, Apr 2, 2017 at 2:39 PM, Krishna Kalyan <kr...@gmail.com>
wrote:

> Hello Nakul,
> My comments in *Italics* below.
>
> On Sat, Apr 1, 2017 at 11:27 PM, Nakul Jindal <na...@gmail.com> wrote:
>
>> Hi Krishna,
>>
>> Here are some questions/remarks i have about parts of your proposal:
>>
>> In the section titled Summary -
>>
>> "The systematic evaluation of performance can be measured with
>> performance tests and micro-benchmarks"
>> We currently do not have any micro benchmarks. Do you plan on adding any?
>> (It would be awesome, but remember to keep the number of tasks reasonable
>> given the time frame and your familiarity with the project)
>>
> *- Removed micro bench marks from the proposal. *
>
>>
>> Your summary section feels like its generally applicable for performance
>> testing on any project, which is good. However, when it comes to talking
>> about what you'd actually be doing, I see - " build a benchmark
>> infrastructure and conduct experiments, that compare different choices in
>> critical parts (sparsity thresholds, optimisation decisions, etc..)".
>>
> *-  I agree and have made these changes.*
>
> Going over each point:
>>
>> 1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty
>> much all the tasks involved
>> 2. "conduct experiments" - sure, although I think you mean testing your
>> benchmarking infrastructure, please correct me if this is not what you meant
>>
>>
> 3. "that compare different choices in critical parts"
>> a. "sparsity thresholds" - awesome. You'd need to figure out what
>> SystemML already does and what to add.
>> b. "optimization decisions" - could you provide an example or two of
>> what exactly you mean by this. Do you mean to enable and/or disable certain
>> optimizations and run the perf suite and also automate the process? or
>> something else?
>> c. "etc" - more detail would be nice here. It would be nice to know what
>> exactly you are committing to.
>> *- will add more details in this section *
>>
>> In the section titled Deliverables -
>>
>> You mention
>> - "automation for all performance tests" - awesome! this is the primary
>> task
>> - "automatic scripts to test performance on a cloud provider" - this is
>> great
>> - "web dashboard" - awesome! this is a nice-to-have
>>
>> But before the "cloud provider" and "web dashboard" task, we'd like to
>> robustly check for errors and record performance numbers and generate
>> reports. (Tasks 2 - 6 on https://issues.apache.org/j
>> ira/browse/SYSTEMML-1451). I see that you've mentioned some of these
>> tasks in you "Project milestones" section as "Understand metrics to be
>> captured like time, memory, errors". It'd be good to put them here as well.
>>
> *- Will add this information under Deliverables*
>
>>
>> Remember, you might also need to change the way SystemML reports errors
>> and performance numbers to complete your tasks. You, along with the
>> currently active members of SystemML might need to change the algorithms
>> being tested as well.
>>
> *- Sure will keep this in mind and will account for this in proposal. *
>
>>
>> In the section titled "Project Milestones" -
>> Your project timeline looks good, the initial set of things to before May
>> 30 and the fact that you've set aside the final week for buffer. You have
>> dug down into a week by week schedule, which is good. I have some
>> suggestion though:
>>
>> You need to
>> T1. Understand what is happening now, try it out for yourself
>>
> *- Yes, I am following the documentation to simulate benchmarks on my
> local system. *
>
> T2. You need to automate this process
>> T3. You need to test that this automated process works as expected (and
>> make it robust)
>> T4. You need to add additional capabilities (like micro-benchmarks and/or
>> parameterizing the tests and/or running it with sparse and dense sets)
>>
> *- I will account for T3 and T4 more explicitly in my proposal.*
>
>
>> For each of the tasks that you mention in your deliverables, could you
>> please think about how you'd spend each week doing either T1-3 for a
>> deliverable that is now being done manually and T4 for one that is not
>> being done at all right now?
>> Please revisit some of the tasks on your timeline with this in mind.
>>
>> I'd also ask that you set some deliverable(s) for phase 1 (due on June
>> 26), phase 2 (due on July 26) and the final phase (ends on Aug 29).
>>
>> A suggestion for the deliverables, if you wanted to be really ambitious
>> and complete every task possible :
>> Phase 1 - implement infrastructure to launch perf suite and to detect
>> errors & report performance numbers in a plain text file
>> Phase 2 - implement scripts to compare performance against older versions
>> of SystemML and other packages (Spark MLLib) and implement mechanism to
>> generate report(s) with errors and performance information in a spreadsheet
>> or pdf or on a web interface
>> Phase 3 - add additional perf tests for more algorithms, different
>> sparsity thresholds and optimization levels and include them in the
>> reports. Also implement and test scripts to run the perf suite on a cloud
>> provider; doing this through a web UI.
>>
>> Something very conservative could be do
>> Phase 1 - automate perf suite and report perf numbers
>> Phase 2 - make error reporting and handling robust, compare against
>> previous versions of systemml
>> Phase 3 - add additional algorithms to the test suite,
>>
> *- I would prefer taking the conservative approach here.*
>
>>
>> These are just a suggestions, tweak it as you see fit.
>> Having a deliverable attached to the end of a phase is a good thing.
>>
>> Hope I am not being too critical and hopefully this helps
>>
> *- Not at all,  appreciate your feedback detailed reply. *
>
> *- Could you also let me know the co-mentors for this project?. I am
> working on the proposal and will share an updated version soon.*
>
>
>> -Nakul
>>
>>
>>
>>
>> On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <krishnakalyan3@gmail.com
>> > wrote:
>>
>>> Hello All,
>>> Based on "SYSTEMML-1451" and  relevant SystemML source code, I have
>>> updated the draft proposal. Please have a look and share your valuable
>>> feedback.
>>>
>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>
>>> Regards,
>>> Krishna
>>>
>>> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <
>>> krishnakalyan3@gmail.com> wrote:
>>>
>>>> Hello All,
>>>> I have created a proposal for
>>>>
>>>> d) Perftest : automated performance tests of algorithms
>>>> (I am most comfortable with bash scripting and Python)
>>>>
>>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>>>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>>
>>>> Please share your feedback on the proposal. If someone from the
>>>> community could mentor, it would be great.
>>>>
>>>> Regards,
>>>> Krishna
>>>>
>>>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <
>>>> krishnakalyan3@gmail.com> wrote:
>>>>
>>>>> Thanks Nakul,
>>>>> Replied to the JIRA thread.
>>>>>
>>>>> Cheers,
>>>>> Krishna
>>>>>
>>>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <na...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Krishna,
>>>>>>
>>>>>> We have 2 proposals up :
>>>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=p
>>>>>> roject%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20OR
>>>>>> DER%20BY%20created%20DESC
>>>>>>
>>>>>> Would you be interested in any of these?
>>>>>> If you are specifically interested in the Python DSL project, we can
>>>>>> look for more volunteers or I could just volunteer to mentor it.
>>>>>>
>>>>>> -Nakul
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <na...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Krishna,
>>>>>>>
>>>>>>> We are working on putting together some proposals. I created is for
>>>>>>> a GPU based project.
>>>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>>>>>> Be on the lookout for more.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Nakul
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <
>>>>>>> krishnakalyan3@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello Adina and Arvind thanks you for your reply,
>>>>>>>> I am open to writing a proposal with a mentor and would appreciate
>>>>>>>> if we
>>>>>>>> could take action quickly on this.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Krishna
>>>>>>>>
>>>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > Apache Software Foundation applied and was accepted for GSOC. I
>>>>>>>> believe
>>>>>>>> > SystemML could still participate as part of ASF if interested
>>>>>>>> (record your
>>>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this
>>>>>>>> subject on
>>>>>>>> > the community.apache.org mailing list from Ulrich Stark.
>>>>>>>> > The following page also has useful info, even if it is not
>>>>>>>> updated for this
>>>>>>>> > year: http://community.apache.org/gsoc.html - mentors need to
>>>>>>>> register
>>>>>>>> > very
>>>>>>>> > soon.
>>>>>>>> >
>>>>>>>> > Best regards,
>>>>>>>> > Adina
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve
>>>>>>>> <ac...@yahoo.com.invalid>
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> > > Thanks Krishna for your interest.
>>>>>>>> > > Unfortunately we could not submit topic to GSoc on time.However
>>>>>>>> please
>>>>>>>> > > feel free to leverage SystemML for your use cases and do
>>>>>>>> possible
>>>>>>>> > > contribution to SystemML.
>>>>>>>> > > Please let us know if you have any question.
>>>>>>>> > >
>>>>>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>>>>>> > >
>>>>>>>> > >       From: Krishna Kalyan <kr...@gmail.com>
>>>>>>>> > >  To: dev@systemml.incubator.apache.org
>>>>>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>>>>>> > >  Subject: Re: GSoc 2017
>>>>>>>> > >
>>>>>>>> > > Hello All,
>>>>>>>> > > A Gentle ping. Student applications open in a couple of days. I
>>>>>>>> like to
>>>>>>>> > > work on 'Support for Python DSLs'.
>>>>>>>> > > However for now I am not sure on how to proceed.
>>>>>>>> > >
>>>>>>>> > > Thank you,
>>>>>>>> > > Krishna
>>>>>>>> > >
>>>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > >
>>>>>>>> > > > Yeah helping to build out our Python DSL into a full-out
>>>>>>>> replacement
>>>>>>>> > for
>>>>>>>> > > > the current "DML" language would be great, and we'd be quite
>>>>>>>> > supportive!
>>>>>>>> > > >
>>>>>>>> > > > -Mike
>>>>>>>> > > >
>>>>>>>> > > > --
>>>>>>>> > > >
>>>>>>>> > > > Mike Dusenberry
>>>>>>>> > > > GitHub: github.com/dusenberrymw
>>>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>>> > > >
>>>>>>>> > > > Sent from my iPhone.
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
>>>>>>>> > > > >
>>>>>>>> > > > > Hi Krishna,
>>>>>>>> > > > >
>>>>>>>> > > > > cool to see that you're interested in SystemML!
>>>>>>>> > > > >
>>>>>>>> > > > > From your list I personally think that a) and d) would be
>>>>>>>> well suited
>>>>>>>> > > > for projects, especially a good python DSL is a high priority.
>>>>>>>> > > > >
>>>>>>>> > > > > We will apply as an organization to GSoC once organization
>>>>>>>> > applications
>>>>>>>> > > > are open (Jan. 19th) and I think we will find mentors for at
>>>>>>>> least a)
>>>>>>>> > and
>>>>>>>> > > > d). If you already want to take a look at what is currently
>>>>>>>> there, I
>>>>>>>> > > > suggest to look at our python APIs and documentation. If you
>>>>>>>> want to
>>>>>>>> > take
>>>>>>>> > > > on the DSL project it might also be a good idea to look into
>>>>>>>> the DML
>>>>>>>> > > > documentation and related papers to see what we need to
>>>>>>>> support.
>>>>>>>> > > > >
>>>>>>>> > > > > The proposals will probably circulate on the mailinglist,
>>>>>>>> too, so
>>>>>>>> > keep
>>>>>>>> > > > an eye on that :)
>>>>>>>> > > > >
>>>>>>>> > > > > -Felix
>>>>>>>> > > > >
>>>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>>>>>>> > > > >> Hello All,
>>>>>>>> > > > >> Thank you for your wonderful replies.
>>>>>>>> > > > >> Tasks that I am interested in:
>>>>>>>> > > > >> a) Support for Python DSLs
>>>>>>>> > > > >> b) Python wrappers for all existing algorithms
>>>>>>>> > > > >> c) GPU support
>>>>>>>> > > > >> d) Perftest : automated performance tests of algorithms
>>>>>>>> > > > >> I am also willing to work on the tasks that SystemML
>>>>>>>> community think
>>>>>>>> > > are
>>>>>>>> > > > >> important.
>>>>>>>> > > > >> Regards,
>>>>>>>> > > > >> Krishna
>>>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
>>>>>>>> > > > dusenberrymw@gmail.com>
>>>>>>>> > > > >> wrote:
>>>>>>>> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
>>>>>>>> > > > >>> We would definitely be excited to collaborate with you on
>>>>>>>> a GSOC
>>>>>>>> > > > project.
>>>>>>>> > > > >>> We've started another thread to discuss possible new
>>>>>>>> proposals, and
>>>>>>>> > > we
>>>>>>>> > > > >>> would also be quite interested in any particular proposal
>>>>>>>> that you
>>>>>>>> > > > might
>>>>>>>> > > > >>> like to generate tailored towards your interests.  Copied
>>>>>>>> from the
>>>>>>>> > > > other
>>>>>>>> > > > >>> thread, some possible ideas could include: building out a
>>>>>>>> full ML
>>>>>>>> > > demo
>>>>>>>> > > > to
>>>>>>>> > > > >>> solve a real, large-scale problem that would benefit from
>>>>>>>> a
>>>>>>>> > > distributed
>>>>>>>> > > > >>> approach; overall performance improvements that address a
>>>>>>>> full
>>>>>>>> > class,
>>>>>>>> > > > or
>>>>>>>> > > > >>> wider area, of ML algorithms, rather than a single,
>>>>>>>> specific
>>>>>>>> > script;
>>>>>>>> > > > >>> infrastructure for [performance] testing, and
>>>>>>>> identification of
>>>>>>>> > wide
>>>>>>>> > > > areas
>>>>>>>> > > > >>> of improvement; helping with building out fully-featured,
>>>>>>>> clean,
>>>>>>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it
>>>>>>>> would be
>>>>>>>> > > > good to
>>>>>>>> > > > >>> continue stressing them -- we could even aim to replace
>>>>>>>> DML with
>>>>>>>> > the
>>>>>>>> > > > DSLs);
>>>>>>>> > > > >>> etc.  Overall, we want to improve the ability of the user
>>>>>>>> to work
>>>>>>>> > on
>>>>>>>> > > a
>>>>>>>> > > > wide
>>>>>>>> > > > >>> range of large-scale, distributed ML problems in a simple
>>>>>>>> and easy
>>>>>>>> > > > manner
>>>>>>>> > > > >>> on top of Spark.
>>>>>>>> > > > >>> In the meantime, you could explore our recent open issues
>>>>>>>> [1] and
>>>>>>>> > > even
>>>>>>>> > > > >>> begin discussions or contributions on any of the items.
>>>>>>>> You could
>>>>>>>> > > also
>>>>>>>> > > > >>> view our recent roadmap discussion thread on the mailing
>>>>>>>> list,
>>>>>>>> > > starting
>>>>>>>> > > > >>> with the first email [2]:
>>>>>>>> > > > >>> [1]:
>>>>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>>>>> > > > 20SYSTEMML%20AND%
>>>>>>>> > > > >>> 20resolution%20%3D%20Unresolve
>>>>>>>> d%20ORDER%20BY%20updated%20DESC%2C%
>>>>>>>> > > > >>> 20priority%20DESC
>>>>>>>> > > > >>> [2]:
>>>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>>>>>> > > > >>> bad74059930d@gmail.com%3E
>>>>>>>> > > > >>> - Mike
>>>>>>>> > > > >>> --
>>>>>>>> > > > >>> Michael W. Dusenberry
>>>>>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
>>>>>>>> > > luckbr1975@gmail.com
>>>>>>>> > > > >
>>>>>>>> > > > >>> wrote:
>>>>>>>> > > > >>> > As some folks have described on this thread, it would
>>>>>>>> be great to
>>>>>>>> > > > get you
>>>>>>>> > > > >>> > familiarized with SystemML.
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > In parallel, I would look for a mentor from the active
>>>>>>>> committer
>>>>>>>> > > > list and
>>>>>>>> > > > >>> > start working on a project proposal which could be
>>>>>>>> based on the
>>>>>>>> > > > recent
>>>>>>>> > > > >>> > Roadmap discussion [1].
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > If you are looking for some guidance on how Apache
>>>>>>>> participate on
>>>>>>>> > > > GSOC,
>>>>>>>> > > > >>> > take a look at the following resources [2] and [3], and
>>>>>>>> don't
>>>>>>>> > > > hesitate to
>>>>>>>> > > > >>> > ask questions here.
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > [1]
>>>>>>>> > > > >>> > https://www.mail-archive.com/d
>>>>>>>> ev@systemml.incubator.apache.o
>>>>>>>> > > > >>> > rg/msg01199.html
>>>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>>>>>> > > > >>> > [3]
>>>>>>>> > > > >>> > http://www.slideshare.net/luck
>>>>>>>> br1975/how-mentoring-can-help-
>>>>>>>> > > > >>> > you-start-contributing-to-open-source
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
>>>>>>>> > > > krishnakalyan3@gmail.com
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > wrote:
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > > Hello Developers,
>>>>>>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in
>>>>>>>> (MSc. in
>>>>>>>> > > Data
>>>>>>>> > > > >>> > Mining)
>>>>>>>> > > > >>> > > currently in Barcelona studying at Université
>>>>>>>> Polytechnique de
>>>>>>>> > > > >>> Catalogne.
>>>>>>>> > > > >>> > > I was interested in contributing to SystemML this
>>>>>>>> year under
>>>>>>>> > GSoc
>>>>>>>> > > > >>> > program.
>>>>>>>> > > > >>> > > Could anyone please guide on how to go about it?. (I
>>>>>>>> understand
>>>>>>>> > > > the I
>>>>>>>> > > > >>> > need
>>>>>>>> > > > >>> > > to write a proposal)
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> > > Related Experience:
>>>>>>>> > > > >>> > > My masters is mostly focussed on data mining
>>>>>>>> techniques. Before
>>>>>>>> > > my
>>>>>>>> > > > >>> > masters,
>>>>>>>> > > > >>> > > I was a  data engineer with IBM (India). I was
>>>>>>>> responsible for
>>>>>>>> > > > managing
>>>>>>>> > > > >>> > 50
>>>>>>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my
>>>>>>>> time was
>>>>>>>> > > spent
>>>>>>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> > > I am the most comfortable with Python followed by R
>>>>>>>> and Scala.
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> > > My Webpage
>>>>>>>> > > > >>> > > kkalyan.in
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> > > My Spark Pull Requests
>>>>>>>> > > > >>> > > https://github.com/apache/spar
>>>>>>>> k/pulls?utf8=%E2%9C%93&q=
>>>>>>>> > > > >>> is%3Apr%20author%
>>>>>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> > > Thank you so much,
>>>>>>>> > > > >>> > > Krishna
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > --
>>>>>>>> > > > >>> > Luciano Resende
>>>>>>>> > > > >>> > http://twitter.com/lresende1975
>>>>>>>> > > > >>> > http://lresende.blogspot.com/
>>>>>>>> > > > >>> >
>>>>>>>> > > >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Dr. Adina Crainiceanu
>>>>>>>> > Associate Professor, Computer Science Department
>>>>>>>> > United States Naval Academy
>>>>>>>> > 410-293-6822
>>>>>>>> > adina@usna.edu
>>>>>>>> > http://www.usna.edu/Users/cs/adina/
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: GSoc 2017

Posted by Krishna Kalyan <kr...@gmail.com>.
Hello Nakul,
My comments in *Italics* below.

On Sat, Apr 1, 2017 at 11:27 PM, Nakul Jindal <na...@gmail.com> wrote:

> Hi Krishna,
>
> Here are some questions/remarks i have about parts of your proposal:
>
> In the section titled Summary -
>
> "The systematic evaluation of performance can be measured with
> performance tests and micro-benchmarks"
> We currently do not have any micro benchmarks. Do you plan on adding any?
> (It would be awesome, but remember to keep the number of tasks reasonable
> given the time frame and your familiarity with the project)
>
*- Removed micro bench marks from the proposal. *

>
> Your summary section feels like its generally applicable for performance
> testing on any project, which is good. However, when it comes to talking
> about what you'd actually be doing, I see - " build a benchmark
> infrastructure and conduct experiments, that compare different choices in
> critical parts (sparsity thresholds, optimisation decisions, etc..)".
>
*-  I agree and have made these changes.*

Going over each point:
>
> 1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty
> much all the tasks involved
> 2. "conduct experiments" - sure, although I think you mean testing your
> benchmarking infrastructure, please correct me if this is not what you meant
>
>
3. "that compare different choices in critical parts"
> a. "sparsity thresholds" - awesome. You'd need to figure out what SystemML
> already does and what to add.
> b. "optimization decisions" - could you provide an example or two of what
> exactly you mean by this. Do you mean to enable and/or disable certain
> optimizations and run the perf suite and also automate the process? or
> something else?
> c. "etc" - more detail would be nice here. It would be nice to know what
> exactly you are committing to.
> *- will add more details in this section *
>
> In the section titled Deliverables -
>
> You mention
> - "automation for all performance tests" - awesome! this is the primary
> task
> - "automatic scripts to test performance on a cloud provider" - this is
> great
> - "web dashboard" - awesome! this is a nice-to-have
>
> But before the "cloud provider" and "web dashboard" task, we'd like to
> robustly check for errors and record performance numbers and generate
> reports. (Tasks 2 - 6 on https://issues.apache.org/j
> ira/browse/SYSTEMML-1451). I see that you've mentioned some of these
> tasks in you "Project milestones" section as "Understand metrics to be
> captured like time, memory, errors". It'd be good to put them here as well.
>
*- Will add this information under Deliverables*

>
> Remember, you might also need to change the way SystemML reports errors
> and performance numbers to complete your tasks. You, along with the
> currently active members of SystemML might need to change the algorithms
> being tested as well.
>
*- Sure will keep this in mind and will account for this in proposal. *

>
> In the section titled "Project Milestones" -
> Your project timeline looks good, the initial set of things to before May
> 30 and the fact that you've set aside the final week for buffer. You have
> dug down into a week by week schedule, which is good. I have some
> suggestion though:
>
> You need to
> T1. Understand what is happening now, try it out for yourself
>
*- Yes, I am following the documentation to simulate benchmarks on my local
system. *

T2. You need to automate this process
> T3. You need to test that this automated process works as expected (and
> make it robust)
> T4. You need to add additional capabilities (like micro-benchmarks and/or
> parameterizing the tests and/or running it with sparse and dense sets)
>
*- I will account for T3 and T4 more explicitly in my proposal.*


> For each of the tasks that you mention in your deliverables, could you
> please think about how you'd spend each week doing either T1-3 for a
> deliverable that is now being done manually and T4 for one that is not
> being done at all right now?
> Please revisit some of the tasks on your timeline with this in mind.
>
> I'd also ask that you set some deliverable(s) for phase 1 (due on June
> 26), phase 2 (due on July 26) and the final phase (ends on Aug 29).
>
> A suggestion for the deliverables, if you wanted to be really ambitious
> and complete every task possible :
> Phase 1 - implement infrastructure to launch perf suite and to detect
> errors & report performance numbers in a plain text file
> Phase 2 - implement scripts to compare performance against older versions
> of SystemML and other packages (Spark MLLib) and implement mechanism to
> generate report(s) with errors and performance information in a spreadsheet
> or pdf or on a web interface
> Phase 3 - add additional perf tests for more algorithms, different
> sparsity thresholds and optimization levels and include them in the
> reports. Also implement and test scripts to run the perf suite on a cloud
> provider; doing this through a web UI.
>
> Something very conservative could be do
> Phase 1 - automate perf suite and report perf numbers
> Phase 2 - make error reporting and handling robust, compare against
> previous versions of systemml
> Phase 3 - add additional algorithms to the test suite,
>
*- I would prefer taking the conservative approach here.*

>
> These are just a suggestions, tweak it as you see fit.
> Having a deliverable attached to the end of a phase is a good thing.
>
> Hope I am not being too critical and hopefully this helps
>
*- Not at all,  appreciate your feedback detailed reply. *

*- Could you also let me know the co-mentors for this project?. I am
working on the proposal and will share an updated version soon.*


> -Nakul
>
>
>
>
> On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <kr...@gmail.com>
> wrote:
>
>> Hello All,
>> Based on "SYSTEMML-1451" and  relevant SystemML source code, I have
>> updated the draft proposal. Please have a look and share your valuable
>> feedback.
>>
>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>
>> Regards,
>> Krishna
>>
>> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <krishnakalyan3@gmail.com
>> > wrote:
>>
>>> Hello All,
>>> I have created a proposal for
>>>
>>> d) Perftest : automated performance tests of algorithms
>>> (I am most comfortable with bash scripting and Python)
>>>
>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>
>>> Please share your feedback on the proposal. If someone from the
>>> community could mentor, it would be great.
>>>
>>> Regards,
>>> Krishna
>>>
>>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <
>>> krishnakalyan3@gmail.com> wrote:
>>>
>>>> Thanks Nakul,
>>>> Replied to the JIRA thread.
>>>>
>>>> Cheers,
>>>> Krishna
>>>>
>>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <na...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Krishna,
>>>>>
>>>>> We have 2 proposals up :
>>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=p
>>>>> roject%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20OR
>>>>> DER%20BY%20created%20DESC
>>>>>
>>>>> Would you be interested in any of these?
>>>>> If you are specifically interested in the Python DSL project, we can
>>>>> look for more volunteers or I could just volunteer to mentor it.
>>>>>
>>>>> -Nakul
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <na...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Krishna,
>>>>>>
>>>>>> We are working on putting together some proposals. I created is for a
>>>>>> GPU based project.
>>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>>>>> Be on the lookout for more.
>>>>>>
>>>>>> Thanks,
>>>>>> Nakul
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <
>>>>>> krishnakalyan3@gmail.com> wrote:
>>>>>>
>>>>>>> Hello Adina and Arvind thanks you for your reply,
>>>>>>> I am open to writing a proposal with a mentor and would appreciate
>>>>>>> if we
>>>>>>> could take action quickly on this.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Krishna
>>>>>>>
>>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Apache Software Foundation applied and was accepted for GSOC. I
>>>>>>> believe
>>>>>>> > SystemML could still participate as part of ASF if interested
>>>>>>> (record your
>>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this
>>>>>>> subject on
>>>>>>> > the community.apache.org mailing list from Ulrich Stark.
>>>>>>> > The following page also has useful info, even if it is not updated
>>>>>>> for this
>>>>>>> > year: http://community.apache.org/gsoc.html - mentors need to
>>>>>>> register
>>>>>>> > very
>>>>>>> > soon.
>>>>>>> >
>>>>>>> > Best regards,
>>>>>>> > Adina
>>>>>>> >
>>>>>>> >
>>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve
>>>>>>> <ac...@yahoo.com.invalid>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> > > Thanks Krishna for your interest.
>>>>>>> > > Unfortunately we could not submit topic to GSoc on time.However
>>>>>>> please
>>>>>>> > > feel free to leverage SystemML for your use cases and do possible
>>>>>>> > > contribution to SystemML.
>>>>>>> > > Please let us know if you have any question.
>>>>>>> > >
>>>>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>>>>> > >
>>>>>>> > >       From: Krishna Kalyan <kr...@gmail.com>
>>>>>>> > >  To: dev@systemml.incubator.apache.org
>>>>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>>>>> > >  Subject: Re: GSoc 2017
>>>>>>> > >
>>>>>>> > > Hello All,
>>>>>>> > > A Gentle ping. Student applications open in a couple of days. I
>>>>>>> like to
>>>>>>> > > work on 'Support for Python DSLs'.
>>>>>>> > > However for now I am not sure on how to proceed.
>>>>>>> > >
>>>>>>> > > Thank you,
>>>>>>> > > Krishna
>>>>>>> > >
>>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:
>>>>>>> > >
>>>>>>> > > > Yeah helping to build out our Python DSL into a full-out
>>>>>>> replacement
>>>>>>> > for
>>>>>>> > > > the current "DML" language would be great, and we'd be quite
>>>>>>> > supportive!
>>>>>>> > > >
>>>>>>> > > > -Mike
>>>>>>> > > >
>>>>>>> > > > --
>>>>>>> > > >
>>>>>>> > > > Mike Dusenberry
>>>>>>> > > > GitHub: github.com/dusenberrymw
>>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>> > > >
>>>>>>> > > > Sent from my iPhone.
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
>>>>>>> > > > >
>>>>>>> > > > > Hi Krishna,
>>>>>>> > > > >
>>>>>>> > > > > cool to see that you're interested in SystemML!
>>>>>>> > > > >
>>>>>>> > > > > From your list I personally think that a) and d) would be
>>>>>>> well suited
>>>>>>> > > > for projects, especially a good python DSL is a high priority.
>>>>>>> > > > >
>>>>>>> > > > > We will apply as an organization to GSoC once organization
>>>>>>> > applications
>>>>>>> > > > are open (Jan. 19th) and I think we will find mentors for at
>>>>>>> least a)
>>>>>>> > and
>>>>>>> > > > d). If you already want to take a look at what is currently
>>>>>>> there, I
>>>>>>> > > > suggest to look at our python APIs and documentation. If you
>>>>>>> want to
>>>>>>> > take
>>>>>>> > > > on the DSL project it might also be a good idea to look into
>>>>>>> the DML
>>>>>>> > > > documentation and related papers to see what we need to
>>>>>>> support.
>>>>>>> > > > >
>>>>>>> > > > > The proposals will probably circulate on the mailinglist,
>>>>>>> too, so
>>>>>>> > keep
>>>>>>> > > > an eye on that :)
>>>>>>> > > > >
>>>>>>> > > > > -Felix
>>>>>>> > > > >
>>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>>>>>> > > > >> Hello All,
>>>>>>> > > > >> Thank you for your wonderful replies.
>>>>>>> > > > >> Tasks that I am interested in:
>>>>>>> > > > >> a) Support for Python DSLs
>>>>>>> > > > >> b) Python wrappers for all existing algorithms
>>>>>>> > > > >> c) GPU support
>>>>>>> > > > >> d) Perftest : automated performance tests of algorithms
>>>>>>> > > > >> I am also willing to work on the tasks that SystemML
>>>>>>> community think
>>>>>>> > > are
>>>>>>> > > > >> important.
>>>>>>> > > > >> Regards,
>>>>>>> > > > >> Krishna
>>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
>>>>>>> > > > dusenberrymw@gmail.com>
>>>>>>> > > > >> wrote:
>>>>>>> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
>>>>>>> > > > >>> We would definitely be excited to collaborate with you on
>>>>>>> a GSOC
>>>>>>> > > > project.
>>>>>>> > > > >>> We've started another thread to discuss possible new
>>>>>>> proposals, and
>>>>>>> > > we
>>>>>>> > > > >>> would also be quite interested in any particular proposal
>>>>>>> that you
>>>>>>> > > > might
>>>>>>> > > > >>> like to generate tailored towards your interests.  Copied
>>>>>>> from the
>>>>>>> > > > other
>>>>>>> > > > >>> thread, some possible ideas could include: building out a
>>>>>>> full ML
>>>>>>> > > demo
>>>>>>> > > > to
>>>>>>> > > > >>> solve a real, large-scale problem that would benefit from a
>>>>>>> > > distributed
>>>>>>> > > > >>> approach; overall performance improvements that address a
>>>>>>> full
>>>>>>> > class,
>>>>>>> > > > or
>>>>>>> > > > >>> wider area, of ML algorithms, rather than a single,
>>>>>>> specific
>>>>>>> > script;
>>>>>>> > > > >>> infrastructure for [performance] testing, and
>>>>>>> identification of
>>>>>>> > wide
>>>>>>> > > > areas
>>>>>>> > > > >>> of improvement; helping with building out fully-featured,
>>>>>>> clean,
>>>>>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it
>>>>>>> would be
>>>>>>> > > > good to
>>>>>>> > > > >>> continue stressing them -- we could even aim to replace
>>>>>>> DML with
>>>>>>> > the
>>>>>>> > > > DSLs);
>>>>>>> > > > >>> etc.  Overall, we want to improve the ability of the user
>>>>>>> to work
>>>>>>> > on
>>>>>>> > > a
>>>>>>> > > > wide
>>>>>>> > > > >>> range of large-scale, distributed ML problems in a simple
>>>>>>> and easy
>>>>>>> > > > manner
>>>>>>> > > > >>> on top of Spark.
>>>>>>> > > > >>> In the meantime, you could explore our recent open issues
>>>>>>> [1] and
>>>>>>> > > even
>>>>>>> > > > >>> begin discussions or contributions on any of the items.
>>>>>>> You could
>>>>>>> > > also
>>>>>>> > > > >>> view our recent roadmap discussion thread on the mailing
>>>>>>> list,
>>>>>>> > > starting
>>>>>>> > > > >>> with the first email [2]:
>>>>>>> > > > >>> [1]:
>>>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>>>> > > > 20SYSTEMML%20AND%
>>>>>>> > > > >>> 20resolution%20%3D%20Unresolve
>>>>>>> d%20ORDER%20BY%20updated%20DESC%2C%
>>>>>>> > > > >>> 20priority%20DESC
>>>>>>> > > > >>> [2]:
>>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>>>>> > > > >>> bad74059930d@gmail.com%3E
>>>>>>> > > > >>> - Mike
>>>>>>> > > > >>> --
>>>>>>> > > > >>> Michael W. Dusenberry
>>>>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
>>>>>>> > > luckbr1975@gmail.com
>>>>>>> > > > >
>>>>>>> > > > >>> wrote:
>>>>>>> > > > >>> > As some folks have described on this thread, it would be
>>>>>>> great to
>>>>>>> > > > get you
>>>>>>> > > > >>> > familiarized with SystemML.
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > In parallel, I would look for a mentor from the active
>>>>>>> committer
>>>>>>> > > > list and
>>>>>>> > > > >>> > start working on a project proposal which could be based
>>>>>>> on the
>>>>>>> > > > recent
>>>>>>> > > > >>> > Roadmap discussion [1].
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > If you are looking for some guidance on how Apache
>>>>>>> participate on
>>>>>>> > > > GSOC,
>>>>>>> > > > >>> > take a look at the following resources [2] and [3], and
>>>>>>> don't
>>>>>>> > > > hesitate to
>>>>>>> > > > >>> > ask questions here.
>>>>>>> > > > >>> >
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > [1]
>>>>>>> > > > >>> > https://www.mail-archive.com/d
>>>>>>> ev@systemml.incubator.apache.o
>>>>>>> > > > >>> > rg/msg01199.html
>>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>>>>> > > > >>> > [3]
>>>>>>> > > > >>> > http://www.slideshare.net/luck
>>>>>>> br1975/how-mentoring-can-help-
>>>>>>> > > > >>> > you-start-contributing-to-open-source
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
>>>>>>> > > > krishnakalyan3@gmail.com
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > wrote:
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > > Hello Developers,
>>>>>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in
>>>>>>> (MSc. in
>>>>>>> > > Data
>>>>>>> > > > >>> > Mining)
>>>>>>> > > > >>> > > currently in Barcelona studying at Université
>>>>>>> Polytechnique de
>>>>>>> > > > >>> Catalogne.
>>>>>>> > > > >>> > > I was interested in contributing to SystemML this year
>>>>>>> under
>>>>>>> > GSoc
>>>>>>> > > > >>> > program.
>>>>>>> > > > >>> > > Could anyone please guide on how to go about it?. (I
>>>>>>> understand
>>>>>>> > > > the I
>>>>>>> > > > >>> > need
>>>>>>> > > > >>> > > to write a proposal)
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> > > Related Experience:
>>>>>>> > > > >>> > > My masters is mostly focussed on data mining
>>>>>>> techniques. Before
>>>>>>> > > my
>>>>>>> > > > >>> > masters,
>>>>>>> > > > >>> > > I was a  data engineer with IBM (India). I was
>>>>>>> responsible for
>>>>>>> > > > managing
>>>>>>> > > > >>> > 50
>>>>>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my
>>>>>>> time was
>>>>>>> > > spent
>>>>>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> > > I am the most comfortable with Python followed by R
>>>>>>> and Scala.
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> > > My Webpage
>>>>>>> > > > >>> > > kkalyan.in
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> > > My Spark Pull Requests
>>>>>>> > > > >>> > > https://github.com/apache/spar
>>>>>>> k/pulls?utf8=%E2%9C%93&q=
>>>>>>> > > > >>> is%3Apr%20author%
>>>>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> > > Thank you so much,
>>>>>>> > > > >>> > > Krishna
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> >
>>>>>>> > > > >>> >
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > --
>>>>>>> > > > >>> > Luciano Resende
>>>>>>> > > > >>> > http://twitter.com/lresende1975
>>>>>>> > > > >>> > http://lresende.blogspot.com/
>>>>>>> > > > >>> >
>>>>>>> > > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Dr. Adina Crainiceanu
>>>>>>> > Associate Professor, Computer Science Department
>>>>>>> > United States Naval Academy
>>>>>>> > 410-293-6822
>>>>>>> > adina@usna.edu
>>>>>>> > http://www.usna.edu/Users/cs/adina/
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: GSoc 2017

Posted by Nakul Jindal <na...@gmail.com>.
Hi Krishna,

Here are some questions/remarks i have about parts of your proposal:

In the section titled Summary -

"The systematic evaluation of performance can be measured with performance
tests and micro-benchmarks"
We currently do not have any micro benchmarks. Do you plan on adding any?
(It would be awesome, but remember to keep the number of tasks reasonable
given the time frame and your familiarity with the project)

Your summary section feels like its generally applicable for performance
testing on any project, which is good. However, when it comes to talking
about what you'd actually be doing, I see - " build a benchmark
infrastructure and conduct experiments, that compare different choices in
critical parts (sparsity thresholds, optimisation decisions, etc..)".
Going over each point:

1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty
much all the tasks involved
2. "conduct experiments" - sure, although I think you mean testing your
benchmarking infrastructure, please correct me if this is not what you meant
3. "that compare different choices in critical parts"
a. "sparsity thresholds" - awesome. You'd need to figure out what SystemML
already does and what to add.
b. "optimization decisions" - could you provide an example or two of what
exactly you mean by this. Do you mean to enable and/or disable certain
optimizations and run the perf suite and also automate the process? or
something else?
c. "etc" - more detail would be nice here. It would be nice to know what
exactly you are committing to.


In the section titled Deliverables -

You mention
- "automation for all performance tests" - awesome! this is the primary task
- "automatic scripts to test performance on a cloud provider" - this is
great
- "web dashboard" - awesome! this is a nice-to-have

But before the "cloud provider" and "web dashboard" task, we'd like to
robustly check for errors and record performance numbers and generate
reports. (Tasks 2 - 6 on https://issues.apache.org/jira/browse/SYSTEMML-1451).
I see that you've mentioned some of these tasks in you "Project milestones"
section as "Understand metrics to be captured like time, memory, errors".
It'd be good to put them here as well.

Remember, you might also need to change the way SystemML reports errors and
performance numbers to complete your tasks. You, along with the currently
active members of SystemML might need to change the algorithms being tested
as well.

In the section titled "Project Milestones" -
Your project timeline looks good, the initial set of things to before May
30 and the fact that you've set aside the final week for buffer. You have
dug down into a week by week schedule, which is good. I have some
suggestion though:

You need to
T1. Understand what is happening now, try it out for yourself
T2. You need to automate this process
T3. You need to test that this automated process works as expected (and
make it robust)
T4. You need to add additional capabilities (like micro-benchmarks and/or
parameterizing the tests and/or running it with sparse and dense sets)

For each of the tasks that you mention in your deliverables, could you
please think about how you'd spend each week doing either T1-3 for a
deliverable that is now being done manually and T4 for one that is not
being done at all right now?
Please revisit some of the tasks on your timeline with this in mind.

I'd also ask that you set some deliverable(s) for phase 1 (due on June 26),
phase 2 (due on July 26) and the final phase (ends on Aug 29).

A suggestion for the deliverables, if you wanted to be really ambitious and
complete every task possible :
Phase 1 - implement infrastructure to launch perf suite and to detect
errors & report performance numbers in a plain text file
Phase 2 - implement scripts to compare performance against older versions
of SystemML and other packages (Spark MLLib) and implement mechanism to
generate report(s) with errors and performance information in a spreadsheet
or pdf or on a web interface
Phase 3 - add additional perf tests for more algorithms, different sparsity
thresholds and optimization levels and include them in the reports. Also
implement and test scripts to run the perf suite on a cloud provider; doing
this through a web UI.

Something very conservative could be do
Phase 1 - automate perf suite and report perf numbers
Phase 2 - make error reporting and handling robust, compare against
previous versions of systemml
Phase 3 - add additional algorithms to the test suite,

These are just a suggestions, tweak it as you see fit.
Having a deliverable attached to the end of a phase is a good thing.

Hope I am not being too critical and hopefully this helps

-Nakul




On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <kr...@gmail.com>
wrote:

> Hello All,
> Based on "SYSTEMML-1451" and  relevant SystemML source code, I have
> updated the draft proposal. Please have a look and share your valuable
> feedback.
>
> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>
> Regards,
> Krishna
>
> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <kr...@gmail.com>
> wrote:
>
>> Hello All,
>> I have created a proposal for
>>
>> d) Perftest : automated performance tests of algorithms
>> (I am most comfortable with bash scripting and Python)
>>
>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>
>> Please share your feedback on the proposal. If someone from the community
>> could mentor, it would be great.
>>
>> Regards,
>> Krishna
>>
>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <krishnakalyan3@gmail.com
>> > wrote:
>>
>>> Thanks Nakul,
>>> Replied to the JIRA thread.
>>>
>>> Cheers,
>>> Krishna
>>>
>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <na...@gmail.com> wrote:
>>>
>>>> Hi Krishna,
>>>>
>>>> We have 2 proposals up :
>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=p
>>>> roject%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20OR
>>>> DER%20BY%20created%20DESC
>>>>
>>>> Would you be interested in any of these?
>>>> If you are specifically interested in the Python DSL project, we can
>>>> look for more volunteers or I could just volunteer to mentor it.
>>>>
>>>> -Nakul
>>>>
>>>>
>>>>
>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <na...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Krishna,
>>>>>
>>>>> We are working on putting together some proposals. I created is for a
>>>>> GPU based project.
>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>>>> Be on the lookout for more.
>>>>>
>>>>> Thanks,
>>>>> Nakul
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <
>>>>> krishnakalyan3@gmail.com> wrote:
>>>>>
>>>>>> Hello Adina and Arvind thanks you for your reply,
>>>>>> I am open to writing a proposal with a mentor and would appreciate if
>>>>>> we
>>>>>> could take action quickly on this.
>>>>>>
>>>>>> Best Regards,
>>>>>> Krishna
>>>>>>
>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu>
>>>>>> wrote:
>>>>>>
>>>>>> > Apache Software Foundation applied and was accepted for GSOC. I
>>>>>> believe
>>>>>> > SystemML could still participate as part of ASF if interested
>>>>>> (record your
>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this
>>>>>> subject on
>>>>>> > the community.apache.org mailing list from Ulrich Stark.
>>>>>> > The following page also has useful info, even if it is not updated
>>>>>> for this
>>>>>> > year: http://community.apache.org/gsoc.html - mentors need to
>>>>>> register
>>>>>> > very
>>>>>> > soon.
>>>>>> >
>>>>>> > Best regards,
>>>>>> > Adina
>>>>>> >
>>>>>> >
>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve
>>>>>> <ac...@yahoo.com.invalid>
>>>>>> > wrote:
>>>>>> >
>>>>>> > > Thanks Krishna for your interest.
>>>>>> > > Unfortunately we could not submit topic to GSoc on time.However
>>>>>> please
>>>>>> > > feel free to leverage SystemML for your use cases and do possible
>>>>>> > > contribution to SystemML.
>>>>>> > > Please let us know if you have any question.
>>>>>> > >
>>>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>>>> > >
>>>>>> > >       From: Krishna Kalyan <kr...@gmail.com>
>>>>>> > >  To: dev@systemml.incubator.apache.org
>>>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>>>> > >  Subject: Re: GSoc 2017
>>>>>> > >
>>>>>> > > Hello All,
>>>>>> > > A Gentle ping. Student applications open in a couple of days. I
>>>>>> like to
>>>>>> > > work on 'Support for Python DSLs'.
>>>>>> > > However for now I am not sure on how to proceed.
>>>>>> > >
>>>>>> > > Thank you,
>>>>>> > > Krishna
>>>>>> > >
>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:
>>>>>> > >
>>>>>> > > > Yeah helping to build out our Python DSL into a full-out
>>>>>> replacement
>>>>>> > for
>>>>>> > > > the current "DML" language would be great, and we'd be quite
>>>>>> > supportive!
>>>>>> > > >
>>>>>> > > > -Mike
>>>>>> > > >
>>>>>> > > > --
>>>>>> > > >
>>>>>> > > > Mike Dusenberry
>>>>>> > > > GitHub: github.com/dusenberrymw
>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>>>> > > >
>>>>>> > > > Sent from my iPhone.
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
>>>>>> > > > >
>>>>>> > > > > Hi Krishna,
>>>>>> > > > >
>>>>>> > > > > cool to see that you're interested in SystemML!
>>>>>> > > > >
>>>>>> > > > > From your list I personally think that a) and d) would be
>>>>>> well suited
>>>>>> > > > for projects, especially a good python DSL is a high priority.
>>>>>> > > > >
>>>>>> > > > > We will apply as an organization to GSoC once organization
>>>>>> > applications
>>>>>> > > > are open (Jan. 19th) and I think we will find mentors for at
>>>>>> least a)
>>>>>> > and
>>>>>> > > > d). If you already want to take a look at what is currently
>>>>>> there, I
>>>>>> > > > suggest to look at our python APIs and documentation. If you
>>>>>> want to
>>>>>> > take
>>>>>> > > > on the DSL project it might also be a good idea to look into
>>>>>> the DML
>>>>>> > > > documentation and related papers to see what we need to support.
>>>>>> > > > >
>>>>>> > > > > The proposals will probably circulate on the mailinglist,
>>>>>> too, so
>>>>>> > keep
>>>>>> > > > an eye on that :)
>>>>>> > > > >
>>>>>> > > > > -Felix
>>>>>> > > > >
>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>>>>> > > > >> Hello All,
>>>>>> > > > >> Thank you for your wonderful replies.
>>>>>> > > > >> Tasks that I am interested in:
>>>>>> > > > >> a) Support for Python DSLs
>>>>>> > > > >> b) Python wrappers for all existing algorithms
>>>>>> > > > >> c) GPU support
>>>>>> > > > >> d) Perftest : automated performance tests of algorithms
>>>>>> > > > >> I am also willing to work on the tasks that SystemML
>>>>>> community think
>>>>>> > > are
>>>>>> > > > >> important.
>>>>>> > > > >> Regards,
>>>>>> > > > >> Krishna
>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
>>>>>> > > > dusenberrymw@gmail.com>
>>>>>> > > > >> wrote:
>>>>>> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
>>>>>> > > > >>> We would definitely be excited to collaborate with you on a
>>>>>> GSOC
>>>>>> > > > project.
>>>>>> > > > >>> We've started another thread to discuss possible new
>>>>>> proposals, and
>>>>>> > > we
>>>>>> > > > >>> would also be quite interested in any particular proposal
>>>>>> that you
>>>>>> > > > might
>>>>>> > > > >>> like to generate tailored towards your interests.  Copied
>>>>>> from the
>>>>>> > > > other
>>>>>> > > > >>> thread, some possible ideas could include: building out a
>>>>>> full ML
>>>>>> > > demo
>>>>>> > > > to
>>>>>> > > > >>> solve a real, large-scale problem that would benefit from a
>>>>>> > > distributed
>>>>>> > > > >>> approach; overall performance improvements that address a
>>>>>> full
>>>>>> > class,
>>>>>> > > > or
>>>>>> > > > >>> wider area, of ML algorithms, rather than a single, specific
>>>>>> > script;
>>>>>> > > > >>> infrastructure for [performance] testing, and
>>>>>> identification of
>>>>>> > wide
>>>>>> > > > areas
>>>>>> > > > >>> of improvement; helping with building out fully-featured,
>>>>>> clean,
>>>>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it
>>>>>> would be
>>>>>> > > > good to
>>>>>> > > > >>> continue stressing them -- we could even aim to replace DML
>>>>>> with
>>>>>> > the
>>>>>> > > > DSLs);
>>>>>> > > > >>> etc.  Overall, we want to improve the ability of the user
>>>>>> to work
>>>>>> > on
>>>>>> > > a
>>>>>> > > > wide
>>>>>> > > > >>> range of large-scale, distributed ML problems in a simple
>>>>>> and easy
>>>>>> > > > manner
>>>>>> > > > >>> on top of Spark.
>>>>>> > > > >>> In the meantime, you could explore our recent open issues
>>>>>> [1] and
>>>>>> > > even
>>>>>> > > > >>> begin discussions or contributions on any of the items.
>>>>>> You could
>>>>>> > > also
>>>>>> > > > >>> view our recent roadmap discussion thread on the mailing
>>>>>> list,
>>>>>> > > starting
>>>>>> > > > >>> with the first email [2]:
>>>>>> > > > >>> [1]:
>>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>>> > > > 20SYSTEMML%20AND%
>>>>>> > > > >>> 20resolution%20%3D%20Unresolve
>>>>>> d%20ORDER%20BY%20updated%20DESC%2C%
>>>>>> > > > >>> 20priority%20DESC
>>>>>> > > > >>> [2]:
>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>>>> > > > >>> bad74059930d@gmail.com%3E
>>>>>> > > > >>> - Mike
>>>>>> > > > >>> --
>>>>>> > > > >>> Michael W. Dusenberry
>>>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
>>>>>> > > luckbr1975@gmail.com
>>>>>> > > > >
>>>>>> > > > >>> wrote:
>>>>>> > > > >>> > As some folks have described on this thread, it would be
>>>>>> great to
>>>>>> > > > get you
>>>>>> > > > >>> > familiarized with SystemML.
>>>>>> > > > >>> >
>>>>>> > > > >>> > In parallel, I would look for a mentor from the active
>>>>>> committer
>>>>>> > > > list and
>>>>>> > > > >>> > start working on a project proposal which could be based
>>>>>> on the
>>>>>> > > > recent
>>>>>> > > > >>> > Roadmap discussion [1].
>>>>>> > > > >>> >
>>>>>> > > > >>> > If you are looking for some guidance on how Apache
>>>>>> participate on
>>>>>> > > > GSOC,
>>>>>> > > > >>> > take a look at the following resources [2] and [3], and
>>>>>> don't
>>>>>> > > > hesitate to
>>>>>> > > > >>> > ask questions here.
>>>>>> > > > >>> >
>>>>>> > > > >>> >
>>>>>> > > > >>> > [1]
>>>>>> > > > >>> > https://www.mail-archive.com/d
>>>>>> ev@systemml.incubator.apache.o
>>>>>> > > > >>> > rg/msg01199.html
>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>>>> > > > >>> > [3]
>>>>>> > > > >>> > http://www.slideshare.net/luck
>>>>>> br1975/how-mentoring-can-help-
>>>>>> > > > >>> > you-start-contributing-to-open-source
>>>>>> > > > >>> >
>>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
>>>>>> > > > krishnakalyan3@gmail.com
>>>>>> > > > >>> >
>>>>>> > > > >>> > wrote:
>>>>>> > > > >>> >
>>>>>> > > > >>> > > Hello Developers,
>>>>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in
>>>>>> (MSc. in
>>>>>> > > Data
>>>>>> > > > >>> > Mining)
>>>>>> > > > >>> > > currently in Barcelona studying at Université
>>>>>> Polytechnique de
>>>>>> > > > >>> Catalogne.
>>>>>> > > > >>> > > I was interested in contributing to SystemML this year
>>>>>> under
>>>>>> > GSoc
>>>>>> > > > >>> > program.
>>>>>> > > > >>> > > Could anyone please guide on how to go about it?. (I
>>>>>> understand
>>>>>> > > > the I
>>>>>> > > > >>> > need
>>>>>> > > > >>> > > to write a proposal)
>>>>>> > > > >>> > >
>>>>>> > > > >>> > > Related Experience:
>>>>>> > > > >>> > > My masters is mostly focussed on data mining
>>>>>> techniques. Before
>>>>>> > > my
>>>>>> > > > >>> > masters,
>>>>>> > > > >>> > > I was a  data engineer with IBM (India). I was
>>>>>> responsible for
>>>>>> > > > managing
>>>>>> > > > >>> > 50
>>>>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my
>>>>>> time was
>>>>>> > > spent
>>>>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
>>>>>> > > > >>> > >
>>>>>> > > > >>> > > I am the most comfortable with Python followed by R and
>>>>>> Scala.
>>>>>> > > > >>> > >
>>>>>> > > > >>> > > My Webpage
>>>>>> > > > >>> > > kkalyan.in
>>>>>> > > > >>> > >
>>>>>> > > > >>> > > My Spark Pull Requests
>>>>>> > > > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
>>>>>> > > > >>> is%3Apr%20author%
>>>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>>>> > > > >>> > >
>>>>>> > > > >>> > > Thank you so much,
>>>>>> > > > >>> > > Krishna
>>>>>> > > > >>> > >
>>>>>> > > > >>> >
>>>>>> > > > >>> >
>>>>>> > > > >>> >
>>>>>> > > > >>> > --
>>>>>> > > > >>> > Luciano Resende
>>>>>> > > > >>> > http://twitter.com/lresende1975
>>>>>> > > > >>> > http://lresende.blogspot.com/
>>>>>> > > > >>> >
>>>>>> > > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Dr. Adina Crainiceanu
>>>>>> > Associate Professor, Computer Science Department
>>>>>> > United States Naval Academy
>>>>>> > 410-293-6822
>>>>>> > adina@usna.edu
>>>>>> > http://www.usna.edu/Users/cs/adina/
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: GSoc 2017

Posted by Krishna Kalyan <kr...@gmail.com>.
Hello All,
Based on "SYSTEMML-1451" and  relevant SystemML source code, I have updated
the draft proposal. Please have a look and share your valuable feedback.

https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALG
jLH2DrIfRsJksA/edit?usp=sharing

Regards,
Krishna

On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <kr...@gmail.com>
wrote:

> Hello All,
> I have created a proposal for
>
> d) Perftest : automated performance tests of algorithms
> (I am most comfortable with bash scripting and Python)
>
> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>
> Please share your feedback on the proposal. If someone from the community
> could mentor, it would be great.
>
> Regards,
> Krishna
>
> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <kr...@gmail.com>
> wrote:
>
>> Thanks Nakul,
>> Replied to the JIRA thread.
>>
>> Cheers,
>> Krishna
>>
>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <na...@gmail.com> wrote:
>>
>>> Hi Krishna,
>>>
>>> We have 2 proposals up :
>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=p
>>> roject%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20OR
>>> DER%20BY%20created%20DESC
>>>
>>> Would you be interested in any of these?
>>> If you are specifically interested in the Python DSL project, we can
>>> look for more volunteers or I could just volunteer to mentor it.
>>>
>>> -Nakul
>>>
>>>
>>>
>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <na...@gmail.com>
>>> wrote:
>>>
>>>> Hi Krishna,
>>>>
>>>> We are working on putting together some proposals. I created is for a
>>>> GPU based project.
>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>>> Be on the lookout for more.
>>>>
>>>> Thanks,
>>>> Nakul
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <
>>>> krishnakalyan3@gmail.com> wrote:
>>>>
>>>>> Hello Adina and Arvind thanks you for your reply,
>>>>> I am open to writing a proposal with a mentor and would appreciate if
>>>>> we
>>>>> could take action quickly on this.
>>>>>
>>>>> Best Regards,
>>>>> Krishna
>>>>>
>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu>
>>>>> wrote:
>>>>>
>>>>> > Apache Software Foundation applied and was accepted for GSOC. I
>>>>> believe
>>>>> > SystemML could still participate as part of ASF if interested
>>>>> (record your
>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this
>>>>> subject on
>>>>> > the community.apache.org mailing list from Ulrich Stark.
>>>>> > The following page also has useful info, even if it is not updated
>>>>> for this
>>>>> > year: http://community.apache.org/gsoc.html - mentors need to
>>>>> register
>>>>> > very
>>>>> > soon.
>>>>> >
>>>>> > Best regards,
>>>>> > Adina
>>>>> >
>>>>> >
>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve
>>>>> <ac...@yahoo.com.invalid>
>>>>> > wrote:
>>>>> >
>>>>> > > Thanks Krishna for your interest.
>>>>> > > Unfortunately we could not submit topic to GSoc on time.However
>>>>> please
>>>>> > > feel free to leverage SystemML for your use cases and do possible
>>>>> > > contribution to SystemML.
>>>>> > > Please let us know if you have any question.
>>>>> > >
>>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>>> > >
>>>>> > >       From: Krishna Kalyan <kr...@gmail.com>
>>>>> > >  To: dev@systemml.incubator.apache.org
>>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>>> > >  Subject: Re: GSoc 2017
>>>>> > >
>>>>> > > Hello All,
>>>>> > > A Gentle ping. Student applications open in a couple of days. I
>>>>> like to
>>>>> > > work on 'Support for Python DSLs'.
>>>>> > > However for now I am not sure on how to proceed.
>>>>> > >
>>>>> > > Thank you,
>>>>> > > Krishna
>>>>> > >
>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:
>>>>> > >
>>>>> > > > Yeah helping to build out our Python DSL into a full-out
>>>>> replacement
>>>>> > for
>>>>> > > > the current "DML" language would be great, and we'd be quite
>>>>> > supportive!
>>>>> > > >
>>>>> > > > -Mike
>>>>> > > >
>>>>> > > > --
>>>>> > > >
>>>>> > > > Mike Dusenberry
>>>>> > > > GitHub: github.com/dusenberrymw
>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>>> > > >
>>>>> > > > Sent from my iPhone.
>>>>> > > >
>>>>> > > >
>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
>>>>> > > > >
>>>>> > > > > Hi Krishna,
>>>>> > > > >
>>>>> > > > > cool to see that you're interested in SystemML!
>>>>> > > > >
>>>>> > > > > From your list I personally think that a) and d) would be well
>>>>> suited
>>>>> > > > for projects, especially a good python DSL is a high priority.
>>>>> > > > >
>>>>> > > > > We will apply as an organization to GSoC once organization
>>>>> > applications
>>>>> > > > are open (Jan. 19th) and I think we will find mentors for at
>>>>> least a)
>>>>> > and
>>>>> > > > d). If you already want to take a look at what is currently
>>>>> there, I
>>>>> > > > suggest to look at our python APIs and documentation. If you
>>>>> want to
>>>>> > take
>>>>> > > > on the DSL project it might also be a good idea to look into the
>>>>> DML
>>>>> > > > documentation and related papers to see what we need to support.
>>>>> > > > >
>>>>> > > > > The proposals will probably circulate on the mailinglist, too,
>>>>> so
>>>>> > keep
>>>>> > > > an eye on that :)
>>>>> > > > >
>>>>> > > > > -Felix
>>>>> > > > >
>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>>>> > > > >> Hello All,
>>>>> > > > >> Thank you for your wonderful replies.
>>>>> > > > >> Tasks that I am interested in:
>>>>> > > > >> a) Support for Python DSLs
>>>>> > > > >> b) Python wrappers for all existing algorithms
>>>>> > > > >> c) GPU support
>>>>> > > > >> d) Perftest : automated performance tests of algorithms
>>>>> > > > >> I am also willing to work on the tasks that SystemML
>>>>> community think
>>>>> > > are
>>>>> > > > >> important.
>>>>> > > > >> Regards,
>>>>> > > > >> Krishna
>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
>>>>> > > > dusenberrymw@gmail.com>
>>>>> > > > >> wrote:
>>>>> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
>>>>> > > > >>> We would definitely be excited to collaborate with you on a
>>>>> GSOC
>>>>> > > > project.
>>>>> > > > >>> We've started another thread to discuss possible new
>>>>> proposals, and
>>>>> > > we
>>>>> > > > >>> would also be quite interested in any particular proposal
>>>>> that you
>>>>> > > > might
>>>>> > > > >>> like to generate tailored towards your interests.  Copied
>>>>> from the
>>>>> > > > other
>>>>> > > > >>> thread, some possible ideas could include: building out a
>>>>> full ML
>>>>> > > demo
>>>>> > > > to
>>>>> > > > >>> solve a real, large-scale problem that would benefit from a
>>>>> > > distributed
>>>>> > > > >>> approach; overall performance improvements that address a
>>>>> full
>>>>> > class,
>>>>> > > > or
>>>>> > > > >>> wider area, of ML algorithms, rather than a single, specific
>>>>> > script;
>>>>> > > > >>> infrastructure for [performance] testing, and identification
>>>>> of
>>>>> > wide
>>>>> > > > areas
>>>>> > > > >>> of improvement; helping with building out fully-featured,
>>>>> clean,
>>>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it
>>>>> would be
>>>>> > > > good to
>>>>> > > > >>> continue stressing them -- we could even aim to replace DML
>>>>> with
>>>>> > the
>>>>> > > > DSLs);
>>>>> > > > >>> etc.  Overall, we want to improve the ability of the user to
>>>>> work
>>>>> > on
>>>>> > > a
>>>>> > > > wide
>>>>> > > > >>> range of large-scale, distributed ML problems in a simple
>>>>> and easy
>>>>> > > > manner
>>>>> > > > >>> on top of Spark.
>>>>> > > > >>> In the meantime, you could explore our recent open issues
>>>>> [1] and
>>>>> > > even
>>>>> > > > >>> begin discussions or contributions on any of the items.  You
>>>>> could
>>>>> > > also
>>>>> > > > >>> view our recent roadmap discussion thread on the mailing
>>>>> list,
>>>>> > > starting
>>>>> > > > >>> with the first email [2]:
>>>>> > > > >>> [1]:
>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>> > > > 20SYSTEMML%20AND%
>>>>> > > > >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DES
>>>>> C%2C%
>>>>> > > > >>> 20priority%20DESC
>>>>> > > > >>> [2]:
>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>>> > > > >>> bad74059930d@gmail.com%3E
>>>>> > > > >>> - Mike
>>>>> > > > >>> --
>>>>> > > > >>> Michael W. Dusenberry
>>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
>>>>> > > luckbr1975@gmail.com
>>>>> > > > >
>>>>> > > > >>> wrote:
>>>>> > > > >>> > As some folks have described on this thread, it would be
>>>>> great to
>>>>> > > > get you
>>>>> > > > >>> > familiarized with SystemML.
>>>>> > > > >>> >
>>>>> > > > >>> > In parallel, I would look for a mentor from the active
>>>>> committer
>>>>> > > > list and
>>>>> > > > >>> > start working on a project proposal which could be based
>>>>> on the
>>>>> > > > recent
>>>>> > > > >>> > Roadmap discussion [1].
>>>>> > > > >>> >
>>>>> > > > >>> > If you are looking for some guidance on how Apache
>>>>> participate on
>>>>> > > > GSOC,
>>>>> > > > >>> > take a look at the following resources [2] and [3], and
>>>>> don't
>>>>> > > > hesitate to
>>>>> > > > >>> > ask questions here.
>>>>> > > > >>> >
>>>>> > > > >>> >
>>>>> > > > >>> > [1]
>>>>> > > > >>> > https://www.mail-archive.com/d
>>>>> ev@systemml.incubator.apache.o
>>>>> > > > >>> > rg/msg01199.html
>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>>> > > > >>> > [3]
>>>>> > > > >>> > http://www.slideshare.net/luck
>>>>> br1975/how-mentoring-can-help-
>>>>> > > > >>> > you-start-contributing-to-open-source
>>>>> > > > >>> >
>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
>>>>> > > > krishnakalyan3@gmail.com
>>>>> > > > >>> >
>>>>> > > > >>> > wrote:
>>>>> > > > >>> >
>>>>> > > > >>> > > Hello Developers,
>>>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in
>>>>> (MSc. in
>>>>> > > Data
>>>>> > > > >>> > Mining)
>>>>> > > > >>> > > currently in Barcelona studying at Université
>>>>> Polytechnique de
>>>>> > > > >>> Catalogne.
>>>>> > > > >>> > > I was interested in contributing to SystemML this year
>>>>> under
>>>>> > GSoc
>>>>> > > > >>> > program.
>>>>> > > > >>> > > Could anyone please guide on how to go about it?. (I
>>>>> understand
>>>>> > > > the I
>>>>> > > > >>> > need
>>>>> > > > >>> > > to write a proposal)
>>>>> > > > >>> > >
>>>>> > > > >>> > > Related Experience:
>>>>> > > > >>> > > My masters is mostly focussed on data mining techniques.
>>>>> Before
>>>>> > > my
>>>>> > > > >>> > masters,
>>>>> > > > >>> > > I was a  data engineer with IBM (India). I was
>>>>> responsible for
>>>>> > > > managing
>>>>> > > > >>> > 50
>>>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my
>>>>> time was
>>>>> > > spent
>>>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
>>>>> > > > >>> > >
>>>>> > > > >>> > > I am the most comfortable with Python followed by R and
>>>>> Scala.
>>>>> > > > >>> > >
>>>>> > > > >>> > > My Webpage
>>>>> > > > >>> > > kkalyan.in
>>>>> > > > >>> > >
>>>>> > > > >>> > > My Spark Pull Requests
>>>>> > > > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
>>>>> > > > >>> is%3Apr%20author%
>>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>>> > > > >>> > >
>>>>> > > > >>> > > Thank you so much,
>>>>> > > > >>> > > Krishna
>>>>> > > > >>> > >
>>>>> > > > >>> >
>>>>> > > > >>> >
>>>>> > > > >>> >
>>>>> > > > >>> > --
>>>>> > > > >>> > Luciano Resende
>>>>> > > > >>> > http://twitter.com/lresende1975
>>>>> > > > >>> > http://lresende.blogspot.com/
>>>>> > > > >>> >
>>>>> > > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Dr. Adina Crainiceanu
>>>>> > Associate Professor, Computer Science Department
>>>>> > United States Naval Academy
>>>>> > 410-293-6822
>>>>> > adina@usna.edu
>>>>> > http://www.usna.edu/Users/cs/adina/
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: GSoc 2017

Posted by Krishna Kalyan <kr...@gmail.com>.
Hello All,
I have created a proposal for

d) Perftest : automated performance tests of algorithms
(I am most comfortable with bash scripting and Python)

https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALGjLH2DrIfRsJksA/edit?usp=sharing

Please share your feedback on the proposal. If someone from the community
could mentor, it would be great.

Regards,
Krishna

On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <kr...@gmail.com>
wrote:

> Thanks Nakul,
> Replied to the JIRA thread.
>
> Cheers,
> Krishna
>
> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <na...@gmail.com> wrote:
>
>> Hi Krishna,
>>
>> We have 2 proposals up :
>> https://issues.apache.org/jira/issues/?filter=12339687&jql=
>> project%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%
>> 20ORDER%20BY%20created%20DESC
>>
>> Would you be interested in any of these?
>> If you are specifically interested in the Python DSL project, we can look
>> for more volunteers or I could just volunteer to mentor it.
>>
>> -Nakul
>>
>>
>>
>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <na...@gmail.com> wrote:
>>
>>> Hi Krishna,
>>>
>>> We are working on putting together some proposals. I created is for a
>>> GPU based project.
>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>> Be on the lookout for more.
>>>
>>> Thanks,
>>> Nakul
>>>
>>>
>>>
>>>
>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <
>>> krishnakalyan3@gmail.com> wrote:
>>>
>>>> Hello Adina and Arvind thanks you for your reply,
>>>> I am open to writing a proposal with a mentor and would appreciate if we
>>>> could take action quickly on this.
>>>>
>>>> Best Regards,
>>>> Krishna
>>>>
>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu>
>>>> wrote:
>>>>
>>>> > Apache Software Foundation applied and was accepted for GSOC. I
>>>> believe
>>>> > SystemML could still participate as part of ASF if interested (record
>>>> your
>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this
>>>> subject on
>>>> > the community.apache.org mailing list from Ulrich Stark.
>>>> > The following page also has useful info, even if it is not updated
>>>> for this
>>>> > year: http://community.apache.org/gsoc.html - mentors need to
>>>> register
>>>> > very
>>>> > soon.
>>>> >
>>>> > Best regards,
>>>> > Adina
>>>> >
>>>> >
>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve <acs_s@yahoo.com.invalid
>>>> >
>>>> > wrote:
>>>> >
>>>> > > Thanks Krishna for your interest.
>>>> > > Unfortunately we could not submit topic to GSoc on time.However
>>>> please
>>>> > > feel free to leverage SystemML for your use cases and do possible
>>>> > > contribution to SystemML.
>>>> > > Please let us know if you have any question.
>>>> > >
>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>> > >
>>>> > >       From: Krishna Kalyan <kr...@gmail.com>
>>>> > >  To: dev@systemml.incubator.apache.org
>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>> > >  Subject: Re: GSoc 2017
>>>> > >
>>>> > > Hello All,
>>>> > > A Gentle ping. Student applications open in a couple of days. I
>>>> like to
>>>> > > work on 'Support for Python DSLs'.
>>>> > > However for now I am not sure on how to proceed.
>>>> > >
>>>> > > Thank you,
>>>> > > Krishna
>>>> > >
>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:
>>>> > >
>>>> > > > Yeah helping to build out our Python DSL into a full-out
>>>> replacement
>>>> > for
>>>> > > > the current "DML" language would be great, and we'd be quite
>>>> > supportive!
>>>> > > >
>>>> > > > -Mike
>>>> > > >
>>>> > > > --
>>>> > > >
>>>> > > > Mike Dusenberry
>>>> > > > GitHub: github.com/dusenberrymw
>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>> > > >
>>>> > > > Sent from my iPhone.
>>>> > > >
>>>> > > >
>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
>>>> > > > >
>>>> > > > > Hi Krishna,
>>>> > > > >
>>>> > > > > cool to see that you're interested in SystemML!
>>>> > > > >
>>>> > > > > From your list I personally think that a) and d) would be well
>>>> suited
>>>> > > > for projects, especially a good python DSL is a high priority.
>>>> > > > >
>>>> > > > > We will apply as an organization to GSoC once organization
>>>> > applications
>>>> > > > are open (Jan. 19th) and I think we will find mentors for at
>>>> least a)
>>>> > and
>>>> > > > d). If you already want to take a look at what is currently
>>>> there, I
>>>> > > > suggest to look at our python APIs and documentation. If you want
>>>> to
>>>> > take
>>>> > > > on the DSL project it might also be a good idea to look into the
>>>> DML
>>>> > > > documentation and related papers to see what we need to support.
>>>> > > > >
>>>> > > > > The proposals will probably circulate on the mailinglist, too,
>>>> so
>>>> > keep
>>>> > > > an eye on that :)
>>>> > > > >
>>>> > > > > -Felix
>>>> > > > >
>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>>> > > > >> Hello All,
>>>> > > > >> Thank you for your wonderful replies.
>>>> > > > >> Tasks that I am interested in:
>>>> > > > >> a) Support for Python DSLs
>>>> > > > >> b) Python wrappers for all existing algorithms
>>>> > > > >> c) GPU support
>>>> > > > >> d) Perftest : automated performance tests of algorithms
>>>> > > > >> I am also willing to work on the tasks that SystemML community
>>>> think
>>>> > > are
>>>> > > > >> important.
>>>> > > > >> Regards,
>>>> > > > >> Krishna
>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
>>>> > > > dusenberrymw@gmail.com>
>>>> > > > >> wrote:
>>>> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
>>>> > > > >>> We would definitely be excited to collaborate with you on a
>>>> GSOC
>>>> > > > project.
>>>> > > > >>> We've started another thread to discuss possible new
>>>> proposals, and
>>>> > > we
>>>> > > > >>> would also be quite interested in any particular proposal
>>>> that you
>>>> > > > might
>>>> > > > >>> like to generate tailored towards your interests.  Copied
>>>> from the
>>>> > > > other
>>>> > > > >>> thread, some possible ideas could include: building out a
>>>> full ML
>>>> > > demo
>>>> > > > to
>>>> > > > >>> solve a real, large-scale problem that would benefit from a
>>>> > > distributed
>>>> > > > >>> approach; overall performance improvements that address a full
>>>> > class,
>>>> > > > or
>>>> > > > >>> wider area, of ML algorithms, rather than a single, specific
>>>> > script;
>>>> > > > >>> infrastructure for [performance] testing, and identification
>>>> of
>>>> > wide
>>>> > > > areas
>>>> > > > >>> of improvement; helping with building out fully-featured,
>>>> clean,
>>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it
>>>> would be
>>>> > > > good to
>>>> > > > >>> continue stressing them -- we could even aim to replace DML
>>>> with
>>>> > the
>>>> > > > DSLs);
>>>> > > > >>> etc.  Overall, we want to improve the ability of the user to
>>>> work
>>>> > on
>>>> > > a
>>>> > > > wide
>>>> > > > >>> range of large-scale, distributed ML problems in a simple and
>>>> easy
>>>> > > > manner
>>>> > > > >>> on top of Spark.
>>>> > > > >>> In the meantime, you could explore our recent open issues [1]
>>>> and
>>>> > > even
>>>> > > > >>> begin discussions or contributions on any of the items.  You
>>>> could
>>>> > > also
>>>> > > > >>> view our recent roadmap discussion thread on the mailing list,
>>>> > > starting
>>>> > > > >>> with the first email [2]:
>>>> > > > >>> [1]:
>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>> > > > 20SYSTEMML%20AND%
>>>> > > > >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DES
>>>> C%2C%
>>>> > > > >>> 20priority%20DESC
>>>> > > > >>> [2]:
>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>> > > > >>> bad74059930d@gmail.com%3E
>>>> > > > >>> - Mike
>>>> > > > >>> --
>>>> > > > >>> Michael W. Dusenberry
>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
>>>> > > luckbr1975@gmail.com
>>>> > > > >
>>>> > > > >>> wrote:
>>>> > > > >>> > As some folks have described on this thread, it would be
>>>> great to
>>>> > > > get you
>>>> > > > >>> > familiarized with SystemML.
>>>> > > > >>> >
>>>> > > > >>> > In parallel, I would look for a mentor from the active
>>>> committer
>>>> > > > list and
>>>> > > > >>> > start working on a project proposal which could be based on
>>>> the
>>>> > > > recent
>>>> > > > >>> > Roadmap discussion [1].
>>>> > > > >>> >
>>>> > > > >>> > If you are looking for some guidance on how Apache
>>>> participate on
>>>> > > > GSOC,
>>>> > > > >>> > take a look at the following resources [2] and [3], and
>>>> don't
>>>> > > > hesitate to
>>>> > > > >>> > ask questions here.
>>>> > > > >>> >
>>>> > > > >>> >
>>>> > > > >>> > [1]
>>>> > > > >>> > https://www.mail-archive.com/d
>>>> ev@systemml.incubator.apache.o
>>>> > > > >>> > rg/msg01199.html
>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>> > > > >>> > [3]
>>>> > > > >>> > http://www.slideshare.net/luck
>>>> br1975/how-mentoring-can-help-
>>>> > > > >>> > you-start-contributing-to-open-source
>>>> > > > >>> >
>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
>>>> > > > krishnakalyan3@gmail.com
>>>> > > > >>> >
>>>> > > > >>> > wrote:
>>>> > > > >>> >
>>>> > > > >>> > > Hello Developers,
>>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in
>>>> (MSc. in
>>>> > > Data
>>>> > > > >>> > Mining)
>>>> > > > >>> > > currently in Barcelona studying at Université
>>>> Polytechnique de
>>>> > > > >>> Catalogne.
>>>> > > > >>> > > I was interested in contributing to SystemML this year
>>>> under
>>>> > GSoc
>>>> > > > >>> > program.
>>>> > > > >>> > > Could anyone please guide on how to go about it?. (I
>>>> understand
>>>> > > > the I
>>>> > > > >>> > need
>>>> > > > >>> > > to write a proposal)
>>>> > > > >>> > >
>>>> > > > >>> > > Related Experience:
>>>> > > > >>> > > My masters is mostly focussed on data mining techniques.
>>>> Before
>>>> > > my
>>>> > > > >>> > masters,
>>>> > > > >>> > > I was a  data engineer with IBM (India). I was
>>>> responsible for
>>>> > > > managing
>>>> > > > >>> > 50
>>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my time
>>>> was
>>>> > > spent
>>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
>>>> > > > >>> > >
>>>> > > > >>> > > I am the most comfortable with Python followed by R and
>>>> Scala.
>>>> > > > >>> > >
>>>> > > > >>> > > My Webpage
>>>> > > > >>> > > kkalyan.in
>>>> > > > >>> > >
>>>> > > > >>> > > My Spark Pull Requests
>>>> > > > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
>>>> > > > >>> is%3Apr%20author%
>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>> > > > >>> > >
>>>> > > > >>> > > Thank you so much,
>>>> > > > >>> > > Krishna
>>>> > > > >>> > >
>>>> > > > >>> >
>>>> > > > >>> >
>>>> > > > >>> >
>>>> > > > >>> > --
>>>> > > > >>> > Luciano Resende
>>>> > > > >>> > http://twitter.com/lresende1975
>>>> > > > >>> > http://lresende.blogspot.com/
>>>> > > > >>> >
>>>> > > >
>>>> > >
>>>> > >
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Dr. Adina Crainiceanu
>>>> > Associate Professor, Computer Science Department
>>>> > United States Naval Academy
>>>> > 410-293-6822
>>>> > adina@usna.edu
>>>> > http://www.usna.edu/Users/cs/adina/
>>>> >
>>>>
>>>
>>>
>>
>

Re: GSoc 2017

Posted by Krishna Kalyan <kr...@gmail.com>.
Thanks Nakul,
Replied to the JIRA thread.

Cheers,
Krishna

On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <na...@gmail.com> wrote:

> Hi Krishna,
>
> We have 2 proposals up :
> https://issues.apache.org/jira/issues/?filter=12339687&
> jql=project%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20ORDER%20BY%
> 20created%20DESC
>
> Would you be interested in any of these?
> If you are specifically interested in the Python DSL project, we can look
> for more volunteers or I could just volunteer to mentor it.
>
> -Nakul
>
>
>
> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <na...@gmail.com> wrote:
>
>> Hi Krishna,
>>
>> We are working on putting together some proposals. I created is for a GPU
>> based project.
>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>> Be on the lookout for more.
>>
>> Thanks,
>> Nakul
>>
>>
>>
>>
>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <
>> krishnakalyan3@gmail.com> wrote:
>>
>>> Hello Adina and Arvind thanks you for your reply,
>>> I am open to writing a proposal with a mentor and would appreciate if we
>>> could take action quickly on this.
>>>
>>> Best Regards,
>>> Krishna
>>>
>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu>
>>> wrote:
>>>
>>> > Apache Software Foundation applied and was accepted for GSOC. I believe
>>> > SystemML could still participate as part of ASF if interested (record
>>> your
>>> > ideas in JIRA and put gsoc2017 as label). See messages on this subject
>>> on
>>> > the community.apache.org mailing list from Ulrich Stark.
>>> > The following page also has useful info, even if it is not updated for
>>> this
>>> > year: http://community.apache.org/gsoc.html - mentors need to register
>>> > very
>>> > soon.
>>> >
>>> > Best regards,
>>> > Adina
>>> >
>>> >
>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve <acs_s@yahoo.com.invalid
>>> >
>>> > wrote:
>>> >
>>> > > Thanks Krishna for your interest.
>>> > > Unfortunately we could not submit topic to GSoc on time.However
>>> please
>>> > > feel free to leverage SystemML for your use cases and do possible
>>> > > contribution to SystemML.
>>> > > Please let us know if you have any question.
>>> > >
>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>> > >
>>> > >       From: Krishna Kalyan <kr...@gmail.com>
>>> > >  To: dev@systemml.incubator.apache.org
>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>> > >  Subject: Re: GSoc 2017
>>> > >
>>> > > Hello All,
>>> > > A Gentle ping. Student applications open in a couple of days. I like
>>> to
>>> > > work on 'Support for Python DSLs'.
>>> > > However for now I am not sure on how to proceed.
>>> > >
>>> > > Thank you,
>>> > > Krishna
>>> > >
>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:
>>> > >
>>> > > > Yeah helping to build out our Python DSL into a full-out
>>> replacement
>>> > for
>>> > > > the current "DML" language would be great, and we'd be quite
>>> > supportive!
>>> > > >
>>> > > > -Mike
>>> > > >
>>> > > > --
>>> > > >
>>> > > > Mike Dusenberry
>>> > > > GitHub: github.com/dusenberrymw
>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>> > > >
>>> > > > Sent from my iPhone.
>>> > > >
>>> > > >
>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
>>> > > > >
>>> > > > > Hi Krishna,
>>> > > > >
>>> > > > > cool to see that you're interested in SystemML!
>>> > > > >
>>> > > > > From your list I personally think that a) and d) would be well
>>> suited
>>> > > > for projects, especially a good python DSL is a high priority.
>>> > > > >
>>> > > > > We will apply as an organization to GSoC once organization
>>> > applications
>>> > > > are open (Jan. 19th) and I think we will find mentors for at least
>>> a)
>>> > and
>>> > > > d). If you already want to take a look at what is currently there,
>>> I
>>> > > > suggest to look at our python APIs and documentation. If you want
>>> to
>>> > take
>>> > > > on the DSL project it might also be a good idea to look into the
>>> DML
>>> > > > documentation and related papers to see what we need to support.
>>> > > > >
>>> > > > > The proposals will probably circulate on the mailinglist, too, so
>>> > keep
>>> > > > an eye on that :)
>>> > > > >
>>> > > > > -Felix
>>> > > > >
>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>> > > > >> Hello All,
>>> > > > >> Thank you for your wonderful replies.
>>> > > > >> Tasks that I am interested in:
>>> > > > >> a) Support for Python DSLs
>>> > > > >> b) Python wrappers for all existing algorithms
>>> > > > >> c) GPU support
>>> > > > >> d) Perftest : automated performance tests of algorithms
>>> > > > >> I am also willing to work on the tasks that SystemML community
>>> think
>>> > > are
>>> > > > >> important.
>>> > > > >> Regards,
>>> > > > >> Krishna
>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
>>> > > > dusenberrymw@gmail.com>
>>> > > > >> wrote:
>>> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
>>> > > > >>> We would definitely be excited to collaborate with you on a
>>> GSOC
>>> > > > project.
>>> > > > >>> We've started another thread to discuss possible new
>>> proposals, and
>>> > > we
>>> > > > >>> would also be quite interested in any particular proposal that
>>> you
>>> > > > might
>>> > > > >>> like to generate tailored towards your interests.  Copied from
>>> the
>>> > > > other
>>> > > > >>> thread, some possible ideas could include: building out a full
>>> ML
>>> > > demo
>>> > > > to
>>> > > > >>> solve a real, large-scale problem that would benefit from a
>>> > > distributed
>>> > > > >>> approach; overall performance improvements that address a full
>>> > class,
>>> > > > or
>>> > > > >>> wider area, of ML algorithms, rather than a single, specific
>>> > script;
>>> > > > >>> infrastructure for [performance] testing, and identification of
>>> > wide
>>> > > > areas
>>> > > > >>> of improvement; helping with building out fully-featured,
>>> clean,
>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it
>>> would be
>>> > > > good to
>>> > > > >>> continue stressing them -- we could even aim to replace DML
>>> with
>>> > the
>>> > > > DSLs);
>>> > > > >>> etc.  Overall, we want to improve the ability of the user to
>>> work
>>> > on
>>> > > a
>>> > > > wide
>>> > > > >>> range of large-scale, distributed ML problems in a simple and
>>> easy
>>> > > > manner
>>> > > > >>> on top of Spark.
>>> > > > >>> In the meantime, you could explore our recent open issues [1]
>>> and
>>> > > even
>>> > > > >>> begin discussions or contributions on any of the items.  You
>>> could
>>> > > also
>>> > > > >>> view our recent roadmap discussion thread on the mailing list,
>>> > > starting
>>> > > > >>> with the first email [2]:
>>> > > > >>> [1]:
>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>> > > > 20SYSTEMML%20AND%
>>> > > > >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DES
>>> C%2C%
>>> > > > >>> 20priority%20DESC
>>> > > > >>> [2]:
>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>> > > > >>> bad74059930d@gmail.com%3E
>>> > > > >>> - Mike
>>> > > > >>> --
>>> > > > >>> Michael W. Dusenberry
>>> > > > >>> GitHub: github.com/dusenberrymw
>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
>>> > > luckbr1975@gmail.com
>>> > > > >
>>> > > > >>> wrote:
>>> > > > >>> > As some folks have described on this thread, it would be
>>> great to
>>> > > > get you
>>> > > > >>> > familiarized with SystemML.
>>> > > > >>> >
>>> > > > >>> > In parallel, I would look for a mentor from the active
>>> committer
>>> > > > list and
>>> > > > >>> > start working on a project proposal which could be based on
>>> the
>>> > > > recent
>>> > > > >>> > Roadmap discussion [1].
>>> > > > >>> >
>>> > > > >>> > If you are looking for some guidance on how Apache
>>> participate on
>>> > > > GSOC,
>>> > > > >>> > take a look at the following resources [2] and [3], and don't
>>> > > > hesitate to
>>> > > > >>> > ask questions here.
>>> > > > >>> >
>>> > > > >>> >
>>> > > > >>> > [1]
>>> > > > >>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o
>>> > > > >>> > rg/msg01199.html
>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>> > > > >>> > [3]
>>> > > > >>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help-
>>> > > > >>> > you-start-contributing-to-open-source
>>> > > > >>> >
>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
>>> > > > krishnakalyan3@gmail.com
>>> > > > >>> >
>>> > > > >>> > wrote:
>>> > > > >>> >
>>> > > > >>> > > Hello Developers,
>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in
>>> (MSc. in
>>> > > Data
>>> > > > >>> > Mining)
>>> > > > >>> > > currently in Barcelona studying at Université
>>> Polytechnique de
>>> > > > >>> Catalogne.
>>> > > > >>> > > I was interested in contributing to SystemML this year
>>> under
>>> > GSoc
>>> > > > >>> > program.
>>> > > > >>> > > Could anyone please guide on how to go about it?. (I
>>> understand
>>> > > > the I
>>> > > > >>> > need
>>> > > > >>> > > to write a proposal)
>>> > > > >>> > >
>>> > > > >>> > > Related Experience:
>>> > > > >>> > > My masters is mostly focussed on data mining techniques.
>>> Before
>>> > > my
>>> > > > >>> > masters,
>>> > > > >>> > > I was a  data engineer with IBM (India). I was responsible
>>> for
>>> > > > managing
>>> > > > >>> > 50
>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my time
>>> was
>>> > > spent
>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
>>> > > > >>> > >
>>> > > > >>> > > I am the most comfortable with Python followed by R and
>>> Scala.
>>> > > > >>> > >
>>> > > > >>> > > My Webpage
>>> > > > >>> > > kkalyan.in
>>> > > > >>> > >
>>> > > > >>> > > My Spark Pull Requests
>>> > > > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
>>> > > > >>> is%3Apr%20author%
>>> > > > >>> > > 3Akrishnakalyan3%20
>>> > > > >>> > >
>>> > > > >>> > > Thank you so much,
>>> > > > >>> > > Krishna
>>> > > > >>> > >
>>> > > > >>> >
>>> > > > >>> >
>>> > > > >>> >
>>> > > > >>> > --
>>> > > > >>> > Luciano Resende
>>> > > > >>> > http://twitter.com/lresende1975
>>> > > > >>> > http://lresende.blogspot.com/
>>> > > > >>> >
>>> > > >
>>> > >
>>> > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Dr. Adina Crainiceanu
>>> > Associate Professor, Computer Science Department
>>> > United States Naval Academy
>>> > 410-293-6822
>>> > adina@usna.edu
>>> > http://www.usna.edu/Users/cs/adina/
>>> >
>>>
>>
>>
>

Re: GSoc 2017

Posted by Nakul Jindal <na...@gmail.com>.
Hi Krishna,

We have 2 proposals up :
https://issues.apache.org/jira/issues/?filter=12339687&jql=project%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20ORDER%20BY%20created%20DESC

Would you be interested in any of these?
If you are specifically interested in the Python DSL project, we can look
for more volunteers or I could just volunteer to mentor it.

-Nakul



On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <na...@gmail.com> wrote:

> Hi Krishna,
>
> We are working on putting together some proposals. I created is for a GPU
> based project.
> https://issues.apache.org/jira/browse/SYSTEMML-1436
> Be on the lookout for more.
>
> Thanks,
> Nakul
>
>
>
>
> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <krishnakalyan3@gmail.com
> > wrote:
>
>> Hello Adina and Arvind thanks you for your reply,
>> I am open to writing a proposal with a mentor and would appreciate if we
>> could take action quickly on this.
>>
>> Best Regards,
>> Krishna
>>
>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu>
>> wrote:
>>
>> > Apache Software Foundation applied and was accepted for GSOC. I believe
>> > SystemML could still participate as part of ASF if interested (record
>> your
>> > ideas in JIRA and put gsoc2017 as label). See messages on this subject
>> on
>> > the community.apache.org mailing list from Ulrich Stark.
>> > The following page also has useful info, even if it is not updated for
>> this
>> > year: http://community.apache.org/gsoc.html - mentors need to register
>> > very
>> > soon.
>> >
>> > Best regards,
>> > Adina
>> >
>> >
>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve <ac...@yahoo.com.invalid>
>> > wrote:
>> >
>> > > Thanks Krishna for your interest.
>> > > Unfortunately we could not submit topic to GSoc on time.However please
>> > > feel free to leverage SystemML for your use cases and do possible
>> > > contribution to SystemML.
>> > > Please let us know if you have any question.
>> > >
>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>> > >
>> > >       From: Krishna Kalyan <kr...@gmail.com>
>> > >  To: dev@systemml.incubator.apache.org
>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>> > >  Subject: Re: GSoc 2017
>> > >
>> > > Hello All,
>> > > A Gentle ping. Student applications open in a couple of days. I like
>> to
>> > > work on 'Support for Python DSLs'.
>> > > However for now I am not sure on how to proceed.
>> > >
>> > > Thank you,
>> > > Krishna
>> > >
>> > > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:
>> > >
>> > > > Yeah helping to build out our Python DSL into a full-out replacement
>> > for
>> > > > the current "DML" language would be great, and we'd be quite
>> > supportive!
>> > > >
>> > > > -Mike
>> > > >
>> > > > --
>> > > >
>> > > > Mike Dusenberry
>> > > > GitHub: github.com/dusenberrymw
>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>> > > >
>> > > > Sent from my iPhone.
>> > > >
>> > > >
>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
>> > > > >
>> > > > > Hi Krishna,
>> > > > >
>> > > > > cool to see that you're interested in SystemML!
>> > > > >
>> > > > > From your list I personally think that a) and d) would be well
>> suited
>> > > > for projects, especially a good python DSL is a high priority.
>> > > > >
>> > > > > We will apply as an organization to GSoC once organization
>> > applications
>> > > > are open (Jan. 19th) and I think we will find mentors for at least
>> a)
>> > and
>> > > > d). If you already want to take a look at what is currently there, I
>> > > > suggest to look at our python APIs and documentation. If you want to
>> > take
>> > > > on the DSL project it might also be a good idea to look into the DML
>> > > > documentation and related papers to see what we need to support.
>> > > > >
>> > > > > The proposals will probably circulate on the mailinglist, too, so
>> > keep
>> > > > an eye on that :)
>> > > > >
>> > > > > -Felix
>> > > > >
>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>> > > > >> Hello All,
>> > > > >> Thank you for your wonderful replies.
>> > > > >> Tasks that I am interested in:
>> > > > >> a) Support for Python DSLs
>> > > > >> b) Python wrappers for all existing algorithms
>> > > > >> c) GPU support
>> > > > >> d) Perftest : automated performance tests of algorithms
>> > > > >> I am also willing to work on the tasks that SystemML community
>> think
>> > > are
>> > > > >> important.
>> > > > >> Regards,
>> > > > >> Krishna
>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
>> > > > dusenberrymw@gmail.com>
>> > > > >> wrote:
>> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
>> > > > >>> We would definitely be excited to collaborate with you on a GSOC
>> > > > project.
>> > > > >>> We've started another thread to discuss possible new proposals,
>> and
>> > > we
>> > > > >>> would also be quite interested in any particular proposal that
>> you
>> > > > might
>> > > > >>> like to generate tailored towards your interests.  Copied from
>> the
>> > > > other
>> > > > >>> thread, some possible ideas could include: building out a full
>> ML
>> > > demo
>> > > > to
>> > > > >>> solve a real, large-scale problem that would benefit from a
>> > > distributed
>> > > > >>> approach; overall performance improvements that address a full
>> > class,
>> > > > or
>> > > > >>> wider area, of ML algorithms, rather than a single, specific
>> > script;
>> > > > >>> infrastructure for [performance] testing, and identification of
>> > wide
>> > > > areas
>> > > > >>> of improvement; helping with building out fully-featured, clean,
>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it would
>> be
>> > > > good to
>> > > > >>> continue stressing them -- we could even aim to replace DML with
>> > the
>> > > > DSLs);
>> > > > >>> etc.  Overall, we want to improve the ability of the user to
>> work
>> > on
>> > > a
>> > > > wide
>> > > > >>> range of large-scale, distributed ML problems in a simple and
>> easy
>> > > > manner
>> > > > >>> on top of Spark.
>> > > > >>> In the meantime, you could explore our recent open issues [1]
>> and
>> > > even
>> > > > >>> begin discussions or contributions on any of the items.  You
>> could
>> > > also
>> > > > >>> view our recent roadmap discussion thread on the mailing list,
>> > > starting
>> > > > >>> with the first email [2]:
>> > > > >>> [1]:
>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>> > > > 20SYSTEMML%20AND%
>> > > > >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%
>> 20DESC%2C%
>> > > > >>> 20priority%20DESC
>> > > > >>> [2]:
>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>> > > > >>> bad74059930d@gmail.com%3E
>> > > > >>> - Mike
>> > > > >>> --
>> > > > >>> Michael W. Dusenberry
>> > > > >>> GitHub: github.com/dusenberrymw
>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
>> > > luckbr1975@gmail.com
>> > > > >
>> > > > >>> wrote:
>> > > > >>> > As some folks have described on this thread, it would be
>> great to
>> > > > get you
>> > > > >>> > familiarized with SystemML.
>> > > > >>> >
>> > > > >>> > In parallel, I would look for a mentor from the active
>> committer
>> > > > list and
>> > > > >>> > start working on a project proposal which could be based on
>> the
>> > > > recent
>> > > > >>> > Roadmap discussion [1].
>> > > > >>> >
>> > > > >>> > If you are looking for some guidance on how Apache
>> participate on
>> > > > GSOC,
>> > > > >>> > take a look at the following resources [2] and [3], and don't
>> > > > hesitate to
>> > > > >>> > ask questions here.
>> > > > >>> >
>> > > > >>> >
>> > > > >>> > [1]
>> > > > >>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o
>> > > > >>> > rg/msg01199.html
>> > > > >>> > [2] http://community.apache.org/gsoc.html
>> > > > >>> > [3]
>> > > > >>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help-
>> > > > >>> > you-start-contributing-to-open-source
>> > > > >>> >
>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
>> > > > krishnakalyan3@gmail.com
>> > > > >>> >
>> > > > >>> > wrote:
>> > > > >>> >
>> > > > >>> > > Hello Developers,
>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in (MSc.
>> in
>> > > Data
>> > > > >>> > Mining)
>> > > > >>> > > currently in Barcelona studying at Université Polytechnique
>> de
>> > > > >>> Catalogne.
>> > > > >>> > > I was interested in contributing to SystemML this year under
>> > GSoc
>> > > > >>> > program.
>> > > > >>> > > Could anyone please guide on how to go about it?. (I
>> understand
>> > > > the I
>> > > > >>> > need
>> > > > >>> > > to write a proposal)
>> > > > >>> > >
>> > > > >>> > > Related Experience:
>> > > > >>> > > My masters is mostly focussed on data mining techniques.
>> Before
>> > > my
>> > > > >>> > masters,
>> > > > >>> > > I was a  data engineer with IBM (India). I was responsible
>> for
>> > > > managing
>> > > > >>> > 50
>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my time
>> was
>> > > spent
>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
>> > > > >>> > >
>> > > > >>> > > I am the most comfortable with Python followed by R and
>> Scala.
>> > > > >>> > >
>> > > > >>> > > My Webpage
>> > > > >>> > > kkalyan.in
>> > > > >>> > >
>> > > > >>> > > My Spark Pull Requests
>> > > > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
>> > > > >>> is%3Apr%20author%
>> > > > >>> > > 3Akrishnakalyan3%20
>> > > > >>> > >
>> > > > >>> > > Thank you so much,
>> > > > >>> > > Krishna
>> > > > >>> > >
>> > > > >>> >
>> > > > >>> >
>> > > > >>> >
>> > > > >>> > --
>> > > > >>> > Luciano Resende
>> > > > >>> > http://twitter.com/lresende1975
>> > > > >>> > http://lresende.blogspot.com/
>> > > > >>> >
>> > > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > Dr. Adina Crainiceanu
>> > Associate Professor, Computer Science Department
>> > United States Naval Academy
>> > 410-293-6822
>> > adina@usna.edu
>> > http://www.usna.edu/Users/cs/adina/
>> >
>>
>
>

Re: GSoc 2017

Posted by Nakul Jindal <na...@gmail.com>.
Hi Krishna,

We are working on putting together some proposals. I created is for a GPU
based project.
https://issues.apache.org/jira/browse/SYSTEMML-1436
Be on the lookout for more.

Thanks,
Nakul




On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <kr...@gmail.com>
wrote:

> Hello Adina and Arvind thanks you for your reply,
> I am open to writing a proposal with a mentor and would appreciate if we
> could take action quickly on this.
>
> Best Regards,
> Krishna
>
> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu> wrote:
>
> > Apache Software Foundation applied and was accepted for GSOC. I believe
> > SystemML could still participate as part of ASF if interested (record
> your
> > ideas in JIRA and put gsoc2017 as label). See messages on this subject on
> > the community.apache.org mailing list from Ulrich Stark.
> > The following page also has useful info, even if it is not updated for
> this
> > year: http://community.apache.org/gsoc.html - mentors need to register
> > very
> > soon.
> >
> > Best regards,
> > Adina
> >
> >
> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve <ac...@yahoo.com.invalid>
> > wrote:
> >
> > > Thanks Krishna for your interest.
> > > Unfortunately we could not submit topic to GSoc on time.However please
> > > feel free to leverage SystemML for your use cases and do possible
> > > contribution to SystemML.
> > > Please let us know if you have any question.
> > >
> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
> > >
> > >       From: Krishna Kalyan <kr...@gmail.com>
> > >  To: dev@systemml.incubator.apache.org
> > >  Sent: Saturday, March 18, 2017 8:18 AM
> > >  Subject: Re: GSoc 2017
> > >
> > > Hello All,
> > > A Gentle ping. Student applications open in a couple of days. I like to
> > > work on 'Support for Python DSLs'.
> > > However for now I am not sure on how to proceed.
> > >
> > > Thank you,
> > > Krishna
> > >
> > > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:
> > >
> > > > Yeah helping to build out our Python DSL into a full-out replacement
> > for
> > > > the current "DML" language would be great, and we'd be quite
> > supportive!
> > > >
> > > > -Mike
> > > >
> > > > --
> > > >
> > > > Mike Dusenberry
> > > > GitHub: github.com/dusenberrymw
> > > > LinkedIn: linkedin.com/in/mikedusenberry
> > > >
> > > > Sent from my iPhone.
> > > >
> > > >
> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
> > > > >
> > > > > Hi Krishna,
> > > > >
> > > > > cool to see that you're interested in SystemML!
> > > > >
> > > > > From your list I personally think that a) and d) would be well
> suited
> > > > for projects, especially a good python DSL is a high priority.
> > > > >
> > > > > We will apply as an organization to GSoC once organization
> > applications
> > > > are open (Jan. 19th) and I think we will find mentors for at least a)
> > and
> > > > d). If you already want to take a look at what is currently there, I
> > > > suggest to look at our python APIs and documentation. If you want to
> > take
> > > > on the DSL project it might also be a good idea to look into the DML
> > > > documentation and related papers to see what we need to support.
> > > > >
> > > > > The proposals will probably circulate on the mailinglist, too, so
> > keep
> > > > an eye on that :)
> > > > >
> > > > > -Felix
> > > > >
> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
> > > > >> Hello All,
> > > > >> Thank you for your wonderful replies.
> > > > >> Tasks that I am interested in:
> > > > >> a) Support for Python DSLs
> > > > >> b) Python wrappers for all existing algorithms
> > > > >> c) GPU support
> > > > >> d) Perftest : automated performance tests of algorithms
> > > > >> I am also willing to work on the tasks that SystemML community
> think
> > > are
> > > > >> important.
> > > > >> Regards,
> > > > >> Krishna
> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
> > > > dusenberrymw@gmail.com>
> > > > >> wrote:
> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
> > > > >>> We would definitely be excited to collaborate with you on a GSOC
> > > > project.
> > > > >>> We've started another thread to discuss possible new proposals,
> and
> > > we
> > > > >>> would also be quite interested in any particular proposal that
> you
> > > > might
> > > > >>> like to generate tailored towards your interests.  Copied from
> the
> > > > other
> > > > >>> thread, some possible ideas could include: building out a full ML
> > > demo
> > > > to
> > > > >>> solve a real, large-scale problem that would benefit from a
> > > distributed
> > > > >>> approach; overall performance improvements that address a full
> > class,
> > > > or
> > > > >>> wider area, of ML algorithms, rather than a single, specific
> > script;
> > > > >>> infrastructure for [performance] testing, and identification of
> > wide
> > > > areas
> > > > >>> of improvement; helping with building out fully-featured, clean,
> > > > >>> well-tested DSLs in Python & Scala (we've started, but it would
> be
> > > > good to
> > > > >>> continue stressing them -- we could even aim to replace DML with
> > the
> > > > DSLs);
> > > > >>> etc.  Overall, we want to improve the ability of the user to work
> > on
> > > a
> > > > wide
> > > > >>> range of large-scale, distributed ML problems in a simple and
> easy
> > > > manner
> > > > >>> on top of Spark.
> > > > >>> In the meantime, you could explore our recent open issues [1] and
> > > even
> > > > >>> begin discussions or contributions on any of the items.  You
> could
> > > also
> > > > >>> view our recent roadmap discussion thread on the mailing list,
> > > starting
> > > > >>> with the first email [2]:
> > > > >>> [1]:
> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
> > > > 20SYSTEMML%20AND%
> > > > >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%
> 20updated%20DESC%2C%
> > > > >>> 20priority%20DESC
> > > > >>> [2]:
> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
> > > > >>> bad74059930d@gmail.com%3E
> > > > >>> - Mike
> > > > >>> --
> > > > >>> Michael W. Dusenberry
> > > > >>> GitHub: github.com/dusenberrymw
> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
> > > luckbr1975@gmail.com
> > > > >
> > > > >>> wrote:
> > > > >>> > As some folks have described on this thread, it would be great
> to
> > > > get you
> > > > >>> > familiarized with SystemML.
> > > > >>> >
> > > > >>> > In parallel, I would look for a mentor from the active
> committer
> > > > list and
> > > > >>> > start working on a project proposal which could be based on the
> > > > recent
> > > > >>> > Roadmap discussion [1].
> > > > >>> >
> > > > >>> > If you are looking for some guidance on how Apache participate
> on
> > > > GSOC,
> > > > >>> > take a look at the following resources [2] and [3], and don't
> > > > hesitate to
> > > > >>> > ask questions here.
> > > > >>> >
> > > > >>> >
> > > > >>> > [1]
> > > > >>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o
> > > > >>> > rg/msg01199.html
> > > > >>> > [2] http://community.apache.org/gsoc.html
> > > > >>> > [3]
> > > > >>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help-
> > > > >>> > you-start-contributing-to-open-source
> > > > >>> >
> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
> > > > krishnakalyan3@gmail.com
> > > > >>> >
> > > > >>> > wrote:
> > > > >>> >
> > > > >>> > > Hello Developers,
> > > > >>> > > I am Krishna, currently a 2nd year Masters student in (MSc.
> in
> > > Data
> > > > >>> > Mining)
> > > > >>> > > currently in Barcelona studying at Université Polytechnique
> de
> > > > >>> Catalogne.
> > > > >>> > > I was interested in contributing to SystemML this year under
> > GSoc
> > > > >>> > program.
> > > > >>> > > Could anyone please guide on how to go about it?. (I
> understand
> > > > the I
> > > > >>> > need
> > > > >>> > > to write a proposal)
> > > > >>> > >
> > > > >>> > > Related Experience:
> > > > >>> > > My masters is mostly focussed on data mining techniques.
> Before
> > > my
> > > > >>> > masters,
> > > > >>> > > I was a  data engineer with IBM (India). I was responsible
> for
> > > > managing
> > > > >>> > 50
> > > > >>> > > node Hadoop Cluster for more than a year. Most of my time was
> > > spent
> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
> > > > >>> > >
> > > > >>> > > I am the most comfortable with Python followed by R and
> Scala.
> > > > >>> > >
> > > > >>> > > My Webpage
> > > > >>> > > kkalyan.in
> > > > >>> > >
> > > > >>> > > My Spark Pull Requests
> > > > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
> > > > >>> is%3Apr%20author%
> > > > >>> > > 3Akrishnakalyan3%20
> > > > >>> > >
> > > > >>> > > Thank you so much,
> > > > >>> > > Krishna
> > > > >>> > >
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> > --
> > > > >>> > Luciano Resende
> > > > >>> > http://twitter.com/lresende1975
> > > > >>> > http://lresende.blogspot.com/
> > > > >>> >
> > > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > Dr. Adina Crainiceanu
> > Associate Professor, Computer Science Department
> > United States Naval Academy
> > 410-293-6822
> > adina@usna.edu
> > http://www.usna.edu/Users/cs/adina/
> >
>

Re: GSoc 2017

Posted by Krishna Kalyan <kr...@gmail.com>.
Hello Adina and Arvind thanks you for your reply,
I am open to writing a proposal with a mentor and would appreciate if we
could take action quickly on this.

Best Regards,
Krishna

On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu> wrote:

> Apache Software Foundation applied and was accepted for GSOC. I believe
> SystemML could still participate as part of ASF if interested (record your
> ideas in JIRA and put gsoc2017 as label). See messages on this subject on
> the community.apache.org mailing list from Ulrich Stark.
> The following page also has useful info, even if it is not updated for this
> year: http://community.apache.org/gsoc.html - mentors need to register
> very
> soon.
>
> Best regards,
> Adina
>
>
> On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve <ac...@yahoo.com.invalid>
> wrote:
>
> > Thanks Krishna for your interest.
> > Unfortunately we could not submit topic to GSoc on time.However please
> > feel free to leverage SystemML for your use cases and do possible
> > contribution to SystemML.
> > Please let us know if you have any question.
> >
> > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
> >
> >       From: Krishna Kalyan <kr...@gmail.com>
> >  To: dev@systemml.incubator.apache.org
> >  Sent: Saturday, March 18, 2017 8:18 AM
> >  Subject: Re: GSoc 2017
> >
> > Hello All,
> > A Gentle ping. Student applications open in a couple of days. I like to
> > work on 'Support for Python DSLs'.
> > However for now I am not sure on how to proceed.
> >
> > Thank you,
> > Krishna
> >
> > On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:
> >
> > > Yeah helping to build out our Python DSL into a full-out replacement
> for
> > > the current "DML" language would be great, and we'd be quite
> supportive!
> > >
> > > -Mike
> > >
> > > --
> > >
> > > Mike Dusenberry
> > > GitHub: github.com/dusenberrymw
> > > LinkedIn: linkedin.com/in/mikedusenberry
> > >
> > > Sent from my iPhone.
> > >
> > >
> > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
> > > >
> > > > Hi Krishna,
> > > >
> > > > cool to see that you're interested in SystemML!
> > > >
> > > > From your list I personally think that a) and d) would be well suited
> > > for projects, especially a good python DSL is a high priority.
> > > >
> > > > We will apply as an organization to GSoC once organization
> applications
> > > are open (Jan. 19th) and I think we will find mentors for at least a)
> and
> > > d). If you already want to take a look at what is currently there, I
> > > suggest to look at our python APIs and documentation. If you want to
> take
> > > on the DSL project it might also be a good idea to look into the DML
> > > documentation and related papers to see what we need to support.
> > > >
> > > > The proposals will probably circulate on the mailinglist, too, so
> keep
> > > an eye on that :)
> > > >
> > > > -Felix
> > > >
> > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
> > > >> Hello All,
> > > >> Thank you for your wonderful replies.
> > > >> Tasks that I am interested in:
> > > >> a) Support for Python DSLs
> > > >> b) Python wrappers for all existing algorithms
> > > >> c) GPU support
> > > >> d) Perftest : automated performance tests of algorithms
> > > >> I am also willing to work on the tasks that SystemML community think
> > are
> > > >> important.
> > > >> Regards,
> > > >> Krishna
> > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
> > > dusenberrymw@gmail.com>
> > > >> wrote:
> > > >>> Hi Krishna!  Welcome, and thanks for your interest!
> > > >>> We would definitely be excited to collaborate with you on a GSOC
> > > project.
> > > >>> We've started another thread to discuss possible new proposals, and
> > we
> > > >>> would also be quite interested in any particular proposal that you
> > > might
> > > >>> like to generate tailored towards your interests.  Copied from the
> > > other
> > > >>> thread, some possible ideas could include: building out a full ML
> > demo
> > > to
> > > >>> solve a real, large-scale problem that would benefit from a
> > distributed
> > > >>> approach; overall performance improvements that address a full
> class,
> > > or
> > > >>> wider area, of ML algorithms, rather than a single, specific
> script;
> > > >>> infrastructure for [performance] testing, and identification of
> wide
> > > areas
> > > >>> of improvement; helping with building out fully-featured, clean,
> > > >>> well-tested DSLs in Python & Scala (we've started, but it would be
> > > good to
> > > >>> continue stressing them -- we could even aim to replace DML with
> the
> > > DSLs);
> > > >>> etc.  Overall, we want to improve the ability of the user to work
> on
> > a
> > > wide
> > > >>> range of large-scale, distributed ML problems in a simple and easy
> > > manner
> > > >>> on top of Spark.
> > > >>> In the meantime, you could explore our recent open issues [1] and
> > even
> > > >>> begin discussions or contributions on any of the items.  You could
> > also
> > > >>> view our recent roadmap discussion thread on the mailing list,
> > starting
> > > >>> with the first email [2]:
> > > >>> [1]:
> > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
> > > 20SYSTEMML%20AND%
> > > >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%
> > > >>> 20priority%20DESC
> > > >>> [2]:
> > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
> > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
> > > >>> bad74059930d@gmail.com%3E
> > > >>> - Mike
> > > >>> --
> > > >>> Michael W. Dusenberry
> > > >>> GitHub: github.com/dusenberrymw
> > > >>> LinkedIn: linkedin.com/in/mikedusenberry
> > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
> > luckbr1975@gmail.com
> > > >
> > > >>> wrote:
> > > >>> > As some folks have described on this thread, it would be great to
> > > get you
> > > >>> > familiarized with SystemML.
> > > >>> >
> > > >>> > In parallel, I would look for a mentor from the active committer
> > > list and
> > > >>> > start working on a project proposal which could be based on the
> > > recent
> > > >>> > Roadmap discussion [1].
> > > >>> >
> > > >>> > If you are looking for some guidance on how Apache participate on
> > > GSOC,
> > > >>> > take a look at the following resources [2] and [3], and don't
> > > hesitate to
> > > >>> > ask questions here.
> > > >>> >
> > > >>> >
> > > >>> > [1]
> > > >>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o
> > > >>> > rg/msg01199.html
> > > >>> > [2] http://community.apache.org/gsoc.html
> > > >>> > [3]
> > > >>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help-
> > > >>> > you-start-contributing-to-open-source
> > > >>> >
> > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
> > > krishnakalyan3@gmail.com
> > > >>> >
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Hello Developers,
> > > >>> > > I am Krishna, currently a 2nd year Masters student in (MSc. in
> > Data
> > > >>> > Mining)
> > > >>> > > currently in Barcelona studying at Université Polytechnique de
> > > >>> Catalogne.
> > > >>> > > I was interested in contributing to SystemML this year under
> GSoc
> > > >>> > program.
> > > >>> > > Could anyone please guide on how to go about it?. (I understand
> > > the I
> > > >>> > need
> > > >>> > > to write a proposal)
> > > >>> > >
> > > >>> > > Related Experience:
> > > >>> > > My masters is mostly focussed on data mining techniques. Before
> > my
> > > >>> > masters,
> > > >>> > > I was a  data engineer with IBM (India). I was responsible for
> > > managing
> > > >>> > 50
> > > >>> > > node Hadoop Cluster for more than a year. Most of my time was
> > spent
> > > >>> > > optimising and writing ETL (Apache Pig) jobs.
> > > >>> > >
> > > >>> > > I am the most comfortable with Python followed by R and Scala.
> > > >>> > >
> > > >>> > > My Webpage
> > > >>> > > kkalyan.in
> > > >>> > >
> > > >>> > > My Spark Pull Requests
> > > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
> > > >>> is%3Apr%20author%
> > > >>> > > 3Akrishnakalyan3%20
> > > >>> > >
> > > >>> > > Thank you so much,
> > > >>> > > Krishna
> > > >>> > >
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > --
> > > >>> > Luciano Resende
> > > >>> > http://twitter.com/lresende1975
> > > >>> > http://lresende.blogspot.com/
> > > >>> >
> > >
> >
> >
> >
>
>
>
> --
> Dr. Adina Crainiceanu
> Associate Professor, Computer Science Department
> United States Naval Academy
> 410-293-6822
> adina@usna.edu
> http://www.usna.edu/Users/cs/adina/
>

Re: GSoc 2017

Posted by Adina Crainiceanu <ad...@usna.edu>.
Apache Software Foundation applied and was accepted for GSOC. I believe
SystemML could still participate as part of ASF if interested (record your
ideas in JIRA and put gsoc2017 as label). See messages on this subject on
the community.apache.org mailing list from Ulrich Stark.
The following page also has useful info, even if it is not updated for this
year: http://community.apache.org/gsoc.html - mentors need to register very
soon.

Best regards,
Adina


On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve <ac...@yahoo.com.invalid>
wrote:

> Thanks Krishna for your interest.
> Unfortunately we could not submit topic to GSoc on time.However please
> feel free to leverage SystemML for your use cases and do possible
> contribution to SystemML.
> Please let us know if you have any question.
>
> Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>
>       From: Krishna Kalyan <kr...@gmail.com>
>  To: dev@systemml.incubator.apache.org
>  Sent: Saturday, March 18, 2017 8:18 AM
>  Subject: Re: GSoc 2017
>
> Hello All,
> A Gentle ping. Student applications open in a couple of days. I like to
> work on 'Support for Python DSLs'.
> However for now I am not sure on how to proceed.
>
> Thank you,
> Krishna
>
> On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:
>
> > Yeah helping to build out our Python DSL into a full-out replacement for
> > the current "DML" language would be great, and we'd be quite supportive!
> >
> > -Mike
> >
> > --
> >
> > Mike Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> >
> > Sent from my iPhone.
> >
> >
> > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
> > >
> > > Hi Krishna,
> > >
> > > cool to see that you're interested in SystemML!
> > >
> > > From your list I personally think that a) and d) would be well suited
> > for projects, especially a good python DSL is a high priority.
> > >
> > > We will apply as an organization to GSoC once organization applications
> > are open (Jan. 19th) and I think we will find mentors for at least a) and
> > d). If you already want to take a look at what is currently there, I
> > suggest to look at our python APIs and documentation. If you want to take
> > on the DSL project it might also be a good idea to look into the DML
> > documentation and related papers to see what we need to support.
> > >
> > > The proposals will probably circulate on the mailinglist, too, so keep
> > an eye on that :)
> > >
> > > -Felix
> > >
> > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
> > >> Hello All,
> > >> Thank you for your wonderful replies.
> > >> Tasks that I am interested in:
> > >> a) Support for Python DSLs
> > >> b) Python wrappers for all existing algorithms
> > >> c) GPU support
> > >> d) Perftest : automated performance tests of algorithms
> > >> I am also willing to work on the tasks that SystemML community think
> are
> > >> important.
> > >> Regards,
> > >> Krishna
> > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
> > dusenberrymw@gmail.com>
> > >> wrote:
> > >>> Hi Krishna!  Welcome, and thanks for your interest!
> > >>> We would definitely be excited to collaborate with you on a GSOC
> > project.
> > >>> We've started another thread to discuss possible new proposals, and
> we
> > >>> would also be quite interested in any particular proposal that you
> > might
> > >>> like to generate tailored towards your interests.  Copied from the
> > other
> > >>> thread, some possible ideas could include: building out a full ML
> demo
> > to
> > >>> solve a real, large-scale problem that would benefit from a
> distributed
> > >>> approach; overall performance improvements that address a full class,
> > or
> > >>> wider area, of ML algorithms, rather than a single, specific script;
> > >>> infrastructure for [performance] testing, and identification of wide
> > areas
> > >>> of improvement; helping with building out fully-featured, clean,
> > >>> well-tested DSLs in Python & Scala (we've started, but it would be
> > good to
> > >>> continue stressing them -- we could even aim to replace DML with the
> > DSLs);
> > >>> etc.  Overall, we want to improve the ability of the user to work on
> a
> > wide
> > >>> range of large-scale, distributed ML problems in a simple and easy
> > manner
> > >>> on top of Spark.
> > >>> In the meantime, you could explore our recent open issues [1] and
> even
> > >>> begin discussions or contributions on any of the items.  You could
> also
> > >>> view our recent roadmap discussion thread on the mailing list,
> starting
> > >>> with the first email [2]:
> > >>> [1]:
> > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
> > 20SYSTEMML%20AND%
> > >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%
> > >>> 20priority%20DESC
> > >>> [2]:
> > >>> http://mail-archives.apache.org/mod_mbox/incubator-
> > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
> > >>> bad74059930d@gmail.com%3E
> > >>> - Mike
> > >>> --
> > >>> Michael W. Dusenberry
> > >>> GitHub: github.com/dusenberrymw
> > >>> LinkedIn: linkedin.com/in/mikedusenberry
> > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
> luckbr1975@gmail.com
> > >
> > >>> wrote:
> > >>> > As some folks have described on this thread, it would be great to
> > get you
> > >>> > familiarized with SystemML.
> > >>> >
> > >>> > In parallel, I would look for a mentor from the active committer
> > list and
> > >>> > start working on a project proposal which could be based on the
> > recent
> > >>> > Roadmap discussion [1].
> > >>> >
> > >>> > If you are looking for some guidance on how Apache participate on
> > GSOC,
> > >>> > take a look at the following resources [2] and [3], and don't
> > hesitate to
> > >>> > ask questions here.
> > >>> >
> > >>> >
> > >>> > [1]
> > >>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o
> > >>> > rg/msg01199.html
> > >>> > [2] http://community.apache.org/gsoc.html
> > >>> > [3]
> > >>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help-
> > >>> > you-start-contributing-to-open-source
> > >>> >
> > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
> > krishnakalyan3@gmail.com
> > >>> >
> > >>> > wrote:
> > >>> >
> > >>> > > Hello Developers,
> > >>> > > I am Krishna, currently a 2nd year Masters student in (MSc. in
> Data
> > >>> > Mining)
> > >>> > > currently in Barcelona studying at Université Polytechnique de
> > >>> Catalogne.
> > >>> > > I was interested in contributing to SystemML this year under GSoc
> > >>> > program.
> > >>> > > Could anyone please guide on how to go about it?. (I understand
> > the I
> > >>> > need
> > >>> > > to write a proposal)
> > >>> > >
> > >>> > > Related Experience:
> > >>> > > My masters is mostly focussed on data mining techniques. Before
> my
> > >>> > masters,
> > >>> > > I was a  data engineer with IBM (India). I was responsible for
> > managing
> > >>> > 50
> > >>> > > node Hadoop Cluster for more than a year. Most of my time was
> spent
> > >>> > > optimising and writing ETL (Apache Pig) jobs.
> > >>> > >
> > >>> > > I am the most comfortable with Python followed by R and Scala.
> > >>> > >
> > >>> > > My Webpage
> > >>> > > kkalyan.in
> > >>> > >
> > >>> > > My Spark Pull Requests
> > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
> > >>> is%3Apr%20author%
> > >>> > > 3Akrishnakalyan3%20
> > >>> > >
> > >>> > > Thank you so much,
> > >>> > > Krishna
> > >>> > >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > Luciano Resende
> > >>> > http://twitter.com/lresende1975
> > >>> > http://lresende.blogspot.com/
> > >>> >
> >
>
>
>



-- 
Dr. Adina Crainiceanu
Associate Professor, Computer Science Department
United States Naval Academy
410-293-6822
adina@usna.edu
http://www.usna.edu/Users/cs/adina/

Re: GSoc 2017

Posted by Arvind Surve <ac...@yahoo.com.INVALID>.
Thanks Krishna for your interest.
Unfortunately we could not submit topic to GSoc on time.However please feel free to leverage SystemML for your use cases and do possible contribution to SystemML.
Please let us know if you have any question.
 
Arvind Surve | Spark Technology Center  | http://www.spark.tc/

      From: Krishna Kalyan <kr...@gmail.com>
 To: dev@systemml.incubator.apache.org 
 Sent: Saturday, March 18, 2017 8:18 AM
 Subject: Re: GSoc 2017
   
Hello All,
A Gentle ping. Student applications open in a couple of days. I like to
work on 'Support for Python DSLs'.
However for now I am not sure on how to proceed.

Thank you,
Krishna

On Thu, Jan 12, 2017 at 6:08 PM, <du...@gmail.com> wrote:

> Yeah helping to build out our Python DSL into a full-out replacement for
> the current "DML" language would be great, and we'd be quite supportive!
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de wrote:
> >
> > Hi Krishna,
> >
> > cool to see that you're interested in SystemML!
> >
> > From your list I personally think that a) and d) would be well suited
> for projects, especially a good python DSL is a high priority.
> >
> > We will apply as an organization to GSoC once organization applications
> are open (Jan. 19th) and I think we will find mentors for at least a) and
> d). If you already want to take a look at what is currently there, I
> suggest to look at our python APIs and documentation. If you want to take
> on the DSL project it might also be a good idea to look into the DML
> documentation and related papers to see what we need to support.
> >
> > The proposals will probably circulate on the mailinglist, too, so keep
> an eye on that :)
> >
> > -Felix
> >
> > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
> >> Hello All,
> >> Thank you for your wonderful replies.
> >> Tasks that I am interested in:
> >> a) Support for Python DSLs
> >> b) Python wrappers for all existing algorithms
> >> c) GPU support
> >> d) Perftest : automated performance tests of algorithms
> >> I am also willing to work on the tasks that SystemML community think are
> >> important.
> >> Regards,
> >> Krishna
> >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
> dusenberrymw@gmail.com>
> >> wrote:
> >>> Hi Krishna!  Welcome, and thanks for your interest!
> >>> We would definitely be excited to collaborate with you on a GSOC
> project.
> >>> We've started another thread to discuss possible new proposals, and we
> >>> would also be quite interested in any particular proposal that you
> might
> >>> like to generate tailored towards your interests.  Copied from the
> other
> >>> thread, some possible ideas could include: building out a full ML demo
> to
> >>> solve a real, large-scale problem that would benefit from a distributed
> >>> approach; overall performance improvements that address a full class,
> or
> >>> wider area, of ML algorithms, rather than a single, specific script;
> >>> infrastructure for [performance] testing, and identification of wide
> areas
> >>> of improvement; helping with building out fully-featured, clean,
> >>> well-tested DSLs in Python & Scala (we've started, but it would be
> good to
> >>> continue stressing them -- we could even aim to replace DML with the
> DSLs);
> >>> etc.  Overall, we want to improve the ability of the user to work on a
> wide
> >>> range of large-scale, distributed ML problems in a simple and easy
> manner
> >>> on top of Spark.
> >>> In the meantime, you could explore our recent open issues [1] and even
> >>> begin discussions or contributions on any of the items.  You could also
> >>> view our recent roadmap discussion thread on the mailing list, starting
> >>> with the first email [2]:
> >>> [1]:
> >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
> 20SYSTEMML%20AND%
> >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%
> >>> 20priority%20DESC
> >>> [2]:
> >>> http://mail-archives.apache.org/mod_mbox/incubator-
> >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
> >>> bad74059930d@gmail.com%3E
> >>> - Mike
> >>> --
> >>> Michael W. Dusenberry
> >>> GitHub: github.com/dusenberrymw
> >>> LinkedIn: linkedin.com/in/mikedusenberry
> >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <luckbr1975@gmail.com
> >
> >>> wrote:
> >>> > As some folks have described on this thread, it would be great to
> get you
> >>> > familiarized with SystemML.
> >>> >
> >>> > In parallel, I would look for a mentor from the active committer
> list and
> >>> > start working on a project proposal which could be based on the
> recent
> >>> > Roadmap discussion [1].
> >>> >
> >>> > If you are looking for some guidance on how Apache participate on
> GSOC,
> >>> > take a look at the following resources [2] and [3], and don't
> hesitate to
> >>> > ask questions here.
> >>> >
> >>> >
> >>> > [1]
> >>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o
> >>> > rg/msg01199.html
> >>> > [2] http://community.apache.org/gsoc.html
> >>> > [3]
> >>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help-
> >>> > you-start-contributing-to-open-source
> >>> >
> >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
> krishnakalyan3@gmail.com
> >>> >
> >>> > wrote:
> >>> >
> >>> > > Hello Developers,
> >>> > > I am Krishna, currently a 2nd year Masters student in (MSc. in Data
> >>> > Mining)
> >>> > > currently in Barcelona studying at Université Polytechnique de
> >>> Catalogne.
> >>> > > I was interested in contributing to SystemML this year under GSoc
> >>> > program.
> >>> > > Could anyone please guide on how to go about it?. (I understand
> the I
> >>> > need
> >>> > > to write a proposal)
> >>> > >
> >>> > > Related Experience:
> >>> > > My masters is mostly focussed on data mining techniques. Before my
> >>> > masters,
> >>> > > I was a  data engineer with IBM (India). I was responsible for
> managing
> >>> > 50
> >>> > > node Hadoop Cluster for more than a year. Most of my time was spent
> >>> > > optimising and writing ETL (Apache Pig) jobs.
> >>> > >
> >>> > > I am the most comfortable with Python followed by R and Scala.
> >>> > >
> >>> > > My Webpage
> >>> > > kkalyan.in
> >>> > >
> >>> > > My Spark Pull Requests
> >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
> >>> is%3Apr%20author%
> >>> > > 3Akrishnakalyan3%20
> >>> > >
> >>> > > Thank you so much,
> >>> > > Krishna
> >>> > >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Luciano Resende
> >>> > http://twitter.com/lresende1975
> >>> > http://lresende.blogspot.com/
> >>> >
>