You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Samee Zahur <sa...@gmail.com> on 2008/04/03 07:18:43 UTC

Re: Undergrad stud interested in GSoC

I have updated my proposal (see below) to GSoC to include a timeline
and expanded its scope to include ALL the algorithms in the nips
paper.

Frankly, I was a little confused when Isabel Drost said that's a lot
of work for one summer. So I kept doing some background study the last
week, gone through the API, the existing codes in mahout, and I think
I understand them well enough to write similar codes by myself - and
that fast enough to implement all the algorithms described in the nips
paper in this summer. To confirm myself, I even built up a LWLR
implementation on hadoop from scratch. Took me around 3-5 hours of
coding and testing.

I know it is only a draft implementation, and that it has lots of room
for improvement, but all that is doable.

Please, if you still think I have grossly underestimated the
difficulty in someway, let me know so exactly parts are the most
difficult and time-consuming, and I will adjust my position
accordingly.

My implementation of LWLR is attached here. I ran it along with the
trunk codes from svn. It is just the implementation, the test modules
are not there yet. But I think it is still simple enough for you to
inspect and comment.


Samee

---- Proposal starts ---------

Introduction

I am Samee Zahur, a senior-year student expecting a BS degree in
Computer Engineering in less than a year. I was very excited to learn
that you would mentor students from the GSoC program as I feel that
the Apache Mahout project would provide me with an excellent
first-hand introduction into working with large-scale machine-learning
systems. Given that I have been involved in various forms of
programming tasks for over 10 years now, I strongly feel that I can
provide you with all the services that you require for this particular
task, and that within the short time-frame.


Scope of Work

During the Google Summer of Code 2008, I intend to implement all of
the ten machine-learning algorithms described in the nips paper (or
fewer if specified by the mentor), following the proper project
standards as dictated by the mentor, perpare illustratively large
sample cases for demonstration of each of those algorithms, and
document the API thus coded, once again, according to project
standards.


Expected phases of my work

1) First two weeks I will just be reading documentations, doing
background studies, going through the existing codebase, discussing
project conventions (how much to comment the codes, data I/O format
design, where to use Float[] instead of mahout.util classes etc.), and
getting to know the project in general. This is also the time I will
be using to gain greater familiarity with the Hadoop framework.

2) At this point, I will start coding. During this time I will code
LWLR, NN and PCA algorithms. It is expected that I will take a maximum
of 2-3 days for a draft working system for each of these algorithms,
along with one more week for fine-tuning, flushing things back into
project convention, documenting, testing etc. All these activities
will be reported to the mahout-dev mailing list, so that mentors may
check progress and correct/adjust my activity accordingly.

3) Once these three algorithms have been completely implemented, I
expect to be quite thoroughly familiar with the project - codebase and
conventions. At this time I should be able to implement any and all of
the remaining algorithms without much trouble, once again, taking 2-3
days for each. At this point, I expect my progress to be slowed down
by the fact that I may start contributing to this project in ways
outside the scope of the GSoC. Nevertheless, I expect this phase to be
over by a maximum of 5 weeks.

4) By this time I should have delivered almost all my work related to
the GSoC, and explained them well to the mentor. Only demonstration
and test data and usage tutorial writeups are expected to be remaining
at this point. Once again, I expect this to take around 3-5 weeks,
depending on the actual size of the API I implemented thus far.

Personal Background

As you may realize, I have been involved in programming ever since
middle-school, and since then I have used my skills sometimes to
semi-professional needs, sometimes purely as a hobby. From the
descriptions of my experience and interests below, you will realize
that I am usually a quick learner. And it is this particular quality
which I feel will make me the most suitable candidate for your work
here. I realize that any student taking up this task would be required
to gain quick familiarity to Hadoop and its implementation of
MapReduce, Mahout design plans and planned interfaces, and in-depth
understanding of principles relating to scalability. You will find
that in the descriptions below, I have repeatedly proven myself to be
capable of rapidly acquiring new skills in this area and using them
within a tight time constraint.


Relevant Activities

I have been an active contestant in various algorithm-based
programming competitions for the last 2-3 years, both on national and
international levels. Competitions I have participated include ACM
ICPC Regional Contests, TopCoder Collegiate Challenge and TopCoder
Open, and various other nationwide contests. You can see a graphical
representation of my performance over the years here:

http://www.topcoder.com/tc?module=MemberProfile&cr=16039001

This profile reflects my performance in programming competitions,
mainly reflecting the results of weekly Single Round Matches I have
participated in TopCoder to keep in touch with my skills. As the graph
shows, at the start I was not quite adept at such competitions. But
with practice, my performance improved. The tasks required us to read
problem statements, analyze them, classify the feasible solution
approach and code an algorithm which solves the given problem. With
practice, I was able to go from opening the problem statement to
submitting the solution even in under 10 mins for moderately easy
problems. This 10 mins included the time to read the problem, code,
test and debug the solutions. At the peak of my performance a few
months back, I was ranked number one in the country and ranked better
than 250 out of 7000+ coders in the world. I have, however, decided to
concentrate on other tasks in this summer and therefore will
not participate and practice there till the end of August. The point
is, however, that I went from nothing to being the top in the country
(and 250 in the world) within a matter of 2-3 years, even when I am
competing against international contestants who have been in such
competitions for 5-10 years more than me.

I have also had to work on real-life projects when I used to do
lightweight freelancing work on website backend development. There I
coded server-side scripts in popular languages like ASP, PHP etc. My
work ranged from simple order-placement scripts to payment gateway
authentication integration. At this time my skills at understanding
and integrating with existing codebases were put to the test. Small
projects frequently involved delivery within weeks, which meant I had
to be fast in getting acquainted with the existing systems in place
and in getting familiar with new technologies. I worked in such small
projects for approximately another 2 years.

Over the years my hobbies included work in C, C++, VB, 80x86 Assembly,
PHP, ASP and Java, among other languages. In the course of learning
several languages I have tested my abilities to learn fast and
thoroughly more than once.



---- Proposal Ends --------

Re: Undergrad stud interested in GSoC

Posted by Isabel Drost <ap...@isabel-drost.de>.
On Friday 04 April 2008, Samee Zahur wrote:
> I do see others at least including signature.asc as an attachment at times.

I would guess there is some upper limit on the size of attachments. Don't know 
for sure though.

Isabel

-- 
You cannot use your friends and have them too.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: Undergrad stud interested in GSoC

Posted by Samee Zahur <sa...@gmail.com>.
I do see others at least including signature.asc as an attachment at times.


On 4/4/08, Grant Ingersoll <gs...@apache.org> wrote:
>
>  On Apr 4, 2008, at 5:21 AM, Samee Zahur wrote:
>
> >
> > For future reference, just one question: what kind of attachments do
> > go through the mailing list here?
> >
>
>
>  None, I think
>

Re: Undergrad stud interested in GSoC

Posted by Grant Ingersoll <gs...@apache.org>.
On Apr 4, 2008, at 5:21 AM, Samee Zahur wrote:
>
> For future reference, just one question: what kind of attachments do
> go through the mailing list here?


None, I think

Re: Undergrad stud interested in GSoC

Posted by Samee Zahur <sa...@gmail.com>.
Will do. As soon as I write up the unit tests I will submit it. So far
I was testing it by manually invoking it from the command line just to
avoid having to download JUnit :p

For future reference, just one question: what kind of attachments do
go through the mailing list here?


Samee

On 4/3/08, Grant Ingersoll <gs...@apache.org> wrote:
> Please post the code as a patch in JIRA.  See the Wiki "How to Contribute"
> section.
>
>  -Grant
>
>

Re: Undergrad stud interested in GSoC

Posted by Grant Ingersoll <gs...@apache.org>.
Please post the code as a patch in JIRA.  See the Wiki "How to  
Contribute" section.

-Grant

On Apr 3, 2008, at 1:19 AM, Samee Zahur wrote:

> Please rename the attachment to LWLR.tar.gz for openning. Couldn't get
> it mailed otherwise.



Re: Undergrad stud interested in GSoC

Posted by Samee Zahur <sa...@gmail.com>.
Please rename the attachment to LWLR.tar.gz for openning. Couldn't get
it mailed otherwise.