You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@community.apache.org by Ross Gardler <rg...@apache.org> on 2009/12/18 11:18:27 UTC

Academic outrech activities for Hadoop

I had a meeting with Simon Metson of University of Bristal and Steve 
Loughran of HP Labs (Bristol) yesterday (both cc'd). One of the topics 
of discussion was reaching out to the academic sector from the Hadoop 
project.

In short it is felt that the academic sector has big data on a scale 
equal to or greater than big players such as Yahoo!, Facebook and 
Cloudera (e.g. Simon works on data from various sources such as 
landslide modelling for cost benefit analysis and data collected from 
experiments such as those conducted at the Large Hadron Collidor).

It was therefore agreed that there is a real need for the academic 
sector to get to grips with Hadoop. Having large data sets and practical 
applications such as these would undoubtedly help the Hadoop project in 
terms of testing and validation. It's hoped that there would eventually 
be code contributions from the sector too.

I suggested that the Community Development project would be the right 
vehicle for this via the mentoring programme [1]. We are also thinking 
of organising an event or two in the UK next year.

Since I'm not involved with the Hadoop project Steve has offered to work 
with the Hadoop community to find suitable mentors. I'm posting here for 
transparency and also in the hope that others in the community may be 
interested in helping move this effort forwards.

I've not copied this mail to the Hadoop list, I'll let Steve and others 
do that.

Steve - It may be worth subscribing to dev@community.apache.org which is 
where we will be running mentoring programmes and may be able to support 
some of your other activities.

Ross

[1] http://community.apache.org/mentoringprogramme.html

Re: Academic outrech activities for Hadoop

Posted by Ross Gardler <rg...@apache.org>.
On 18/12/2009 17:10, Isabel Drost wrote:
 > On Fri Ross Gardler<rg...@apache.org>  wrote:
 >> It was therefore agreed that there is a real need for the academic
 >> sector to get to grips with Hadoop. Having large data sets and
 >> practical applications such as these would undoubtedly help the
 >> Hadoop project in terms of testing and validation. It's hoped that
 >> there would eventually be code contributions from the sector too.
 >
 > At least here in Berlin (TU Berlin as well as HPI Potsdam) there is
 > interest in contributing back to the community (in this case the
 > Hadoop and the Mahout community). Currently it is mostly student
 > projects done during labs that people (lecturers as well as some
 > students) are interested in contributing. I told them about the ASF
 > mentoring program already.

Excellent.

 > I have been talking to several local people, there are two to
 > three problems usually encountered in the academic sector:
 >
 > 1) Doing open source work does not give you any credits for your
 > scientific carrier, so there is little incentive to contribute back or
 > to release your work under an open source license. I personally have no
 > great idea how this problem could be fixed except through finding
 > interested individuals, discussing the advantages of free software in
 > general and personal participation in open source projects in
 > particular.

I face this problem every day in my day job. There are many incentives 
for contributing back, we just have to educated them. Some examples:

- better qualility research
- reproducable research
- sustainable research outputs
- exposure to addititional funding streams
- wider network of research collabroators

The problem is that they don't understand open source software 
development. in the commercial sector the equivalent argument is:

"There is no direct credit in my annual review, so there is little 
incentive to contribute back or to release my work under an open source 
licence."

 > 2) People are not really familiar with how to contribute to projects.
 > So there is a need for mentoring, explaining and getting the word out.

Again, I deal with that daily in my day job and now we have the 
Community Development project to help solve this problem. Of course this 
is true of the commercial world as well as the academic world.

 > 3) Some people are not familiar with the transparent, public model of
 > communication in most open source projects, especially here at the ASF.
 > Again, fixing this problem probably needs quite a bit of explanation
 > and "getting used to".

Most people - both academic and non-academic are unfamiliar with this model.

In all cases there are lots of resources available at 
http://www.oss-watch.ac.uk

These are written for the academic sector but in most cases are 
applicable to the non-academic sector.

 > Me personally, I made the experience, that it is comparably easy to
 > get students convinced. It does get a little harder with PhD. students
 > but is still possible. General lack of time when working on a PhD. adds
 > to the problems.

Agreeed. The key is to find people who actually understand the benefits 
and want to participate. With respect to Hadoop in the UK we have at 
least one research leader who wants to go this way (Simon Metson, cc'd).

Ross



Re: Academic outrech activities for Hadoop

Posted by Isabel Drost <is...@apache.org>.
On Fri Ross Gardler <rg...@apache.org> wrote:
> It was therefore agreed that there is a real need for the academic 
> sector to get to grips with Hadoop. Having large data sets and
> practical applications such as these would undoubtedly help the
> Hadoop project in terms of testing and validation. It's hoped that
> there would eventually be code contributions from the sector too.

At least here in Berlin (TU Berlin as well as HPI Potsdam) there is
interest in contributing back to the community (in this case the
Hadoop and the Mahout community). Currently it is mostly student
projects done during labs that people (lecturers as well as some
students) are interested in contributing. I told them about the ASF
mentoring program already.

I have been talking to several local people, there are two to
three problems usually encountered in the academic sector:

1) Doing open source work does not give you any credits for your
scientific carrier, so there is little incentive to contribute back or
to release your work under an open source license. I personally have no
great idea how this problem could be fixed except through finding
interested individuals, discussing the advantages of free software in
general and personal participation in open source projects in
particular.

2) People are not really familiar with how to contribute to projects.
So there is a need for mentoring, explaining and getting the word out.

3) Some people are not familiar with the transparent, public model of
communication in most open source projects, especially here at the ASF.
Again, fixing this problem probably needs quite a bit of explanation
and "getting used to".

Me personally, I made the experience, that it is comparably easy to
get students convinced. It does get a little harder with PhD. students
but is still possible. General lack of time when working on a PhD. adds
to the problems.

Isabel