You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hdt.apache.org by Evert Lammerts <ev...@gmail.com> on 2013/03/16 22:54:09 UTC

Re: Early aims

Hi all,

I finally got a chance to have a look at what's been happening so far. I'm
not familiar with eclipse plugin development so I'm just going to spend
some time trying to fix some of Bob's jiras.

There's been a little discussion on the feature roadmap, but no resolution
yet. I share Bob's vision on where HDT should move to, at
http://wiki.apache.org/hdt/HDTProductExperience. I also like Adam's idea of
an early release, even if it's limited to MR job development and HDFS
browsing for 0.23 and 2.x. I think it would provide a fast feedback loop
that shows us whether we're on the right track with the small increments in
functionality we've added.

A little story to support this approach. The platform I set up at my last
job was a single kerberos secured cluster running 0.20.205.0, used by
researchers from lots of different institutes. Some work together, others
don't, and all connect from within their own networks from their own
machines - a true multi-tenant service with lots of heterogeneity in client
(lap- & desktop) configurations. I gave my users ant targets for submitting
their jobs from within Eclipse, which was a simple but effective way for
them to run MR jobs against the cluster. With the ant targets I gave them a
pre-configured Hadoop release and a krb5.ini. I also helped them export
JAVA_HOME, put the kerberos config in the right place, and set up FireFox
or Chrome for accessing the SPNEGO secured web interfaces of the JT and the
NN. This very basic setup worked well enough, but required hands-on support
quite a lot, due to conflicts with existing Hadoop installations,
environment variables, Kerberos configs, network, and so on. Most of Hadoop
is platform independent, but, at least in 0.20.x, the devil is in the
details.

Dealing with the heterogeneity on the client side was not easy, and then I
only supported Linux and OS X (although eventually I did manage to get
everything to work on Windows as well). It is going to be a challenge
getting even basic functionality to work across clients _and_ across Hadoop
installations and configurations. I'm not saying this to temper enthusiasm,
I'm just trying to argue that we should work in baby steps and get as much
feedback as possible, as soon as possible.

Anyway, just thought I'd share the experience. I'm going to dig through the
code a bit. I should be able to put some effort in during the remainder of
March. Although, probably not next week - any of you guys going to Hadoop
Summit here in my hometown?

Evert


On Thu, Jan 17, 2013 at 9:10 PM, Adam Berry <ad...@apache.org> wrote:

> Hello all,
>
> I thought while I'm making progress on the initial split of the code, that
> we could take a moment to talk about a rough early outline.
>
> So in the original project proposal, we laid out the initial goals of the
> project (http://wiki.apache.org/incubator/HadoopDevelopmentToolsProposal).
>
> Basically right now the features are MapReduce development (wizards, hadoop
> launches) and HDFS access, so getting those working with multiple versions
> of hadoop would be the first target. I think we could make this happen,
> including documentation and tests in the next few months, by end of Q1
> would be a nice (yes, its also aggressive) thing to shoot for.
>
> With a release in hand we can target various places to grow our visibility
> (as Bob brought up) and hence grow the community. At that point I think we
> will start to feel where to go next, things like Pig are attractive targets
> for tools, but as we drive and build the community the direction will
> become clearer.
>
> So what else would people like to throw into the ring here?
>
> Cheers,
> Adam
>