You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Sanjay Radia <sr...@yahoo-inc.com> on 2008/11/14 19:12:27 UTC
Hadoop 1.0 Tasklist/Prerequisites discussion
=========Hadoop 1.0 Tasks/Prerequisites (strawman) =============
================================================================
Release terminology used below:
Standard release numbering:
- Only bug fixes in dot releases: m.x.y
- no changes to API, disk format, protocols or config etc.
- new features in major (m.0) and minor (m.x.0) releases
The task list below has been separated into the following 3 categories
1. Cleanup and Interface work
2. Mechanisms to support versioning and compatibility (Manual or
automated)
3. 1.0 Features or Hooks for 1.x Features
Features that we need before 2.0 and are likely to break
compatibility.
This needs to be a small list, otherwise we will have feature
creep and 1.0 will take too long.
1. =========Cleanup and Interface Work ======================
1a. Split hdfs, mapRed, core projects
1.b Decide on the visibility and stability of interfaces.
We need to decided which interfaces are external facing and which are
internal facing.
*I will shortly start a separate email thread to discuss hadoop
interface classification .*
1.c Interfaces that may deserve cleanup before 1.0
* MapReduce (new context objects API) (targetted for 0.20)
* FileSystem
This is the most important API in the system, let us clean it up
since we are
committing to it for a long time.
- declare the exceptions for each methods (even though
subclass of IOException).
- special characters in path
* Config
For example clients should need to specify only the NN address/
default file
system. Many of the other parameters should be obtained from
the NN.
* Shell cli interface and shell cli output
* Client protocols
- Make data transfer protocol "concrete" to enable versioning
* Mapred.lib
* Job logs - if we have make them external stable or evolving.
* Intra Hadoop protocols (to enable rolling upgrades)
- HDFS, MapReduce
1.d Remove deprecated methods
2. =========Mechanisms to support versioning ======================
2a. Serialization and RPC - manual or automated versioning
A mechanism for versioning (manual or automated) must be selected
so that
we can easily support compatibility will allowing methods to be
added and
fields to be added to rpc parameter data types.
2b. Dealing with old calling new
- new hdfs clients calling old
- new mapReduce framework calling client via old interface.
Note we may not need a new mechanism but merely an awareness in
the community to watchout
for such issues.
2c. Support for Protocol Transition at major releases
Note we may be able to delay this work till release 1.9.
Since the protocol can break at major releases and customers
have multiple clusters that
will not be upgraded simultaneously, we have to consider issues
related to cross cluster access.
- Need a mechanism for tranferring data out
- Today http serves that purpose. If that is all we need
then we are done.
- Today customer do not write apps that access data across
clusters because wire protocol can
break on any minor release. This will change in the 1.x series
where Hadoop will provide
wire protocol compatibility across minor releases. As a result
customers are likely to
write cross cluster apps (easy to do using URI file names).
So we will need consider our client-side being able to talk
multiple version of our protocols.
Again the good news is that we can probably will till 1.9 to
do this.
3. =========1.0 Features or Hooks for 1.x Features
======================
Hadoop 1.0 has backward compatibility rules (API and wire protocol)
that will
require that changes that break compatibility happen only at major
release
boundaries (i.e 1.0, 2.0 3.0 etc and not 1.1, 1.2, etc.) Hence
features that we
need before 2.0 that are likely to break compatibility need to be
considered
now. This needs to be a small list, otherwise we will have feature
creep and
1.0 will take too long.
3a. Security - authentication is surely going to break the wire
protocol.
3b.Clients survive NN and JT restarts
Others?
- Hooks for rolling upgrades
- Hooks for HA??