You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Gerd Stolpmann <in...@gerd-stolpmann.de> on 2011/02/01 17:10:55 UTC
[ANN] Plasma MapReduce, PlasmaFS, version 0.3
Hi,
This is about the release of Plasma-0.3, an alternate and independent
implementation of map/reduce with its own dfs. This might also be
interesting for Hadoop users and developers, because this project
incorporates a number of new ideas. So far, Plasma works on smaller
clusters and shows good signs of being scalable. HA support is still
very incomplete.
--
Plasma consists of two parts (for now), namely Plasma MapReduce, a
map/reduce compute framework, and PlasmaFS, the underlying distributed
filesystem.
Major changes in version 0.3 :
* Optimized blocklist representation (extent-based)
* Improved block allocator to minimize disk seeks
* Allocating datanode access tickets in advance
* Sophisticated RAM management
* The command-line utility "plasma" supports wildcards
Of course, there are also numerous bug fixes and performance
improvements.
Plasma MapReduce is a distributed implementation of the map/reduce
algorithm scheme written in Ocaml. PlasmaFS is the underlying
distributed filesystem, also written in Ocaml. Especially the PlasmaFS
approach has numerous differences compared to HDFS:
* Data blocks are preallocated, and PlasmaFS takes care of block
placement
* Blocklists are extent-based
* Metadata is stored in a PostgreSQL db
* 2-phase commit is used to distribute the metadata db
* the full set of file access functions is supported, including
random writes
* file accesses can be transaction-based
* shared memory can be used for speeding up the data path to
locally stored data blocks
* we _think_ it is not possible to corrupt the namenode by
accident or by crashes
* PlasmaFS volumes can be directly mounted via NFS
* PlasmaFS uses ONCRPC as protocol and not home-grown protocols
(and one of the next releases will add security via GSS-API)
* We got rid of multi-threading
There is no need that user programs are written in Ocaml, as Plasma also
support a streaming mode.
Both pieces of software are bundled together in one download. The
project page with further links is
http://projects.camlcity.org/projects/plasma.html
There is now also a homepage at
http://plasma.camlcity.org
This is an early alpha release (0.3). A lot of things work already, and
you can already run distributed map/reduce jobs. However, it is in no
way complete.
Plasma is installable via GODI for Ocaml 3.12.
For discussions on specifics of Plasma there is a separate mailing list:
https://godirepo.camlcity.org/mailman/listinfo/plasma-list
Gerd
--
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany
gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------