You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Gerd Stolpmann <in...@gerd-stolpmann.de> on 2011/02/01 17:10:55 UTC

[ANN] Plasma MapReduce, PlasmaFS, version 0.3

Hi,

This is about the release of Plasma-0.3, an alternate and independent
implementation of map/reduce with its own dfs. This might also be
interesting for Hadoop users and developers, because this project
incorporates a number of new ideas. So far, Plasma works on smaller
clusters and shows good signs of being scalable. HA support is still
very incomplete.

--

Plasma consists of two parts (for now), namely Plasma MapReduce, a
map/reduce compute framework, and PlasmaFS, the underlying distributed
filesystem.

Major changes in version 0.3 :

      * Optimized blocklist representation (extent-based)
      * Improved block allocator to minimize disk seeks
      * Allocating datanode access tickets in advance
      * Sophisticated RAM management
      * The command-line utility "plasma" supports wildcards

Of course, there are also numerous bug fixes and performance
improvements.

Plasma MapReduce is a distributed implementation of the map/reduce
algorithm scheme written in Ocaml. PlasmaFS is the underlying
distributed filesystem, also written in Ocaml. Especially the PlasmaFS
approach has numerous differences compared to HDFS:

      * Data blocks are preallocated, and PlasmaFS takes care of block
        placement
      * Blocklists are extent-based
      * Metadata is stored in a PostgreSQL db
      * 2-phase commit is used to distribute the metadata db
      * the full set of file access functions is supported, including
        random writes
      * file accesses can be transaction-based
      * shared memory can be used for speeding up the data path to
        locally stored data blocks
      * we _think_ it is not possible to corrupt the namenode by
        accident or by crashes
      * PlasmaFS volumes can be directly mounted via NFS
      * PlasmaFS uses ONCRPC as protocol and not home-grown protocols
        (and one of the next releases will add security via GSS-API)
      * We got rid of multi-threading

There is no need that user programs are written in Ocaml, as Plasma also
support a streaming mode.

Both pieces of software are bundled together in one download. The
project page with further links is

http://projects.camlcity.org/projects/plasma.html

There is now also a homepage at

http://plasma.camlcity.org

This is an early alpha release (0.3). A lot of things work already, and
you can already run distributed map/reduce jobs. However, it is in no
way complete.

Plasma is installable via GODI for Ocaml 3.12.

For discussions on specifics of Plasma there is a separate mailing list:

https://godirepo.camlcity.org/mailman/listinfo/plasma-list

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------