You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by "Xiaoyi Lu@cse.osu" <lu...@cse.ohio-state.edu> on 2015/03/24 01:08:04 UTC

[HiBD] Announcing the release of RDMA for Apache Hadoop-2.x 0.9.6

The High-Performance Big Data (HiBD) team is pleased to announce the
release of Hadoop-2.x 0.9.6 package (for Hadoop 2.x series) with the
following features.

* RDMA for Apache Hadoop-2.x 0.9.6 Features

    - Based on Apache Hadoop 2.6.0
    - High performance design with native InfiniBand and RoCE support
      at the verbs level for HDFS, MapReduce, and RPC components
    - Compliant with Apache Hadoop 2.6.0 APIs and applications
    - Easily configurable for different running modes (HHH, HHH-M, HHH-L,
      and MapReduce over Lustre) and different protocols (native InfiniBand,
      RoCE, and IPoIB)
    - On-demand connection setup
    - HDFS over native InfiniBand and RoCE
        - RDMA-based write
        - RDMA-based replication
        - Parallel replication support
        - Overlapping in different stages of write and replication
        - Enhanced hybrid HDFS design with in-memory and heterogeneous
          storage (HHH)
            - Supports three modes of operations
                - HHH (default) with I/O operations over RAM disk, SSD, and HDD
                - HHH-M (in-memory) with I/O operations in-memory
                - HHH-L (Lustre-integrated) with I/O operations in local
                  storage and Lustre
            - Policies to efficiently utilize heterogeneous storage
              devices (RAM Disk, SSD, HDD, and Lustre)
                - Greedy and Balanced policies support
                - Automatic policy selection based on available storage types
            - Hybrid replication (in-memory and persistent storage) for
              HHH default mode
            - Memory replication (in-memory only with lazy persistence) for
              HHH-M mode
            - Lustre-based fault-tolerance for HHH-L mode
                - No HDFS replication
                - Reduced local storage space usage
    - MapReduce over native InfiniBand and RoCE
        - RDMA-based shuffle
        - Pre-fetching and caching of map output
        - In-memory merge
        - Advanced optimization in overlapping
            - map, shuffle, and merge
            - shuffle, merge, and reduce
        - Optional disk-assisted shuffle
        - High performance design of MapReduce over Lustre
            - Supports two shuffle approaches
                - Lustre read based shuffle
                - RDMA based shuffle
            - Hybrid shuffle based on both shuffle approaches
                - Configurable distribution support
            - In-memory merge and overlapping of different phases
    - RPC over native InfiniBand and RoCE
        - JVM-bypassed buffer management
        - RDMA or send/recv based adaptive communication
        - Intelligent buffer allocation and adjustment for serialization
    - Tested with
        - Mellanox InfiniBand adapters (DDR, QDR, and FDR)
        - RoCE support with Mellanox adapters
        - Various multi-core platforms
        - RAM Disks, SSDs, HDDs, and Lustre

Bug Fixes (since Apache Hadoop-2.x 0.9.5)

    - Fix a hang issue in running with WordCount-like benchmarks
      - Thanks to Amit Sangroya@TCS for reporting the issue

    - Fix an issue for NameNode running with HA enabled mode
      - Thanks to Qihu Yang@AsiaInfo for reporting the issue

For downloading RDMA for Apache Hadoop-2.x 0.9.6 package and the
associated user guide, please visit the following URL:

http://hibd.cse.ohio-state.edu

Sample performance numbers for benchmarks using RDMA for Apache
Hadoop-2.x 0.9.6 version can be viewed by visiting the `Performance'
tab of the above website.

All questions, feedbacks and bug reports are welcome. Please post it
to the rdma-hadoop-discuss mailing list (rdma-hadoop-discuss at
cse.ohio-state.edu).

Thanks,

The High-Performance Big Data (HiBD) Team