You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jimmy Lin <ji...@umd.edu> on 2010/05/08 20:25:01 UTC

Data-Intensive Text Processing with MapReduce

Hi everyone,

I'm pleased to announce the publication a new book on MapReduce 
algorithm design:

Data-Intensive Text Processing with MapReduce
by Jimmy Lin and Chris Dyer
Morgan & Claypool Publishers, 2010
http://mapreduce.me/

Abstract

Our world is being revolutionized by data-driven methods: access to 
large amounts of data has generated new insights and opened exciting new 
opportunities in commerce, science, and computing applications. 
Processing the enormous quantities of data necessary for these advances 
requires large clusters, making distributed computing paradigms more 
crucial than ever. MapReduce is a programming model for expressing 
distributed computations on massive datasets and an execution framework 
for large-scale data processing on clusters of commodity servers. The 
programming model provides an easy-to-understand abstraction for 
designing scalable algorithms, while the execution framework 
transparently handles many system-level details, ranging from scheduling 
to synchronization to fault tolerance. This book focuses on MapReduce 
algorithm design, with an emphasis on text processing algorithms common 
in natural language processing, information retrieval, and machine 
learning. We introduce the notion of MapReduce design patterns, which 
represent general reusable solutions to commonly occurring problems 
across a variety of problem domains. This book not only intends to help 
the reader "think in MapReduce", but also discusses limitations of the 
programming model as well.

Table of Contents

    1. Introduction
    2. MapReduce Basics
    3. MapReduce algorithm design
    4. Inverted Indexing for Text Retrieval
    5. Graph Algorithms
    6. EM Algorithms for Text Processing
    7. Closing Remarks

Enjoy!

-Jimmy

Re: Data-Intensive Text Processing with MapReduce

Posted by Mark Kerzner <ma...@gmail.com>.
Dear Jimmy and Chris:

I am reading your book (thank you for providing the pre-release version) and
I find it great in contents and in style. Thank you!

Sincerely,
Mark

On Sat, May 8, 2010 at 1:25 PM, Jimmy Lin <ji...@umd.edu> wrote:

> Hi everyone,
>
> I'm pleased to announce the publication a new book on MapReduce algorithm
> design:
>
> Data-Intensive Text Processing with MapReduce
> by Jimmy Lin and Chris Dyer
> Morgan & Claypool Publishers, 2010
> http://mapreduce.me/
>
> Abstract
>
> Our world is being revolutionized by data-driven methods: access to large
> amounts of data has generated new insights and opened exciting new
> opportunities in commerce, science, and computing applications. Processing
> the enormous quantities of data necessary for these advances requires large
> clusters, making distributed computing paradigms more crucial than ever.
> MapReduce is a programming model for expressing distributed computations on
> massive datasets and an execution framework for large-scale data processing
> on clusters of commodity servers. The programming model provides an
> easy-to-understand abstraction for designing scalable algorithms, while the
> execution framework transparently handles many system-level details, ranging
> from scheduling to synchronization to fault tolerance. This book focuses on
> MapReduce algorithm design, with an emphasis on text processing algorithms
> common in natural language processing, information retrieval, and machine
> learning. We introduce the notion of MapReduce design patterns, which
> represent general reusable solutions to commonly occurring problems across a
> variety of problem domains. This book not only intends to help the reader
> "think in MapReduce", but also discusses limitations of the programming
> model as well.
>
> Table of Contents
>
>   1. Introduction
>   2. MapReduce Basics
>   3. MapReduce algorithm design
>   4. Inverted Indexing for Text Retrieval
>   5. Graph Algorithms
>   6. EM Algorithms for Text Processing
>   7. Closing Remarks
>
> Enjoy!
>
> -Jimmy
>