You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Black, Michael (IS)" <Mi...@ngc.com> on 2011/02/01 17:31:39 UTC

RE::How to speed up of Map/Reduce job?

Try this rather small C++ program...it will more than likley be a LOT faster than anything you could do in hadoop.  Hadoop is not the hammer for every nail.  Too many people think that any "cluster" solution will automagically scale their problem...tain't true.

I'd appreciate hearing your results with this.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main(int argc, char *argv[])
{
    if (argc < 2) {
        cerr << "Usage: " << argv[0] << " [filename]" << endl;
        return -1;
    }
    ifstream in(argv[1]);
    if (!in) {
        perror(argv[1]);
        return -1;
    }
    string str;
    in >> str;
    int n=0;
    while(!in.eof()) {
        ++n;
        //cout << str << endl;
        in >> str;
    }
    in.close();
    cout << n << " words" << endl;
    return 0;
}

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate



________________________________________
From: Igor Bubkin [igba14@gmail.com]
Sent: Tuesday, February 01, 2011 2:19 AM
To: common-issues@hadoop.apache.org
Cc: common-user@hadoop.apache.org
Subject: EXTERNAL:How to speed up of Map/Reduce job?

Hello everybody

I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
example. It takes about 20 sec for processing of 1,5MB text file. We want to
use Map/Reduce in real time (interactive: by user's requests). User can't
wait for his request 20 sec. This is too long. Is it possible to reduce time
of Map/Reduce job? Or may be I misunderstand something?

BR,
Igor Babkin, Mifors.com

Re: :How to speed up of Map/Reduce job?

Posted by madhu phatak <ph...@gmail.com>.
Most of the Hadoop uses includes processing of large data. But in real time
applications , the data provided by user will be relatively small ,in which
its not advised to use Hadoop
On Tue, Feb 1, 2011 at 10:01 PM, Black, Michael (IS) <Michael.Black2@ngc.com
> wrote:

> Try this rather small C++ program...it will more than likley be a LOT
> faster than anything you could do in hadoop.  Hadoop is not the hammer for
> every nail.  Too many people think that any "cluster" solution will
> automagically scale their problem...tain't true.
>
> I'd appreciate hearing your results with this.
>
> #include <iostream>
> #include <fstream>
> #include <string>
>
> using namespace std;
>
> int main(int argc, char *argv[])
> {
>    if (argc < 2) {
>        cerr << "Usage: " << argv[0] << " [filename]" << endl;
>        return -1;
>    }
>    ifstream in(argv[1]);
>    if (!in) {
>        perror(argv[1]);
>        return -1;
>    }
>    string str;
>    in >> str;
>    int n=0;
>    while(!in.eof()) {
>        ++n;
>        //cout << str << endl;
>        in >> str;
>    }
>    in.close();
>    cout << n << " words" << endl;
>    return 0;
> }
>
> Michael D. Black
> Senior Scientist
> NG Information Systems
> Advanced Analytics Directorate
>
>
>
> ________________________________________
> From: Igor Bubkin [igba14@gmail.com]
> Sent: Tuesday, February 01, 2011 2:19 AM
> To: common-issues@hadoop.apache.org
> Cc: common-user@hadoop.apache.org
> Subject: EXTERNAL:How to speed up of Map/Reduce job?
>
> Hello everybody
>
> I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
> example. It takes about 20 sec for processing of 1,5MB text file. We want
> to
> use Map/Reduce in real time (interactive: by user's requests). User can't
> wait for his request 20 sec. This is too long. Is it possible to reduce
> time
> of Map/Reduce job? Or may be I misunderstand something?
>
> BR,
> Igor Babkin, Mifors.com