You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Anseh Danesh <an...@gmail.com> on 2013/10/19 08:08:43 UTC

number of map and reduce task does not change in M/R program

Hi all.. I have a question.. I have a mapreduce program that get input from
cassandra. my input is a little big, about 100000000 data. my problem is
that my program takes too long to process, but I think mapreduce is good
and fast for large volume of data. so I think maybe I have problems in
number of map and reduce tasks.. I set the number of map and reduce asks
with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
any changes.. in my logs at first there is map 0% reduce 0% and after about
2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
me I really get confused...

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

Thanks a lot for the reply..


On Mon, Oct 21, 2013 at 10:39 AM, Dieter De Witte <dr...@gmail.com>wrote:

> Anseh,
>
> Let's assume that your job is fully scalable, then it should take: 100 000
> 000 / 600 000 times the amount of time of the first job, which is 1000 / 6
> = 167 times longer. This is an ideal, probably it will be something like
> 200 times. Also try using units in your questions + scientific notation
> 10^8 records or 10^8 bytes?
>
> Regards, irW
>
>
> 2013/10/20 Anseh Danesh <an...@gmail.com>
>
>> OK... thanks a lot for the link... it is so useful... ;)
>>
>>
>> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:
>>
>>> Try profiling the job (
>>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>>> And yeah the machine specs could be the reason, that's why hadoop was
>>> invented in the first place ;)
>>>
>>>
>>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>
>>>> I try it in a small set of data, in about 600000 data and it does not
>>>> take too long. the execution time was reasonable. but in the set of
>>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>>> in my machine, I think this amount of data is very huge for my processor
>>>> and this way it takes too long to process... what do you think about this?
>>>>
>>>>
>>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com>wrote:
>>>>
>>>>> Try running the job locally on a small set of the data and see if it
>>>>> takes too long. If so, you map code might have some performance issues
>>>>>
>>>>>
>>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>>>
>>>>>> Hi all.. I have a question.. I have a mapreduce program that get
>>>>>> input from cassandra. my input is a little big, about 100000000 data. my
>>>>>> problem is that my program takes too long to process, but I think mapreduce
>>>>>> is good and fast for large volume of data. so I think maybe I have problems
>>>>>> in number of map and reduce tasks.. I set the number of map and reduce asks
>>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>>> me I really get confused...
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

Thanks a lot for the reply..


On Mon, Oct 21, 2013 at 10:39 AM, Dieter De Witte <dr...@gmail.com>wrote:

> Anseh,
>
> Let's assume that your job is fully scalable, then it should take: 100 000
> 000 / 600 000 times the amount of time of the first job, which is 1000 / 6
> = 167 times longer. This is an ideal, probably it will be something like
> 200 times. Also try using units in your questions + scientific notation
> 10^8 records or 10^8 bytes?
>
> Regards, irW
>
>
> 2013/10/20 Anseh Danesh <an...@gmail.com>
>
>> OK... thanks a lot for the link... it is so useful... ;)
>>
>>
>> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:
>>
>>> Try profiling the job (
>>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>>> And yeah the machine specs could be the reason, that's why hadoop was
>>> invented in the first place ;)
>>>
>>>
>>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>
>>>> I try it in a small set of data, in about 600000 data and it does not
>>>> take too long. the execution time was reasonable. but in the set of
>>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>>> in my machine, I think this amount of data is very huge for my processor
>>>> and this way it takes too long to process... what do you think about this?
>>>>
>>>>
>>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com>wrote:
>>>>
>>>>> Try running the job locally on a small set of the data and see if it
>>>>> takes too long. If so, you map code might have some performance issues
>>>>>
>>>>>
>>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>>>
>>>>>> Hi all.. I have a question.. I have a mapreduce program that get
>>>>>> input from cassandra. my input is a little big, about 100000000 data. my
>>>>>> problem is that my program takes too long to process, but I think mapreduce
>>>>>> is good and fast for large volume of data. so I think maybe I have problems
>>>>>> in number of map and reduce tasks.. I set the number of map and reduce asks
>>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>>> me I really get confused...
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

Thanks a lot for the reply..


On Mon, Oct 21, 2013 at 10:39 AM, Dieter De Witte <dr...@gmail.com>wrote:

> Anseh,
>
> Let's assume that your job is fully scalable, then it should take: 100 000
> 000 / 600 000 times the amount of time of the first job, which is 1000 / 6
> = 167 times longer. This is an ideal, probably it will be something like
> 200 times. Also try using units in your questions + scientific notation
> 10^8 records or 10^8 bytes?
>
> Regards, irW
>
>
> 2013/10/20 Anseh Danesh <an...@gmail.com>
>
>> OK... thanks a lot for the link... it is so useful... ;)
>>
>>
>> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:
>>
>>> Try profiling the job (
>>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>>> And yeah the machine specs could be the reason, that's why hadoop was
>>> invented in the first place ;)
>>>
>>>
>>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>
>>>> I try it in a small set of data, in about 600000 data and it does not
>>>> take too long. the execution time was reasonable. but in the set of
>>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>>> in my machine, I think this amount of data is very huge for my processor
>>>> and this way it takes too long to process... what do you think about this?
>>>>
>>>>
>>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com>wrote:
>>>>
>>>>> Try running the job locally on a small set of the data and see if it
>>>>> takes too long. If so, you map code might have some performance issues
>>>>>
>>>>>
>>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>>>
>>>>>> Hi all.. I have a question.. I have a mapreduce program that get
>>>>>> input from cassandra. my input is a little big, about 100000000 data. my
>>>>>> problem is that my program takes too long to process, but I think mapreduce
>>>>>> is good and fast for large volume of data. so I think maybe I have problems
>>>>>> in number of map and reduce tasks.. I set the number of map and reduce asks
>>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>>> me I really get confused...
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

Thanks a lot for the reply..


On Mon, Oct 21, 2013 at 10:39 AM, Dieter De Witte <dr...@gmail.com>wrote:

> Anseh,
>
> Let's assume that your job is fully scalable, then it should take: 100 000
> 000 / 600 000 times the amount of time of the first job, which is 1000 / 6
> = 167 times longer. This is an ideal, probably it will be something like
> 200 times. Also try using units in your questions + scientific notation
> 10^8 records or 10^8 bytes?
>
> Regards, irW
>
>
> 2013/10/20 Anseh Danesh <an...@gmail.com>
>
>> OK... thanks a lot for the link... it is so useful... ;)
>>
>>
>> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:
>>
>>> Try profiling the job (
>>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>>> And yeah the machine specs could be the reason, that's why hadoop was
>>> invented in the first place ;)
>>>
>>>
>>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>
>>>> I try it in a small set of data, in about 600000 data and it does not
>>>> take too long. the execution time was reasonable. but in the set of
>>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>>> in my machine, I think this amount of data is very huge for my processor
>>>> and this way it takes too long to process... what do you think about this?
>>>>
>>>>
>>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com>wrote:
>>>>
>>>>> Try running the job locally on a small set of the data and see if it
>>>>> takes too long. If so, you map code might have some performance issues
>>>>>
>>>>>
>>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>>>
>>>>>> Hi all.. I have a question.. I have a mapreduce program that get
>>>>>> input from cassandra. my input is a little big, about 100000000 data. my
>>>>>> problem is that my program takes too long to process, but I think mapreduce
>>>>>> is good and fast for large volume of data. so I think maybe I have problems
>>>>>> in number of map and reduce tasks.. I set the number of map and reduce asks
>>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>>> me I really get confused...
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Dieter De Witte <dr...@gmail.com>.

Anseh,

Let's assume that your job is fully scalable, then it should take: 100 000
000 / 600 000 times the amount of time of the first job, which is 1000 / 6
= 167 times longer. This is an ideal, probably it will be something like
200 times. Also try using units in your questions + scientific notation
10^8 records or 10^8 bytes?

Regards, irW


2013/10/20 Anseh Danesh <an...@gmail.com>

> OK... thanks a lot for the link... it is so useful... ;)
>
>
> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:
>
>> Try profiling the job (
>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>> And yeah the machine specs could be the reason, that's why hadoop was
>> invented in the first place ;)
>>
>>
>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>>
>>> I try it in a small set of data, in about 600000 data and it does not
>>> take too long. the execution time was reasonable. but in the set of
>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>> in my machine, I think this amount of data is very huge for my processor
>>> and this way it takes too long to process... what do you think about this?
>>>
>>>
>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>>>
>>>> Try running the job locally on a small set of the data and see if it
>>>> takes too long. If so, you map code might have some performance issues
>>>>
>>>>
>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>>
>>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>>> is that my program takes too long to process, but I think mapreduce is good
>>>>> and fast for large volume of data. so I think maybe I have problems in
>>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>> me I really get confused...
>>>>>
>>>>
>>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Dieter De Witte <dr...@gmail.com>.

Anseh,

Let's assume that your job is fully scalable, then it should take: 100 000
000 / 600 000 times the amount of time of the first job, which is 1000 / 6
= 167 times longer. This is an ideal, probably it will be something like
200 times. Also try using units in your questions + scientific notation
10^8 records or 10^8 bytes?

Regards, irW


2013/10/20 Anseh Danesh <an...@gmail.com>

> OK... thanks a lot for the link... it is so useful... ;)
>
>
> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:
>
>> Try profiling the job (
>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>> And yeah the machine specs could be the reason, that's why hadoop was
>> invented in the first place ;)
>>
>>
>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>>
>>> I try it in a small set of data, in about 600000 data and it does not
>>> take too long. the execution time was reasonable. but in the set of
>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>> in my machine, I think this amount of data is very huge for my processor
>>> and this way it takes too long to process... what do you think about this?
>>>
>>>
>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>>>
>>>> Try running the job locally on a small set of the data and see if it
>>>> takes too long. If so, you map code might have some performance issues
>>>>
>>>>
>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>>
>>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>>> is that my program takes too long to process, but I think mapreduce is good
>>>>> and fast for large volume of data. so I think maybe I have problems in
>>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>> me I really get confused...
>>>>>
>>>>
>>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Dieter De Witte <dr...@gmail.com>.

Anseh,

Let's assume that your job is fully scalable, then it should take: 100 000
000 / 600 000 times the amount of time of the first job, which is 1000 / 6
= 167 times longer. This is an ideal, probably it will be something like
200 times. Also try using units in your questions + scientific notation
10^8 records or 10^8 bytes?

Regards, irW


2013/10/20 Anseh Danesh <an...@gmail.com>

> OK... thanks a lot for the link... it is so useful... ;)
>
>
> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:
>
>> Try profiling the job (
>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>> And yeah the machine specs could be the reason, that's why hadoop was
>> invented in the first place ;)
>>
>>
>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>>
>>> I try it in a small set of data, in about 600000 data and it does not
>>> take too long. the execution time was reasonable. but in the set of
>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>> in my machine, I think this amount of data is very huge for my processor
>>> and this way it takes too long to process... what do you think about this?
>>>
>>>
>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>>>
>>>> Try running the job locally on a small set of the data and see if it
>>>> takes too long. If so, you map code might have some performance issues
>>>>
>>>>
>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>>
>>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>>> is that my program takes too long to process, but I think mapreduce is good
>>>>> and fast for large volume of data. so I think maybe I have problems in
>>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>> me I really get confused...
>>>>>
>>>>
>>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Dieter De Witte <dr...@gmail.com>.

Anseh,

Let's assume that your job is fully scalable, then it should take: 100 000
000 / 600 000 times the amount of time of the first job, which is 1000 / 6
= 167 times longer. This is an ideal, probably it will be something like
200 times. Also try using units in your questions + scientific notation
10^8 records or 10^8 bytes?

Regards, irW


2013/10/20 Anseh Danesh <an...@gmail.com>

> OK... thanks a lot for the link... it is so useful... ;)
>
>
> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:
>
>> Try profiling the job (
>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>> And yeah the machine specs could be the reason, that's why hadoop was
>> invented in the first place ;)
>>
>>
>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>>
>>> I try it in a small set of data, in about 600000 data and it does not
>>> take too long. the execution time was reasonable. but in the set of
>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>> in my machine, I think this amount of data is very huge for my processor
>>> and this way it takes too long to process... what do you think about this?
>>>
>>>
>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>>>
>>>> Try running the job locally on a small set of the data and see if it
>>>> takes too long. If so, you map code might have some performance issues
>>>>
>>>>
>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>>
>>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>>> is that my program takes too long to process, but I think mapreduce is good
>>>>> and fast for large volume of data. so I think maybe I have problems in
>>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>> me I really get confused...
>>>>>
>>>>
>>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

OK... thanks a lot for the link... it is so useful... ;)


On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:

> Try profiling the job (
> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
> And yeah the machine specs could be the reason, that's why hadoop was
> invented in the first place ;)
>
>
> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>
>> I try it in a small set of data, in about 600000 data and it does not
>> take too long. the execution time was reasonable. but in the set of
>> 100000000 data it really works too bad. any thing else, I have 2 processors
>> in my machine, I think this amount of data is very huge for my processor
>> and this way it takes too long to process... what do you think about this?
>>
>>
>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>>
>>> Try running the job locally on a small set of the data and see if it
>>> takes too long. If so, you map code might have some performance issues
>>>
>>>
>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>
>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>> is that my program takes too long to process, but I think mapreduce is good
>>>> and fast for large volume of data. so I think maybe I have problems in
>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>> me I really get confused...
>>>>
>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

OK... thanks a lot for the link... it is so useful... ;)


On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:

> Try profiling the job (
> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
> And yeah the machine specs could be the reason, that's why hadoop was
> invented in the first place ;)
>
>
> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>
>> I try it in a small set of data, in about 600000 data and it does not
>> take too long. the execution time was reasonable. but in the set of
>> 100000000 data it really works too bad. any thing else, I have 2 processors
>> in my machine, I think this amount of data is very huge for my processor
>> and this way it takes too long to process... what do you think about this?
>>
>>
>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>>
>>> Try running the job locally on a small set of the data and see if it
>>> takes too long. If so, you map code might have some performance issues
>>>
>>>
>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>
>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>> is that my program takes too long to process, but I think mapreduce is good
>>>> and fast for large volume of data. so I think maybe I have problems in
>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>> me I really get confused...
>>>>
>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

OK... thanks a lot for the link... it is so useful... ;)


On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:

> Try profiling the job (
> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
> And yeah the machine specs could be the reason, that's why hadoop was
> invented in the first place ;)
>
>
> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>
>> I try it in a small set of data, in about 600000 data and it does not
>> take too long. the execution time was reasonable. but in the set of
>> 100000000 data it really works too bad. any thing else, I have 2 processors
>> in my machine, I think this amount of data is very huge for my processor
>> and this way it takes too long to process... what do you think about this?
>>
>>
>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>>
>>> Try running the job locally on a small set of the data and see if it
>>> takes too long. If so, you map code might have some performance issues
>>>
>>>
>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>
>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>> is that my program takes too long to process, but I think mapreduce is good
>>>> and fast for large volume of data. so I think maybe I have problems in
>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>> me I really get confused...
>>>>
>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

OK... thanks a lot for the link... it is so useful... ;)


On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <am...@gmail.com> wrote:

> Try profiling the job (
> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
> And yeah the machine specs could be the reason, that's why hadoop was
> invented in the first place ;)
>
>
> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:
>
>> I try it in a small set of data, in about 600000 data and it does not
>> take too long. the execution time was reasonable. but in the set of
>> 100000000 data it really works too bad. any thing else, I have 2 processors
>> in my machine, I think this amount of data is very huge for my processor
>> and this way it takes too long to process... what do you think about this?
>>
>>
>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>>
>>> Try running the job locally on a small set of the data and see if it
>>> takes too long. If so, you map code might have some performance issues
>>>
>>>
>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>>
>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>> is that my program takes too long to process, but I think mapreduce is good
>>>> and fast for large volume of data. so I think maybe I have problems in
>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>> me I really get confused...
>>>>
>>>
>>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Amr Shahin <am...@gmail.com>.

Try profiling the job (
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
And yeah the machine specs could be the reason, that's why hadoop was
invented in the first place ;)


On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:

> I try it in a small set of data, in about 600000 data and it does not take
> too long. the execution time was reasonable. but in the set of 100000000
> data it really works too bad. any thing else, I have 2 processors in my
> machine, I think this amount of data is very huge for my processor and this
> way it takes too long to process... what do you think about this?
>
>
> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>
>> Try running the job locally on a small set of the data and see if it
>> takes too long. If so, you map code might have some performance issues
>>
>>
>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>
>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>> is that my program takes too long to process, but I think mapreduce is good
>>> and fast for large volume of data. so I think maybe I have problems in
>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>> me I really get confused...
>>>
>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Amr Shahin <am...@gmail.com>.

Try profiling the job (
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
And yeah the machine specs could be the reason, that's why hadoop was
invented in the first place ;)


On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:

> I try it in a small set of data, in about 600000 data and it does not take
> too long. the execution time was reasonable. but in the set of 100000000
> data it really works too bad. any thing else, I have 2 processors in my
> machine, I think this amount of data is very huge for my processor and this
> way it takes too long to process... what do you think about this?
>
>
> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>
>> Try running the job locally on a small set of the data and see if it
>> takes too long. If so, you map code might have some performance issues
>>
>>
>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>
>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>> is that my program takes too long to process, but I think mapreduce is good
>>> and fast for large volume of data. so I think maybe I have problems in
>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>> me I really get confused...
>>>
>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Amr Shahin <am...@gmail.com>.

Try profiling the job (
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
And yeah the machine specs could be the reason, that's why hadoop was
invented in the first place ;)


On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:

> I try it in a small set of data, in about 600000 data and it does not take
> too long. the execution time was reasonable. but in the set of 100000000
> data it really works too bad. any thing else, I have 2 processors in my
> machine, I think this amount of data is very huge for my processor and this
> way it takes too long to process... what do you think about this?
>
>
> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>
>> Try running the job locally on a small set of the data and see if it
>> takes too long. If so, you map code might have some performance issues
>>
>>
>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>
>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>> is that my program takes too long to process, but I think mapreduce is good
>>> and fast for large volume of data. so I think maybe I have problems in
>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>> me I really get confused...
>>>
>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Amr Shahin <am...@gmail.com>.

Try profiling the job (
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
And yeah the machine specs could be the reason, that's why hadoop was
invented in the first place ;)


On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <an...@gmail.com>wrote:

> I try it in a small set of data, in about 600000 data and it does not take
> too long. the execution time was reasonable. but in the set of 100000000
> data it really works too bad. any thing else, I have 2 processors in my
> machine, I think this amount of data is very huge for my processor and this
> way it takes too long to process... what do you think about this?
>
>
> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:
>
>> Try running the job locally on a small set of the data and see if it
>> takes too long. If so, you map code might have some performance issues
>>
>>
>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>>
>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>> is that my program takes too long to process, but I think mapreduce is good
>>> and fast for large volume of data. so I think maybe I have problems in
>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>> me I really get confused...
>>>
>>
>>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

I try it in a small set of data, in about 600000 data and it does not take
too long. the execution time was reasonable. but in the set of 100000000
data it really works too bad. any thing else, I have 2 processors in my
machine, I think this amount of data is very huge for my processor and this
way it takes too long to process... what do you think about this?

On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:

> Try running the job locally on a small set of the data and see if it takes
> too long. If so, you map code might have some performance issues
>
>
> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>
>> Hi all.. I have a question.. I have a mapreduce program that get input
>> from cassandra. my input is a little big, about 100000000 data. my problem
>> is that my program takes too long to process, but I think mapreduce is good
>> and fast for large volume of data. so I think maybe I have problems in
>> number of map and reduce tasks.. I set the number of map and reduce asks
>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>> me I really get confused...
>>
>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

I try it in a small set of data, in about 600000 data and it does not take
too long. the execution time was reasonable. but in the set of 100000000
data it really works too bad. any thing else, I have 2 processors in my
machine, I think this amount of data is very huge for my processor and this
way it takes too long to process... what do you think about this?

On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:

> Try running the job locally on a small set of the data and see if it takes
> too long. If so, you map code might have some performance issues
>
>
> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>
>> Hi all.. I have a question.. I have a mapreduce program that get input
>> from cassandra. my input is a little big, about 100000000 data. my problem
>> is that my program takes too long to process, but I think mapreduce is good
>> and fast for large volume of data. so I think maybe I have problems in
>> number of map and reduce tasks.. I set the number of map and reduce asks
>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>> me I really get confused...
>>
>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

I try it in a small set of data, in about 600000 data and it does not take
too long. the execution time was reasonable. but in the set of 100000000
data it really works too bad. any thing else, I have 2 processors in my
machine, I think this amount of data is very huge for my processor and this
way it takes too long to process... what do you think about this?

On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:

> Try running the job locally on a small set of the data and see if it takes
> too long. If so, you map code might have some performance issues
>
>
> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>
>> Hi all.. I have a question.. I have a mapreduce program that get input
>> from cassandra. my input is a little big, about 100000000 data. my problem
>> is that my program takes too long to process, but I think mapreduce is good
>> and fast for large volume of data. so I think maybe I have problems in
>> number of map and reduce tasks.. I set the number of map and reduce asks
>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>> me I really get confused...
>>
>
>

Re: number of map and reduce task does not change in M/R program

Posted by Anseh Danesh <an...@gmail.com>.

I try it in a small set of data, in about 600000 data and it does not take
too long. the execution time was reasonable. but in the set of 100000000
data it really works too bad. any thing else, I have 2 processors in my
machine, I think this amount of data is very huge for my processor and this
way it takes too long to process... what do you think about this?

On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <am...@gmail.com> wrote:

> Try running the job locally on a small set of the data and see if it takes
> too long. If so, you map code might have some performance issues
>
>
> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:
>
>> Hi all.. I have a question.. I have a mapreduce program that get input
>> from cassandra. my input is a little big, about 100000000 data. my problem
>> is that my program takes too long to process, but I think mapreduce is good
>> and fast for large volume of data. so I think maybe I have problems in
>> number of map and reduce tasks.. I set the number of map and reduce asks
>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>> me I really get confused...
>>
>
>

Re: number of map and reduce task does not change in M/R program

Posted by Amr Shahin <am...@gmail.com>.

Try running the job locally on a small set of the data and see if it takes
too long. If so, you map code might have some performance issues


On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:

> Hi all.. I have a question.. I have a mapreduce program that get input
> from cassandra. my input is a little big, about 100000000 data. my problem
> is that my program takes too long to process, but I think mapreduce is good
> and fast for large volume of data. so I think maybe I have problems in
> number of map and reduce tasks.. I set the number of map and reduce asks
> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
> any changes.. in my logs at first there is map 0% reduce 0% and after about
> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
> me I really get confused...
>

Re: number of map and reduce task does not change in M/R program

Posted by Dieter De Witte <dr...@gmail.com>.

Hi anesh,

It doesn't depend on the number of map tasks and since your reducer doesn't
start yet it doesn't depend on that as well. Maybe check the counters of
your jobs, is the number of map input records going up, if not then you're
stuck somewhere otherwise you might have a really big dataset :).. If you
want to provide more details, you can also put it on stackoverflow, it is
easier to look at your code then.

Regards, irW


2013/10/19 Anseh Danesh <an...@gmail.com>

> Hi all.. I have a question.. I have a mapreduce program that get input
> from cassandra. my input is a little big, about 100000000 data. my problem
> is that my program takes too long to process, but I think mapreduce is good
> and fast for large volume of data. so I think maybe I have problems in
> number of map and reduce tasks.. I set the number of map and reduce asks
> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
> any changes.. in my logs at first there is map 0% reduce 0% and after about
> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
> me I really get confused...
>

Re: number of map and reduce task does not change in M/R program

Posted by Amr Shahin <am...@gmail.com>.

Try running the job locally on a small set of the data and see if it takes
too long. If so, you map code might have some performance issues


On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:

> Hi all.. I have a question.. I have a mapreduce program that get input
> from cassandra. my input is a little big, about 100000000 data. my problem
> is that my program takes too long to process, but I think mapreduce is good
> and fast for large volume of data. so I think maybe I have problems in
> number of map and reduce tasks.. I set the number of map and reduce asks
> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
> any changes.. in my logs at first there is map 0% reduce 0% and after about
> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
> me I really get confused...
>

Re: number of map and reduce task does not change in M/R program

Posted by Dieter De Witte <dr...@gmail.com>.

Hi anesh,

It doesn't depend on the number of map tasks and since your reducer doesn't
start yet it doesn't depend on that as well. Maybe check the counters of
your jobs, is the number of map input records going up, if not then you're
stuck somewhere otherwise you might have a really big dataset :).. If you
want to provide more details, you can also put it on stackoverflow, it is
easier to look at your code then.

Regards, irW


2013/10/19 Anseh Danesh <an...@gmail.com>

> Hi all.. I have a question.. I have a mapreduce program that get input
> from cassandra. my input is a little big, about 100000000 data. my problem
> is that my program takes too long to process, but I think mapreduce is good
> and fast for large volume of data. so I think maybe I have problems in
> number of map and reduce tasks.. I set the number of map and reduce asks
> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
> any changes.. in my logs at first there is map 0% reduce 0% and after about
> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
> me I really get confused...
>

Re: number of map and reduce task does not change in M/R program

Posted by Dieter De Witte <dr...@gmail.com>.

Hi anesh,

It doesn't depend on the number of map tasks and since your reducer doesn't
start yet it doesn't depend on that as well. Maybe check the counters of
your jobs, is the number of map input records going up, if not then you're
stuck somewhere otherwise you might have a really big dataset :).. If you
want to provide more details, you can also put it on stackoverflow, it is
easier to look at your code then.

Regards, irW


2013/10/19 Anseh Danesh <an...@gmail.com>

> Hi all.. I have a question.. I have a mapreduce program that get input
> from cassandra. my input is a little big, about 100000000 data. my problem
> is that my program takes too long to process, but I think mapreduce is good
> and fast for large volume of data. so I think maybe I have problems in
> number of map and reduce tasks.. I set the number of map and reduce asks
> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
> any changes.. in my logs at first there is map 0% reduce 0% and after about
> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
> me I really get confused...
>

Re: number of map and reduce task does not change in M/R program

Posted by Amr Shahin <am...@gmail.com>.

Try running the job locally on a small set of the data and see if it takes
too long. If so, you map code might have some performance issues


On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:

> Hi all.. I have a question.. I have a mapreduce program that get input
> from cassandra. my input is a little big, about 100000000 data. my problem
> is that my program takes too long to process, but I think mapreduce is good
> and fast for large volume of data. so I think maybe I have problems in
> number of map and reduce tasks.. I set the number of map and reduce asks
> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
> any changes.. in my logs at first there is map 0% reduce 0% and after about
> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
> me I really get confused...
>

Re: number of map and reduce task does not change in M/R program

Posted by Amr Shahin <am...@gmail.com>.

Try running the job locally on a small set of the data and see if it takes
too long. If so, you map code might have some performance issues


On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <an...@gmail.com>wrote:

> Hi all.. I have a question.. I have a mapreduce program that get input
> from cassandra. my input is a little big, about 100000000 data. my problem
> is that my program takes too long to process, but I think mapreduce is good
> and fast for large volume of data. so I think maybe I have problems in
> number of map and reduce tasks.. I set the number of map and reduce asks
> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
> any changes.. in my logs at first there is map 0% reduce 0% and after about
> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
> me I really get confused...
>

Re: number of map and reduce task does not change in M/R program

Posted by Dieter De Witte <dr...@gmail.com>.

Hi anesh,

It doesn't depend on the number of map tasks and since your reducer doesn't
start yet it doesn't depend on that as well. Maybe check the counters of
your jobs, is the number of map input records going up, if not then you're
stuck somewhere otherwise you might have a really big dataset :).. If you
want to provide more details, you can also put it on stackoverflow, it is
easier to look at your code then.

Regards, irW


2013/10/19 Anseh Danesh <an...@gmail.com>

> Hi all.. I have a question.. I have a mapreduce program that get input
> from cassandra. my input is a little big, about 100000000 data. my problem
> is that my program takes too long to process, but I think mapreduce is good
> and fast for large volume of data. so I think maybe I have problems in
> number of map and reduce tasks.. I set the number of map and reduce asks
> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
> any changes.. in my logs at first there is map 0% reduce 0% and after about
> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
> me I really get confused...
>