You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2012/11/20 02:20:59 UTC

Fwd: debugging hadoop streaming programs (first code)

Hi,
  This is my first attempt to learn the map reduce abstraction.

My problem is as follows
I have a text file as follows:
id 1, id2, date,time,mrps,code,code2

3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0


Now what I want is to do is to count the number of transaction
happening in every half an hour between 7 am and 11 am.

So here are the  intervals.


7-7:30 ->0

7:30-8 -> 1

8-8:30->2

....

10:30-11->7

So ultimately what I am doing is creating a 2d dictionary

d[id2][interval] = count_transactions.


My mappers and reducers are attached (sample input also).

The code run just fine if i run via

cat input.txt | python mapper.py | sort | python reducer.py


Gives me the output but when i run it on clusters.. it throws an error
which is not helpful (basically on the terminal it says job
unsuccesful reason NA).

Any suggestion on what am i doing wrong.


Jamal

Re: debugging hadoop streaming programs (first code)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
The mapreduce webUI gives you all the information you need for debugging you code. Depending on where your JobTracker is, you should go hit $JT_HOST_NAME:50030. And check the job link as well the task, taskattempt and logs pages.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Nov 20, 2012, at 5:33 AM, jamal sasha wrote:

> Hi,
>    If I just use pipes, then the code runs just fine.. the issue is when I deploy it on clusters...
> :(
> Any suggestions on how to debug it.
> 
> 
> On Tue, Nov 20, 2012 at 7:42 AM, Mahesh Balija <ba...@gmail.com> wrote:
> Hi Jamal,
> 
>           You can debug your MapReduce program if it is written in java code, by running your MR job in LocalRunner mode via eclipse.
>           Or even you can have some debug statements (or even S.O.Ps) written in your code so that you can check where your job fails.
> 
>           But I am NOT sure for python, but one suggestion is can you run your Python code (Map unit & reduce unit) locally on your input data and see whether your logic has any issues.
> 
> Best,
> Mahesh Balija,
> Calsoft Labs.
> 
> 
> On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com> wrote:
> 
> 
> 
> Hi,
>   This is my first attempt to learn the map reduce abstraction.
> 
> My problem is as follows
> I have a text file as follows:
> id 1, id2, date,time,mrps,code,code2
> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0 
> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0 
> 
> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
> So here are the  intervals.
> 
> 7-7:30 ->0
> 7:30-8 -> 1
> 8-8:30->2
> ....
> 10:30-11->7
> So ultimately what I am doing is creating a 2d dictionary 
> d[id2][interval] = count_transactions.
> 
> My mappers and reducers are attached (sample input also).
> The code run just fine if i run via
> cat input.txt | python mapper.py | sort | python reducer.py
> 
> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
> Any suggestion on what am i doing wrong.
> 
> Jamal 
> 
> 
> 
> 
> 
> 
> 


Re: debugging hadoop streaming programs (first code)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
The mapreduce webUI gives you all the information you need for debugging you code. Depending on where your JobTracker is, you should go hit $JT_HOST_NAME:50030. And check the job link as well the task, taskattempt and logs pages.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Nov 20, 2012, at 5:33 AM, jamal sasha wrote:

> Hi,
>    If I just use pipes, then the code runs just fine.. the issue is when I deploy it on clusters...
> :(
> Any suggestions on how to debug it.
> 
> 
> On Tue, Nov 20, 2012 at 7:42 AM, Mahesh Balija <ba...@gmail.com> wrote:
> Hi Jamal,
> 
>           You can debug your MapReduce program if it is written in java code, by running your MR job in LocalRunner mode via eclipse.
>           Or even you can have some debug statements (or even S.O.Ps) written in your code so that you can check where your job fails.
> 
>           But I am NOT sure for python, but one suggestion is can you run your Python code (Map unit & reduce unit) locally on your input data and see whether your logic has any issues.
> 
> Best,
> Mahesh Balija,
> Calsoft Labs.
> 
> 
> On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com> wrote:
> 
> 
> 
> Hi,
>   This is my first attempt to learn the map reduce abstraction.
> 
> My problem is as follows
> I have a text file as follows:
> id 1, id2, date,time,mrps,code,code2
> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0 
> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0 
> 
> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
> So here are the  intervals.
> 
> 7-7:30 ->0
> 7:30-8 -> 1
> 8-8:30->2
> ....
> 10:30-11->7
> So ultimately what I am doing is creating a 2d dictionary 
> d[id2][interval] = count_transactions.
> 
> My mappers and reducers are attached (sample input also).
> The code run just fine if i run via
> cat input.txt | python mapper.py | sort | python reducer.py
> 
> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
> Any suggestion on what am i doing wrong.
> 
> Jamal 
> 
> 
> 
> 
> 
> 
> 


Re: debugging hadoop streaming programs (first code)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
The mapreduce webUI gives you all the information you need for debugging you code. Depending on where your JobTracker is, you should go hit $JT_HOST_NAME:50030. And check the job link as well the task, taskattempt and logs pages.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Nov 20, 2012, at 5:33 AM, jamal sasha wrote:

> Hi,
>    If I just use pipes, then the code runs just fine.. the issue is when I deploy it on clusters...
> :(
> Any suggestions on how to debug it.
> 
> 
> On Tue, Nov 20, 2012 at 7:42 AM, Mahesh Balija <ba...@gmail.com> wrote:
> Hi Jamal,
> 
>           You can debug your MapReduce program if it is written in java code, by running your MR job in LocalRunner mode via eclipse.
>           Or even you can have some debug statements (or even S.O.Ps) written in your code so that you can check where your job fails.
> 
>           But I am NOT sure for python, but one suggestion is can you run your Python code (Map unit & reduce unit) locally on your input data and see whether your logic has any issues.
> 
> Best,
> Mahesh Balija,
> Calsoft Labs.
> 
> 
> On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com> wrote:
> 
> 
> 
> Hi,
>   This is my first attempt to learn the map reduce abstraction.
> 
> My problem is as follows
> I have a text file as follows:
> id 1, id2, date,time,mrps,code,code2
> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0 
> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0 
> 
> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
> So here are the  intervals.
> 
> 7-7:30 ->0
> 7:30-8 -> 1
> 8-8:30->2
> ....
> 10:30-11->7
> So ultimately what I am doing is creating a 2d dictionary 
> d[id2][interval] = count_transactions.
> 
> My mappers and reducers are attached (sample input also).
> The code run just fine if i run via
> cat input.txt | python mapper.py | sort | python reducer.py
> 
> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
> Any suggestion on what am i doing wrong.
> 
> Jamal 
> 
> 
> 
> 
> 
> 
> 


Re: debugging hadoop streaming programs (first code)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
The mapreduce webUI gives you all the information you need for debugging you code. Depending on where your JobTracker is, you should go hit $JT_HOST_NAME:50030. And check the job link as well the task, taskattempt and logs pages.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Nov 20, 2012, at 5:33 AM, jamal sasha wrote:

> Hi,
>    If I just use pipes, then the code runs just fine.. the issue is when I deploy it on clusters...
> :(
> Any suggestions on how to debug it.
> 
> 
> On Tue, Nov 20, 2012 at 7:42 AM, Mahesh Balija <ba...@gmail.com> wrote:
> Hi Jamal,
> 
>           You can debug your MapReduce program if it is written in java code, by running your MR job in LocalRunner mode via eclipse.
>           Or even you can have some debug statements (or even S.O.Ps) written in your code so that you can check where your job fails.
> 
>           But I am NOT sure for python, but one suggestion is can you run your Python code (Map unit & reduce unit) locally on your input data and see whether your logic has any issues.
> 
> Best,
> Mahesh Balija,
> Calsoft Labs.
> 
> 
> On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com> wrote:
> 
> 
> 
> Hi,
>   This is my first attempt to learn the map reduce abstraction.
> 
> My problem is as follows
> I have a text file as follows:
> id 1, id2, date,time,mrps,code,code2
> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0 
> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0 
> 
> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
> So here are the  intervals.
> 
> 7-7:30 ->0
> 7:30-8 -> 1
> 8-8:30->2
> ....
> 10:30-11->7
> So ultimately what I am doing is creating a 2d dictionary 
> d[id2][interval] = count_transactions.
> 
> My mappers and reducers are attached (sample input also).
> The code run just fine if i run via
> cat input.txt | python mapper.py | sort | python reducer.py
> 
> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
> Any suggestion on what am i doing wrong.
> 
> Jamal 
> 
> 
> 
> 
> 
> 
> 


Re: debugging hadoop streaming programs (first code)

Posted by jamal sasha <ja...@gmail.com>.
Hi,
   If I just use pipes, then the code runs just fine.. the issue is when I
deploy it on clusters...
:(
Any suggestions on how to debug it.


On Tue, Nov 20, 2012 at 7:42 AM, Mahesh Balija
<ba...@gmail.com>wrote:

> Hi Jamal,
>
>           You can debug your MapReduce program if it is written in java
> code, by running your MR job in LocalRunner mode via eclipse.
>           Or even you can have some debug statements (or even S.O.Ps)
> written in your code so that you can check where your job fails.
>
>           But I am NOT sure for python, but one suggestion is can you run
> your Python code (Map unit & reduce unit) locally on your input data and
> see whether your logic has any issues.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com>wrote:
>
>>
>>
>>
>> Hi,
>>   This is my first attempt to learn the map reduce abstraction.
>>
>> My problem is as follows
>> I have a text file as follows:
>> id 1, id2, date,time,mrps,code,code2
>>
>> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
>> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
>>
>>
>> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
>>
>> So here are the  intervals.
>>
>>
>> 7-7:30 ->0
>>
>> 7:30-8 -> 1
>>
>> 8-8:30->2
>>
>> ....
>>
>> 10:30-11->7
>>
>> So ultimately what I am doing is creating a 2d dictionary
>>
>> d[id2][interval] = count_transactions.
>>
>>
>> My mappers and reducers are attached (sample input also).
>>
>> The code run just fine if i run via
>>
>> cat input.txt | python mapper.py | sort | python reducer.py
>>
>>
>> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
>>
>> Any suggestion on what am i doing wrong.
>>
>>
>> Jamal
>>
>>
>>
>>
>>
>>
>>
>

Re: debugging hadoop streaming programs (first code)

Posted by jamal sasha <ja...@gmail.com>.
Hi,
   If I just use pipes, then the code runs just fine.. the issue is when I
deploy it on clusters...
:(
Any suggestions on how to debug it.


On Tue, Nov 20, 2012 at 7:42 AM, Mahesh Balija
<ba...@gmail.com>wrote:

> Hi Jamal,
>
>           You can debug your MapReduce program if it is written in java
> code, by running your MR job in LocalRunner mode via eclipse.
>           Or even you can have some debug statements (or even S.O.Ps)
> written in your code so that you can check where your job fails.
>
>           But I am NOT sure for python, but one suggestion is can you run
> your Python code (Map unit & reduce unit) locally on your input data and
> see whether your logic has any issues.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com>wrote:
>
>>
>>
>>
>> Hi,
>>   This is my first attempt to learn the map reduce abstraction.
>>
>> My problem is as follows
>> I have a text file as follows:
>> id 1, id2, date,time,mrps,code,code2
>>
>> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
>> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
>>
>>
>> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
>>
>> So here are the  intervals.
>>
>>
>> 7-7:30 ->0
>>
>> 7:30-8 -> 1
>>
>> 8-8:30->2
>>
>> ....
>>
>> 10:30-11->7
>>
>> So ultimately what I am doing is creating a 2d dictionary
>>
>> d[id2][interval] = count_transactions.
>>
>>
>> My mappers and reducers are attached (sample input also).
>>
>> The code run just fine if i run via
>>
>> cat input.txt | python mapper.py | sort | python reducer.py
>>
>>
>> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
>>
>> Any suggestion on what am i doing wrong.
>>
>>
>> Jamal
>>
>>
>>
>>
>>
>>
>>
>

Re: debugging hadoop streaming programs (first code)

Posted by jamal sasha <ja...@gmail.com>.
Hi,
   If I just use pipes, then the code runs just fine.. the issue is when I
deploy it on clusters...
:(
Any suggestions on how to debug it.


On Tue, Nov 20, 2012 at 7:42 AM, Mahesh Balija
<ba...@gmail.com>wrote:

> Hi Jamal,
>
>           You can debug your MapReduce program if it is written in java
> code, by running your MR job in LocalRunner mode via eclipse.
>           Or even you can have some debug statements (or even S.O.Ps)
> written in your code so that you can check where your job fails.
>
>           But I am NOT sure for python, but one suggestion is can you run
> your Python code (Map unit & reduce unit) locally on your input data and
> see whether your logic has any issues.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com>wrote:
>
>>
>>
>>
>> Hi,
>>   This is my first attempt to learn the map reduce abstraction.
>>
>> My problem is as follows
>> I have a text file as follows:
>> id 1, id2, date,time,mrps,code,code2
>>
>> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
>> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
>>
>>
>> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
>>
>> So here are the  intervals.
>>
>>
>> 7-7:30 ->0
>>
>> 7:30-8 -> 1
>>
>> 8-8:30->2
>>
>> ....
>>
>> 10:30-11->7
>>
>> So ultimately what I am doing is creating a 2d dictionary
>>
>> d[id2][interval] = count_transactions.
>>
>>
>> My mappers and reducers are attached (sample input also).
>>
>> The code run just fine if i run via
>>
>> cat input.txt | python mapper.py | sort | python reducer.py
>>
>>
>> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
>>
>> Any suggestion on what am i doing wrong.
>>
>>
>> Jamal
>>
>>
>>
>>
>>
>>
>>
>

Re: debugging hadoop streaming programs (first code)

Posted by jamal sasha <ja...@gmail.com>.
Hi,
   If I just use pipes, then the code runs just fine.. the issue is when I
deploy it on clusters...
:(
Any suggestions on how to debug it.


On Tue, Nov 20, 2012 at 7:42 AM, Mahesh Balija
<ba...@gmail.com>wrote:

> Hi Jamal,
>
>           You can debug your MapReduce program if it is written in java
> code, by running your MR job in LocalRunner mode via eclipse.
>           Or even you can have some debug statements (or even S.O.Ps)
> written in your code so that you can check where your job fails.
>
>           But I am NOT sure for python, but one suggestion is can you run
> your Python code (Map unit & reduce unit) locally on your input data and
> see whether your logic has any issues.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com>wrote:
>
>>
>>
>>
>> Hi,
>>   This is my first attempt to learn the map reduce abstraction.
>>
>> My problem is as follows
>> I have a text file as follows:
>> id 1, id2, date,time,mrps,code,code2
>>
>> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
>> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
>>
>>
>> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
>>
>> So here are the  intervals.
>>
>>
>> 7-7:30 ->0
>>
>> 7:30-8 -> 1
>>
>> 8-8:30->2
>>
>> ....
>>
>> 10:30-11->7
>>
>> So ultimately what I am doing is creating a 2d dictionary
>>
>> d[id2][interval] = count_transactions.
>>
>>
>> My mappers and reducers are attached (sample input also).
>>
>> The code run just fine if i run via
>>
>> cat input.txt | python mapper.py | sort | python reducer.py
>>
>>
>> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
>>
>> Any suggestion on what am i doing wrong.
>>
>>
>> Jamal
>>
>>
>>
>>
>>
>>
>>
>

Re: debugging hadoop streaming programs (first code)

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Jamal,

          You can debug your MapReduce program if it is written in java
code, by running your MR job in LocalRunner mode via eclipse.
          Or even you can have some debug statements (or even S.O.Ps)
written in your code so that you can check where your job fails.

          But I am NOT sure for python, but one suggestion is can you run
your Python code (Map unit & reduce unit) locally on your input data and
see whether your logic has any issues.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com> wrote:

>
>
>
> Hi,
>   This is my first attempt to learn the map reduce abstraction.
>
> My problem is as follows
> I have a text file as follows:
> id 1, id2, date,time,mrps,code,code2
>
> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
>
>
> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
>
> So here are the  intervals.
>
>
> 7-7:30 ->0
>
> 7:30-8 -> 1
>
> 8-8:30->2
>
> ....
>
> 10:30-11->7
>
> So ultimately what I am doing is creating a 2d dictionary
>
> d[id2][interval] = count_transactions.
>
>
> My mappers and reducers are attached (sample input also).
>
> The code run just fine if i run via
>
> cat input.txt | python mapper.py | sort | python reducer.py
>
>
> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
>
> Any suggestion on what am i doing wrong.
>
>
> Jamal
>
>
>
>
>
>
>

Re: debugging hadoop streaming programs (first code)

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Jamal,

          You can debug your MapReduce program if it is written in java
code, by running your MR job in LocalRunner mode via eclipse.
          Or even you can have some debug statements (or even S.O.Ps)
written in your code so that you can check where your job fails.

          But I am NOT sure for python, but one suggestion is can you run
your Python code (Map unit & reduce unit) locally on your input data and
see whether your logic has any issues.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com> wrote:

>
>
>
> Hi,
>   This is my first attempt to learn the map reduce abstraction.
>
> My problem is as follows
> I have a text file as follows:
> id 1, id2, date,time,mrps,code,code2
>
> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
>
>
> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
>
> So here are the  intervals.
>
>
> 7-7:30 ->0
>
> 7:30-8 -> 1
>
> 8-8:30->2
>
> ....
>
> 10:30-11->7
>
> So ultimately what I am doing is creating a 2d dictionary
>
> d[id2][interval] = count_transactions.
>
>
> My mappers and reducers are attached (sample input also).
>
> The code run just fine if i run via
>
> cat input.txt | python mapper.py | sort | python reducer.py
>
>
> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
>
> Any suggestion on what am i doing wrong.
>
>
> Jamal
>
>
>
>
>
>
>

Re: debugging hadoop streaming programs (first code)

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Jamal,

          You can debug your MapReduce program if it is written in java
code, by running your MR job in LocalRunner mode via eclipse.
          Or even you can have some debug statements (or even S.O.Ps)
written in your code so that you can check where your job fails.

          But I am NOT sure for python, but one suggestion is can you run
your Python code (Map unit & reduce unit) locally on your input data and
see whether your logic has any issues.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com> wrote:

>
>
>
> Hi,
>   This is my first attempt to learn the map reduce abstraction.
>
> My problem is as follows
> I have a text file as follows:
> id 1, id2, date,time,mrps,code,code2
>
> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
>
>
> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
>
> So here are the  intervals.
>
>
> 7-7:30 ->0
>
> 7:30-8 -> 1
>
> 8-8:30->2
>
> ....
>
> 10:30-11->7
>
> So ultimately what I am doing is creating a 2d dictionary
>
> d[id2][interval] = count_transactions.
>
>
> My mappers and reducers are attached (sample input also).
>
> The code run just fine if i run via
>
> cat input.txt | python mapper.py | sort | python reducer.py
>
>
> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
>
> Any suggestion on what am i doing wrong.
>
>
> Jamal
>
>
>
>
>
>
>

Re: debugging hadoop streaming programs (first code)

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Jamal,

          You can debug your MapReduce program if it is written in java
code, by running your MR job in LocalRunner mode via eclipse.
          Or even you can have some debug statements (or even S.O.Ps)
written in your code so that you can check where your job fails.

          But I am NOT sure for python, but one suggestion is can you run
your Python code (Map unit & reduce unit) locally on your input data and
see whether your logic has any issues.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <ja...@gmail.com> wrote:

>
>
>
> Hi,
>   This is my first attempt to learn the map reduce abstraction.
>
> My problem is as follows
> I have a text file as follows:
> id 1, id2, date,time,mrps,code,code2
>
> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
>
>
> Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am.
>
> So here are the  intervals.
>
>
> 7-7:30 ->0
>
> 7:30-8 -> 1
>
> 8-8:30->2
>
> ....
>
> 10:30-11->7
>
> So ultimately what I am doing is creating a 2d dictionary
>
> d[id2][interval] = count_transactions.
>
>
> My mappers and reducers are attached (sample input also).
>
> The code run just fine if i run via
>
> cat input.txt | python mapper.py | sort | python reducer.py
>
>
> Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA).
>
> Any suggestion on what am i doing wrong.
>
>
> Jamal
>
>
>
>
>
>
>