You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2012/12/26 02:56:26 UTC

good way to debug map reduce code

Hi,
  I have been using python hadoop streaming framework to write the code and
now I am slowly moving towards the core java api's.
And I am getting comfortable with it but what is the quickest way to debug
the map reduce native code..
like in hadoop streaming this worked great.
% cat input.txt | python mapper.py | sort | python reducer.py

If there use to be any coding error.. it use to just throw them off and it
was very fast to debug as you code.
Is there any similar way .. where i dont have to run hadoop jobs to debg
and wait and go thru hadoop logs to see that maybe i miss a semi-colon..
Thanks
Jamal

Re: good way to debug map reduce code

Posted by Rishi Yadav <ri...@infoobjects.com>.

pseudo-distributed mode is the only way you can test your code as far as I
know. This means that you are running a single node cluster. Are you using
eclipse?

Thanks and Regards,

Rishi Yadav

(o) 408.988.2000x113 ||  (f) 408.716.2726

InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*

*INC 500 Fastest growing company in 2012 || 2011*

*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal

2041 Mission College Boulevard, #280 || Santa Clara, CA 95054




On Tue, Dec 25, 2012 at 5:56 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>   I have been using python hadoop streaming framework to write the code
> and now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug
> the map reduce native code..
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
>
> If there use to be any coding error.. it use to just throw them off and it
> was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg
> and wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal
>

Re: good way to debug map reduce code

Posted by Harsh J <ha...@cloudera.com>.

For Java MR jobs, there is Apache MRUnit that provides a good way of
writing test cases. See http://mrunit.apache.org

On Wed, Dec 26, 2012 at 7:26 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>   I have been using python hadoop streaming framework to write the code and
> now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug
> the map reduce native code..
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
>
> If there use to be any coding error.. it use to just throw them off and it
> was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg and
> wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal



-- 
Harsh J

Re: good way to debug map reduce code

Posted by SUJIT PAL <su...@comcast.net>.

Hi Jamal,

A missing semi-colon should get flagged by the Java compiler, but one way to keep you debug cycles short is to (1) use local mode and (2) small data sets which you can run through under a minute. Once you are happy that your stuff works, move to distributed and target data sets.

HTH
Sujit

On Dec 25, 2012, at 5:56 PM, jamal sasha wrote:

> Hi, 
>   I have been using python hadoop streaming framework to write the code and now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug the map reduce native code.. 
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
> 
> If there use to be any coding error.. it use to just throw them off and it was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg and wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal

Re: good way to debug map reduce code

Posted by SUJIT PAL <su...@comcast.net>.

Hi Jamal,

A missing semi-colon should get flagged by the Java compiler, but one way to keep you debug cycles short is to (1) use local mode and (2) small data sets which you can run through under a minute. Once you are happy that your stuff works, move to distributed and target data sets.

HTH
Sujit

On Dec 25, 2012, at 5:56 PM, jamal sasha wrote:

> Hi, 
>   I have been using python hadoop streaming framework to write the code and now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug the map reduce native code.. 
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
> 
> If there use to be any coding error.. it use to just throw them off and it was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg and wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal

Re: good way to debug map reduce code

Posted by Rishi Yadav <ri...@infoobjects.com>.

pseudo-distributed mode is the only way you can test your code as far as I
know. This means that you are running a single node cluster. Are you using
eclipse?

Thanks and Regards,

Rishi Yadav

(o) 408.988.2000x113 ||  (f) 408.716.2726

InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*

*INC 500 Fastest growing company in 2012 || 2011*

*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal

2041 Mission College Boulevard, #280 || Santa Clara, CA 95054




On Tue, Dec 25, 2012 at 5:56 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>   I have been using python hadoop streaming framework to write the code
> and now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug
> the map reduce native code..
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
>
> If there use to be any coding error.. it use to just throw them off and it
> was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg
> and wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal
>

Re: good way to debug map reduce code

Posted by Harsh J <ha...@cloudera.com>.

For Java MR jobs, there is Apache MRUnit that provides a good way of
writing test cases. See http://mrunit.apache.org

On Wed, Dec 26, 2012 at 7:26 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>   I have been using python hadoop streaming framework to write the code and
> now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug
> the map reduce native code..
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
>
> If there use to be any coding error.. it use to just throw them off and it
> was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg and
> wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal



-- 
Harsh J

Re: good way to debug map reduce code

Posted by Rishi Yadav <ri...@infoobjects.com>.

pseudo-distributed mode is the only way you can test your code as far as I
know. This means that you are running a single node cluster. Are you using
eclipse?

Thanks and Regards,

Rishi Yadav

(o) 408.988.2000x113 ||  (f) 408.716.2726

InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*

*INC 500 Fastest growing company in 2012 || 2011*

*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal

2041 Mission College Boulevard, #280 || Santa Clara, CA 95054




On Tue, Dec 25, 2012 at 5:56 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>   I have been using python hadoop streaming framework to write the code
> and now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug
> the map reduce native code..
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
>
> If there use to be any coding error.. it use to just throw them off and it
> was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg
> and wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal
>

Re: good way to debug map reduce code

Posted by Harsh J <ha...@cloudera.com>.

For Java MR jobs, there is Apache MRUnit that provides a good way of
writing test cases. See http://mrunit.apache.org

On Wed, Dec 26, 2012 at 7:26 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>   I have been using python hadoop streaming framework to write the code and
> now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug
> the map reduce native code..
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
>
> If there use to be any coding error.. it use to just throw them off and it
> was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg and
> wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal



-- 
Harsh J

Re: good way to debug map reduce code

Posted by Rishi Yadav <ri...@infoobjects.com>.

pseudo-distributed mode is the only way you can test your code as far as I
know. This means that you are running a single node cluster. Are you using
eclipse?

Thanks and Regards,

Rishi Yadav

(o) 408.988.2000x113 ||  (f) 408.716.2726

InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*

*INC 500 Fastest growing company in 2012 || 2011*

*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal

2041 Mission College Boulevard, #280 || Santa Clara, CA 95054




On Tue, Dec 25, 2012 at 5:56 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>   I have been using python hadoop streaming framework to write the code
> and now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug
> the map reduce native code..
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
>
> If there use to be any coding error.. it use to just throw them off and it
> was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg
> and wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal
>

Re: good way to debug map reduce code

Posted by SUJIT PAL <su...@comcast.net>.

Hi Jamal,

A missing semi-colon should get flagged by the Java compiler, but one way to keep you debug cycles short is to (1) use local mode and (2) small data sets which you can run through under a minute. Once you are happy that your stuff works, move to distributed and target data sets.

HTH
Sujit

On Dec 25, 2012, at 5:56 PM, jamal sasha wrote:

> Hi, 
>   I have been using python hadoop streaming framework to write the code and now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug the map reduce native code.. 
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
> 
> If there use to be any coding error.. it use to just throw them off and it was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg and wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal

Re: good way to debug map reduce code

Posted by Harsh J <ha...@cloudera.com>.

For Java MR jobs, there is Apache MRUnit that provides a good way of
writing test cases. See http://mrunit.apache.org

On Wed, Dec 26, 2012 at 7:26 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>   I have been using python hadoop streaming framework to write the code and
> now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug
> the map reduce native code..
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
>
> If there use to be any coding error.. it use to just throw them off and it
> was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg and
> wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal



-- 
Harsh J

Re: good way to debug map reduce code

Posted by SUJIT PAL <su...@comcast.net>.

Hi Jamal,

A missing semi-colon should get flagged by the Java compiler, but one way to keep you debug cycles short is to (1) use local mode and (2) small data sets which you can run through under a minute. Once you are happy that your stuff works, move to distributed and target data sets.

HTH
Sujit

On Dec 25, 2012, at 5:56 PM, jamal sasha wrote:

> Hi, 
>   I have been using python hadoop streaming framework to write the code and now I am slowly moving towards the core java api's.
> And I am getting comfortable with it but what is the quickest way to debug the map reduce native code.. 
> like in hadoop streaming this worked great.
> % cat input.txt | python mapper.py | sort | python reducer.py
> 
> If there use to be any coding error.. it use to just throw them off and it was very fast to debug as you code.
> Is there any similar way .. where i dont have to run hadoop jobs to debg and wait and go thru hadoop logs to see that maybe i miss a semi-colon..
> Thanks
> Jamal