You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Alexey Tigarev <al...@gmail.com> on 2009/11/12 23:06:44 UTC

Hadoop Streaming overhead

Hi All!

How much overhead using Hadoop Streming vs. native Java steps does add?

How can I estimate performance increase I can get by rewriting my
streaming job to Java? What parameters the overhead does depend on
(number of instances, size of input/output, etc.) and how?

Regards,
Alexey.

Re: Hadoop Streaming overhead

Posted by Jason Venner <ja...@gmail.com>.

All of your data has to be converted back and forth to strings, and passed
through pipes from the jvm to your task and back from the task to the jvm.

On Thu, Nov 12, 2009 at 10:06 PM, Alexey Tigarev
<al...@gmail.com>wrote:

> Hi All!
>
> How much overhead using Hadoop Streming vs. native Java steps does add?
>
> How can I estimate performance increase I can get by rewriting my
> streaming job to Java? What parameters the overhead does depend on
> (number of instances, size of input/output, etc.) and how?
>
> Regards,
> Alexey.
>

-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals