You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Bhavesh Shah <bh...@gmail.com> on 2012/05/06 05:24:46 UTC

About the performance of job execution on Amazon EMR

Hello,
As we increase the number of mappers and  decrease reducers to less
number,  does performance increase?
I have never played with setting the number mapper and reducer and I don't
know how to set it.
But in case of multiple nodes then how much do I need the set the mappers
and reducers according to the number of instances to increase the
performance.
Because in case if I have large table and it requires large mapper then we
had set the mappers then in such case will be a problem?

I have many factors in my task which I think could create a little bit
issue in performance.
1) I have created many tables at the run-time. After reusing it I am
dropping it.
2) I have used lots of joins in the query. (Most of the query is taking
10-11 jobs to execute)
3) I have used indexing in the task. I am applying the index after
inserting the values into table, because I have to reuse it.
4) Also I am dynamically altering the table and inserting the values.

Will all this factors should be considered separately to increase the
performance or it will just get solved normally by setting mappers and
reducers.?

Thanks for helping me.
Sorry for inconvenience by continuously asking question.




-- 
Regards,
Bhavesh Shah