You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jie Li <ji...@cs.duke.edu> on 2012/12/14 03:01:42 UTC

A tool to analyze and tune performance for Hive?

Hi everyone,

May I know if there is any tool available to analyze and tune the
performance for Hive queries? And what is the state of the art?

I had some experience on tuning Pig, based on manually clicking JT web
pages and collecting pieces of data from here and there, and guessing
what might be wrong. That was a slow and uncomfortable process. So
before I dive into Hive, I'd like to hear any experience from you.

PS: for individual jobs, we built a tool called Starfish:
http://www.cs.duke.edu/starfish/release.html . It can be used to
analyze the job's performance and profile the job for auto-tuning. It
could be used for Hive too, but now doesn't capture the Hive-related
info, as well as the interaction among jobs.

Thanks,
Jie

Re: A tool to analyze and tune performance for Hive?

Posted by Jie Li <ji...@cs.duke.edu>.
Thanks for the pointer to HiTune. The dataflow graphs in the paper looks nice.

The potential issues I can see:
1) the data collection requires a Chukwa cluster being set up. Seems
too heavy-weight?
2) drill down analysis. Besides those graphs shown in the paper, can
users further drill down to the query or jobs?

It'll be nice to have some sample data available, so users can try a quick demo.

Jie

On Thu, Dec 13, 2012 at 9:12 PM, Zheng, Kai <ka...@intel.com> wrote:
> You may have a try for HiTune & HiBench. Just google for them.
>
> -----Original Message-----
> From: Jie Li [mailto:jieli@cs.duke.edu]
> Sent: Friday, December 14, 2012 10:02 AM
> To: user@hive.apache.org
> Subject: A tool to analyze and tune performance for Hive?
>
> Hi everyone,
>
> May I know if there is any tool available to analyze and tune the performance for Hive queries? And what is the state of the art?
>
> I had some experience on tuning Pig, based on manually clicking JT web pages and collecting pieces of data from here and there, and guessing what might be wrong. That was a slow and uncomfortable process. So before I dive into Hive, I'd like to hear any experience from you.
>
> PS: for individual jobs, we built a tool called Starfish:
> http://www.cs.duke.edu/starfish/release.html . It can be used to analyze the job's performance and profile the job for auto-tuning. It could be used for Hive too, but now doesn't capture the Hive-related info, as well as the interaction among jobs.
>
> Thanks,
> Jie
>

RE: A tool to analyze and tune performance for Hive?

Posted by "Zheng, Kai" <ka...@intel.com>.
You may have a try for HiTune & HiBench. Just google for them.

-----Original Message-----
From: Jie Li [mailto:jieli@cs.duke.edu] 
Sent: Friday, December 14, 2012 10:02 AM
To: user@hive.apache.org
Subject: A tool to analyze and tune performance for Hive?

Hi everyone,

May I know if there is any tool available to analyze and tune the performance for Hive queries? And what is the state of the art?

I had some experience on tuning Pig, based on manually clicking JT web pages and collecting pieces of data from here and there, and guessing what might be wrong. That was a slow and uncomfortable process. So before I dive into Hive, I'd like to hear any experience from you.

PS: for individual jobs, we built a tool called Starfish:
http://www.cs.duke.edu/starfish/release.html . It can be used to analyze the job's performance and profile the job for auto-tuning. It could be used for Hive too, but now doesn't capture the Hive-related info, as well as the interaction among jobs.

Thanks,
Jie