You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Alexis De La Cruz Toledo <al...@gmail.com> on 2012/02/09 03:17:11 UTC

optimization of tree plan generated by Hive...

Hi! My name is Alexis. I am a master student of Cinvestav, DF, México.
Actually I am doing my thesis work and I would like to participate in
Google Summer of Code 2012 (
http://google-melange.appspot.com/gsoc/events/google/gsoc2012)
I'm interesting in improve Hive and I have been studying hadoop and hive.

I have interesting about the tree plan generated by Hive.
Call me the attention that Hive read many times the same table
and generate many jobs hadoop when the query can be
expressed in less Jobs and with only one read of the table
if I program the same query in hadoop.

I think that I can reduce the number of jobs to process a query
and read the tables one time too, no matter if used it on several jobs.

The solution could be raised of two ways:

1. Changing the part when the DAG is created, making the optimizations in
this moment.
2. After that the DAG is created, we can apply the optimizations, this
optimizations can be implemented in another class.

Where could I do this? I think that the method that compile the queries is
the method compile in class Driver, am I right?
Can someone guide me where  I could implement it?

There is a paper which discussed what I say
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
We can take it and improve or implement us own ideas.

Personally I would like to do the second options due to time.

By another hand, Someone is interested to work with me and be my mentor in
Google Summer Code 2012?

Thanks.

Regards.

-- 
Ing. Alexis de la Cruz Toledo.
*Av. Instituto Politécnico Nacional No. 2508 Col. San Pedro Zacatenco. México,
D.F, 07360 *
*CINVESTAV, DF.*

Re: optimization of tree plan generated by Hive...

Posted by Carl Steinbach <ca...@cloudera.com>.

Hi Alexis,

Work is already underway to add the YSmart optimizer to Hive. Please take a
look at https://issues.apache.org/jira/browse/HIVE-2206.

Thanks.

Carl

On Wed, Feb 8, 2012 at 6:17 PM, Alexis De La Cruz Toledo <
alexisdct@gmail.com> wrote:

> Hi! My name is Alexis. I am a master student of Cinvestav, DF, México.
> Actually I am doing my thesis work and I would like to participate in
> Google Summer of Code 2012 (
> http://google-melange.appspot.com/gsoc/events/google/gsoc2012)
> I'm interesting in improve Hive and I have been studying hadoop and hive.
>
> I have interesting about the tree plan generated by Hive.
> Call me the attention that Hive read many times the same table
> and generate many jobs hadoop when the query can be
> expressed in less Jobs and with only one read of the table
> if I program the same query in hadoop.
>
> I think that I can reduce the number of jobs to process a query
> and read the tables one time too, no matter if used it on several jobs.
>
> The solution could be raised of two ways:
>
> 1. Changing the part when the DAG is created, making the optimizations in
> this moment.
> 2. After that the DAG is created, we can apply the optimizations, this
> optimizations can be implemented in another class.
>
> Where could I do this? I think that the method that compile the queries is
> the method compile in class Driver, am I right?
> Can someone guide me where  I could implement it?
>
> There is a paper which discussed what I say
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> We can take it and improve or implement us own ideas.
>
> Personally I would like to do the second options due to time.
>
> By another hand, Someone is interested to work with me and be my mentor in
> Google Summer Code 2012?
>
> Thanks.
>
> Regards.
>
> --
> Ing. Alexis de la Cruz Toledo.
> *Av. Instituto Politécnico Nacional No. 2508 Col. San Pedro Zacatenco.
> México,
> D.F, 07360 *
> *CINVESTAV, DF.*
>