You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Ashutosh Chauhan <as...@gmail.com> on 2009/11/14 23:07:51 UTC

optimizer hints in Pig

Hi All,

We would like to know what Pig devs feel about optimizer hints.
Traditionally, optimizer hints have been received with mixed reactions
in RDBMS world.  Oracle provides lots of knobs[1][2] to turn and tune,
while postgres[3][4] have tried to stay away from them. Mysql have few
of them (e.g., straight_join). Surajit Chaudhary [5] (Microsoft) is
making case in favor of them.
More specifically, I am talking of hints like following

a = filter 'mydata' by myudf ($1) with "selectivity 0.5";
// This is letting user to tell Pig that  myudf filters out nearly
half of tuples of 'mydata'.

c = join a by $0, b by $0 with "selectivity a.$0 = b.$0, 0.1";
// This is letting user to tell Pig that only 10% of keys in a will
match with those in b.

Exact syntax isn't important it could be adapted. But, question is
does it seem to be  a useful enough idea to be added in Pig Latin.
Pig's case is slightly different from other sql engines in that while
other systems treats them as "hints" and thus are free to ignore them
Pig treats hints as commands in a sense that it will fail even if it
can figure out that hint will result in failure of query. Perhaps, Pig
can interpret "using" as command and "with" as hint.

Thoughts?

Ashutosh

[1] http://www.dba-oracle.com/art_otn_cbo_p7.htm
[2] http://www.dba-oracle.com/oracle11g/oracle_11g_extended_optimizer_statistics.htm
[3] http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php
[4] http://archives.postgresql.org/pgsql-hackers/2006-08/msg00506.php
[5] portal.acm.org/ft_gateway.cfm?id=1559955&type=pdf

Re: optimizer hints in Pig

Posted by Alan Gates <ga...@yahoo-inc.com>.
In general I think optimizer hints fit well with Pig's approach to  
data processing, as expressed in our philosophic statement that Pigs  
are domestic animals (see http://hadoop.apache.org/pig/ 
philosophy.html ).

At least in the examples you give, I don't see 'with' as binding.  The  
user is giving Pig information; it can choose how to use it, or to not  
use it all.  I would like 'using' to continue to be binding as in that  
case the user is explicitly telling Pig to do something in a  
particular way.

Alan.

On Nov 14, 2009, at 2:07 PM, Ashutosh Chauhan wrote:

> Hi All,
>
> We would like to know what Pig devs feel about optimizer hints.
> Traditionally, optimizer hints have been received with mixed reactions
> in RDBMS world.  Oracle provides lots of knobs[1][2] to turn and tune,
> while postgres[3][4] have tried to stay away from them. Mysql have few
> of them (e.g., straight_join). Surajit Chaudhary [5] (Microsoft) is
> making case in favor of them.
> More specifically, I am talking of hints like following
>
> a = filter 'mydata' by myudf ($1) with "selectivity 0.5";
> // This is letting user to tell Pig that  myudf filters out nearly
> half of tuples of 'mydata'.
>
> c = join a by $0, b by $0 with "selectivity a.$0 = b.$0, 0.1";
> // This is letting user to tell Pig that only 10% of keys in a will
> match with those in b.
>
> Exact syntax isn't important it could be adapted. But, question is
> does it seem to be  a useful enough idea to be added in Pig Latin.
> Pig's case is slightly different from other sql engines in that while
> other systems treats them as "hints" and thus are free to ignore them
> Pig treats hints as commands in a sense that it will fail even if it
> can figure out that hint will result in failure of query. Perhaps, Pig
> can interpret "using" as command and "with" as hint.
>
> Thoughts?
>
> Ashutosh
>
> [1] http://www.dba-oracle.com/art_otn_cbo_p7.htm
> [2] http://www.dba-oracle.com/oracle11g/oracle_11g_extended_optimizer_statistics.htm
> [3] http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php
> [4] http://archives.postgresql.org/pgsql-hackers/2006-08/msg00506.php
> [5] portal.acm.org/ft_gateway.cfm?id=1559955&type=pdf


Re: optimizer hints in Pig

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.


This should be interesting to pursue.
For non-trivial plans, it can be used to model how join's are to be 
optimized for example - instead of having users specify it, infer based 
on the hints from various stages.
Similarly, an oft-repeated request is making pig infer the value of 
PARALLEL - which can also be influenced by this :

a-> load, filter1, project
b-> load, filter2, project, filter3.
join a, b.

Now we can estimate 'sizes' of a and b based on input and arrive at 
things like : reducers to be used for a, b ; what kind of joint to use 
(traditional, fragment replicate, etc), and so on.


This should be interesting if it scales to slightly more complex scripts.

The example generator gets into some basic aspects of this iirc - though 
not from a costing function point of view (atleast I remember discussing 
it with Shubham).



Regards,
Mridul



Ashutosh Chauhan wrote:
> Hi All,
> 
> We would like to know what Pig devs feel about optimizer hints.
> Traditionally, optimizer hints have been received with mixed reactions
> in RDBMS world.  Oracle provides lots of knobs[1][2] to turn and tune,
> while postgres[3][4] have tried to stay away from them. Mysql have few
> of them (e.g., straight_join). Surajit Chaudhary [5] (Microsoft) is
> making case in favor of them.
> More specifically, I am talking of hints like following
> 
> a = filter 'mydata' by myudf ($1) with "selectivity 0.5";
> // This is letting user to tell Pig that  myudf filters out nearly
> half of tuples of 'mydata'.
> 
> c = join a by $0, b by $0 with "selectivity a.$0 = b.$0, 0.1";
> // This is letting user to tell Pig that only 10% of keys in a will
> match with those in b.
> 
> Exact syntax isn't important it could be adapted. But, question is
> does it seem to be  a useful enough idea to be added in Pig Latin.
> Pig's case is slightly different from other sql engines in that while
> other systems treats them as "hints" and thus are free to ignore them
> Pig treats hints as commands in a sense that it will fail even if it
> can figure out that hint will result in failure of query. Perhaps, Pig
> can interpret "using" as command and "with" as hint.
> 
> Thoughts?
> 
> Ashutosh
> 
> [1] http://www.dba-oracle.com/art_otn_cbo_p7.htm
> [2] http://www.dba-oracle.com/oracle11g/oracle_11g_extended_optimizer_statistics.htm
> [3] http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php
> [4] http://archives.postgresql.org/pgsql-hackers/2006-08/msg00506.php
> [5] portal.acm.org/ft_gateway.cfm?id=1559955&type=pdf