You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Navis류승우 <na...@nexr.com> on 2013/07/02 09:50:13 UTC

Re: Override COUNT() function

MetadataOnlyOptimizer changes GBY on partition columns to simple
TableScan with one line dummy.

I think similar things can be done with stats.

2013/6/28 Peter Marron <Pe...@trilliumsoftware.com>:
> Hi,
>
>
>
> I feel sure that someone has asked for this before, but here goes…
>
>
>
> In the case where I have the query
>
>
>
>                 SELECT COUNT(*) FROM table;
>
>
>
> There are many cases where I can determine the count immediately.
>
> (For example if I have run something like:
>
>
>
> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)]
> COMPUTE STATISTICS [noscan];
>
>
>
> then there seems to be a table property “numRows” which holds a count of the
> number of rows.
>
> Now I know that the COUNT function can’t always be determined easily.
>
> If the query is more complicated, like
>
>
>
>                 SELECT COUNT(*) FROM table GROUP BY column;
>
>
>
> then obviously a simple scalar count is of no real use. But is there some
> way
>
> to intercept the simple case and avoid running a table scan?
>
>
>
> One problem that I see is that the COUNT function is a UDAF and I am
>
> assuming that the presence of any aggregate function like this is enough
>
> to force the query planner to require a Map/Reduce. Is there anyway
>
> to make the function look like a simple UDF for some queries? Or
>
> just for some tables? I guess that I’d be prepared to sacrifice the full
>
> generality of the normal COUNT function for one which
>
> only functions correctly for the simple query on my tables.
>
>
>
> So is it possible to have a different COUNT function only on certain tables?
>
>
>
> Regards,
>
>
>
> Z
>
>

Re: Override COUNT() function

Posted by Navis류승우 <na...@nexr.com>.

As you expected, there is no documentation on it (like other optimizers)

Javadoc of the class might be helpful but seemed not in detail enough.

2013/7/2 Peter Marron <Pe...@trilliumsoftware.com>:
> Thanks Navis,
>
> This is a very interesting class which I feel pretty sure that I would never have found.
> Are  there any descriptions, motivations, documentation or examples anywhere?
> I suspect that there's nothing other than the source itself, but I had to ask.
>
> Regards,
>
> Z
> -----Original Message-----
> From: Navis류승우 [mailto:navis.ryu@nexr.com]
> Sent: 02 July 2013 08:50
> To: user@hive.apache.org
> Subject: Re: Override COUNT() function
>
> MetadataOnlyOptimizer changes GBY on partition columns to simple TableScan with one line dummy.
>
> I think similar things can be done with stats.
>
> 2013/6/28 Peter Marron <Pe...@trilliumsoftware.com>:
>> Hi,
>>
>>
>>
>> I feel sure that someone has asked for this before, but here goes…
>>
>>
>>
>> In the case where I have the query
>>
>>
>>
>>                 SELECT COUNT(*) FROM table;
>>
>>
>>
>> There are many cases where I can determine the count immediately.
>>
>> (For example if I have run something like:
>>
>>
>>
>> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2],
>> ...)] COMPUTE STATISTICS [noscan];
>>
>>
>>
>> then there seems to be a table property “numRows” which holds a count
>> of the number of rows.
>>
>> Now I know that the COUNT function can’t always be determined easily.
>>
>> If the query is more complicated, like
>>
>>
>>
>>                 SELECT COUNT(*) FROM table GROUP BY column;
>>
>>
>>
>> then obviously a simple scalar count is of no real use. But is there
>> some way
>>
>> to intercept the simple case and avoid running a table scan?
>>
>>
>>
>> One problem that I see is that the COUNT function is a UDAF and I am
>>
>> assuming that the presence of any aggregate function like this is
>> enough
>>
>> to force the query planner to require a Map/Reduce. Is there anyway
>>
>> to make the function look like a simple UDF for some queries? Or
>>
>> just for some tables? I guess that I’d be prepared to sacrifice the
>> full
>>
>> generality of the normal COUNT function for one which
>>
>> only functions correctly for the simple query on my tables.
>>
>>
>>
>> So is it possible to have a different COUNT function only on certain tables?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Z
>>
>>
>

RE: Override COUNT() function

Posted by Peter Marron <Pe...@trilliumsoftware.com>.

Thanks Navis,

This is a very interesting class which I feel pretty sure that I would never have found.
Are  there any descriptions, motivations, documentation or examples anywhere?
I suspect that there's nothing other than the source itself, but I had to ask.

Regards,

Z
-----Original Message-----
From: Navis류승우 [mailto:navis.ryu@nexr.com] 
Sent: 02 July 2013 08:50
To: user@hive.apache.org
Subject: Re: Override COUNT() function

MetadataOnlyOptimizer changes GBY on partition columns to simple TableScan with one line dummy.

I think similar things can be done with stats.

2013/6/28 Peter Marron <Pe...@trilliumsoftware.com>:
> Hi,
>
>
>
> I feel sure that someone has asked for this before, but here goes…
>
>
>
> In the case where I have the query
>
>
>
>                 SELECT COUNT(*) FROM table;
>
>
>
> There are many cases where I can determine the count immediately.
>
> (For example if I have run something like:
>
>
>
> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], 
> ...)] COMPUTE STATISTICS [noscan];
>
>
>
> then there seems to be a table property “numRows” which holds a count 
> of the number of rows.
>
> Now I know that the COUNT function can’t always be determined easily.
>
> If the query is more complicated, like
>
>
>
>                 SELECT COUNT(*) FROM table GROUP BY column;
>
>
>
> then obviously a simple scalar count is of no real use. But is there 
> some way
>
> to intercept the simple case and avoid running a table scan?
>
>
>
> One problem that I see is that the COUNT function is a UDAF and I am
>
> assuming that the presence of any aggregate function like this is 
> enough
>
> to force the query planner to require a Map/Reduce. Is there anyway
>
> to make the function look like a simple UDF for some queries? Or
>
> just for some tables? I guess that I’d be prepared to sacrifice the 
> full
>
> generality of the normal COUNT function for one which
>
> only functions correctly for the simple query on my tables.
>
>
>
> So is it possible to have a different COUNT function only on certain tables?
>
>
>
> Regards,
>
>
>
> Z
>
>