You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Keith Wiley <kw...@keithwiley.com> on 2013/10/02 20:48:14 UTC
Use distribute to spread across reducers
I'm trying to create a subset of a large table for testing. The following approach works:
create table subset_table as
select * from large_table limit 1000
...but it only uses one reducer. I would like to speed up the process of creating a subset but distributing across multiple reducers. I already tried explicitly setting mapred.reduce.tasks and hive.exec.reducers.max to values larger than 1, but in this particular case, those values seem to be over-ridden by Hive's internal query->to->mapreduce conversion; it ignores those parameters.
So, I tried this:
create table subset_table as
select * from large_table limit 1000
distribute by column_name
...but that doesn't parse. I get the following error:
OK FAILED: ParseException line 3:0 missing EOF at 'distribute' near '1000'.
I have tried NUMEROUS applications of parentheses, nested queries, etc. For example, here's just one (amongst perhaps ten variations on a theme):
create table subset_table as
select * from (
from (
select * from large_table limit 1000
distribute by column_name
)) s
Like I said, I've tried all sorts of combinations of the elements shown above. So far I have not even gotten any syntax to parse, much less run. Only the original query at the top will even pass the parsing stage of processing.
Any ideas?
Thanks.
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
-- Galileo Galilei
________________________________________________________________________________
Re: Use distribute to spread across reducers
Posted by Timothy Potter <th...@gmail.com>.
Hi Keith,
Have you tried the TABLESAMPLE command?
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
Tim
On Thu, Oct 3, 2013 at 11:58 AM, Yin Huai <hu...@gmail.com> wrote:
> Hello Keith,
>
> Hive will not launch a MR job for your query because it basically reads
> all columns from a table. Hive will fetch the data for you directly from
> the underlying filesystem.
>
> Thanks,
>
> Yin
>
>
>
> On Wed, Oct 2, 2013 at 2:48 PM, Keith Wiley <kw...@keithwiley.com> wrote:
>
>> I'm trying to create a subset of a large table for testing. The
>> following approach works:
>>
>> create table subset_table as
>> select * from large_table limit 1000
>>
>> ...but it only uses one reducer. I would like to speed up the process of
>> creating a subset but distributing across multiple reducers. I already
>> tried explicitly setting mapred.reduce.tasks and hive.exec.reducers.max to
>> values larger than 1, but in this particular case, those values seem to be
>> over-ridden by Hive's internal query->to->mapreduce conversion; it ignores
>> those parameters.
>>
>> So, I tried this:
>>
>> create table subset_table as
>> select * from large_table limit 1000
>> distribute by column_name
>>
>> ...but that doesn't parse. I get the following error:
>>
>> OK FAILED: ParseException line 3:0 missing EOF at 'distribute' near
>> '1000'.
>>
>> I have tried NUMEROUS applications of parentheses, nested queries, etc.
>> For example, here's just one (amongst perhaps ten variations on a theme):
>>
>> create table subset_table as
>> select * from (
>> from (
>> select * from large_table limit 1000
>> distribute by column_name
>> )) s
>>
>> Like I said, I've tried all sorts of combinations of the elements shown
>> above. So far I have not even gotten any syntax to parse, much less run.
>> Only the original query at the top will even pass the parsing stage of
>> processing.
>>
>> Any ideas?
>>
>> Thanks.
>>
>>
>> ________________________________________________________________________________
>> Keith Wiley kwiley@keithwiley.com keithwiley.com
>> music.keithwiley.com
>>
>> "I do not feel obliged to believe that the same God who has endowed us
>> with
>> sense, reason, and intellect has intended us to forgo their use."
>> -- Galileo Galilei
>>
>> ________________________________________________________________________________
>>
>>
>
Re: Use distribute to spread across reducers
Posted by Yin Huai <hu...@gmail.com>.
Hello Keith,
Hive will not launch a MR job for your query because it basically reads all
columns from a table. Hive will fetch the data for you directly from the
underlying filesystem.
Thanks,
Yin
On Wed, Oct 2, 2013 at 2:48 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> I'm trying to create a subset of a large table for testing. The following
> approach works:
>
> create table subset_table as
> select * from large_table limit 1000
>
> ...but it only uses one reducer. I would like to speed up the process of
> creating a subset but distributing across multiple reducers. I already
> tried explicitly setting mapred.reduce.tasks and hive.exec.reducers.max to
> values larger than 1, but in this particular case, those values seem to be
> over-ridden by Hive's internal query->to->mapreduce conversion; it ignores
> those parameters.
>
> So, I tried this:
>
> create table subset_table as
> select * from large_table limit 1000
> distribute by column_name
>
> ...but that doesn't parse. I get the following error:
>
> OK FAILED: ParseException line 3:0 missing EOF at 'distribute' near '1000'.
>
> I have tried NUMEROUS applications of parentheses, nested queries, etc.
> For example, here's just one (amongst perhaps ten variations on a theme):
>
> create table subset_table as
> select * from (
> from (
> select * from large_table limit 1000
> distribute by column_name
> )) s
>
> Like I said, I've tried all sorts of combinations of the elements shown
> above. So far I have not even gotten any syntax to parse, much less run.
> Only the original query at the top will even pass the parsing stage of
> processing.
>
> Any ideas?
>
> Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "I do not feel obliged to believe that the same God who has endowed us with
> sense, reason, and intellect has intended us to forgo their use."
> -- Galileo Galilei
>
> ________________________________________________________________________________
>
>