You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mahender Sarangam <Ma...@outlook.com> on 2017/01/21 00:58:18 UTC
Hive ORC Table
Hi All,
We have ORC table which is of 2 GB size. When we perform operation on
top of this ORC table, Tez always deduce 1009 reducer every time. I
searched 1009 is considered as Maximum value of number of Tez task. Is
there a way to reduce the number of reducer. I see file generated
underlying ORC some of them 500 MB or 1 GB etc. Is there way to
distribute file size to same value/same size.
My Second scenario, we have join on 5 tables all of them are left join.
Query goes fast till reached 99%. From 99% to 100% it takes too much
time. We are not involving our partition column as part of LEFT JOIN
Statement, Is there better way to resolving issues on 99% hanging
condition. My table is of 20 GB we are left joining with another table (
9,00,00,000) records.
Mahens
Re: Hive ORC Table
Posted by goun na <go...@gmail.com>.
Please refer the document below as well:
Hive on Tez Performance Tuning - Determining Reducer Counts
https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html
might
I hope it gives you some clue to understand Tez inside.
2017-01-21 23:35 GMT+09:00 Mahender Sarangam <Ma...@outlook.com>:
> Yes below option, i tried it, But I'm not sure about work load (data
> ingestion). I cant go with fixed hard coded value,I would like to know
> reason for getting 1009 reducer task.
>
> On 1/20/2017 7:45 PM, goun na wrote:
>
> Hi Mahender ,
>
> 1st :
> Didn't work the following option in Tez?
>
> set mapreduce.job.reduces=100
> or
> set mapred.reduce.tasks=100 (deprecated)
>
> 2nd :
> Possibility of data skew. It happens when handling null sometimes.
>
> Goun
>
>
> 2017-01-21 9:58 GMT+09:00 Mahender Sarangam <Ma...@outlook.com>
> :
>
>> Hi All,
>>
>> We have ORC table which is of 2 GB size. When we perform operation on
>> top of this ORC table, Tez always deduce 1009 reducer every time. I
>> searched 1009 is considered as Maximum value of number of Tez task. Is
>> there a way to reduce the number of reducer. I see file generated
>> underlying ORC some of them 500 MB or 1 GB etc. Is there way to
>> distribute file size to same value/same size.
>>
>>
>> My Second scenario, we have join on 5 tables all of them are left join.
>> Query goes fast till reached 99%. From 99% to 100% it takes too much
>> time. We are not involving our partition column as part of LEFT JOIN
>> Statement, Is there better way to resolving issues on 99% hanging
>> condition. My table is of 20 GB we are left joining with another table (
>> 9,00,00,000) records.
>>
>>
>> Mahens
>>
>>
>
>
Re: Hive ORC Table
Posted by Mahender Sarangam <Ma...@outlook.com>.
Yes below option, i tried it, But I'm not sure about work load (data ingestion). I cant go with fixed hard coded value,I would like to know reason for getting 1009 reducer task.
On 1/20/2017 7:45 PM, goun na wrote:
Hi Mahender ,
1st :
Didn't work the following option in Tez?
set mapreduce.job.reduces=100
or
set mapred.reduce.tasks=100 (deprecated)
2nd :
Possibility of data skew. It happens when handling null sometimes.
Goun
2017-01-21 9:58 GMT+09:00 Mahender Sarangam <Ma...@outlook.com>>:
Hi All,
We have ORC table which is of 2 GB size. When we perform operation on
top of this ORC table, Tez always deduce 1009 reducer every time. I
searched 1009 is considered as Maximum value of number of Tez task. Is
there a way to reduce the number of reducer. I see file generated
underlying ORC some of them 500 MB or 1 GB etc. Is there way to
distribute file size to same value/same size.
My Second scenario, we have join on 5 tables all of them are left join.
Query goes fast till reached 99%. From 99% to 100% it takes too much
time. We are not involving our partition column as part of LEFT JOIN
Statement, Is there better way to resolving issues on 99% hanging
condition. My table is of 20 GB we are left joining with another table (
9,00,00,000) records.
Mahens
Re: Hive ORC Table
Posted by goun na <go...@gmail.com>.
Hi Mahender Sarangam,
1st :
Didn't work the following option in Tez?
set mapreduce.job.reduces=100
or
set mapred.reduce.tasks=100 (deprecated)
2nd :
Possibility of data skew. It happens when handling null sometimes.
Goun
2017-01-21 9:58 GMT+09:00 Mahender Sarangam <Ma...@outlook.com>:
> Hi All,
>
> We have ORC table which is of 2 GB size. When we perform operation on
> top of this ORC table, Tez always deduce 1009 reducer every time. I
> searched 1009 is considered as Maximum value of number of Tez task. Is
> there a way to reduce the number of reducer. I see file generated
> underlying ORC some of them 500 MB or 1 GB etc. Is there way to
> distribute file size to same value/same size.
>
>
> My Second scenario, we have join on 5 tables all of them are left join.
> Query goes fast till reached 99%. From 99% to 100% it takes too much
> time. We are not involving our partition column as part of LEFT JOIN
> Statement, Is there better way to resolving issues on 99% hanging
> condition. My table is of 20 GB we are left joining with another table (
> 9,00,00,000) records.
>
>
> Mahens
>
>