You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mahender Sarangam <Ma...@outlook.com> on 2017/01/21 00:58:18 UTC

Hive ORC Table

Hi All,

We have ORC table which is of 2 GB size. When we perform operation on 
top of this ORC table, Tez always deduce 1009 reducer every time. I 
searched 1009 is considered as Maximum value of number of Tez task. Is 
there a way to reduce the number of reducer. I see file generated 
underlying ORC some of them 500 MB or 1 GB etc. Is there way to 
distribute file size to same value/same size.


My Second scenario, we have join on 5 tables all of them are left join. 
Query goes fast till reached 99%. From 99% to 100% it takes too much 
time. We are not involving our partition column as part of LEFT JOIN 
Statement, Is there better way to resolving issues on 99% hanging 
condition. My table is of 20 GB we are left joining with another table ( 
9,00,00,000) records.


Mahens


Re: Hive ORC Table

Posted by goun na <go...@gmail.com>.
Please refer the document below as well:

Hive on Tez Performance Tuning - Determining Reducer Counts
https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html
might

I hope it gives you some clue to understand Tez inside.

2017-01-21 23:35 GMT+09:00 Mahender Sarangam <Ma...@outlook.com>:

> Yes below option, i tried it, But I'm not sure about work load (data
> ingestion). I cant go with fixed hard coded value,I would like to know
> reason for getting 1009 reducer task.
>
> On 1/20/2017 7:45 PM, goun na wrote:
>
> Hi Mahender ,
>
> 1st :
> Didn't work the following option in Tez?
>
> set mapreduce.job.reduces=100
> or
> set mapred.reduce.tasks=100 (deprecated)
>
> 2nd :
> Possibility of data skew. It happens when handling null sometimes.
>
> Goun
>
>
> 2017-01-21 9:58 GMT+09:00 Mahender Sarangam <Ma...@outlook.com>
> :
>
>> Hi All,
>>
>> We have ORC table which is of 2 GB size. When we perform operation on
>> top of this ORC table, Tez always deduce 1009 reducer every time. I
>> searched 1009 is considered as Maximum value of number of Tez task. Is
>> there a way to reduce the number of reducer. I see file generated
>> underlying ORC some of them 500 MB or 1 GB etc. Is there way to
>> distribute file size to same value/same size.
>>
>>
>> My Second scenario, we have join on 5 tables all of them are left join.
>> Query goes fast till reached 99%. From 99% to 100% it takes too much
>> time. We are not involving our partition column as part of LEFT JOIN
>> Statement, Is there better way to resolving issues on 99% hanging
>> condition. My table is of 20 GB we are left joining with another table (
>> 9,00,00,000) records.
>>
>>
>> Mahens
>>
>>
>
>

Re: Hive ORC Table

Posted by Mahender Sarangam <Ma...@outlook.com>.
Yes below option, i tried it, But I'm not sure about work load (data ingestion). I cant go with fixed hard coded value,I would like to know reason for getting 1009 reducer task.

On 1/20/2017 7:45 PM, goun na wrote:
Hi Mahender ,

1st :
Didn't work the following option in Tez?

set mapreduce.job.reduces=100
or
set mapred.reduce.tasks=100 (deprecated)

2nd :
Possibility of data skew. It happens when handling null sometimes.

Goun


2017-01-21 9:58 GMT+09:00 Mahender Sarangam <Ma...@outlook.com>>:
Hi All,

We have ORC table which is of 2 GB size. When we perform operation on
top of this ORC table, Tez always deduce 1009 reducer every time. I
searched 1009 is considered as Maximum value of number of Tez task. Is
there a way to reduce the number of reducer. I see file generated
underlying ORC some of them 500 MB or 1 GB etc. Is there way to
distribute file size to same value/same size.


My Second scenario, we have join on 5 tables all of them are left join.
Query goes fast till reached 99%. From 99% to 100% it takes too much
time. We are not involving our partition column as part of LEFT JOIN
Statement, Is there better way to resolving issues on 99% hanging
condition. My table is of 20 GB we are left joining with another table (
9,00,00,000) records.


Mahens




Re: Hive ORC Table

Posted by goun na <go...@gmail.com>.
Hi Mahender Sarangam,

1st :
Didn't work the following option in Tez?

set mapreduce.job.reduces=100
or
set mapred.reduce.tasks=100 (deprecated)

2nd :
Possibility of data skew. It happens when handling null sometimes.

Goun


2017-01-21 9:58 GMT+09:00 Mahender Sarangam <Ma...@outlook.com>:

> Hi All,
>
> We have ORC table which is of 2 GB size. When we perform operation on
> top of this ORC table, Tez always deduce 1009 reducer every time. I
> searched 1009 is considered as Maximum value of number of Tez task. Is
> there a way to reduce the number of reducer. I see file generated
> underlying ORC some of them 500 MB or 1 GB etc. Is there way to
> distribute file size to same value/same size.
>
>
> My Second scenario, we have join on 5 tables all of them are left join.
> Query goes fast till reached 99%. From 99% to 100% it takes too much
> time. We are not involving our partition column as part of LEFT JOIN
> Statement, Is there better way to resolving issues on 99% hanging
> condition. My table is of 20 GB we are left joining with another table (
> 9,00,00,000) records.
>
>
> Mahens
>
>