You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2022/10/21 07:21:01 UTC
[jira] [Updated] (HIVE-7277) how to decide reduce numbers according to the input size of reduce stage rather than the input size of map stage?
[ https://issues.apache.org/jira/browse/HIVE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stamatis Zampetakis updated HIVE-7277:
--------------------------------------
Fix Version/s: (was: 0.13.0)
I cleared the fixVersion field since this ticket is still open. Please review this ticket and if the fix is already committed to a specific version please set the version accordingly and mark the ticket as RESOLVED.
According to the [JIRA guidelines|https://cwiki.apache.org/confluence/display/Hive/HowToContribute] the fixVersion should be set only when the issue is resolved/closed.
> how to decide reduce numbers according to the input size of reduce stage rather than the input size of map stage?
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-7277
> URL: https://issues.apache.org/jira/browse/HIVE-7277
> Project: Hive
> Issue Type: New Feature
> Reporter: WangMeng
> Priority: Major
>
> As we know ,now hive decide the reduce numbers just by the " Input size of map/ hive.exec.reducers.bytes.per.reducer(default 1G ).....
> But ,I think the out put size of map stage may have a big difference from the original input size , so I think this strategy to decide reduce-numbers may be improper....
> So is there any feature which can decide the reduce number just according to the out put of the map stage.? thanks .
> As I know , actually ,the reduce stage will begin just after some map tasks have finished rather than until the whole map stage have finished , so I think it is improper too decide reduce numbers when the whole map stage have finished.
> As someone point ,We can just according to the out put size of the earliest map tasks which have finished to estimate the whole reduce numbers......However, in fact ,now Hive has used filter push down(where) ,which may resulting a big difference from each map task .
> So, this estimation is improper.
> thanks .
--
This message was sent by Atlassian Jira
(v8.20.10#820010)