You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2017/07/19 07:56:00 UTC
[jira] [Created] (HIVE-17124) PlanUtils: Rand() is not a
failure-tolerant distribution column
Gopal V created HIVE-17124:
------------------------------
Summary: PlanUtils: Rand() is not a failure-tolerant distribution column
Key: HIVE-17124
URL: https://issues.apache.org/jira/browse/HIVE-17124
Project: Hive
Issue Type: Bug
Components: Query Planning
Affects Versions: 2.3.0, 3.0.0
Reporter: Gopal V
{code}
else {
// numPartitionFields = -1 means random partitioning
partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
}
{code}
This causes known data corruption during failure tolerance operations.
There is a failure tolerant distribution function inside ReduceSinkOperator, which kicks in automatically when using no partition columns
{code}
if (partitionEval.length == 0) {
// If no partition cols, just distribute the data uniformly
// to provide better load balance. If the requirement is to have a single reducer, we should
// set the number of reducers to 1. Use a constant seed to make the code deterministic.
if (random == null) {
random = new Random(12345);
}
keyHashCode = random.nextInt();
}
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)