You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2017/07/19 07:56:00 UTC

[jira] [Created] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column

Gopal V created HIVE-17124:
------------------------------

             Summary: PlanUtils: Rand() is not a failure-tolerant distribution column
                 Key: HIVE-17124
                 URL: https://issues.apache.org/jira/browse/HIVE-17124
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
    Affects Versions: 2.3.0, 3.0.0
            Reporter: Gopal V


{code}
else {
      // numPartitionFields = -1 means random partitioning
      partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
    }
{code}

This causes known data corruption during failure tolerance operations.

There is a failure tolerant distribution function inside ReduceSinkOperator, which kicks in automatically when using no partition columns

{code}
    if (partitionEval.length == 0) {
      // If no partition cols, just distribute the data uniformly
      // to provide better load balance. If the requirement is to have a single reducer, we should
      // set the number of reducers to 1. Use a constant seed to make the code deterministic.
      if (random == null) {
        random = new Random(12345);
      }
      keyHashCode = random.nextInt();
    }
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)