You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/09/25 03:49:00 UTC
[jira] [Commented] (IMPALA-7352) HdfsTableSink doesn't take into account insert clustering

    [ https://issues.apache.org/jira/browse/IMPALA-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626723#comment-16626723 ] 

ASF subversion and git services commented on IMPALA-7352:
---------------------------------------------------------

Commit f970b755c38ec86425f911400605d0018355ebb5 in impala's branch refs/heads/master from poojanilangekar
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=f970b75 ]

IMPALA-7352: Account for clustering in HdfsTableSink

Previously, HdfsTableSink::computeResourceProfile() didn't account
for clustering while estimating the memory requirement of an
insert fragment. This change ensures that the resource estimates
produced account for the fact that clustered inserts produce a
single partition at a time.

Testing: Modified testResourceRequirements PlannerTest to account
for clustering while generating insert plans.

Change-Id: I75f8baf5fc3e1c357edf6d0cebd1e5dbafc8a3a8
Reviewed-on: http://gerrit.cloudera.org:8080/11485
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> HdfsTableSink doesn't take into account insert clustering
> ---------------------------------------------------------
>
>                 Key: IMPALA-7352
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7352
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>            Reporter: Tim Armstrong
>            Assignee: Pooja Nilangekar
>            Priority: Major
>              Labels: resource-management
>
> I noticed that the code doesn't check whether the insert is clustered, which would mean it only produces a single partition at a time.
> {code}
>   @Override
>   public void computeResourceProfile(TQueryOptions queryOptions) {
>     HdfsTable table = (HdfsTable) targetTable_;
>     // TODO: Estimate the memory requirements more accurately by partition type.
>     HdfsFileFormat format = table.getMajorityFormat();
>     PlanNode inputNode = fragment_.getPlanRoot();
>     int numInstances = fragment_.getNumInstances(queryOptions.getMt_dop());
>     // Compute the per-instance number of partitions, taking the number of nodes
>     // and the data partition of the fragment executing this sink into account.
>     long numPartitionsPerInstance =
>         fragment_.getPerInstanceNdv(queryOptions.getMt_dop(), partitionKeyExprs_);
>     if (numPartitionsPerInstance == -1) {
>       numPartitionsPerInstance = DEFAULT_NUM_PARTITIONS;
>     }
>     long perPartitionMemReq = getPerPartitionMemReq(format);
>     long perInstanceMemEstimate;
>     // The estimate is based purely on the per-partition mem req if the input cardinality_
>     // or the avg row size is unknown.
>     if (inputNode.getCardinality() == -1 || inputNode.getAvgRowSize() == -1) {
>       perInstanceMemEstimate = numPartitionsPerInstance * perPartitionMemReq;
>     } else {
>       // The per-partition estimate may be higher than the memory required to buffer
>       // the entire input data.
>       long perInstanceInputCardinality =
>           Math.max(1L, inputNode.getCardinality() / numInstances);
>       long perInstanceInputBytes =
>           (long) Math.ceil(perInstanceInputCardinality * inputNode.getAvgRowSize());
>       long perInstanceMemReq =
>           PlanNode.checkedMultiply(numPartitionsPerInstance, perPartitionMemReq);
>       perInstanceMemEstimate = Math.min(perInstanceInputBytes, perInstanceMemReq);
>     }
>     resourceProfile_ = ResourceProfile.noReservation(perInstanceMemEstimate);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org