You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Pooja Nilangekar (JIRA)" <ji...@apache.org> on 2018/09/25 17:32:00 UTC
[jira] [Resolved] (IMPALA-7352) HdfsTableSink doesn't take into
account insert clustering
[ https://issues.apache.org/jira/browse/IMPALA-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pooja Nilangekar resolved IMPALA-7352.
--------------------------------------
Resolution: Fixed
Fix Version/s: Impala 3.1.0
> HdfsTableSink doesn't take into account insert clustering
> ---------------------------------------------------------
>
> Key: IMPALA-7352
> URL: https://issues.apache.org/jira/browse/IMPALA-7352
> Project: IMPALA
> Issue Type: Sub-task
> Components: Frontend
> Reporter: Tim Armstrong
> Assignee: Pooja Nilangekar
> Priority: Major
> Labels: resource-management
> Fix For: Impala 3.1.0
>
>
> I noticed that the code doesn't check whether the insert is clustered, which would mean it only produces a single partition at a time.
> {code}
> @Override
> public void computeResourceProfile(TQueryOptions queryOptions) {
> HdfsTable table = (HdfsTable) targetTable_;
> // TODO: Estimate the memory requirements more accurately by partition type.
> HdfsFileFormat format = table.getMajorityFormat();
> PlanNode inputNode = fragment_.getPlanRoot();
> int numInstances = fragment_.getNumInstances(queryOptions.getMt_dop());
> // Compute the per-instance number of partitions, taking the number of nodes
> // and the data partition of the fragment executing this sink into account.
> long numPartitionsPerInstance =
> fragment_.getPerInstanceNdv(queryOptions.getMt_dop(), partitionKeyExprs_);
> if (numPartitionsPerInstance == -1) {
> numPartitionsPerInstance = DEFAULT_NUM_PARTITIONS;
> }
> long perPartitionMemReq = getPerPartitionMemReq(format);
> long perInstanceMemEstimate;
> // The estimate is based purely on the per-partition mem req if the input cardinality_
> // or the avg row size is unknown.
> if (inputNode.getCardinality() == -1 || inputNode.getAvgRowSize() == -1) {
> perInstanceMemEstimate = numPartitionsPerInstance * perPartitionMemReq;
> } else {
> // The per-partition estimate may be higher than the memory required to buffer
> // the entire input data.
> long perInstanceInputCardinality =
> Math.max(1L, inputNode.getCardinality() / numInstances);
> long perInstanceInputBytes =
> (long) Math.ceil(perInstanceInputCardinality * inputNode.getAvgRowSize());
> long perInstanceMemReq =
> PlanNode.checkedMultiply(numPartitionsPerInstance, perPartitionMemReq);
> perInstanceMemEstimate = Math.min(perInstanceInputBytes, perInstanceMemReq);
> }
> resourceProfile_ = ResourceProfile.noReservation(perInstanceMemEstimate);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org