You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2019/10/05 00:30:47 UTC
[GitHub] [incubator-pinot] mcvsubbu opened a new pull request #4679: [#4667]
Fix auto-tuning algorithm to update when parameters are changed
mcvsubbu opened a new pull request #4679: [#4667] Fix auto-tuning algorithm to update when parameters are changed
URL: https://github.com/apache/incubator-pinot/pull/4679
When any of the parameters of segment auto-tuning are changed, we currently
miss picking them up, since we cache the FlushThreshodUpdater in
memory in the controller. A controller restart will pick up the
new parameters, but we can do better.
Changed the auto-tuning mechanism to take parameters on every call, so
that we can recognize chantges and act accordingly.
Extra logic added:
We used to consider that we hit the time limit any time when the
number of rows in committing segment is lower than the target
we set for it. Now, we also check that the target segment size
must be lower than the desired size. If the target segment size
is higher (most likely because the operator set it higher),
we need to fall through to the computation based on ratio.
Further,
When we hit the time limit in the committing segment, it may
be the case that the new time limit is even lower than the time we
spend consuming the segment being committed.
In that case, we should reduce the number of rows consumed by
the committing segment (as per the average consumption rate)
before applying the standard multiplier when we hit the time limit.
Some examples are useful, since the logic is a bit involved:
Assume segment size was set to 200M, and time limit was 3h, and we
set the number of rows to 4M, and let a segment start consuming.
Case-1:
While it is consuming, the operator changes the optimal segment
size to 180 M.
The segment comes back with a 190M size after hitting the time limit.
and consumes 3.8M rows.
Previously, we would have increased the number of rows as 3.8M * 1.1
After this change, we will fall through to computing the rows using the
ratio (and effectively reduce the number of rows).
Case 2:
The operator changes the time limit to 1 hr.
The segment comes back the same way as before, 190M size, 3.8M rows
consumed in 3 hrs.
Previously, we would have treated this as a time limit hit, and
increased the number of rows to 3.8M * 1.1
After this change, we will assume that the segment actually consumed
(3.8M * 1 hr)/3hrs (i.e. approx 1.3M rows, and then apply the multiplier,
getting a value of about 1.4M rows target.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org