You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/05/30 23:13:44 UTC

[GitHub] [incubator-pinot] siddharthteotia opened a new pull request #5470: Derive numDocsPerChunk for var byte raw index from metadata only if config is enabled.

siddharthteotia opened a new pull request #5470:
URL: https://github.com/apache/incubator-pinot/pull/5470

(1) PR https://github.com/apache/incubator-pinot/pull/5256 added support for deriving num docs per chunk for var byte raw index create from column length. This was specifically
done as part of supporting large text values. For use cases that don't want this feature and are high QPS, they see a negative impact since size of chunk increases (earlier value
of numDocsPerChunk was hardcoded to 1000) and based on the access pattern we might end up uncompressing a bigger chunk to get values for a set of docIds. We have made this change configurable. So the default behavior is same as old (1000 docs per chunk. It can be enabled as follows

`fieldConfigList":[
{
"name":"textCol",
"encodingType":"RAW",
"indexType":"TEXT",
"properties":{
"derive.num.chunks.raw.index":"true",
}
}
`

(2) PR https://github.com/apache/incubator-pinot/pull/4791 added support for noDict for STRING/BYTES in consuming segments. Before PR 4791, even if user had STRING/BYTES as no dictionary in table config, consuming segment still created dictionary because of the lack of support for raw index. There is a particular impact of this change on the use cases that have set noDict on their STRING dimension columns for other performance reasons and also want metricsAggregation. These use cases don't get to aggregateMetrics because the new implementation was able to honor their table config setting of noDict on STRING/BYTES and created a raw index. Without metrics aggregation, memory pressure increases. So to continue aggregating metrics for such cases, we will create dictionary for STRING/BYTES even if the column is part of noDictionary set from table config.

## Description
Add a description of your PR here.
A good description should include pointers to an issue or design document, etc.
## Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
* [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?
* [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:
- New configuration options
- Deprecation of configurations
- Signature changes to public methods/interfaces
- New plugins added or old plugins removed
* [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
## Release Notes
If you have tagged this as either backward-incompat or release-notes,
you MUST add text here that you would like to see appear in release notes of the
next release.

If you have a series of commits adding or enabling a feature, then
add this section only in final commit that marks the feature completed.
Refer to earlier release notes to see examples of text

## Documentation
If you have introduced a new feature or configuration, please add it to the documentation as well.
See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org