You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/01/13 19:19:08 UTC
[GitHub] [incubator-pinot] amitchopraait opened a new issue #6437: Getting numGroupsLimitReached for medium cardinality column (10000 distinct values)
amitchopraait opened a new issue #6437:
URL: https://github.com/apache/incubator-pinot/issues/6437
I have a column with 10000 distinct values.
select count(distinct(device)) from metrics
{
"resultTable": {
"dataSchema": {
"columnNames": [
"distinctcount(device)"
],
"columnDataTypes": [
"INT"
]
},
"rows": [
[
10000
]
]
},
"exceptions": [],
"numServersQueried": 1,
"numServersResponded": 1,
"numSegmentsQueried": 1,
"numSegmentsProcessed": 1,
"numSegmentsMatched": 1,
"numConsumingSegmentsQueried": 0,
"numDocsScanned": 3999999,
"numEntriesScannedInFilter": 0,
"numEntriesScannedPostFilter": 0,
"numGroupsLimitReached": false,
"totalDocs": 3999999,
"timeUsedMs": 21,
"segmentStatistics": [],
"traceInfo": {},
"minConsumingFreshnessTimeMs": 0
}
When i do a group by for this column, i get numGroupsLimitReached = true in the response. Even though the documentation states the default limit is set to 100k
select device, count(device) as aggreg from metrics group by device order by aggreg desc limit 10
{
"resultTable": {
"dataSchema": {
"columnNames": [
"device",
"aggreg"
],
"columnDataTypes": [
"STRING",
"LONG"
]
},
"rows": [
[
"device-6230",
475
],
[
"device-3277",
470
],
[
"device-2311",
469
],
[
"device-3933",
469
],
[
"device-4059",
468
],
[
"device-6002",
468
],
[
"device-621",
466
],
[
"device-2903",
465
],
[
"device-3900",
463
],
[
"device-9324",
463
]
]
},
"exceptions": [],
"numServersQueried": 1,
"numServersResponded": 1,
"numSegmentsQueried": 1,
"numSegmentsProcessed": 1,
"numSegmentsMatched": 1,
"numConsumingSegmentsQueried": 0,
"numDocsScanned": 3999999,
"numEntriesScannedInFilter": 0,
"numEntriesScannedPostFilter": 3999999,
"numGroupsLimitReached": true,
"totalDocs": 3999999,
"timeUsedMs": 87,
"segmentStatistics": [],
"traceInfo": {},
"minConsumingFreshnessTimeMs": 0
}
As per conversation in slack:
Do you have it configured explicitly? The config key is pinot.server.query.executor.num.groups.limit
10 replies
Amit Chopra 23 minutes ago
@Jackie - I believe you mean in pinot-server.conf? no i haven’t set it.
Jackie 21 minutes ago
Hmm.. That is unexpected
Jackie 20 minutes ago
Do you run the query in PQL mode or SQL mode?
Amit Chopra 20 minutes ago
SQL mode
Jackie 16 minutes ago
I just checked the code and we don't set it in SQL mode..
Jackie 15 minutes ago
Could you please file a github issue and put the details?
Amit Chopra 14 minutes ago
sure. Let me file an issue. Just so that i understand, you mean the 100k limit is not set? But what is the default limit in SQL mode today then?
Jackie 13 minutes ago
We don't put the numGroupsLimitReached in SQL mode. I don't know how it shows up in the response
Amit Chopra 12 minutes ago
got it
Amit Chopra 11 minutes ago
As per the second part of the question, given there was a limit of 10 on the query, shouldn’t this be handled by the engine (even if it was a column with more than 100k distinct values)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org