You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/10/14 16:18:05 UTC

[GitHub] [incubator-pinot] lgo opened a new issue #6144: Querying an indexed field for `distinct` values is slow

lgo opened a new issue #6144:
URL: https://github.com/apache/incubator-pinot/issues/6144


   While building a query intended to pull out all values for a dimension, the query was slow and timing out.
   
   ```sql
   select type from adjustment group by type
   ```
   
   Meanwhile, a query that forces the star-index was working fine and returning very quickly. It's also worth noting this column is on an inverted index, which I assume would also be a suitable index to pull data. Or the dictionary itself!
   
   ```sql
   select type, count(*) from adjustment group by type
   ```
   
   Here is a simplified example configuration we were using, in case this ends up being particular about the table config. (irrelevant columns were prunted from the example).
   
   ## Schema
   
   ```json
   {
     "schemaName": "adjustment",
     "dimensionFieldSpecs": [
       {
         "name": "type",
         "dataType": "STRING"
       },
     ],
     "metricFieldSpecs": [
       {
         "name": "balance",
         "dataType": "DOUBLE"
       },
     ],
     "dateTimeFieldSpecs": [
       {
         "name": "created_at",
         "dataType": "LONG",
         "format": "1:SECONDS:EPOCH",
         "granularity": "1:HOURS"
       }
     ]
   }
   ```
   
   ## Table config
   
   ```json
   {
     "OFFLINE": {
       "tableName": "adjustment_OFFLINE",
       "tableType": "OFFLINE",
       "segmentsConfig": {
         "timeType": "SECONDS",
         "schemaName": "adjustment",
         "segmentPushFrequency": "HOURLY",
         "segmentPushType": "APPEND",
         "timeColumnName": "created_at",
         "replication": "1"
       },
       "tenants": {
         "broker": "DefaultTenant",
         "server": "DefaultTenant"
       },
       "tableIndexConfig": {
         "bloomFilterColumns": [
           "type"
         ],
         "loadMode": "MMAP",
         "noDictionaryColumns": [],
         "enableDefaultStarTree": false,
         "starTreeIndexConfigs": [
           {
             "dimensionsSplitOrder": [
               "type",
             ],
             "functionColumnPairs": [
               "COUNT"
             ],
             "maxLeafRecords": 1
           }
         ],
         "enableDynamicStarTreeCreation": false,
         "segmentPartitionConfig": {
           "columnPartitionMap": {
             "type": {
               "functionName": "Murmur",
               "numPartitions": 100
             }
           }
         },
         "aggregateMetrics": false,
         "nullHandlingEnabled": false,
         "invertedIndexColumns": [
           "type"
         ],
         "autoGeneratedInvertedIndex": false,
         "createInvertedIndexDuringSegmentGeneration": false
       },
       "metadata": {},
       "routing": {
         "segmentPrunerTypes": [
           "partition"
         ]
       }
     }
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6144: Querying an indexed field for distinct values is slow

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #6144:
URL: https://github.com/apache/incubator-pinot/issues/6144#issuecomment-709505265


   Inverted index only helps during the filtering. The performance difference between these 2 queries should mainly coming from the group-by using the dictionary id instead of the raw values.
   (`select type from adjustment group by type` will be automatically rewritten to `select distinct type from adjustment`)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6144: Querying an indexed field for distinct values is slow

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #6144:
URL: https://github.com/apache/incubator-pinot/issues/6144#issuecomment-709502945


   This is related to #6081 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] lgo commented on issue #6144: Querying an indexed field for distinct values is slow

Posted by GitBox <gi...@apache.org>.
lgo commented on issue #6144:
URL: https://github.com/apache/incubator-pinot/issues/6144#issuecomment-709619746


   Ah I did not find that original ticket! Feel free to close this in favor of that one since it sounds like the same thing (:


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang closed issue #6144: Querying an indexed field for distinct values is slow

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang closed issue #6144:
URL: https://github.com/apache/incubator-pinot/issues/6144


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org