You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/13 15:11:29 UTC

[GitHub] [incubator-pinot] troywinter opened a new issue #6910: Segment Pruning is not happening when filtering transformed timecolumn

troywinter opened a new issue #6910:
URL: https://github.com/apache/incubator-pinot/issues/6910


   When filtering a transformed column, segment pruning is not happening, below query cost more than 2s to finish, 
   ```
   SELECT datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES'),
          COUNT(*)
   FROM product_log
   WHERE datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') >= 1620830760000
     AND datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') < 1620917160000
     AND method = 'DeviceInternalService.CheckDeviceInSameGroup'
     AND container_name = 'whale-device'
     AND error > '0'
   GROUP BY datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES')
   ORDER BY COUNT(*) DESC
   ```
   The query result is:
   ```
   {
       "resultTable": {
           "dataSchema": {
               "columnDataTypes": [
                   "LONG",
                   "LONG"
               ],
               "columnNames": [
                   "datetimeconvert(__time,'1:MILLISECONDS:EPOCH','1:MILLISECONDS:EPOCH','30:MINUTES')",
                   "count(*)"
               ]
           },
           "rows": [
               [
                   1620873000000,
                   180
               ],
               [
                   1620869400000,
                   179
               ],
               [
                   1620871200000,
                   178
               ],
               [
                   1620894600000,
                   172
               ],
               [
                   1620892800000,
                   166
               ],
               [
                   1620874800000,
                   164
               ],
               [
                   1620876600000,
                   163
               ],
               [
                   1620896400000,
                   163
               ],
               [
                   1620867600000,
                   162
               ],
               [
                   1620885600000,
                   161
               ]
           ]
       },
       "exceptions": [],
       "numServersQueried": 1,
       "numServersResponded": 1,
       "numSegmentsQueried": 41,
       "numSegmentsProcessed": 41,
       "numSegmentsMatched": 12,
       "numConsumingSegmentsQueried": 3,
       "numDocsScanned": 7706,
       "numEntriesScannedInFilter": 195554753,
       "numEntriesScannedPostFilter": 7706,
       "numGroupsLimitReached": false,
       "totalDocs": 165272282,
       "timeUsedMs": 2335,
       "segmentStatistics": [],
       "traceInfo": {},
       "minConsumingFreshnessTimeMs": 1620917392724
   }
   ```
   And if not using a transformed time column in filter, it will return in 647ms
   query:
   ```
   SELECT datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES'),
          COUNT(*)
   FROM product_log
   WHERE __time >= 1620830760000
     AND __time < 1620917160000
     AND method = 'DeviceInternalService.CheckDeviceInSameGroup'
     AND container_name = 'whale-device'
     AND error > '0'
   GROUP BY datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES')
   ORDER BY COUNT(*) DESC
   ```
   result is:
   ```
   {
       "resultTable": {
           "dataSchema": {
               "columnDataTypes": [
                   "LONG",
                   "LONG"
               ],
               "columnNames": [
                   "datetimeconvert(__time,'1:MILLISECONDS:EPOCH','1:MILLISECONDS:EPOCH','30:MINUTES')",
                   "count(*)"
               ]
           },
           "rows": [
               [
                   1620873000000,
                   180
               ],
               [
                   1620869400000,
                   179
               ],
               [
                   1620871200000,
                   178
               ],
               [
                   1620894600000,
                   172
               ],
               [
                   1620892800000,
                   166
               ],
               [
                   1620874800000,
                   164
               ],
               [
                   1620876600000,
                   163
               ],
               [
                   1620896400000,
                   163
               ],
               [
                   1620867600000,
                   162
               ],
               [
                   1620865800000,
                   161
               ]
           ]
       },
       "exceptions": [],
       "numServersQueried": 1,
       "numServersResponded": 1,
       "numSegmentsQueried": 41,
       "numSegmentsProcessed": 12,
       "numSegmentsMatched": 12,
       "numConsumingSegmentsQueried": 3,
       "numDocsScanned": 7770,
       "numEntriesScannedInFilter": 68503679,
       "numEntriesScannedPostFilter": 7770,
       "numGroupsLimitReached": false,
       "totalDocs": 165381107,
       "timeUsedMs": 647,
       "segmentStatistics": [],
       "traceInfo": {},
       "minConsumingFreshnessTimeMs": 1620917833431
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] troywinter commented on issue #6910: Segment Pruning is not happening when filtering transformed timecolumn

Posted by GitBox <gi...@apache.org>.
troywinter commented on issue #6910:
URL: https://github.com/apache/incubator-pinot/issues/6910#issuecomment-841946011


   Is Pinot planning add this kind of query re-write/optimization to fix this issue? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6910: Segment Pruning is not happening when filtering transformed timecolumn

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #6910:
URL: https://github.com/apache/incubator-pinot/issues/6910#issuecomment-842581998


   @troywinter Yes. We'll post updates under this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6910: Segment Pruning is not happening when filtering transformed timecolumn

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #6910:
URL: https://github.com/apache/incubator-pinot/issues/6910#issuecomment-840723171


   Filtering on Transform function (comparing to column) is not efficient in both segment pruning and filter processing. One possible solution would be building some query re-write rules for certain transform functions to rewrite them into columns, e.g. `datetimeconvert(__time, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '30:MINUTES') >= 1620830760000` to `__time >= 1620831600000`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] troywinter commented on issue #6910: Segment Pruning is not happening when filtering transformed timecolumn

Posted by GitBox <gi...@apache.org>.
troywinter commented on issue #6910:
URL: https://github.com/apache/incubator-pinot/issues/6910#issuecomment-841946011


   Is Pinot planning add this kind of query re-write/optimization to fix this issue? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] troywinter commented on issue #6910: Segment Pruning is not happening when filtering transformed timecolumn

Posted by GitBox <gi...@apache.org>.
troywinter commented on issue #6910:
URL: https://github.com/apache/incubator-pinot/issues/6910#issuecomment-844849550


   One thing that Druid did right I think is separating datetime transform with time bucketing, timefloor function will only convert the time column by a period, it doesn't affect the format of time, so it's easier to prune segments in Druid. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org