You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2019/04/23 05:33:34 UTC
[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #4156:
Refactor HelixExternalViewBasedTimeBoundaryService to support all time units
Jackie-Jiang opened a new pull request #4156: Refactor HelixExternalViewBasedTimeBoundaryService to support all time units
URL: https://github.com/apache/incubator-pinot/pull/4156
Currently we pick the segment end time as the time boundary, and
append filter 'timeColumn < boundary' to offline table and filter
'timeColumn >= boundary' to realtime table to achieve the hybrid
table federation. The problem with this is that, if the time unit
is not DAYS (for example, MILLISECONDS), and the offline table has
multiple daily segments to push, then we might get incomplete
result before all offline segments are pushed.
The solution is: always use (end time - 1 DAY) as the time
boundary, append filter 'timeColumn <= boundary' to offline table
and 'timeColumn > boundary' to realtime table. This can ensure
all daily pushed segments or hourly pushed segments be covered
regardless of the time unit.
Also, we should use the time spec in schema as the source of truth
for time column because data is generated based on the schema. In
the future we might remove the timeColumnName and timeType fields
from SegmentsValidationAndRetentionConfig.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org