You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/07/23 22:04:47 UTC

[GitHub] [incubator-pinot] ianvkoeppe opened a new pull request #5745: Allow serving of offline segments immediately/inclusively.

ianvkoeppe opened a new pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745


   ## Description
   
   Today, Pinot serves queries to Hybrid tables by applying a time filter
   based on data availability. For example, when the most recent offline
   segment is for 1/2/2000, and a query has no time filter, then the broker
   queries the offline table for * - 1/2/2000 (exclusive), and the realtime
   table with 1/2/2000 (inclusive) - *.
   
   This means that when you upload a new offline segment, it technically
   doesn't begin to be served by the broker until another offline segment
   pushes the watermark higher.
   
   This change permits a new config which tells the broker to immediately
   start serving offline segments once available; or in other words, to
   query the offline table with the most recent time value inclusively.
   
   
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [x] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#issuecomment-663300262


   This is a fantastic contribution!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#discussion_r459830306



##########
File path: pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/HelixBrokerStarter.java
##########
@@ -235,7 +234,7 @@ public void start()
     FunctionRegistry.init();
     _brokerRequestHandler =
         new SingleConnectionBrokerRequestHandler(_brokerConf, _routingManager, _accessControlFactory, queryQuotaManager,
-            _brokerMetrics, _propertyStore);
+            _brokerMetrics, new TableCache(_propertyStore));

Review comment:
       Just want to call out every deployment will now read the table config for the entire cluster, per broker. I think we do that for controller, and recently added for server.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ianvkoeppe commented on pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
ianvkoeppe commented on pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#issuecomment-663328375


   There's also a new api being added in the controller that makes it a protocol. Perhaps you could piggy-back on it to update time-boundary? @snleee What do you think?
   > #5712
   
   Subbu mentioned this functionality might be possible via the controller in the future. I have just a couple concerns with that approach.
   
   1) We are planning to build a reporting pipeline within an execution environment where we may not have the flexibility or control to configure jobs which call the controller. Pinot segment creation, upload, etc. is all abstracted away.
   2) Managing the time boundary ourselves might be error prone since it means we have to have custom code to track our latest successful job runs and update the time boundary correctly afterwards. I'm worried about maintaining a custom job step that could potentially update the time boundary incorrectly, or having to think about how we'd rollback an update if we wanted to delete a badly pushed segment.
   
   I'm interested to hear your thoughts on these.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ianvkoeppe commented on pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
ianvkoeppe commented on pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#issuecomment-663311913


   @mayankshriv, thanks for reviewing. See answers inline below.
   
   > 1. The first offline segment should immediately push the high-water-mark to (maxTime-1). Are you referring to a case with single segment push per day?
   
   Yes, this is for cases where there is a single segment push per day.
   
   > 2. Your requirement seems contrary to the other requests we are getting, where the high-water-mark being pushed up pre-maturely (before all segments from the new date are ONLINE) causes inconsistent results during the push time. And we are getting multiple requests to address the inconsistency by deferring the push of high-water-mark until the entire push is complete.
   
   This sounds like the current supported case. In our case, we have a Hybrid table where we serve current day metrics from realtime and once per day upload an offline segment. The issue is, since we only upload one segment per day, the latest offline segment isn't used until the next day's data is pushed.
   
   Today we get around this by creating "dummy" segments with empty rows and +1 day as a way of forcing the time value to increment further and serve the latest actual offline segment. Unfortunately, this won't work for us any longer and we need a better long-term solution.
   
   I hope this clarifies, but let me know if you have any further questions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#discussion_r459803167



##########
File path: pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
##########
@@ -140,6 +141,8 @@ public BaseBrokerRequestHandler(PinotConfiguration config, RoutingManager routin
 
     _enableQueryLimitOverride = _config.getProperty(Broker.CONFIG_OF_ENABLE_QUERY_LIMIT_OVERRIDE, false);
 
+    _serveOfflineSegmentsImmediately = _config.getProperty(Broker.CONFIG_OF_BROKER_SERVE_OFFLINE_SEGMENTS_IMMEDIATELY, false);

Review comment:
       this should be a table level config. We don't want this to apply to all the tables in the system.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#issuecomment-663299681


   @ianvkoeppe I am unclear on the PR description:
   1. The first offline segment should immediately push the high-water-mark.
   2. Your requirement seems contrary to the other requests we are getting, where the high-water-mark being pushed up pre-maturely (before all segments from the new date are ONLINE) causes inconsistent results during the push time. And we are getting multiple requests to address the inconsistency by deferring the push of high-water-mark until the entire push is complete.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#discussion_r460214241



##########
File path: pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/HelixBrokerStarter.java
##########
@@ -235,7 +234,7 @@ public void start()
     FunctionRegistry.init();
     _brokerRequestHandler =
         new SingleConnectionBrokerRequestHandler(_brokerConf, _routingManager, _accessControlFactory, queryQuotaManager,
-            _brokerMetrics, _propertyStore);
+            _brokerMetrics, new TableCache(_propertyStore));

Review comment:
       @ianvkoeppe like we spoke:
   
   Controller API that updates time boundary is probably the best solution. Offline push jobs must pay attention to table configs so that they can do other things that are needed for the table (partitioning, inverted index generation, etc.). It is possible for the push job to invoke the controller API to push the time boundary. This will also help in use cases that push multiple segments so that the time boundary is advanced after all segments are pushed. 
   
   In addition to this, the brokers must also realize on their own that the time boundary should be moved forward for these tables. This is because a broker can get restarted after the push, and go into old behavior.  The broker can do this on the basis of the table config when it recognzes it has to serve a table. We alreeady read the table config to get the quota information in the broker.
   
   We should not be changing the query. Instead, update the time boundary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ianvkoeppe commented on a change in pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
ianvkoeppe commented on a change in pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#discussion_r460132216



##########
File path: pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
##########
@@ -122,19 +122,16 @@
 
   public BaseBrokerRequestHandler(PinotConfiguration config, RoutingManager routingManager,
       AccessControlFactory accessControlFactory, QueryQuotaManager queryQuotaManager, BrokerMetrics brokerMetrics,
-      ZkHelixPropertyStore<ZNRecord> propertyStore) {
+      TableCache tableCache) {
     _config = config;
     _routingManager = routingManager;
     _accessControlFactory = accessControlFactory;
     _queryQuotaManager = queryQuotaManager;
     _brokerMetrics = brokerMetrics;
+    _tableCache = tableCache;
 
     _enableCaseInsensitive = _config.getProperty(CommonConstants.Helix.ENABLE_CASE_INSENSITIVE_KEY, false);
-    if (_enableCaseInsensitive) {
-      _tableCache = new TableCache(propertyStore);

Review comment:
       It's to avoid instantiating the cache in cases it isn't needed. Mayank also called this out above...but it should be fine.
   
   > Just want to call out every deployment will now read the table config for the entire cluster, per broker. I think we do that for controller, and recently added for server.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ianvkoeppe commented on a change in pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
ianvkoeppe commented on a change in pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#discussion_r460209486



##########
File path: pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
##########
@@ -1087,6 +1084,22 @@ private void attachTimeBoundary(String rawTableName, BrokerRequest brokerRequest
     }
   }
 
+  /**
+   * The time boundary filter string splits queries to offline and realtime pinot tables based on the current high
+   * watermark of available segments in the offline table. By default, Pinot serves all segments from offline up to
+   * the latest segment (exclusive), and everything else from realtime. However, one can configure Pinot to serve
+   * the latest offline segment inclusively via config.
+   */
+  private String getTimeBoundaryFilter(String offlineTableName, String timeValue, boolean isOfflineRequest) {
+    boolean shouldServeOfflineSegmentsImmediately = Optional.ofNullable(_tableCache.getTableConfig(offlineTableName).getQueryConfig())
+        .map(QueryConfig::getServeOfflineSegmentsImmediately)
+        .orElse(false);
+    String open = isOfflineRequest || !shouldServeOfflineSegmentsImmediately ? "(" : "[";

Review comment:
       Refactored similar to the above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ianvkoeppe commented on a change in pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
ianvkoeppe commented on a change in pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#discussion_r460197217



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/QueryConfig.java
##########
@@ -37,15 +37,23 @@
   // If the server times out, it will directly interrupt the query execution. The server response does not matter much
   // because by the time the server times out, the broker should already timed out and returned the response.
   private final Long _timeoutMs;
+  private final Boolean _serveOfflineSegmentsImmediately;

Review comment:
       Not finding where the docs reside within repo, but I've added documentation to the config class.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv edited a comment on pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
mayankshriv edited a comment on pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#issuecomment-663299681


   @ianvkoeppe I am unclear on the PR description:
   1. The first offline segment should immediately push the high-water-mark to (maxTime-1). Are you referring to a case with single segment push per day?
   2. Your requirement seems contrary to the other requests we are getting, where the high-water-mark being pushed up pre-maturely (before all segments from the new date are ONLINE) causes inconsistent results during the push time. And we are getting multiple requests to address the inconsistency by deferring the push of high-water-mark until the entire push is complete.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ianvkoeppe commented on pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
ianvkoeppe commented on pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#issuecomment-663674214


   @kishoreg and @mcvsubbu I've created this issue: https://github.com/apache/incubator-pinot/issues/5751. I've described the problem statement and a couple potential approaches. Let's discuss there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#discussion_r460169975



##########
File path: pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/HelixBrokerStarter.java
##########
@@ -235,7 +234,7 @@ public void start()
     FunctionRegistry.init();
     _brokerRequestHandler =
         new SingleConnectionBrokerRequestHandler(_brokerConf, _routingManager, _accessControlFactory, queryQuotaManager,
-            _brokerMetrics, _propertyStore);
+            _brokerMetrics, new TableCache(_propertyStore));

Review comment:
       Don't we pick up all table configs to get query quotas (at least, the tables managed by the current broker).
   In any case, we should restrict it to tables managed by this broker, and not all tables on the cluster.
   With shared brokers in clusters that still does not help, so we need to find a better solution.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] cbrentharris commented on a change in pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
cbrentharris commented on a change in pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#discussion_r460111885



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/QueryConfig.java
##########
@@ -37,15 +37,23 @@
   // If the server times out, it will directly interrupt the query execution. The server response does not matter much
   // because by the time the server times out, the broker should already timed out and returned the response.
   private final Long _timeoutMs;
+  private final Boolean _serveOfflineSegmentsImmediately;

Review comment:
       Can you beef up the documentation to this config in the java class. Also, are the pinot docs in repo as well? Should be updated to reflect the new type of config you can set.

##########
File path: pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
##########
@@ -122,19 +122,16 @@
 
   public BaseBrokerRequestHandler(PinotConfiguration config, RoutingManager routingManager,
       AccessControlFactory accessControlFactory, QueryQuotaManager queryQuotaManager, BrokerMetrics brokerMetrics,
-      ZkHelixPropertyStore<ZNRecord> propertyStore) {
+      TableCache tableCache) {
     _config = config;
     _routingManager = routingManager;
     _accessControlFactory = accessControlFactory;
     _queryQuotaManager = queryQuotaManager;
     _brokerMetrics = brokerMetrics;
+    _tableCache = tableCache;
 
     _enableCaseInsensitive = _config.getProperty(CommonConstants.Helix.ENABLE_CASE_INSENSITIVE_KEY, false);
-    if (_enableCaseInsensitive) {
-      _tableCache = new TableCache(propertyStore);

Review comment:
       Why was previously the cache only instantiated if case insensitivity was enabled, but now it is regardless? What impact will that have?

##########
File path: pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
##########
@@ -1087,6 +1084,22 @@ private void attachTimeBoundary(String rawTableName, BrokerRequest brokerRequest
     }
   }
 
+  /**
+   * The time boundary filter string splits queries to offline and realtime pinot tables based on the current high
+   * watermark of available segments in the offline table. By default, Pinot serves all segments from offline up to
+   * the latest segment (exclusive), and everything else from realtime. However, one can configure Pinot to serve
+   * the latest offline segment inclusively via config.
+   */
+  private String getTimeBoundaryFilter(String offlineTableName, String timeValue, boolean isOfflineRequest) {
+    boolean shouldServeOfflineSegmentsImmediately = Optional.ofNullable(_tableCache.getTableConfig(offlineTableName).getQueryConfig())
+        .map(QueryConfig::getServeOfflineSegmentsImmediately)
+        .orElse(false);
+    String open = isOfflineRequest || !shouldServeOfflineSegmentsImmediately ? "(" : "[";

Review comment:
       Perhaps a little more self documenting:
   
   ```
   boolean openRangeInclusive = !isOfflineRequest || shouldServeOfflineSegments;
   String open = openRangeInclusive ? INCLUSIVE_OPEN_RANGE : EXCLUSIVE_OPEN_RANGE;
   String range = isOfflineRequest ? ALL_VALUES_BEFORE + timeValue : timeValue + ALL_VALUES_AFTER;
   boolean closeRangeInclusive = isOfflineRequest && !shouldServeOfflineSegmentsImmediately;
   ```
   
   Etc
   
   As rightt now this is a little confusing because I assume that `]` is inclusive and `)` is exclusive, but `)` seems to be inclusive when its offline and not configured to serve immediately, which seems counter intuitive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#issuecomment-663672482


   @ianvkoeppe can u please create an issue first? It is better to agree on an implementation strategy in an issue. In case of external API changes, let us define the APIs clearly in a doc so that we can all comment on it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#issuecomment-663326776


   > @mayankshriv, thanks for reviewing. See answers inline below.
   > 
   > > 1. The first offline segment should immediately push the high-water-mark to (maxTime-1). Are you referring to a case with single segment push per day?
   > 
   > Yes, this is for cases where there is a single segment push per day.
   > 
   > > 1. Your requirement seems contrary to the other requests we are getting, where the high-water-mark being pushed up pre-maturely (before all segments from the new date are ONLINE) causes inconsistent results during the push time. And we are getting multiple requests to address the inconsistency by deferring the push of high-water-mark until the entire push is complete.
   > 
   > This sounds like the current supported case. In our case, we have a Hybrid table where we serve current day metrics from realtime and once per day upload an offline segment. The issue is, since we only upload one segment per day, the latest offline segment isn't used until the next day's data is pushed.
   > 
   > Today we get around this by creating "dummy" segments with empty rows and +1 day as a way of forcing the time value to increment further and serve the latest actual offline segment. Unfortunately, this won't work for us any longer and we need a better long-term solution.
   > 
   > I hope this clarifies, but let me know if you have any further questions.
   
   I see, the single segment per day clarifies the confusion. There's also a new api being added in the controller that makes it a protocol. Perhaps you could piggy-back on it to update time-boundary? @snleee What do you think? 
   https://github.com/apache/incubator-pinot/pull/5712 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ianvkoeppe commented on a change in pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
ianvkoeppe commented on a change in pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#discussion_r459814064



##########
File path: pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
##########
@@ -140,6 +141,8 @@ public BaseBrokerRequestHandler(PinotConfiguration config, RoutingManager routin
 
     _enableQueryLimitOverride = _config.getProperty(Broker.CONFIG_OF_ENABLE_QUERY_LIMIT_OVERRIDE, false);
 
+    _serveOfflineSegmentsImmediately = _config.getProperty(Broker.CONFIG_OF_BROKER_SERVE_OFFLINE_SEGMENTS_IMMEDIATELY, false);

Review comment:
       @kishoreg, that makes sense. Any recommendation for which subconfig of table config? There are routing configs, segment configs, etc. I'm not sure the right one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ianvkoeppe edited a comment on pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
ianvkoeppe edited a comment on pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#issuecomment-663328375


   > There's also a new api being added in the controller that makes it a protocol. Perhaps you could piggy-back on it to update time-boundary? @snleee What do you think?
   > #5712
   
   Subbu mentioned this functionality might be possible via the controller in the future. I have just a couple concerns with that approach.
   
   1) We are planning to build a reporting pipeline within an execution environment where we may not have the flexibility or control to configure jobs which call the controller. Pinot segment creation, upload, etc. is all abstracted away.
   2) Managing the time boundary ourselves might be error prone since it means we have to have custom code to track our latest successful job runs and update the time boundary correctly afterwards. I'm worried about maintaining a custom job step that could potentially update the time boundary incorrectly, or having to think about how we'd rollback an update if we wanted to delete a badly pushed segment.
   
   I'm interested to hear your thoughts on these.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ianvkoeppe commented on a change in pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
ianvkoeppe commented on a change in pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#discussion_r459822723



##########
File path: pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
##########
@@ -140,6 +141,8 @@ public BaseBrokerRequestHandler(PinotConfiguration config, RoutingManager routin
 
     _enableQueryLimitOverride = _config.getProperty(Broker.CONFIG_OF_ENABLE_QUERY_LIMIT_OVERRIDE, false);
 
+    _serveOfflineSegmentsImmediately = _config.getProperty(Broker.CONFIG_OF_BROKER_SERVE_OFFLINE_SEGMENTS_IMMEDIATELY, false);

Review comment:
       I've done the refactor to add to the QueryConfig, but let me know if a different subconfig makes more sense. It won't be difficult to move to a different config class and it should look structurally the same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] ianvkoeppe closed pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
ianvkoeppe closed pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on pull request #5745: Allow serving of offline segments immediately/inclusively.

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #5745:
URL: https://github.com/apache/incubator-pinot/pull/5745#issuecomment-663664728


   @ianvkoeppe this is definitely a good feature to have.  However, the solution is too restrictive and we generally prefer table level config (for multi-tenancy).
   Can you please create an issue to discuss high level requirements and solution.
   
   My requirements
   - should be configurable per table. Have something like timeBoundaryUpdateStrategy which will allow us to add more strategies. RoutingConfig is probably the right place
   - An explicit API to update the time boundary for the table. This can be handy where users can call the API after all the segments are uploaded for a day.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org