You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficcontrol.apache.org by GitBox <gi...@apache.org> on 2020/05/01 02:36:01 UTC
[GitHub] [trafficcontrol] rawlinp commented on a change in pull request #4665: Support IMS header in Get endpoints

rawlinp commented on a change in pull request #4665:
URL: https://github.com/apache/trafficcontrol/pull/4665#discussion_r418390844



##########
File path: traffic_ops/app/db/migrations/20200424000000_add_deleted_tables.sql
##########
@@ -0,0 +1,608 @@
+/*
+
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use this file except in compliance with the License.
+    You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+*/
+
+-- +goose Up
+-- SQL in section 'Up' is executed when this migration is applied
+
+CREATE TABLE IF NOT EXISTS deleted_api_capability (
+    id bigserial PRIMARY KEY,

Review comment:
       >  I see solutions like denormalized outside-the-API JSON blob snapshots or intermediary proxies as hacks to work around a broken system, not the ideal.
   
   I wouldn't really call it a hack. I think it's actually a step in the right direction that will help us achieve much greater performance and efficiency in dishing out config data to thousands of caches at once. Since caches are by far the biggest client base of the TO API, it makes sense to optimize their requests as much as possible, and I think the cache-config snapshots idea is the simplest way to give us the most optimal solution for those requests. It can even have all those nice features -- perfect IMS, delta encoding, read-while-writer, 1s caching -- at a much cheaper price than implementing them all across the entire API since it's just a single endpoint.
   
   _Back to the PR at hand:_
   
   I've also been thinking, we really need to make sure the IMS DB queries are significantly cheaper compared to the non-IMS DB queries. Currently, I'm pretty sure that they will do a full sequential scan in order to find the most recent `last_updated` because it isn't indexed, and that isn't really much cheaper on the DB than sequentially scanning and returning all of the data. So IMS hits would save bandwidth, but not load on the DB, which I think may be more important. It also basically doubles the cost of an IMS miss, and we don't really know what our IMS hit percentage would be in practice.
   
   I was playing around with indexing `last_updated`, and this seems to make the `select max(last_updated)` queries much cheaper:
   ```
   create index deliveryservice_last_updated_idx on deliveryservice (last_updated DESC NULLS LAST);
   ```
   The `DESC` seems to be important because, in theory, the first timestamp in the index should be the max, so this should be a very cheap query compared to a sequential scan.
   
   There's also the chance that we're wrong about the DB performance in practice, if things are updated often enough to cause way more IMS misses than expected. In that case, it would be nice to have a feature flag to disable IMS handling. That also means we should probably log whether a request was IMS_HIT, IMS_MISS, NON_IMS, etc. Then we will be able to determine what our hit percentage is from the logs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org