You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@trafficcontrol.apache.org by ra...@apache.org on 2021/06/22 22:56:31 UTC

[trafficcontrol] branch master updated: Blueprint/refetch (#5910)

This is an automated email from the ASF dual-hosted git repository.

rawlin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/trafficcontrol.git


The following commit(s) were added to refs/heads/master by this push:
     new dd1a333  Blueprint/refetch (#5910)
dd1a333 is described below

commit dd1a33381ec649fe58bb9ab005387b56e2538b51
Author: Taylor Clayton Frey <ta...@gmail.com>
AuthorDate: Tue Jun 22 16:56:19 2021 -0600

    Blueprint/refetch (#5910)
    
    * Added blueprint for Refetch Cache Invalidation proposal
    
    * Add json tag to code syntax blocks. Reformat json for readability.
    
    * Fix go struct spacing alignment
    
    * Incorporate feedback for blueprint.
    RFC3339 for Time format
    Explain impact for T3C
    Explicitly set 'refresh' or 'refetch' for all API interaction
    Minor formatting fixes
    
    * Incorporate Feedback
    Updated InvdalidationJobInput fields
    iIncorporate global Parameter check
    Add clarity around parent vs origin
    
    * Remove omitempty
    
    * Update Upgrade Impact section to account for the Parameter
    
    Co-authored-by: Taylor Frey <ta...@comcast.com>
---
 blueprints/refetch-invalidation.md | 344 +++++++++++++++++++++++++++++++++++++
 1 file changed, 344 insertions(+)

diff --git a/blueprints/refetch-invalidation.md b/blueprints/refetch-invalidation.md
new file mode 100644
index 0000000..11bcc11
--- /dev/null
+++ b/blueprints/refetch-invalidation.md
@@ -0,0 +1,344 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+# Add REFETCH capability option for Content Invalidation
+
+## Problem Description
+
+Currently, within ATC, there is a concept of Invalidation Jobs. These Invalidation Jobs give a user the ability to queue an invalidation for a resource, primarily based on regular expressions. The invalidation is gathered and treated as though there was a cache **STALE**, allowing the CDN to query the origin server to **REFRESH** the resource. However, should the cache policy still be incorrect or misconfigured, the resource could be updated on the origin server, but the CDN will still r [...]
+
+It should be noted that this problem originally arose from a misconfigured origin, however this applies to any parent cache within the topology.
+
+## Proposed Change
+
+To address this potential conflict, a proposal to add **REFETCH** as an option for Invalidation Jobs. This will then be treated by caches as a **MISS** (rather than a **STALE**), thusly allowing the cache to retrieve the resource regardless of cache policies. The original **REFRESH**/**STALE** will be the default option, where **REFETCH**/**MISS** will be the addition.
+
+### Traffic Portal Impact
+
+##### Create and Update
+Traffic Portal will need to update the Invalidation Job to account for the different options. When creating an Invalidation Job both options will need to be present (Perhaps a radio button? Default will be the original **REFRESH**).
+
+Tooltips should be added to ensure an understanding of this feature at a high level.
+
+##### Read
+When displaying the information, the **Invalidation Requests** table current shows the `Parameters` field, which is only the TTL in hours in the format `TTL:%dh`. The `Parameters` field will be changed to a `TTL` field. This can be displayed directly on the table, however it should be made clear that this is in hours and no other time value.
+
+Additionally, since we derive and calculate the expiration field based on the TTL some code will need to be modified to account for the change in field name and value.
+
+Speaking of Time, time values returned by the server will be formatted to follow RFC3339 per API guidelines.
+
+### Traffic Ops Impact
+
+Both the API and the database schema will likely be updated, which in turn will result in changes downstream (such as T3C/ORT, clients) as well.
+
+#### REST API Impact
+
+No new endpoints will be required. However the current invalidation job will now include an optional field during `Create`. Invalidation jobs are added by submitting a POST to the jobs endpoint.
+
+Globally, a new parameter will be added (recommend something akin to `refetch_enabled`, which defaults to **false**), that will be used to validate any **POSTS** calls to the _/jobs_ endpoint. This will be an initial check done to perform that the CDN is configured to process and return `refetch` jobs. If a `refetch` is submitted and the value of the parameter is **true**, the **POST** will succeed. If the value is `refetch` and the parameter is set to **false**, the **POST** will fail w [...]
+
+**POST** /api/4.0/jobs
+
+##### Current Request
+
+Body:
+```json
+{
+	"startTime":"2021-06-02T15:23:21.348Z",
+	"deliveryService":11,
+	"regex":"/path/.*\\.jpeg",
+	"ttl":24
+}
+```
+
+Which is mapped to a go `struct` in the `go-tc` lib.
+```go
+type InvalidationJobInput struct {
+	DeliveryService *interface{} `json:"deliveryService"`
+	Regex           *string      `json:"regex"`
+	StartTime       *Time        `json:"startTime"`
+	TTL             *interface{} `json:"ttl"`
+	dsid            *uint
+	ttl             *time.Duration
+}
+```
+
+##### Proposed
+
+Add an "InvalidationType" to signify a specific type of invalidation request. If the field is included, it _must_ be either "refetch" or "refresh".
+
+Body:
+```json
+{
+	"startTime":"2021-06-02T15:23:21.348Z",
+	"deliveryService":11,
+	"regex":"/path/.*\\.jpeg",
+	"ttl":24,
+	"invalidationType":"refresh"
+}
+```
+
+This struct now contains the `InvalidationType *string` field. Additionally, the `DeliveryService` and `TTL` are no longer empty interfaces. Also `DeliveryService` and `Regex` are no longer optional fields. 
+
+```go
+type InvalidationJobInput struct {
+	DeliveryService  string  `json:"deliveryService"`
+	Regex            string  `json:"regex"`
+	InvalidationType *string `json:"invalidationType"`
+	StartTime        *Time   `json:"startTime"`
+	TTL              *uint   `json:"ttl"`
+}
+```
+
+##### Parsing the value
+
+The value can only pass validation if it is either explicitly `refresh` or `refetch`. Any other value (including missing/omitted) will be treated as 400 - Bad Content.
+
+##### Response
+
+The response will be modified, then, to return this new value as well. Additionally, _Time_ values will be formatted to RFC3339 to follow API guidelines. This, plus some database changes, will require changes to the struct used for reading the values from the DB.
+
+The current struct:
+
+```go
+type InvalidationJob struct {
+	AssetURL        *string `json:"assetUrl"`
+	CreatedBy       *string `json:"createdBy"`
+	DeliveryService *string `json:"deliveryService"`
+	ID              *uint64 `json:"id"`
+	Keyword         *string `json:"keyword"`
+	Parameters      *string `json:"parameters"`
+	StartTime       *Time   `json:"startTime"`
+}
+```
+
+Will be changed to:
+
+```go
+type InvalidationJob struct {
+	AssetURL         *string `json:"assetUrl"`
+	CreatedBy        *string `json:"createdBy"`
+	DeliveryService  *string `json:"deliveryService"`
+	ID               *uint64 `json:"id"`
+	Keyword          *string `json:"keyword"`
+	TTL              *int    `json:"ttl"`
+	InvalidationType *string `json:"invalidationType"`
+	StartTime        *Time   `json:"startTime"`
+}
+```
+
+Sample current response:
+```json
+{
+	"alerts":[
+		{
+			"text":"Invalidation request created for http://amc-linear-origin.local.tld/path/.*\\.jpeg, start:2021-06-02 15:23:21.348 +0000 UTC end 2021-06-03 15:23:21.348 +0000 UTC",
+			"level":"success"
+		}
+	],
+	"response":{
+		"assetUrl":"http://amc-linear-origin.local.tld/path/.*\\.jpeg",
+		"createdBy":"admin",
+		"deliveryService":"amc-live",
+		"id":1,
+		"keyword":"PURGE",
+		"parameters":"TTL:24h",
+		"startTime":"2021-06-02 09:23:21-06"
+	}
+}
+```
+
+Sample new response (includes the `invalidationType` on parameters field, an updated `alert text` field, and a RFC3339 formated startTime):
+```json
+{
+	"alerts":[
+		{
+			"text":"Invalidation (refresh) request created for http://amc-linear-origin.local.tld/path/.*\\.jpeg, start:2021-06-02 15:23:21.348 +0000 UTC end 2021-06-03 15:23:21.348 +0000 UTC",
+			"level":"success"
+		}
+	],
+	"response":{
+		"assetUrl":"http://amc-linear-origin.local.tld/path/.*\\.jpeg",
+		"createdBy":"admin",
+		"deliveryService":"amc-live",
+		"id":1,
+		"keyword":"PURGE",
+		"ttl":24,
+		"invalidationType":"refresh",
+		"startTime":"2021-06-02T09:23:21-06Z07:00"
+	}
+}
+```
+
+___
+
+> Note: There are still 1.x routes that reference `UserInvalidationJob`, such as 
+		`user/current/jobs(/|\.json/?)?$`
+		`user/current/jobs(/|\.json/?)?$`
+		These routes are currently deprecated and the corresponding `structs` will be removed in a future release as well.
+
+#### Client Impact
+
+Likewise with Traffic Portal, the `go` clients will need to be updated to provide this additional functionality. Since an additional field has been added to `InvalidationJobInput` in `go-tc` lib, this can be set by the client as well.
+
+There is also
+
+#### Data Model / Database Impact
+
+When referring to _Jobs_, these are relegated to three database tables in the TO DB; _jobs_, _job\_agent_, and _job\_status_.
+
+> The _jobs_ concept appears to have been intended to be generic and flexible, however it's only ever been implemented to record invalidation jobs.
+
+The current column `parameters` will be converted to `ttl` and contain an INT datatype representing the TTL in hours. 
+```
+ttl:48h
+```
+
+I propose adding an additional column, _invalidation\_type_. This column will be non-nullable. Default value will be `refresh` (rather than a nullable `NULL` value).
+
+```
+invalidation_type: refresh
+```
+
+The jobs table will then look more like:
+
+| id | agent | object\_type | object\_name | invalidation\_type | keyword | ttl | asset\_url | asset\_type | status | start\_time | entered\_time | job\_user | last\_updated | job\_deliveryservice |
+|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
+
+
+*OPTIONAL, BUT RECOMMENDED*: As part of this effort, the _Boy Scout Rule_ will be applied ("Always leave the campground cleaner than you found it."). The `agent`, `status`, `asset_type`, `keyword`, `object_type`, and `object_name` columns will be removed. `agent` and `status` are currently hardcoded to the value of 1 and don't appear to be accessed beyond the INSERT. Similarly, `asset_type` is always "file" and `keyword` is always "PURGE". `object_type` and `object_name` are not used at  [...]
+
+> The removal of these columns will impact the `UserInvalidationJob` struct. Even though the only endpoint utilizing this struct (v1.1) is deprecated, it is still in use.
+
+### ORT/T3C Impact
+
+The changes required to implement the final functionality have already been implemented within ORT/T3C. The generation of the invalidation regex jobs previously resembled:
+
+```
+# regex purgeExpiryTime
+refreshasset 1623861151
+```
+
+They now include an optional third field that is either **STALE** or **MISS**.
+
+```
+# regex purgeExpiryTime (optional)type
+refreshasset 1623861151
+refreshasset 1623861151 STALE
+refreshasset 1623861151 MISS
+```
+
+The `regexrevalidatedotconfig.go` in _lib/tc-atscfg_ currently has a function called `filterJobs` that is responsible for parsing the _parameters_ information from the job. Right now, that is parsing the **TTL**. However, this will need to account for the new specific TTL field as well as the new InvalidationType field.
+
+```go
+type revalJob struct {
+	AssetURL string
+	PurgeEnd time.Time
+	Type     string // MISS or STALE (default)
+}
+```
+
+As we can see above, if the value **invalidationType** is missing or equal to **refresh**, then the `revalJob` struct's `Type` field will be set to **STALE**. Otherwise, if the **invalidationType** is **refetch** then the `Type` field will be set to **MISS**. The struct is already prepared for this information, it needs only to be parsed by during the `filterJobs` function call.
+
+
+### Traffic Monitor Impact
+
+N/A - No changes
+
+### Traffic Router Impact
+
+N/A - No changes
+
+### Traffic Stats Impact
+
+N/A - No changes
+
+### Traffic Vault Impact
+
+N/A - No changes
+
+### Documentation Impact
+
+This information will need to be added to the current **Forcing Content Invalidation** section under the **_Quick How To Guides_** section under the **_Administrator's Guide_**
+[Content Invalidation](https://traffic-control-cdn.readthedocs.io/en/latest/admin/quick_howto/content_invalidation.html)
+
+Additionally, the **_Traffic Ops API_** `jobs` routes will need to be updated with the changes, API V1-V4.
+[V4 Jobs](https://traffic-control-cdn.readthedocs.io/en/latest/admin/quick_howto/content_invalidation.html)
+
+### Testing Impact
+
+##### Unit tests
+
+For ORT/T3C there are already unit tests that will need to be updated to account for this contingency.
+
+Update:
+```
+github.com/apache/trafficcontrol/lib/go-atscfg/regexrevalidatedotconfig_test.go
+```
+
+There are no unit tests for invalidation jobs in `traffic_ops_golang`. This provides an opportunity to create unit tests to validate current and new functionality.
+
+Add:
+```
+github.com/apache/trafficcontrol/traffic_ops/traffic_ops_golang/invalidationjobs/invalidationjobs.go
+```
+
+##### Integration/E2E Tests:
+There are already existing integration tests for the various APIs (v1-v4) for Traffic Ops. Each will need to have this optional functionality tested as well.
+```
+github.com/apache/trafficcontrol/traffic_ops/testing/api/v1/jobs_test.go
+github.com/apache/trafficcontrol/traffic_ops/testing/api/v2/jobs_test.go
+github.com/apache/trafficcontrol/traffic_ops/testing/api/v3/jobs_test.go
+github.com/apache/trafficcontrol/traffic_ops/testing/api/v4/jobs_test.go
+```
+
+### Performance Impact
+
+There will be no performance impact for Traffic Control.
+
+> Note: This functionality may create a performance impact on caches that implement a REFETCH/MISS manual override based on a regex.
+
+### Security Impact
+
+The validation in Traffic Ops of the `invalidationType` field will be such that it can only be explicitly set to **refresh** or **refetch**. Any other value (missing, malformed, wrong data type, etc.) will result in either a 400 level error. No other permissions are modified.
+
+> Current permissions require `PrivLevelPortal` to create, update, delete. For read, only `PrivLevelReadOnly` is needed.
+
+### Upgrade Impact
+
+The API for v4 and database schemas will change, however with the addition of the Parameter to ensure the feature is checked for safe guards, there is minimal impact on the upgrades. Clients will continue to create invalidation jobs (`refresh`) as they did before without impacting the caches downstream. However the caches downstream must implement the proper regexrevalidation plugin before enabling the feature via Parameters.
+
+Those utilizing the clients will need to update to be able to utilize the new type of invalidation job.
+
+Once an upgrade is complete, manual test can verify the changes were done correctly (for example, either through code utilizing a client or through the traffic portal interface)
+
+### Operations Impact
+
+Operators should be made aware in the documentation of the potential performance hit the cache might experience by using **reFETCH** resulting in a **MISS** over **reFRESH** resulting in **STALE** (Default).
+
+### Developer Impact
+
+There will be no impact for developers moving forward. If the database columns were removed/cleaned up it may lighten a slight cognitive load since the fields aren't representative or used in the non-deprecated implementation.
+
+## Alternatives
+
+Utilizing the currently existing `parameters` field will require some code modifications, but it will be minimally invasive. Another field could be added and become an additional column to the database schema. It would then be returned in the response object from the API, however the primary concern is that the generic `jobs` table to now to be specifically aware of invalidation jobs (it appears to have been written generically originally) and this may result in unintended consequences f [...]
+
+## Dependencies
+
+N/A - No changes