You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficcontrol.apache.org by Nir Sopher <ni...@qwilt.com> on 2017/08/07 17:18:41 UTC

Re: Delivery Service based config generation and Cache Manager

Hi,

What would you say about the following refinement, merging the ideas of
configuration versioning, and a DS individual JSON:

   - The configuration of the different delivery-services is propagated to
   the caches using separate JSON files  -  a JSON per DS
   - The different versions are kept in the DB as raw records in the DS
   table.
   - Whenever the ort script are connecting to traffic ops, it:
      1. Requests the list of DS configuration "file identifier"s to be
      pulled - each file identifier represent the configuration related to a
      single DS.
      2. For each such identifier, if the JSON associated with the
      identifier was not pulled before, pulls it
      Note: The "to be pulled" JSON is generated from the active version
      configuration during this process.
      3. Removes JSONs associated with identifies that were previously
      pulled but are no longer in the list.
      4. Uses the gathered files to adjust the ATS configuration
   - A "file identifier" changes on the below circumstances:
      1. A delivery-service "active version" change
      2. A format change (e.g. due to a new added field, a bug in the
      previous JSON creation, a newly introduced pattern support, etc.)

In this paradigm:

   1. Every DS configuration is pulled only once by each server
   2. Cache may reduce TO work in file generation (due to multiple servers
   pulling the same configuration)
   3. We do not save JSON products in the DB.

Nir

On Fri, Jul 28, 2017 at 5:51 PM, Dewayne Richardson <de...@gmail.com>
wrote:

> When:   Read · Fri, Jul 28.
> <https://timyo.com/?utm_source=expectationheader&utm_medium=email>
> [image: Timyo expectation line]
> If Postgres performance becomes a concern, we can make ORT send it's
> "Update" to a different endpoint than the Read, we can also take advantage
> of Postgres HA to have dedicated Postgres Replicas that will allow us to
> scale even more.
>
> -Dew
>
> On Thu, Jul 27, 2017 at 8:28 AM, Robert Butts <ro...@gmail.com>
> wrote:
>
> > As long as we have the proper indexes, database performance shouldn't be
> an
> > issue. Postgres is capable of handling, and I've personally worked with,
> > databases several orders of magnitude larger than ours. Now, if you don't
> > have the indexes, querying will absolutely be slow. But as long as we add
> > indexes to the columns we query on, it won't be an issue.
> >
> > I'm likewise not concerned about size. Our current production database is
> > 162mb. But I'm not opposed to writing a truncate utility, if we think
> it's
> > necessary. The queries are simple, it should be easy enough to write a
> > function and a GUI button. I would oppose it being used unless absolutely
> > necessary though; history is invaluable.
> >
> > On Thu, Jul 27, 2017 at 8:15 AM, Gelinas, Derek <
> Derek_Gelinas@comcast.com
> > >
> > wrote:
> >
> > > I'm down with this! But I'm worried about database performance, for
> one,
> > > and table size. I think we need to have a utility for removing older
> > > entries if we are to go this route.
> > >
> > > On Jul 27, 2017, at 10:12 AM, Robert Butts <robert.o.butts@gmail.com<
> > > mailto:robert.o.butts@gmail.com>> wrote:
> > >
> > > Can I propose an adjustment?
> > >
> > > If we add a timestamp to every table, we can generate the JSON
> > on-the-fly.
> > > Then, the snapshot becomes a timestamp field, `snapshot_time`, and all
> > the
> > > data `select` queries add `where timestamp <= snapshot_time limit 1`.
> > > Instead of updating rows, we only ever insert new rows with new
> > timestamps.
> > > This gives us snapshots back to eternity, and if a snapshot ever
> breaks,
> > > rolling back is as simple as updating the `snapshot_time`. Our data is
> so
> > > tiny, space is almost certainly not a problem, but if it is, truncating
> > is
> > > as easy as `delete where count > X and timestamp < Y`.
> > >
> > > That gives us all the benefits of your plan, plus the benefits of
> > > relational data, type safety, more powerful querying, etc. And it
> > shouldn't
> > > be much more work to implement: add timestamp columns, snapshotting
> > updates
> > > the snapshot field, and getting the config simply runs what the
> snapshot
> > > otherwise would to create the JSON. If generation performance is an
> issue
> > > (it may be in Perl, probably not in Go), we can always cache the latest
> > > snapshot in memory, and only regenerate it when the `snapshot_time`
> > > changes.
> > >
> > > On Wed, Jul 26, 2017 at 9:08 AM, Gelinas, Derek <
> > Derek_Gelinas@comcast.com
> > > <ma...@comcast.com>>
> > > wrote:
> > >
> > > That’s not a terrible idea.  Fewer changes to the code that way for
> sure,
> > > really just the DS interface page.
> > >
> > > On Jul 26, 2017, at 10:30 AM, Nir Sopher <nirs@qwilt.com<mailto:nirs@
> > > qwilt.com>> wrote:
> > >
> > > Hi Derek,
> > >
> > > As discussed in the summit, we also see significant value in
> > >
> > >  1. DS Deployment Granularity - using DS individual config files.
> > >  2. Delivery Service Configuration Versioning (DSCV) -  separating the
> > >  "provisioning" from the "deployment".
> > >  3. Improving the roll-out procedure, joining the capabilities #1 & #2
> > >
> > > We are on the same page with these needs:)
> > >
> > > However, as I see it, these are #1 & #2 are 2 separate features, each
> has
> > > different requirements.
> > > For example, for DSCV,  I would suggest to manage the versions as
> > > standard
> > > rows in the Delivery-Service table, side by side with the "hot" DS
> > > configuration.
> > > This will allow the existing code (with minor adjustments) to properly
> > > work
> > > on these rows.
> > > Furthermore, it also allows you to simply "restore" the DS "hot"
> > > configuration to a specified revision.
> > > It is also more resilient to DS table schema updates.
> > >
> > > I'll soon share, on another thread, a link to a "DSCV functional spec"
> I
> > > was working on. It extends the presentation
> > > <https://cwiki.apache.org/confluence/download/attachments/69407844/TC%
> > > 20Summit%20-%20Spring%202017%20-%20Self-Service.pptx?
> > > version=1&modificationDate=1495451091000&api=v2>
> > > we
> > > had in the summit.
> > > I would appreciate any inputs to this spec.
> > >
> > > Nir
> > >
> > > On Tue, Jul 25, 2017 at 10:13 PM, Gelinas, Derek <
> > > Derek_Gelinas@comcast.com<ma...@comcast.com>>
> > > wrote:
> > >
> > > At the summit, there was some talk about changing the manner in which
> we
> > > generate configuration files.  The early stages of this idea had me
> > > creating large CDN definition files, but in the course of our
> > > discussion it
> > > became clear that we would be better served by creating delivery
> service
> > > configuration files instead.  This would shift us from a
> > > server-generated
> > > implementation, as we have now, to generating the configuration files
> > > for
> > > the caches locally.  The data for this would come from a new API that
> > > would
> > > provide the delivery service definitions in json format.
> > >
> > > What I’m envisioning is creating delivery service “snapshots” which are
> > > saved to the database as json objects.  These snapshots would have the
> > > full
> > > range of information specific to the delivery service, including the
> > > new DS
> > > profiles.  The database would store up to five of these objects per DS,
> > > and
> > > one DS object would be set to “active” through the UI or API.
> > >
> > > In this way, we could create multiple versions of a delivery service,
> or
> > > safely modify the definition currently “live” (but not necessarily
> > > active)
> > > in the database without changing the configuration in the field.
> > > Configuration would only be changed when the DS was saved and then that
> > > saved version was set to become active.  In the reverse manner,
> existing
> > > saved delivery services could be restored to the live DB for
> > > modification.
> > >
> > > By divorcing the “live” db from the active configuration we prevent the
> > > possibility of accidental edits affecting the field, or
> > > edits-in-progress
> > > from being sent out prematurely when one person is working on a
> delivery
> > > service and another is queueing updates.
> > >
> > > Once set, it would be this active delivery service definition that
> would
> > > be provided to the rest of traffic ops for any delivery service
> > > operations.  For config file generation, new API endpoints would be
> > > created
> > > that do the following:
> > >
> > > - List the delivery services and the active versions of each assigned
> to
> > > the specific server.
> > > - Provide the json object from the database when requested - I’m
> > > thinking
> > > that the endpoint would send the current active by default, or a
> > > specific
> > > version if specified.
> > >
> > > These definitions would be absurdly cacheable - we would not need to
> > > worry
> > > about sending stale data because each new version would have a
> > > completely
> > > different name - and so could be generated once and sent to thousands
> of
> > > caches with greatly reduced load on traffic ops.  The load would
> > > consist of
> > > the initial creation of the json object, and the minimal serving of
> that
> > > object, so this would still result in greatly reduced load on the
> > > traffic
> > > ops host(s) even without the use of caching.  Because of this, the new
> > > cache management service could check with traffic ops multiple times
> per
> > > minute for updates.  Once a delivery service was changed, the new json
> > > would be downloaded and configs generated on the cache itself.
> > >
> > > Other benefits of the use of a cache manager service rather than the
> ORT
> > > script include:
> > >
> > > - Decreased load from logins - once the cache has logged in, it could
> > > use
> > > the cookie from the previous session and only re-login when that cookie
> > > has
> > > expired.  we could also explore the use of certificates or keys
> instead,
> > > and eliminate logins altogether.
> > > - Multiple checks per minute rather than every X minutes - faster
> > > checks,
> > > more agile CDN.
> > > - Service could provide regular status updates to traffic ops, giving
> us
> > > the ability to keep an eye out for drastic shifts in i/o, unwanted
> > > behavior, problems with the ATS service, etc.  This leads to building a
> > > traffic ops that can adapt itself on the fly to changing conditions and
> > > adjust accordingly.
> > > - Queue commands to run on the host from traffic ops.  ATS restarts,
> > > system reboots, all manner of things could be triggered and scheduled
> > > right
> > > from traffic ops.
> > >
> > > Thoughts?
> > >
> > > Derek
> > >
> > >
> > >
> >
>