You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@trafficserver.apache.org by zw...@apache.org on 2017/10/25 21:35:20 UTC

[trafficserver] 01/03: cachekey/cacheurl key string compatibility enhancements.

This is an automated email from the ASF dual-hosted git repository.

zwoop pushed a commit to branch 7.1.x
in repository https://gitbox.apache.org/repos/asf/trafficserver.git

commit 46f4a33da53f8d359ad7e1e74374f1adf7580a5f
Author: Gancho Tenev <ga...@apache.com>
AuthorDate: Mon Jul 31 14:43:33 2017 -0700

    cachekey/cacheurl key string compatibility enhancements.
    
    - Added a parameter to override the default cache key element separator.
      --separator=<string>
    
    - Added flags allowing path and prefix not to be appended by default.
      --remove-path=<true|false|yes|no|0|1>
      --remove-prefix=<true|false|yes|no|0|1>
    
    - Documentated the new features and provided cacheurl to cachekey migrationion examples.
    
    - If /replacement/ in <capture_definition> is empty don't add any captures, i.e.
      --capture-path=/.*// - would remove the whole path (nothing captured);
      --capture-prefix=/(.*)// - would remove the whole prefix (nothing captured);
---
 doc/admin-guide/plugins/cachekey.en.rst   | 52 +++++++++++++++++++++++++++++++
 plugins/experimental/cachekey/cachekey.cc | 13 ++++----
 plugins/experimental/cachekey/cachekey.h  |  5 +--
 plugins/experimental/cachekey/configs.cc  | 40 +++++++++++++++++++++++-
 plugins/experimental/cachekey/configs.h   | 26 +++++++++++++++-
 plugins/experimental/cachekey/pattern.cc  | 24 ++++++++------
 plugins/experimental/cachekey/pattern.h   |  5 ++-
 plugins/experimental/cachekey/plugin.cc   | 12 ++++---
 8 files changed, 151 insertions(+), 26 deletions(-)

diff --git a/doc/admin-guide/plugins/cachekey.en.rst b/doc/admin-guide/plugins/cachekey.en.rst
index 2f374ff..fb75c4e 100644
--- a/doc/admin-guide/plugins/cachekey.en.rst
+++ b/doc/admin-guide/plugins/cachekey.en.rst
@@ -74,6 +74,8 @@ Cache key structure and related plugin parameters
 * ``--capture-prefix=<capture_definition>`` (default: empty string) - if specified and not empty then strings are captured from ``host:port`` based on the ``<capture_definition>`` and are added to the cache key.
 * ``--capture-prefix-uri=<capture_definition>`` (default: empty string) - if specified and not empty then strings are captured from the entire URI based on the ``<capture_definition>`` and are added to the cache key.
 * If any of the "Prefix" related plugin parameters are used together in the plugin configuration they are added to the cache key in the order shown in the diagram.
+* ``--remove-prefix=<true|false|yes|no|0|1`` (default: false) - if specified the prefix elements (host, port) are not processed nor appended to the cachekey. All prefix related plugin paramenters are ignored if this parameter is ``true``, ``yes`` or ``1``.
+
 
 
 "User-Agent" section
@@ -139,6 +141,7 @@ Cache key structure and related plugin parameters
 * if no path related plugin parameters are used, the URI path string is included in the cache key.
 * ``--capture-path=<capture_definition>`` (default: empty string) - if specified and not empty then strings are captured from URI path based on the ``<capture_definition>`` and are added to the cache key.
 * ``--capture-path-uri=<capture_definition>`` (default: empty string) - if specified and not empty then strings are captured from the entire URI based on the ``<capture_definition>`` and are added to the cache key.
+* ``--remove-path=<true|false|yes|no|0|1`` (default: false) - if specified the HTTP URI path element is not processed nor appended to the cachekey. All path related plugin paramenters are ignored if this parameter is ``true``, ``yes`` or ``1``.
 
 "Query" section
 ^^^^^^^^^^^^^^^
@@ -162,6 +165,11 @@ All parameters are optional, and if not used, their default values are as mentio
     * ``/<regex>/<replacement>/`` - ``<regex>`` defines regex capturing groups, ``<replacement>`` defines a pattern where the captured strings referenced with ``$0`` ... ``$9`` will be substituted and the result will be added to the cache key.
 
 
+Cache key elements separator
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+* ``--separator=<string>`` - the cache key is constructed by extracting elements from HTTP URI and headers or by using the UA classifiers and they are appended during the key construction and separated by ``/`` (by default). This options allows to override the dafault separator to any string (including an empty string)
+
+
 Detailed examples and troubleshooting
 =====================================
 
@@ -504,3 +512,47 @@ and if ``tool_agents.config`` contains: ::
   ^curl.*
 
 then ``browser`` will be used when constructing the key.
+
+
+Cacheurl plugin to cachekey plugin migration
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The plugin `cachekey` was not meant to replace the cacheurl plugin in terms of having exactly the same cache key strings generated. It just allows the operator to exctract elements from the HTTP URI in the same way the `cacheurl` does (through a regular expression, please see `<capture_definition>` above).
+
+The following examples demonstrate different ways to achieve `cacheurl` compatibility on a cache key string level in order to avoid invalidation of the cache. 
+
+The operator could use `--capture-path-uri`, `--capture-path`, `--capture-prefix-uri`, `--capture-prefix` to capture elements from the URI, path and authority elements.
+
+By using `--separator=<string>` the operator could override the default separator to an empty string `--separator=` and thus make sure there are no cache key element separators.
+
+
+Example 1: Let us say we have a capture definition used in `cacheurl`. Now by using `--capture-prefix-uri` one could extract elements through the same caplture definition used with `cacheurl`, remove the cache key element separator `--separator=` and by using `--capture-path-uri` could remove the URI path and by using `--remove-all-params=true` could remove the query string::
+
+  @plugin=cachekey.so \
+      @pparam=--capture-prefix-uri=/.*/$0/ \
+      @pparam=--capture-path-uri=/.*// \
+      @pparam=--remove-all-params=true \
+      @pparam=--separator=
+
+Example 2: A more efficient way would be achieved by using `--capture-prefix-uri` to capture from the URI, remove the cache key element separator `--separator=`  and by using `--remove-path` to remove the URI path and `--remove-all-params=true` to remove the query string::
+
+  @plugin=cachekey.so \
+      @pparam=--capture-prefix-uri=/.*/$0/ \
+      @pparam=--remove-path=true \
+      @pparam=--remove-all-params=true \
+      @pparam=--separator=
+
+Example 3: Same result as the above but this time by using `--capture-path-uri` to capture from the URI, remove the cache key element separator `--separator=` and by using `--remove-prefix` to remove the URI authority elements and by using `--remove-all-params=true` to remove the query string::
+
+    @plugin=cachekey.so \
+        @pparam=--capture-path-uri=/(.*)/$0/ \
+        @pparam=--remove-prefix=true \
+        @pparam=--remove-all-params=true \
+        @pparam=--separator=
+
+Example 4: Let us say that we would like to capture from URI in similar to `cacheurl` way but also sort the query parameters (which is not supported by `cacheurl`). We could achieve that by using `--capture-prefix-uri` to capture by using a caplture definition to process the URI before `?`  and using `--remove-path` to remove the URI path and `--sort-params=true` to sort the query parameters::
+
+    @plugin=cachekey.so \
+        @pparam=--capture-prefix-uri=/([^?]*)/$1/ \
+        @pparam=--remove-path=true \
+        @pparam=--sort-params=true \
+        @pparam=--separator=
diff --git a/plugins/experimental/cachekey/cachekey.cc b/plugins/experimental/cachekey/cachekey.cc
index 8a6d62b..2a0b140 100644
--- a/plugins/experimental/cachekey/cachekey.cc
+++ b/plugins/experimental/cachekey/cachekey.cc
@@ -179,7 +179,8 @@ classifyUserAgent(const Classifier &c, TSMBuffer buf, TSMLoc hdrs, String &class
  * @param url URI handle
  * @param hdrs headers handle
  */
-CacheKey::CacheKey(TSHttpTxn txn, TSMBuffer buf, TSMLoc url, TSMLoc hdrs) : _txn(txn), _buf(buf), _url(url), _hdrs(hdrs)
+CacheKey::CacheKey(TSHttpTxn txn, TSMBuffer buf, TSMLoc url, TSMLoc hdrs, String separator)
+  : _txn(txn), _buf(buf), _url(url), _hdrs(hdrs), _separator(separator)
 {
   _key.reserve(512);
 }
@@ -191,7 +192,7 @@ CacheKey::CacheKey(TSHttpTxn txn, TSMBuffer buf, TSMLoc url, TSMLoc hdrs) : _txn
 void
 CacheKey::append(unsigned n)
 {
-  _key.append("/");
+  _key.append(_separator);
   ::append(_key, n);
 }
 
@@ -202,7 +203,7 @@ CacheKey::append(unsigned n)
 void
 CacheKey::append(const String &s)
 {
-  _key.append("/");
+  _key.append(_separator);
   ::appendEncoded(_key, s.data(), s.size());
 }
 
@@ -213,7 +214,7 @@ CacheKey::append(const String &s)
 void
 CacheKey::append(const char *s)
 {
-  _key.append("/");
+  _key.append(_separator);
   ::appendEncoded(_key, s, strlen(s));
 }
 
@@ -225,7 +226,7 @@ CacheKey::append(const char *s)
 void
 CacheKey::append(const char *s, unsigned n)
 {
-  _key.append("/");
+  _key.append(_separator);
   ::appendEncoded(_key, s, n);
 }
 
@@ -417,7 +418,7 @@ CacheKey::appendHeaders(const ConfigHeaders &config)
   }
 
   /* It doesn't make sense to have the headers unordered in the cache key. */
-  String headers_key = containerToString<StringSet, StringSet::const_iterator>(hset, "", "/");
+  String headers_key = containerToString<StringSet, StringSet::const_iterator>(hset, "", _separator);
   if (!headers_key.empty()) {
     append(headers_key);
   }
diff --git a/plugins/experimental/cachekey/cachekey.h b/plugins/experimental/cachekey/cachekey.h
index ed156fb..d065b63 100644
--- a/plugins/experimental/cachekey/cachekey.h
+++ b/plugins/experimental/cachekey/cachekey.h
@@ -49,7 +49,7 @@
 class CacheKey
 {
 public:
-  CacheKey(TSHttpTxn txn, TSMBuffer buf, TSMLoc url, TSMLoc hdrs);
+  CacheKey(TSHttpTxn txn, TSMBuffer buf, TSMLoc url, TSMLoc hdrs, String separator);
 
   void append(unsigned number);
   void append(const String &);
@@ -75,7 +75,8 @@ private:
   TSMLoc _url;    /**< @brief URI handle */
   TSMLoc _hdrs;   /**< @brief headers handle */
 
-  String _key; /**< @brief cache key */
+  String _key;       /**< @brief cache key */
+  String _separator; /**< @brief a separator used to separate the cache key elements extracted from the URI */
 };
 
 #endif /* PLUGINS_EXPERIMENTAL_CACHEKEY_CACHEKEY_H_ */
diff --git a/plugins/experimental/cachekey/configs.cc b/plugins/experimental/cachekey/configs.cc
index b5fb65a..788b56d 100644
--- a/plugins/experimental/cachekey/configs.cc
+++ b/plugins/experimental/cachekey/configs.cc
@@ -43,7 +43,7 @@ commaSeparateString(ContainerType &c, const String &input)
 static bool
 isTrue(const char *arg)
 {
-  return (0 == strncasecmp("true", arg, 4) || 0 == strncasecmp("1", arg, 1) || 0 == strncasecmp("yes", arg, 3));
+  return (nullptr == arg || 0 == strncasecmp("true", arg, 4) || 0 == strncasecmp("1", arg, 1) || 0 == strncasecmp("yes", arg, 3));
 }
 
 void
@@ -341,6 +341,9 @@ Configs::init(int argc, char *argv[])
     {const_cast<char *>("capture-prefix-uri"), optional_argument, nullptr, 'n'},
     {const_cast<char *>("capture-path"), optional_argument, nullptr, 'o'},
     {const_cast<char *>("capture-path-uri"), optional_argument, nullptr, 'p'},
+    {const_cast<char *>("remove-prefix"), optional_argument, nullptr, 'q'},
+    {const_cast<char *>("remove-path"), optional_argument, nullptr, 'r'},
+    {const_cast<char *>("separator"), optional_argument, nullptr, 's'},
     {nullptr, 0, nullptr, 0},
   };
 
@@ -430,6 +433,15 @@ Configs::init(int argc, char *argv[])
         status = false;
       }
       break;
+    case 'q': /* remove-prefix */
+      _prefixToBeRemoved = isTrue(optarg);
+      break;
+    case 'r': /* remove-path */
+      _pathToBeRemoved = isTrue(optarg);
+      break;
+    case 's': /* separator */
+      setSeparator(optarg);
+      break;
     }
   }
 
@@ -448,3 +460,29 @@ Configs::finalize()
 {
   return _query.finalize() && _headers.finalize() && _cookies.finalize();
 }
+
+bool
+Configs::prefixToBeRemoved()
+{
+  return _prefixToBeRemoved;
+}
+
+bool
+Configs::pathToBeRemoved()
+{
+  return _pathToBeRemoved;
+}
+
+void
+Configs::setSeparator(const char *arg)
+{
+  if (nullptr != arg) {
+    _separator.assign(arg);
+  }
+}
+
+const String &
+Configs::getSeparator()
+{
+  return _separator;
+}
diff --git a/plugins/experimental/cachekey/configs.h b/plugins/experimental/cachekey/configs.h
index f1376fe..9bf71cb 100644
--- a/plugins/experimental/cachekey/configs.h
+++ b/plugins/experimental/cachekey/configs.h
@@ -122,7 +122,7 @@ private:
 class Configs
 {
 public:
-  Configs() {}
+  Configs() : _prefixToBeRemoved(false), _pathToBeRemoved(false), _separator("/") {}
   /**
    * @brief initializes plugin configuration.
    * @param argc number of plugin parameters
@@ -137,6 +137,26 @@ public:
    */
   bool finalize();
 
+  /**
+   * @brief Tells the caller if the prefix is to be removed (not processed at all).
+   */
+  bool prefixToBeRemoved();
+
+  /**
+   * @brief Tells the caller if the path is to be removed (not processed at all).
+   */
+  bool pathToBeRemoved();
+
+  /**
+   * @brief set the cache key elements separator string.
+   */
+  void setSeparator(const char *arg);
+
+  /**
+   * @brief get the cache key elements separator string.
+   */
+  const String &getSeparator();
+
   /* Make the following members public to avoid unnecessary accessors */
   ConfigQuery _query;        /**< @brief query parameter related configuration */
   ConfigHeaders _headers;    /**< @brief headers related configuration */
@@ -157,6 +177,10 @@ private:
    * @return true if successful, false otherwise.
    */
   bool loadClassifiers(const String &args, bool blacklist = true);
+
+  bool _prefixToBeRemoved; /**< @brief instructs the prefix (i.e. host:port) not to added to the cache key */
+  bool _pathToBeRemoved;   /**< @brief instructs the path not to added to the cache key */
+  String _separator;       /**< @brief a separator used to separate the cache key elements extracted from the URI */
 };
 
 #endif // PLUGINS_EXPERIMENTAL_CACHEKEY_CONFIGS_H_
diff --git a/plugins/experimental/cachekey/pattern.cc b/plugins/experimental/cachekey/pattern.cc
index 870dd60..04ecf19 100644
--- a/plugins/experimental/cachekey/pattern.cc
+++ b/plugins/experimental/cachekey/pattern.cc
@@ -38,7 +38,7 @@ replaceString(String &str, const String &from, const String &to)
   }
 }
 
-Pattern::Pattern() : _re(nullptr), _extra(nullptr), _pattern(""), _replacement(""), _tokenCount(0)
+Pattern::Pattern() : _re(nullptr), _extra(nullptr), _pattern(""), _replacement(""), _replace(false), _tokenCount(0)
 {
 }
 
@@ -49,12 +49,13 @@ Pattern::Pattern() : _re(nullptr), _extra(nullptr), _pattern(""), _replacement("
  * @return true if successful, false if failure
  */
 bool
-Pattern::init(const String &pattern, const String &replacenemt)
+Pattern::init(const String &pattern, const String &replacenemt, bool replace)
 {
   pcreFree();
 
   _pattern.assign(pattern);
   _replacement.assign(replacenemt);
+  _replace = replace;
 
   _tokenCount = 0;
 
@@ -115,9 +116,9 @@ Pattern::init(const String &config)
     ::replaceString(pattern, "\\/", "/");
     ::replaceString(replacement, "\\/", "/");
 
-    return this->init(pattern, replacement);
+    return this->init(pattern, replacement, /* replace */ true);
   } else {
-    return this->init(config, "");
+    return this->init(config, /* replacement */ "", /*replace */ false);
   }
 
   /* Should never get here. */
@@ -170,7 +171,7 @@ Pattern::~Pattern()
 bool
 Pattern::process(const String &subject, StringVector &result)
 {
-  if (!_replacement.empty()) {
+  if (_replace) {
     /* Replacement pattern was provided in the configuration - capture and replace. */
     String element;
     if (replace(subject, element)) {
@@ -234,9 +235,10 @@ Pattern::capture(const String &subject, StringVector &result)
   int matchCount;
   int ovector[OVECOUNT];
 
-  CacheKeyDebug("matching '%s' to '%s'", _pattern.c_str(), subject.c_str());
+  CacheKeyDebug("capturing '%s' from '%s'", _pattern.c_str(), subject.c_str());
 
   if (!_re) {
+    CacheKeyError("regular expression not initialized");
     return false;
   }
 
@@ -272,9 +274,10 @@ Pattern::replace(const String &subject, String &result)
   int matchCount;
   int ovector[OVECOUNT];
 
-  CacheKeyDebug("matching '%s' to '%s'", _pattern.c_str(), subject.c_str());
+  CacheKeyDebug("replacing:'%s' in pattern:'%s', subject:'%s'", _replacement.c_str(), _pattern.c_str(), subject.c_str());
 
-  if (!_re) {
+  if (!_re || !_replace) {
+    CacheKeyError("regular expression not initialized or not configured to replace");
     return false;
   }
 
@@ -327,7 +330,8 @@ Pattern::compile()
   const char *errPtr; /* PCRE error */
   int errOffset;      /* PCRE error offset */
 
-  CacheKeyDebug("compiling pattern:'%s', replacement:'%s'", _pattern.c_str(), _replacement.c_str());
+  CacheKeyDebug("compiling pattern:'%s', replace: %s, replacement:'%s'", _pattern.c_str(), _replace ? "true" : "false",
+                _replacement.c_str());
 
   _re = pcre_compile(_pattern.c_str(), /* the pattern */
                      0,                /* options */
@@ -351,7 +355,7 @@ Pattern::compile()
     return false;
   }
 
-  if (_replacement.empty()) {
+  if (!_replace) {
     /* No replacement necessary - we are done. */
     return true;
   }
diff --git a/plugins/experimental/cachekey/pattern.h b/plugins/experimental/cachekey/pattern.h
index 7fc0cf4..1e7a3bb 100644
--- a/plugins/experimental/cachekey/pattern.h
+++ b/plugins/experimental/cachekey/pattern.h
@@ -46,7 +46,7 @@ public:
   Pattern();
   virtual ~Pattern();
 
-  bool init(const String &pattern, const String &replacenemt);
+  bool init(const String &pattern, const String &replacenemt, bool replace);
   bool init(const String &config);
   bool empty() const;
   bool match(const String &subject);
@@ -65,6 +65,9 @@ private:
   String _pattern;     /**< @brief PCRE pattern string, containing PCRE patterns and capturing groups. */
   String _replacement; /**< @brief PCRE replacement string, containing $0..$9 to be replaced with content of the capturing groups */
 
+  bool _replace; /**< @brief true if a replacement is needed, false if not, this is to distinguish between an empty replacement
+                    string and no replacement needed case */
+
   int _tokenCount;              /**< @brief number of replacements $0..$9 found in the replacement string if not empty */
   int _tokens[TOKENCOUNT];      /**< @brief replacement index 0..9, since they can be used in the replacement string in any order */
   int _tokenOffset[TOKENCOUNT]; /**< @brief replacement offset inside the replacement string */
diff --git a/plugins/experimental/cachekey/plugin.cc b/plugins/experimental/cachekey/plugin.cc
index 2378264..d413bd3 100644
--- a/plugins/experimental/cachekey/plugin.cc
+++ b/plugins/experimental/cachekey/plugin.cc
@@ -92,11 +92,12 @@ TSRemapDoRemap(void *instance, TSHttpTxn txn, TSRemapRequestInfo *rri)
 
   if (nullptr != config) {
     /* Initial cache key facility from the requested URL. */
-    CacheKey cachekey(txn, rri->requestBufp, rri->requestUrl, rri->requestHdrp);
+    CacheKey cachekey(txn, rri->requestBufp, rri->requestUrl, rri->requestHdrp, config->getSeparator());
 
     /* Append custom prefix or the host:port */
-    cachekey.appendPrefix(config->_prefix, config->_prefixCapture, config->_prefixCaptureUri);
-
+    if (!config->prefixToBeRemoved()) {
+      cachekey.appendPrefix(config->_prefix, config->_prefixCapture, config->_prefixCaptureUri);
+    }
     /* Classify User-Agent and append the class name to the cache key if matched. */
     cachekey.appendUaClass(config->_classifier);
 
@@ -110,8 +111,9 @@ TSRemapDoRemap(void *instance, TSHttpTxn txn, TSRemapRequestInfo *rri)
     cachekey.appendCookies(config->_cookies);
 
     /* Append the path to the cache key. */
-    cachekey.appendPath(config->_pathCapture, config->_pathCaptureUri);
-
+    if (!config->pathToBeRemoved()) {
+      cachekey.appendPath(config->_pathCapture, config->_pathCaptureUri);
+    }
     /* Append query parameters to the cache key. */
     cachekey.appendQuery(config->_query);
 

-- 
To stop receiving notification emails like this one, please contact
"commits@trafficserver.apache.org" <co...@trafficserver.apache.org>.