You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/01 06:39:52 UTC

[GitHub] [druid] zachjsh opened a new pull request #11863: Warn if cache size of lookup is beyond max size

zachjsh opened a new pull request #11863:
URL: https://github.com/apache/druid/pull/11863


   ### Description
   
   Enhanced the `ExtractionNamespace` interface in `lookups-cached-global` core extension with the ability to set a `maxSize` in bytes for the cache of the respective namespace. The reason for adding this functionality, is make it easier to detect when a lookup table grows to a size that the underlying service cannot handle, because it does not have enough memory.. The default value of maxSize for the interface is -1, which indicates that no maxSize has been set. For the `JdbcExtractionNamespace` and `UriExtractionNamespace` implementations, the default value is null, which will cause the respective service that the lookup is loaded in, to warn when its cache is beyond the maximum of either 5MB, or 10% of the service's configured max heap size. If a non-null value is set for the namespace's `maxSize` config, this value will be honored for all services that the respective lookup is loaded onto, and consequently log warning messages when the cache of the respective lookup grows beyond 
 this size.  Warnings are logged everytime that either Uri based on Jdbc based lookups are regenerated, no other implementations will log warnings at this time. No error is thrown when the size exceeds the maxSize at this time, as doing so could break functionality for existing users. Previously the `JdbcCacheGenerator` generated its cache by materializing all rows of the underling table in memory at once; this made it difficult to log warning messages in the case that the results from the jdbc query exceeded the max size. To help with this, I've made it so that the jdbc query results are instead streamed through an iterator.
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r742426892



##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       See comment in populate function `// this top level check so that we dont keep logging inability to determine`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh merged pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
zachjsh merged pull request #11863:
URL: https://github.com/apache/druid/pull/11863


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #11863:
URL: https://github.com/apache/druid/pull/11863#issuecomment-956007091


   This pull request **introduces 2 alerts** when merging d38e71193413a2a37c8b1b4cafa89f33c7986aed into 90640bb316283d7377cb0f1fee3c45dd4b87e68a - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-a8c0f0419cec3f80da74f1b2fd4860c689ef56e4)
   
   **new alerts:**
   
   * 2 for Implicit narrowing conversion in compound assignment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r740468584



##########
File path: docs/development/extensions-core/lookups-cached-global.md
##########
@@ -204,7 +204,8 @@ The remapping values for each globally cached lookup can be specified by a JSON
         "\"value\"]"
       ]
   },
-  "pollPeriod":"PT5M"
+  "pollPeriod":"PT5M",
+  "maxSize": 5242880

Review comment:
       I think it'd make more sense to express this as a percentage of available heap instead of a fixed byte limit, since the amount of available memory can differ across the various nodes where lookups might be loaded

##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       Are there cases where the value is not a string? If so, suggest adding support for those cases, this log seems likely to result in excessive logging since it's per entry




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r742426892



##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       See comment in populate function `// this top level check so that we dont keep logging inability to determine`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r740476489



##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       Are there cases where the value is not a string? If so, suggest adding support for those cases, this log seems likely to result in excessive logging since it's per entry




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r742421264



##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       can you adjust so that it doesn't potentially log on every value

##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       can you adjust so that it doesn't potentially log on every value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r742426892



##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       See comment in populate function `// this top level check so that we dont keep logging inability to determine`

##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       See comment in populate function `// this top level check so that we dont keep logging inability to determine`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r740788541



##########
File path: docs/development/extensions-core/lookups-cached-global.md
##########
@@ -204,7 +204,8 @@ The remapping values for each globally cached lookup can be specified by a JSON
         "\"value\"]"
       ]
   },
-  "pollPeriod":"PT5M"
+  "pollPeriod":"PT5M",
+  "maxSize": 5242880

Review comment:
       Good suggestion, fixed.

##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       thanks for suggestion, should be fixed now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r742421264



##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       can you adjust so that it doesn't potentially log on every value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r740468584



##########
File path: docs/development/extensions-core/lookups-cached-global.md
##########
@@ -204,7 +204,8 @@ The remapping values for each globally cached lookup can be specified by a JSON
         "\"value\"]"
       ]
   },
-  "pollPeriod":"PT5M"
+  "pollPeriod":"PT5M",
+  "maxSize": 5242880

Review comment:
       I think it'd make more sense to express this as a percentage of available heap instead of a fixed byte limit, since the amount of available memory can differ across the various nodes where lookups might be loaded




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r740788541



##########
File path: docs/development/extensions-core/lookups-cached-global.md
##########
@@ -204,7 +204,8 @@ The remapping values for each globally cached lookup can be specified by a JSON
         "\"value\"]"
       ]
   },
-  "pollPeriod":"PT5M"
+  "pollPeriod":"PT5M",
+  "maxSize": 5242880

Review comment:
       Good suggestion, fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r740788541



##########
File path: docs/development/extensions-core/lookups-cached-global.md
##########
@@ -204,7 +204,8 @@ The remapping values for each globally cached lookup can be specified by a JSON
         "\"value\"]"
       ]
   },
-  "pollPeriod":"PT5M"
+  "pollPeriod":"PT5M",
+  "maxSize": 5242880

Review comment:
       Good suggestion, fixed.

##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       thanks for suggestion, should be fixed now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r742421264



##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       can you adjust so that it doesn't potentially log on every value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11863: Warn if cache size of lookup is beyond max size

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r740788688



##########
File path: extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning should be shown in the log
+   * @param name      The name of the map that is being populated. Used to identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       thanks for suggestion, should be fixed now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org