You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@geode.apache.org by GitBox <gi...@apache.org> on 2021/09/02 00:09:01 UTC

[GitHub] [geode] DonalEvans opened a new pull request #6831: GEODE-9519: Implement Radish ZSCAN command

DonalEvans opened a new pull request #6831:
URL: https://github.com/apache/geode/pull/6831


   This PR is intentionally split into two commits, the first refactors the existing *SCAN implementations and some related code. The second implements the ZSCAN command and adds tests. The changes in the first commit also address GEODE-9427 as a side-effect.
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
   
   - [x] Has your PR been rebased against the latest commit within the target branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [x] Does `gradlew build` run cleanly?
   
   - [x] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build issues and
   submit an update to your PR as soon as possible. If you need help, please send an
   email to dev@geode.apache.org.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] upthewaterspout commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
upthewaterspout commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r706443424



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisHash.java
##########
@@ -243,10 +243,9 @@ public int hstrlen(byte[] field) {
       throw new IllegalArgumentException("Requested array size exceeds VM limit");
     }
     List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
-    do {
-      cursor = hash.scan(cursor, 1,
-          (list, key, value) -> addIfMatching(matchPattern, list, key, value), resultList);
-    } while (cursor != 0 && resultList.size() < maximumCapacity);
+
+    cursor = hash.scan(cursor, count,
+        (list, key, value) -> addIfMatching(matchPattern, list, key, value), resultList);

Review comment:
       I think this is going to underpopulate the resultList if there is a matchPattern. The scan method is only going to call `addIfMatching` for `maximumCapacity` iterations, rather than waiting until resultList has reached `maximumCapacity`.

##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/executor/key/AbstractScanExecutor.java
##########
@@ -15,25 +15,158 @@
  */
 package org.apache.geode.redis.internal.executor.key;
 
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_CURSOR;
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_NOT_INTEGER;
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_SYNTAX;
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_WRONG_TYPE;
+import static org.apache.geode.redis.internal.netty.Coder.bytesToLong;
+import static org.apache.geode.redis.internal.netty.Coder.bytesToString;
+import static org.apache.geode.redis.internal.netty.Coder.equalsIgnoreCaseBytes;
+import static org.apache.geode.redis.internal.netty.Coder.narrowLongToInt;
+import static org.apache.geode.redis.internal.netty.StringBytesGlossary.bCOUNT;
+import static org.apache.geode.redis.internal.netty.StringBytesGlossary.bMATCH;
+
 import java.math.BigInteger;
+import java.util.List;
 import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.logging.log4j.Logger;
 
+import org.apache.geode.annotations.VisibleForTesting;
+import org.apache.geode.logging.internal.log4j.api.LogService;
+import org.apache.geode.redis.internal.RedisException;
+import org.apache.geode.redis.internal.data.RedisData;
+import org.apache.geode.redis.internal.data.RedisDataType;
+import org.apache.geode.redis.internal.data.RedisDataTypeMismatchException;
+import org.apache.geode.redis.internal.data.RedisKey;
 import org.apache.geode.redis.internal.executor.AbstractExecutor;
 import org.apache.geode.redis.internal.executor.GlobPattern;
+import org.apache.geode.redis.internal.executor.RedisResponse;
+import org.apache.geode.redis.internal.netty.Command;
+import org.apache.geode.redis.internal.netty.ExecutionHandlerContext;
 
 public abstract class AbstractScanExecutor extends AbstractExecutor {
-
-  protected final BigInteger UNSIGNED_LONG_CAPACITY = new BigInteger("18446744073709551615");
+  private static final Logger logger = LogService.getLogger();
+  public static final BigInteger UNSIGNED_LONG_CAPACITY = new BigInteger("18446744073709551615");
   protected final int DEFAULT_COUNT = 10;
 
+  @Override
+  public RedisResponse executeCommand(Command command, ExecutionHandlerContext context) {
+    List<byte[]> commandElems = command.getProcessedCommand();
+
+    int cursor;
+    try {
+      cursor = getIntCursor(bytesToString(commandElems.get(2)));
+    } catch (RedisException ex) {
+      return RedisResponse.error(ERROR_CURSOR);
+    }
+
+    RedisKey key = command.getKey();
+
+    // Because we're trying to preserve the same semantics of error conditions, with native redis,
+    // the ordering of input validation is reflected here. To that end the first check ends up
+    // being an existence check of the key. That causes a race since the data value needs to be
+    // accessed again when the actual command does its work. If the relevant bucket doesn't get
+    // locked throughout the call, the bucket may move producing inconsistent results.
+    return context.getRegionProvider().execute(key, () -> {
+      RedisData value = context.getRegionProvider().getRedisData(key);
+      if (value.isNull()) {
+        context.getRedisStats().incKeyspaceMisses();
+        return RedisResponse.emptyScan();
+      }
+
+      if (value.getType() != getDataType()) {
+        throw new RedisDataTypeMismatchException(ERROR_WRONG_TYPE);
+      }
+
+      command.getCommandType().checkDeferredParameters(command, context);
+      int count = DEFAULT_COUNT;
+      String globPattern = null;
+
+      for (int i = 3; i < commandElems.size(); i = i + 2) {
+        byte[] commandElemBytes = commandElems.get(i);
+        if (equalsIgnoreCaseBytes(commandElemBytes, bMATCH)) {
+          commandElemBytes = commandElems.get(i + 1);
+          globPattern = bytesToString(commandElemBytes);
+
+        } else if (equalsIgnoreCaseBytes(commandElemBytes, bCOUNT)) {
+          commandElemBytes = commandElems.get(i + 1);
+          try {
+            count = narrowLongToInt(bytesToLong(commandElemBytes));
+          } catch (NumberFormatException e) {
+            return RedisResponse.error(ERROR_NOT_INTEGER);
+          }
+
+          if (count < 1) {
+            return RedisResponse.error(ERROR_SYNTAX);
+          }
+
+        } else {
+          return RedisResponse.error(ERROR_SYNTAX);
+        }
+      }
+
+      Pattern matchPattern;
+      try {
+        matchPattern = convertGlobToRegex(globPattern);
+      } catch (PatternSyntaxException e) {
+        logger.warn(
+            "Could not compile the pattern: '{}' due to the following exception: '{}'. {} will return an empty list.",
+            globPattern, e.getMessage(), command.getCommandType().name());
+
+        return RedisResponse.emptyScan();
+      }
+
+      Pair<Integer, List<byte[]>> scanResult;
+      try {
+        scanResult = executeScan(context, key, matchPattern, count, cursor);
+      } catch (IllegalArgumentException ex) {
+        return RedisResponse.error(ERROR_NOT_INTEGER);
+      }
+
+      return RedisResponse.scan(scanResult.getLeft(), scanResult.getRight());
+    });
+  }
+
   /**
    * @param pattern A glob pattern.
    * @return A regex pattern to recognize the given glob pattern.
    */
-  protected Pattern convertGlobToRegex(String pattern) {
+  @VisibleForTesting
+  public Pattern convertGlobToRegex(String pattern) {
     if (pattern == null) {
       return null;
     }
     return GlobPattern.compile(pattern);
   }
+  // Redis allows values for CURSOR up to UNSIGNED_LONG_CAPACITY, but internally it only makes sense
+  // for us to use values up to Integer.MAX_VALUE, so safely narrow any cursor values to int here
+
+  @VisibleForTesting
+  public int getIntCursor(String cursorString) {

Review comment:
       How about we just use Integer.parseInt here (or Long.parseLong, if you prefer). Using BigInteger is a performance impact. Since the behavior is undetermined if the user passes as a cursor value other than 0 or a number that we previously returned, it seems like we could just treat numbers that are too large as 0 or integer max value or whatever. https://redis.io/commands/scan#calling-scan-with-a-corrupted-cursor




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] dschneider-pivotal commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
dschneider-pivotal commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r701364726



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,25 @@ long zrevrank(byte[] member) {
     return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
   }
 
+  ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size of the sorted set. We
+    // need to add 1 to zcard() to ensure that if count > members.size(), we return a cursor of 0
+    long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+    if (maximumCapacity > Integer.MAX_VALUE) {
+      LogService.getLogger().info(
+          "The size of the data to be returned by zscan, {}, exceeds the maximum capacity of an array. A value for the ZSCAN COUNT argument less than {} should be used",
+          maximumCapacity, Integer.MAX_VALUE / 2);
+      throw new IllegalArgumentException("Requested array size exceeds VM limit");
+    }
+    List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
+    do {
+      cursor = members.scan(cursor, 1,

Review comment:
       Something I find out about our scan code is that we implemented SizeableObject2ObjectOpenCustomHashMapWithCursor#scan to take a "count" parameter that allows it to loop but then when we call it we always give it "1" and the caller does a loop. It seems like we could refactor it so that the shared scan method on our HashMap does all the looping.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] DonalEvans commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
DonalEvans commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r701388381



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/executor/key/ScanExecutor.java
##########
@@ -103,7 +104,7 @@ public RedisResponse executeCommand(Command command, ExecutionHandlerContext con
         scan(getDataRegion(context).keySet(), matchPattern, count, cursor);
     context.setScanCursor(scanResult.getLeft());
 
-    return RedisResponse.scan(scanResult.getLeft(), scanResult.getRight());
+    return RedisResponse.scan(scanResult.getLeft().intValue(), scanResult.getRight());
   }
 
   private Pair<BigInteger, List<Object>> scan(Collection<RedisKey> list,

Review comment:
       I didn't want to have this PR get too big, so I haven't changed the existing implementations for SCAN or SSCAN. They're currently not supported commands, so I've left that work for whoever eventually implements them in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] jdeppe-pivotal commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
jdeppe-pivotal commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r702117757



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,25 @@ long zrevrank(byte[] member) {
     return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
   }
 
+  ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size of the sorted set. We
+    // need to add 1 to zcard() to ensure that if count > members.size(), we return a cursor of 0
+    long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+    if (maximumCapacity > Integer.MAX_VALUE) {
+      LogService.getLogger().info(
+          "The size of the data to be returned by zscan, {}, exceeds the maximum capacity of an array. A value for the ZSCAN COUNT argument less than {} should be used",
+          maximumCapacity, Integer.MAX_VALUE / 2);
+      throw new IllegalArgumentException("Requested array size exceeds VM limit");
+    }
+    List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
+    do {
+      cursor = members.scan(cursor, 1,

Review comment:
       That's pretty annoying :) I'd still argue for restoring the prior behavior (or some variant) since it should be more performant overall in that it will reduce the potential number of client calls. If we only process `count` entries for every `scan` call, the client is going to make ~`set_size / count` requests until they are done. The downside of the original implementation is that the key is locked for the duration of the call as more entries are iterated. Since the semantics are so loose, it would be awesome if the scan could be interrupted by a write operation, but that's just crazy talk...
   
   Even Redis requires a lot of calls when the matches are sparse. With a set of 10003 entries where 3 entries will match the `zscan` call (and count=3), Redis still takes 2769 calls before the whole set is processed, so I guess leaving this change is OK for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] DonalEvans commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
DonalEvans commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r702144208



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,25 @@ long zrevrank(byte[] member) {
     return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
   }
 
+  ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size of the sorted set. We
+    // need to add 1 to zcard() to ensure that if count > members.size(), we return a cursor of 0
+    long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+    if (maximumCapacity > Integer.MAX_VALUE) {
+      LogService.getLogger().info(
+          "The size of the data to be returned by zscan, {}, exceeds the maximum capacity of an array. A value for the ZSCAN COUNT argument less than {} should be used",
+          maximumCapacity, Integer.MAX_VALUE / 2);
+      throw new IllegalArgumentException("Requested array size exceeds VM limit");
+    }
+    List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
+    do {
+      cursor = members.scan(cursor, 1,

Review comment:
       Yeah the COUNT/MATCH interaction isn't ideal. From Redis' SCAN documentation:
   > It is important to note that the MATCH filter is applied after elements are retrieved from the collection, just before returning data to the client. This means that if the pattern matches very little elements inside the collection, SCAN will likely return no elements in most iterations.
   
   so it'd be a significant departure from Redis' way of doing things if we kept the old implementation (even if it's better).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] jdeppe-pivotal commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
jdeppe-pivotal commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r701955842



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,25 @@ long zrevrank(byte[] member) {
     return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
   }
 
+  ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size of the sorted set. We
+    // need to add 1 to zcard() to ensure that if count > members.size(), we return a cursor of 0
+    long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+    if (maximumCapacity > Integer.MAX_VALUE) {
+      LogService.getLogger().info(
+          "The size of the data to be returned by zscan, {}, exceeds the maximum capacity of an array. A value for the ZSCAN COUNT argument less than {} should be used",
+          maximumCapacity, Integer.MAX_VALUE / 2);
+      throw new IllegalArgumentException("Requested array size exceeds VM limit");
+    }
+    List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
+    do {
+      cursor = members.scan(cursor, 1,

Review comment:
       I think this breaks things a bit. @DonalEvans the assertion that was removed in `RedisHashTest.hscanOnlyReturnsElementsMatchingPattern` should remain. If that test is translated into an integration test you'll see it passes in native redis (with a cursor result of `0`).
   The javadoc for `SizeableBytes2ObjectOpenCustomHashMapWithCursor.scan` is slightly misleading (for me at least) in that it says:
   > This method will scan until at least count entries are returned...
   
   It should preferably say: _"until at least count entries are consumed"_. The caller of this method,`RedisHash.hscan`, has a different semantic for `count` in that it wants to _return_ that many entries that _match_ the pattern not just process that many entries in the hash. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] jdeppe-pivotal commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
jdeppe-pivotal commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r701959224



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,25 @@ long zrevrank(byte[] member) {
     return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
   }
 
+  ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size of the sorted set. We
+    // need to add 1 to zcard() to ensure that if count > members.size(), we return a cursor of 0
+    long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+    if (maximumCapacity > Integer.MAX_VALUE) {
+      LogService.getLogger().info(
+          "The size of the data to be returned by zscan, {}, exceeds the maximum capacity of an array. A value for the ZSCAN COUNT argument less than {} should be used",
+          maximumCapacity, Integer.MAX_VALUE / 2);
+      throw new IllegalArgumentException("Requested array size exceeds VM limit");
+    }
+    List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
+    do {
+      cursor = members.scan(cursor, 1,

Review comment:
       This is the integration test I tried:
   ```
     @Test
     public void wat() {
       Map<String, String> hashy = new HashMap<>();
       hashy.put("ak1", "v1");
       hashy.put("k2", "v2");
       hashy.put("ak3", "v3");
       hashy.put("k4", "v4");
       jedis.hset(HASH_KEY, hashy);
   
       ScanResult<Map.Entry<byte[], byte[]>> result =
           jedis.hscan(HASH_KEY.getBytes(), "0".getBytes(), new ScanParams().count(3).match("a*"));
   
       assertThat(result.getCursor()).isEqualTo("0");
     }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] DonalEvans commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
DonalEvans commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r701393351



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,25 @@ long zrevrank(byte[] member) {
     return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
   }
 
+  ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size of the sorted set. We
+    // need to add 1 to zcard() to ensure that if count > members.size(), we return a cursor of 0
+    long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+    if (maximumCapacity > Integer.MAX_VALUE) {
+      LogService.getLogger().info(
+          "The size of the data to be returned by zscan, {}, exceeds the maximum capacity of an array. A value for the ZSCAN COUNT argument less than {} should be used",
+          maximumCapacity, Integer.MAX_VALUE / 2);
+      throw new IllegalArgumentException("Requested array size exceeds VM limit");
+    }
+    List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
+    do {
+      cursor = members.scan(cursor, 1,

Review comment:
       Good call. It seems that all the tests pass with this change, so I've applied it to simplify things.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] jdeppe-pivotal commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
jdeppe-pivotal commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r701959224



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,25 @@ long zrevrank(byte[] member) {
     return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
   }
 
+  ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size of the sorted set. We
+    // need to add 1 to zcard() to ensure that if count > members.size(), we return a cursor of 0
+    long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+    if (maximumCapacity > Integer.MAX_VALUE) {
+      LogService.getLogger().info(
+          "The size of the data to be returned by zscan, {}, exceeds the maximum capacity of an array. A value for the ZSCAN COUNT argument less than {} should be used",
+          maximumCapacity, Integer.MAX_VALUE / 2);
+      throw new IllegalArgumentException("Requested array size exceeds VM limit");
+    }
+    List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
+    do {
+      cursor = members.scan(cursor, 1,

Review comment:
       This is the integration test I tried:
   ```java
     @Test
     public void wat() {
       Map<String, String> hashy = new HashMap<>();
       hashy.put("ak1", "v1");
       hashy.put("k2", "v2");
       hashy.put("ak3", "v3");
       hashy.put("k4", "v4");
       jedis.hset(HASH_KEY, hashy);
   
       ScanResult<Map.Entry<byte[], byte[]>> result =
           jedis.hscan(HASH_KEY.getBytes(), "0".getBytes(), new ScanParams().count(3).match("a*"));
   
       assertThat(result.getCursor()).isEqualTo("0");
     }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] kirklund commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
kirklund commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r702186251



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -568,6 +591,19 @@ private int getMaxElementsToReturn(AbstractSortedSetRangeOptions<?> rangeOptions
     return result;
   }
 
+  private void addIfMatching(Pattern matchPattern, List<byte[]> resultList, byte[] key,
+      double score) {
+    if (matchPattern != null) {
+      if (matchPattern.matcher(bytesToString(key)).matches()) {
+        resultList.add(key);
+        resultList.add(doubleToBytes(score));
+      }
+    } else {
+      resultList.add(key);
+      resultList.add(doubleToBytes(score));
+    }
+  }

Review comment:
       This change should be unit tested.

##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,24 @@ long zrevrank(byte[] member) {
     return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
   }
 
+  ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size of the sorted set. We
+    // need to add 1 to zcard() to ensure that if count > members.size(), we return a cursor of 0
+    long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+    if (maximumCapacity > Integer.MAX_VALUE) {
+      LogService.getLogger().info(

Review comment:
       `LogService.getLogger()` constructs a new instance of `FastLogger` every time it's called. You should always store it in a `static final` reference instead of inlining calls like this.
   ```
   private static final Logger logger = LogService.getLogger();
   ```

##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/executor/key/AbstractScanExecutor.java
##########
@@ -15,17 +15,120 @@
  */
 package org.apache.geode.redis.internal.executor.key;
 
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_CURSOR;
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_NOT_INTEGER;
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_SYNTAX;
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_WRONG_TYPE;
+import static org.apache.geode.redis.internal.netty.Coder.bytesToLong;
+import static org.apache.geode.redis.internal.netty.Coder.bytesToString;
+import static org.apache.geode.redis.internal.netty.Coder.equalsIgnoreCaseBytes;
+import static org.apache.geode.redis.internal.netty.Coder.narrowLongToInt;
+import static org.apache.geode.redis.internal.netty.StringBytesGlossary.bCOUNT;
+import static org.apache.geode.redis.internal.netty.StringBytesGlossary.bMATCH;
+
 import java.math.BigInteger;
+import java.util.List;
 import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.logging.log4j.Logger;
 
+import org.apache.geode.logging.internal.log4j.api.LogService;
+import org.apache.geode.redis.internal.RedisException;
+import org.apache.geode.redis.internal.data.RedisData;
+import org.apache.geode.redis.internal.data.RedisDataType;
+import org.apache.geode.redis.internal.data.RedisDataTypeMismatchException;
+import org.apache.geode.redis.internal.data.RedisKey;
 import org.apache.geode.redis.internal.executor.AbstractExecutor;
 import org.apache.geode.redis.internal.executor.GlobPattern;
+import org.apache.geode.redis.internal.executor.RedisResponse;
+import org.apache.geode.redis.internal.netty.Command;
+import org.apache.geode.redis.internal.netty.ExecutionHandlerContext;
 
 public abstract class AbstractScanExecutor extends AbstractExecutor {
-
+  private static final Logger logger = LogService.getLogger();
   protected final BigInteger UNSIGNED_LONG_CAPACITY = new BigInteger("18446744073709551615");
   protected final int DEFAULT_COUNT = 10;
 
+  @Override
+  public RedisResponse executeCommand(Command command, ExecutionHandlerContext context) {

Review comment:
       There are a lot of changes here that should be unit tested.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] kirklund commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
kirklund commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r708421976



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -69,6 +70,7 @@
 
 public class RedisSortedSet extends AbstractRedisData {
   protected static final int REDIS_SORTED_SET_OVERHEAD = memoryOverhead(RedisSortedSet.class);
+  public static final Logger logger = LogService.getLogger();

Review comment:
       You should probably make this private.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] dschneider-pivotal commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
dschneider-pivotal commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r701358638



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/executor/key/ScanExecutor.java
##########
@@ -103,7 +104,7 @@ public RedisResponse executeCommand(Command command, ExecutionHandlerContext con
         scan(getDataRegion(context).keySet(), matchPattern, count, cursor);
     context.setScanCursor(scanResult.getLeft());
 
-    return RedisResponse.scan(scanResult.getLeft(), scanResult.getRight());
+    return RedisResponse.scan(scanResult.getLeft().intValue(), scanResult.getRight());
   }
 
   private Pair<BigInteger, List<Object>> scan(Collection<RedisKey> list,

Review comment:
       why does this BigInteger flavor still exist




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] jdeppe-pivotal commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
jdeppe-pivotal commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r701955842



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,25 @@ long zrevrank(byte[] member) {
     return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
   }
 
+  ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size of the sorted set. We
+    // need to add 1 to zcard() to ensure that if count > members.size(), we return a cursor of 0
+    long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+    if (maximumCapacity > Integer.MAX_VALUE) {
+      LogService.getLogger().info(
+          "The size of the data to be returned by zscan, {}, exceeds the maximum capacity of an array. A value for the ZSCAN COUNT argument less than {} should be used",
+          maximumCapacity, Integer.MAX_VALUE / 2);
+      throw new IllegalArgumentException("Requested array size exceeds VM limit");
+    }
+    List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
+    do {
+      cursor = members.scan(cursor, 1,

Review comment:
       I think this breaks things a bit. @DonalEvans the assertion that was removed in `RedisHashTest.hscanOnlyReturnsElementsMatchingPattern` should remain. If that test is translated into an integration test you'll see it passes in native redis (with a cursor result of `0`).
   The javadoc for `SizeableBytes2ObjectOpenCustomHashMapWithCursor.scan` is slightly misleading (for me at least) in that it says:
   > This method will scan until at least count entries are returned...
   It should preferably say: _"until at least count entries are consumed"_. The caller of this method,`RedisHash.hscan`, has a different semantic for `count` in that it wants to _return_ that many entries that _match_ the pattern not just process that many entries in the hash. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] DonalEvans commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
DonalEvans commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r706509027



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/executor/key/AbstractScanExecutor.java
##########
@@ -15,25 +15,158 @@
  */
 package org.apache.geode.redis.internal.executor.key;
 
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_CURSOR;
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_NOT_INTEGER;
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_SYNTAX;
+import static org.apache.geode.redis.internal.RedisConstants.ERROR_WRONG_TYPE;
+import static org.apache.geode.redis.internal.netty.Coder.bytesToLong;
+import static org.apache.geode.redis.internal.netty.Coder.bytesToString;
+import static org.apache.geode.redis.internal.netty.Coder.equalsIgnoreCaseBytes;
+import static org.apache.geode.redis.internal.netty.Coder.narrowLongToInt;
+import static org.apache.geode.redis.internal.netty.StringBytesGlossary.bCOUNT;
+import static org.apache.geode.redis.internal.netty.StringBytesGlossary.bMATCH;
+
 import java.math.BigInteger;
+import java.util.List;
 import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.logging.log4j.Logger;
 
+import org.apache.geode.annotations.VisibleForTesting;
+import org.apache.geode.logging.internal.log4j.api.LogService;
+import org.apache.geode.redis.internal.RedisException;
+import org.apache.geode.redis.internal.data.RedisData;
+import org.apache.geode.redis.internal.data.RedisDataType;
+import org.apache.geode.redis.internal.data.RedisDataTypeMismatchException;
+import org.apache.geode.redis.internal.data.RedisKey;
 import org.apache.geode.redis.internal.executor.AbstractExecutor;
 import org.apache.geode.redis.internal.executor.GlobPattern;
+import org.apache.geode.redis.internal.executor.RedisResponse;
+import org.apache.geode.redis.internal.netty.Command;
+import org.apache.geode.redis.internal.netty.ExecutionHandlerContext;
 
 public abstract class AbstractScanExecutor extends AbstractExecutor {
-
-  protected final BigInteger UNSIGNED_LONG_CAPACITY = new BigInteger("18446744073709551615");
+  private static final Logger logger = LogService.getLogger();
+  public static final BigInteger UNSIGNED_LONG_CAPACITY = new BigInteger("18446744073709551615");
   protected final int DEFAULT_COUNT = 10;
 
+  @Override
+  public RedisResponse executeCommand(Command command, ExecutionHandlerContext context) {
+    List<byte[]> commandElems = command.getProcessedCommand();
+
+    int cursor;
+    try {
+      cursor = getIntCursor(bytesToString(commandElems.get(2)));
+    } catch (RedisException ex) {
+      return RedisResponse.error(ERROR_CURSOR);
+    }
+
+    RedisKey key = command.getKey();
+
+    // Because we're trying to preserve the same semantics of error conditions, with native redis,
+    // the ordering of input validation is reflected here. To that end the first check ends up
+    // being an existence check of the key. That causes a race since the data value needs to be
+    // accessed again when the actual command does its work. If the relevant bucket doesn't get
+    // locked throughout the call, the bucket may move producing inconsistent results.
+    return context.getRegionProvider().execute(key, () -> {
+      RedisData value = context.getRegionProvider().getRedisData(key);
+      if (value.isNull()) {
+        context.getRedisStats().incKeyspaceMisses();
+        return RedisResponse.emptyScan();
+      }
+
+      if (value.getType() != getDataType()) {
+        throw new RedisDataTypeMismatchException(ERROR_WRONG_TYPE);
+      }
+
+      command.getCommandType().checkDeferredParameters(command, context);
+      int count = DEFAULT_COUNT;
+      String globPattern = null;
+
+      for (int i = 3; i < commandElems.size(); i = i + 2) {
+        byte[] commandElemBytes = commandElems.get(i);
+        if (equalsIgnoreCaseBytes(commandElemBytes, bMATCH)) {
+          commandElemBytes = commandElems.get(i + 1);
+          globPattern = bytesToString(commandElemBytes);
+
+        } else if (equalsIgnoreCaseBytes(commandElemBytes, bCOUNT)) {
+          commandElemBytes = commandElems.get(i + 1);
+          try {
+            count = narrowLongToInt(bytesToLong(commandElemBytes));
+          } catch (NumberFormatException e) {
+            return RedisResponse.error(ERROR_NOT_INTEGER);
+          }
+
+          if (count < 1) {
+            return RedisResponse.error(ERROR_SYNTAX);
+          }
+
+        } else {
+          return RedisResponse.error(ERROR_SYNTAX);
+        }
+      }
+
+      Pattern matchPattern;
+      try {
+        matchPattern = convertGlobToRegex(globPattern);
+      } catch (PatternSyntaxException e) {
+        logger.warn(
+            "Could not compile the pattern: '{}' due to the following exception: '{}'. {} will return an empty list.",
+            globPattern, e.getMessage(), command.getCommandType().name());
+
+        return RedisResponse.emptyScan();
+      }
+
+      Pair<Integer, List<byte[]>> scanResult;
+      try {
+        scanResult = executeScan(context, key, matchPattern, count, cursor);
+      } catch (IllegalArgumentException ex) {
+        return RedisResponse.error(ERROR_NOT_INTEGER);
+      }
+
+      return RedisResponse.scan(scanResult.getLeft(), scanResult.getRight());
+    });
+  }
+
   /**
    * @param pattern A glob pattern.
    * @return A regex pattern to recognize the given glob pattern.
    */
-  protected Pattern convertGlobToRegex(String pattern) {
+  @VisibleForTesting
+  public Pattern convertGlobToRegex(String pattern) {
     if (pattern == null) {
       return null;
     }
     return GlobPattern.compile(pattern);
   }
+  // Redis allows values for CURSOR up to UNSIGNED_LONG_CAPACITY, but internally it only makes sense
+  // for us to use values up to Integer.MAX_VALUE, so safely narrow any cursor values to int here
+
+  @VisibleForTesting
+  public int getIntCursor(String cursorString) {

Review comment:
       In order to match Redis' error behaviour, we need to be able to tell the difference between a value for cursor that's not an integer of any kind ("123abc" for example), and a value that's larger than `Long.MAX_VALUE` but smaller than the maximum capacity of an unsigned long. Redis returns an error for the first case, but doesn't for the second (even though it's a nonsense value for the cursor). In order to check if the cursor value is in that second range, BigInteger is needed.
   
   The alternative is to deviate from Redis' error behaviour here, but that's something we've avoided doing everywhere else. It's definitely an obscure corner case, since users realistically shouldn't ever be passing values for CURSOR greater than `Integer.MAX_VALUE`, let alone as big as an unsigned long, so maybe we're okay with having a difference here that we can document, where a cursor value greater than `Long.MAX_VALUE` returns an error, since we have no way to differentiate between a value that's too big and a value that's gibberish.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] DonalEvans merged pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
DonalEvans merged pull request #6831:
URL: https://github.com/apache/geode/pull/6831


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] DonalEvans commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
DonalEvans commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r702030694



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,25 @@ long zrevrank(byte[] member) {
     return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
   }
 
+  ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size of the sorted set. We
+    // need to add 1 to zcard() to ensure that if count > members.size(), we return a cursor of 0
+    long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+    if (maximumCapacity > Integer.MAX_VALUE) {
+      LogService.getLogger().info(
+          "The size of the data to be returned by zscan, {}, exceeds the maximum capacity of an array. A value for the ZSCAN COUNT argument less than {} should be used",
+          maximumCapacity, Integer.MAX_VALUE / 2);
+      throw new IllegalArgumentException("Requested array size exceeds VM limit");
+    }
+    List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
+    do {
+      cursor = members.scan(cursor, 1,

Review comment:
       The difficulty here is that Redis has different behaviour depending on the size of the sorted set/hash. For sizes less than a certain amount, they internally use a different data structure, which results in different behaviour:
   
   >"When iterating Sets encoded as intsets (small sets composed of just integers), or Hashes and Sorted Sets encoded as ziplists (small hashes and sets composed of small individual values), usually all the elements are returned in the first SCAN call regardless of the COUNT value."
   
   from https://redis.io/commands/scan#the-count-option
   
   But for larger sizes, a scan command may return no results (except a cursor value) when used with MATCH and COUNT. I created a hash with ~500 entries, three of which matched the pattern `a*`, and called HSCAN with `MATCH a*` and various COUNT values. Redis returned an empty array for COUNT values up to 106, and only one result for COUNT values up to 243, so it's not accurate to say that the COUNT argument indicates the number of matching entries that should be returned.
   
   Given the inconsistent behaviour of Redis' scan commands, I'm not sure that we want to try to exactly match behaviour for situations where a full iteration isn't guaranteed, because our internal implementation will not behave the same for small set/hash sizes. This can be tested by creating a hash with 20 entries, then calling HSCAN with a COUNT value of 5. Redis will return all entries, but our implementation will return 5.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] DonalEvans commented on a change in pull request #6831: GEODE-9519: Implement Radish ZSCAN command

Posted by GitBox <gi...@apache.org>.
DonalEvans commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r706505532



##########
File path: geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisHash.java
##########
@@ -243,10 +243,9 @@ public int hstrlen(byte[] field) {
       throw new IllegalArgumentException("Requested array size exceeds VM limit");
     }
     List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
-    do {
-      cursor = hash.scan(cursor, 1,
-          (list, key, value) -> addIfMatching(matchPattern, list, key, value), resultList);
-    } while (cursor != 0 && resultList.size() < maximumCapacity);
+
+    cursor = hash.scan(cursor, count,
+        (list, key, value) -> addIfMatching(matchPattern, list, key, value), resultList);

Review comment:
       That's the behaviour that Redis has. See this comment: https://github.com/apache/geode/pull/6831#discussion_r702144208




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org