You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/10 07:57:41 UTC

[GitHub] [pinot] richardstartin commented on a diff in pull request #8878: Optimize the immutable STRING/BYTES dictionary lookup

richardstartin commented on code in PR #8878:
URL: https://github.com/apache/pinot/pull/8878#discussion_r894256721


##########
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkStringDictionary.java:
##########
@@ -60,59 +70,85 @@ public class BenchmarkStringDictionary {
 
   private PinotDataBufferMemoryManager _memoryManager;
   private String[] _values;
-  private StringOffHeapMutableDictionary _offHeapDictionary;
-  private StringOnHeapMutableDictionary _onHeapDictionary;
+  private StringDictionary _stringDictionary;
+  private OnHeapStringDictionary _onHeapStringDictionary;
+  private StringOffHeapMutableDictionary _offHeapMutableDictionary;
+  private StringOnHeapMutableDictionary _onHeapMutableDictionary;
 
   @Setup
-  public void setUp() {
+  public void setUp()
+      throws IOException {
+    FileUtils.deleteDirectory(INDEX_DIR);
+    FileUtils.forceMkdir(INDEX_DIR);
     _memoryManager = new DirectMemoryManager("");
-    _offHeapDictionary =
+    _offHeapMutableDictionary =
         new StringOffHeapMutableDictionary(CARDINALITY, CARDINALITY / 10, _memoryManager, null, _maxValueLength / 2);
-    _onHeapDictionary = new StringOnHeapMutableDictionary();
-    String[] uniqueValues = new String[CARDINALITY];
-    for (int i = 0; i < CARDINALITY; i++) {
-      String value = generateRandomString(RANDOM.nextInt(_maxValueLength + 1));
-      uniqueValues[i] = value;
-      _offHeapDictionary.index(value);
-      _onHeapDictionary.index(value);
+    _onHeapMutableDictionary = new StringOnHeapMutableDictionary();
+    TreeSet<String> uniqueValues = new TreeSet<>();
+    while (uniqueValues.size() < CARDINALITY) {
+      String value = RandomStringUtils.randomAscii(RANDOM.nextInt(_maxValueLength + 1));

Review Comment:
   This needs to be varied to get more of the full picture: repository URLs from the GitHub events data set would lead to very different numbers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org