You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "clintropolis (via GitHub)" <gi...@apache.org> on 2023/02/03 07:24:53 UTC

[GitHub] [druid] clintropolis commented on a diff in pull request #13732: various nested column (and other) fixes

clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095431083


##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -168,22 +173,85 @@ public NestedDataColumnSupplier(
         );
         if (metadata.hasNulls()) {
           columnBuilder.setHasNulls(true);
-          final ByteBuffer nullIndexBuffer = loadInternalFile(mapper, NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME);
+          final ByteBuffer nullIndexBuffer = loadInternalFile(
+              mapper,
+              metadata,
+              NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME
+          );
           nullValues = metadata.getBitmapSerdeFactory().getObjectStrategy().fromByteBufferWithSize(nullIndexBuffer);
         } else {
           nullValues = metadata.getBitmapSerdeFactory().getBitmapFactory().makeEmptyImmutableBitmap();
         }
+
+        return new NestedDataColumnSupplier(
+            version,
+            metadata,
+            fields,
+            fieldInfo,
+            compressedRawColumnSupplier,
+            nullValues,
+            stringDictionary,
+            frontCodedStringDictionarySupplier,
+            longDictionarySupplier,
+            doubleDictionarySupplier,
+            columnConfig,
+            mapper,
+            simpleType
+        );

Review Comment:
   i didn't actually change anything here, i just moved the logic out of the constructor into a static method per a request of yours on a previous PR. the nested column segments are pretty light due to the use of suppliers, particularly when using `FrontCodedIndexed` for the string dictionary https://github.com/apache/druid/pull/12277#discussion_r1004350233, so they consist mostly of a reference to the underlying buffer from the mmap segment and the supplier object until materialized into an indexed for queries. I did this with heap usage specifically in mind.
   
   `GenericIndexed` is the most expensive part of segment heap usage afaict, and the limited use of it here helps keep things light, because its both materialized into an `Indexed` and also still functions like a supplier with the 'singleThreaded' method, and usually is the driver of heap size in the dumps i've seen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org