You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "SabrinaZhaozyf (via GitHub)" <gi...@apache.org> on 2023/11/03 00:16:13 UTC

[PR] Add instrumentation for DataTable Creation [pinot]

SabrinaZhaozyf opened a new pull request, #11942:
URL: https://github.com/apache/pinot/pull/11942

   When the InstanceResponseBlock is creating the DataTable based on the final results in the combine operator, that data table creation is hitting OOM (see examples below). This PR adds instrumentation for this path under the query killing framework.
   
   Examples from heap dumps
   **Selection**
   ```
   java.lang.OutOfMemoryError.<init>(OutOfMemoryError.java:48)
   java.util.Arrays.copyOf(Arrays.java:3745)
     Local variables
   java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
     Local variables
   java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
   java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
     Local variables
   java.io.OutputStream.write(OutputStream.java:122)
   org.apache.pinot.core.common.datatable.BaseDataTableBuilder.finishRow(BaseDataTableBuilder.java:163)
   org.apache.pinot.core.query.selection.SelectionOperatorUtils.getDataTableFromRows(SelectionOperatorUtils.java:342)
     Local variables
   org.apache.pinot.core.operator.blocks.results.SelectionResultsBlock.getDataTable(SelectionResultsBlock.java:85)
   org.apache.pinot.core.operator.blocks.InstanceResponseBlock.toDataOnlyDataTable(InstanceResponseBlock.java:128)
   org.apache.pinot.core.operator.blocks.InstanceResponseBlock.toDataTable(InstanceResponseBlock.java:121)
     Local variables
   org.apache.pinot.core.query.scheduler.QueryScheduler.serializeResponse(QueryScheduler.java:337)
   ```
   **GroupBy**
   ```
   java.lang.OutOfMemoryError.<init>(OutOfMemoryError.java:48)
   java.util.Arrays.copyOf(Arrays.java:3745)
     Local variables
   java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
     Local variables
   java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
   java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
     Local variables
   java.io.OutputStream.write(OutputStream.java:122)
   org.apache.pinot.core.common.datatable.BaseDataTableBuilder.setColumn(BaseDataTableBuilder.java:112)
   org.apache.pinot.core.operator.blocks.results.GroupByResultsBlock.setDataTableColumn(GroupByResultsBlock.java:262)
   org.apache.pinot.core.operator.blocks.results.GroupByResultsBlock.getDataTable(GroupByResultsBlock.java:209)
     Local variables
   org.apache.pinot.core.operator.blocks.InstanceResponseBlock.toDataOnlyDataTable(InstanceResponseBlock.java:117)
   org.apache.pinot.core.operator.blocks.InstanceResponseBlock.toDataTable(InstanceResponseBlock.java:110)
     Local variables
   org.apache.pinot.core.query.scheduler.QueryScheduler.serializeResponse(QueryScheduler.java:337)
     Local variables
   ```
   
   **Testing**
   
   - Tests added to`ResourceManagerAccountingTest` verify that memory usage is sampled, interruption is checked, and `EarlyTerminationException` is thrown in `getDataTable` by manually setting up the accountant **after** rows/result blocks are created (only samples for data table creation). Without instrumentation, tests will throw OOM exception.
   - Deterministic integration tests to check that queries are killed exactly at data table creation can be hard. Reasons being that triggering threshold too low can cause queries to be killed at combine level before hitting data table creation while a higher threshold can only kill the query on the server nondeterministically. 
   
   Below is an example of query exception in the response when a kill happened exactly at data table creation. While it is not deterministic, to reproduce this, you can set server `accounting.oom.critical.heap.usage.ratio` to be higher in `OfflineClusterMemBasedServerQueryKillingTest` and try running `testSelectionOnlyOOM` .
   
   ```
   {"message":"QueryCancellationError:\norg.apache.pinot.spi.exception.QueryCancelledException: Cancelled while building data table java.lang.RuntimeException:  Query Broker_172.25.200.137_18099_752082720000000006_O got killed because using 2353711296 bytes of memory on SERVER: Server_localhost_8098, exceeding the quota\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.serializeResponse(QueryScheduler.java:227)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:156)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:124)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n...\nCaused by: org.apache.pinot.spi.exception.EarlyTerminationException: Interrupted while merging records\n\tat org.apache.pinot.spi.trace.Tracing$ThreadAccountantOps.sampleAndCheckInterruption(Tracing.java:288)\n\tat org.apache.pinot.spi.trace.Tracing$ThreadAccountantOps.s
 ampleAndCheckInterruptionPeriodically(Tracing.java:304)\n\tat org.apache.pinot.core.query.selection.SelectionOperatorUtils.getDataTableFromRows(SelectionOperatorUtils.java:388)\n\tat org.apache.pinot.core.operator.blocks.results.SelectionResultsBlock.getDataTable(SelectionResultsBlock.java:84)","errorCode":503}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add instrumentation for DataTable Creation [pinot]

Posted by "SabrinaZhaozyf (via GitHub)" <gi...@apache.org>.
SabrinaZhaozyf commented on PR #11942:
URL: https://github.com/apache/pinot/pull/11942#issuecomment-1804276003

   > Could you take a look at the tests? They are failing and the log size looks pretty large, Maybe set the log level to info in the new tests?
   The failed tests were due to a flaky test which [has been fixed here](https://github.com/apache/pinot/pull/11963). 
   Rebased and turned off logging for the tests added. 
   
   Thank you for the review! @jasperjiaguo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add instrumentation for DataTable Creation [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11942:
URL: https://github.com/apache/pinot/pull/11942#issuecomment-1791758562

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11942?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11942](https://app.codecov.io/gh/apache/pinot/pull/11942?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (65a961c) into [master](https://app.codecov.io/gh/apache/pinot/commit/53e80331a4e8cb1593e01b8133a6d90aa39cef22?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (53e8033) will **decrease** coverage by `61.31%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #11942       +/-   ##
   =============================================
   - Coverage     61.30%    0.00%   -61.31%     
   =============================================
     Files          2378     2302       -76     
     Lines        128865   125132     -3733     
     Branches      19927    19370      -557     
   =============================================
   - Hits          78998        0    -78998     
   - Misses        44147   125132    +80985     
   + Partials       5720        0     -5720     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (?)` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.31%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.27%)` | :arrow_down: |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.31%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/11942/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/11942?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...t/core/query/selection/SelectionOperatorUtils.java](https://app.codecov.io/gh/apache/pinot/pull/11942?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9zZWxlY3Rpb24vU2VsZWN0aW9uT3BlcmF0b3JVdGlscy5qYXZh) | `0.00% <0.00%> (-92.77%)` | :arrow_down: |
   | [...e/operator/blocks/results/GroupByResultsBlock.java](https://app.codecov.io/gh/apache/pinot/pull/11942?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9ibG9ja3MvcmVzdWx0cy9Hcm91cEJ5UmVzdWx0c0Jsb2NrLmphdmE=) | `0.00% <0.00%> (-82.93%)` | :arrow_down: |
   | [...che/pinot/core/query/scheduler/QueryScheduler.java](https://app.codecov.io/gh/apache/pinot/pull/11942?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9zY2hlZHVsZXIvUXVlcnlTY2hlZHVsZXIuamF2YQ==) | `0.00% <0.00%> (-62.50%)` | :arrow_down: |
   
   ... and [1980 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11942/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in [Chrome](https://chrome.google.com/webstore/detail/codecov/gedikamndpbemklijjkncpnolildpbgo) or [Firefox](https://addons.mozilla.org/en-US/firefox/addon/codecov/) today!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add instrumentation for DataTable Creation [pinot]

Posted by "siddharthteotia (via GitHub)" <gi...@apache.org>.
siddharthteotia merged PR #11942:
URL: https://github.com/apache/pinot/pull/11942


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add instrumentation for DataTable Creation [pinot]

Posted by "SabrinaZhaozyf (via GitHub)" <gi...@apache.org>.
SabrinaZhaozyf commented on code in PR #11942:
URL: https://github.com/apache/pinot/pull/11942#discussion_r1385693119


##########
pinot-core/src/test/java/org/apache/pinot/core/accounting/ResourceManagerAccountingTest.java:
##########
@@ -223,6 +238,145 @@ public void testWorkerThreadInterruption()
     Assert.fail("Expected EarlyTerminationException to be thrown");
   }
 
+  /**
+   * Test instrumentation during {@link DataTable} creation
+   */
+  @Test
+  public void testGetDataTableOOMSelect()
+      throws Exception {
+
+    // generate random rows
+    String[] columnNames = {
+        "int", "long", "float", "double", "big_decimal", "string", "bytes", "int_array", "long_array", "float_array",
+        "double_array", "string_array"
+    };
+    DataSchema.ColumnDataType[] columnDataTypes = {
+        DataSchema.ColumnDataType.INT, DataSchema.ColumnDataType.LONG, DataSchema.ColumnDataType.FLOAT,
+        DataSchema.ColumnDataType.DOUBLE, DataSchema.ColumnDataType.BIG_DECIMAL, DataSchema.ColumnDataType.STRING,
+        DataSchema.ColumnDataType.BYTES, DataSchema.ColumnDataType.INT_ARRAY, DataSchema.ColumnDataType.LONG_ARRAY,
+        DataSchema.ColumnDataType.FLOAT_ARRAY, DataSchema.ColumnDataType.DOUBLE_ARRAY,
+        DataSchema.ColumnDataType.STRING_ARRAY
+    };
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
+    List<Object[]> rows = DataBlockTestUtils.getRandomRows(dataSchema, NUM_ROWS, 0);
+
+    // set up logging and configs
+    LogManager.getLogger(PerQueryCPUMemResourceUsageAccountant.class).setLevel(Level.DEBUG);
+    LogManager.getLogger(ThreadResourceUsageProvider.class).setLevel(Level.DEBUG);
+    ThreadResourceUsageProvider.setThreadMemoryMeasurementEnabled(true);
+    HashMap<String, Object> configs = new HashMap<>();
+    ServerMetrics.register(Mockito.mock(ServerMetrics.class));
+    configs.put(CommonConstants.Accounting.CONFIG_OF_ALARMING_LEVEL_HEAP_USAGE_RATIO, 0.00f);
+    configs.put(CommonConstants.Accounting.CONFIG_OF_CRITICAL_LEVEL_HEAP_USAGE_RATIO, 0.00f);
+    configs.put(CommonConstants.Accounting.CONFIG_OF_FACTORY_NAME,
+        "org.apache.pinot.core.accounting.PerQueryCPUMemAccountantFactory");
+    configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_MEMORY_SAMPLING, true);
+    configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_CPU_SAMPLING, false);
+    configs.put(CommonConstants.Accounting.CONFIG_OF_OOM_PROTECTION_KILLING_QUERY, true);
+    PinotConfiguration config = getConfig(1, 2, configs);
+    ResourceManager rm = getResourceManager(1, 2, 1, 1, configs);
+    // init accountant and start watcher task
+    Tracing.ThreadAccountantOps.initializeThreadAccountant(config, "testSelect");
+
+    ListenableFuture<?> future = rm.getQueryRunners().submit(() -> {
+      Tracing.ThreadAccountantOps.setupRunner("testSelectQueryId");
+      try {
+        int numIterations = 0;
+        List<DataTable> dataTables = new ArrayList<>();
+        while (numIterations < 100) {
+          // build data table with sampling
+          dataTables.add(SelectionOperatorUtils.getDataTableFromRows(rows, dataSchema, false));

Review Comment:
   Updated both tests. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add instrumentation for DataTable Creation [pinot]

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #11942:
URL: https://github.com/apache/pinot/pull/11942#discussion_r1382267013


##########
pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java:
##########
@@ -215,6 +219,14 @@ private byte[] serializeResponse(ServerQueryRequest queryRequest, InstanceRespon
     byte[] responseByte = null;
     try {
       responseByte = instanceResponse.toDataTable().toBytes();

Review Comment:
   Could we verify whether there's memory allocation in toBytes()? Do we need to instrument that part as well? Meanwhile, IMO it might be helpful to very briefly explain why we only added instrumentation only for groupby and selection operation utils in the PR description @SabrinaZhaozyf 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add instrumentation for DataTable Creation [pinot]

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #11942:
URL: https://github.com/apache/pinot/pull/11942#discussion_r1382267013


##########
pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java:
##########
@@ -215,6 +219,14 @@ private byte[] serializeResponse(ServerQueryRequest queryRequest, InstanceRespon
     byte[] responseByte = null;
     try {
       responseByte = instanceResponse.toDataTable().toBytes();

Review Comment:
   Could we verify whether there's memory allocation in toBytes()? We might need to instrument that part as well. Meanwhile IMO it might be helpful to very briefly explain why we only added instrumentation only for groupby and selection operation utils in the PR description @SabrinaZhaozyf 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add instrumentation for DataTable Creation [pinot]

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #11942:
URL: https://github.com/apache/pinot/pull/11942#discussion_r1382289868


##########
pinot-core/src/test/java/org/apache/pinot/core/accounting/ResourceManagerAccountingTest.java:
##########
@@ -223,6 +238,145 @@ public void testWorkerThreadInterruption()
     Assert.fail("Expected EarlyTerminationException to be thrown");
   }
 
+  /**
+   * Test instrumentation during {@link DataTable} creation
+   */
+  @Test
+  public void testGetDataTableOOMSelect()
+      throws Exception {
+
+    // generate random rows
+    String[] columnNames = {
+        "int", "long", "float", "double", "big_decimal", "string", "bytes", "int_array", "long_array", "float_array",
+        "double_array", "string_array"
+    };
+    DataSchema.ColumnDataType[] columnDataTypes = {
+        DataSchema.ColumnDataType.INT, DataSchema.ColumnDataType.LONG, DataSchema.ColumnDataType.FLOAT,
+        DataSchema.ColumnDataType.DOUBLE, DataSchema.ColumnDataType.BIG_DECIMAL, DataSchema.ColumnDataType.STRING,
+        DataSchema.ColumnDataType.BYTES, DataSchema.ColumnDataType.INT_ARRAY, DataSchema.ColumnDataType.LONG_ARRAY,
+        DataSchema.ColumnDataType.FLOAT_ARRAY, DataSchema.ColumnDataType.DOUBLE_ARRAY,
+        DataSchema.ColumnDataType.STRING_ARRAY
+    };
+    DataSchema dataSchema = new DataSchema(columnNames, columnDataTypes);
+    List<Object[]> rows = DataBlockTestUtils.getRandomRows(dataSchema, NUM_ROWS, 0);
+
+    // set up logging and configs
+    LogManager.getLogger(PerQueryCPUMemResourceUsageAccountant.class).setLevel(Level.DEBUG);
+    LogManager.getLogger(ThreadResourceUsageProvider.class).setLevel(Level.DEBUG);
+    ThreadResourceUsageProvider.setThreadMemoryMeasurementEnabled(true);
+    HashMap<String, Object> configs = new HashMap<>();
+    ServerMetrics.register(Mockito.mock(ServerMetrics.class));
+    configs.put(CommonConstants.Accounting.CONFIG_OF_ALARMING_LEVEL_HEAP_USAGE_RATIO, 0.00f);
+    configs.put(CommonConstants.Accounting.CONFIG_OF_CRITICAL_LEVEL_HEAP_USAGE_RATIO, 0.00f);
+    configs.put(CommonConstants.Accounting.CONFIG_OF_FACTORY_NAME,
+        "org.apache.pinot.core.accounting.PerQueryCPUMemAccountantFactory");
+    configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_MEMORY_SAMPLING, true);
+    configs.put(CommonConstants.Accounting.CONFIG_OF_ENABLE_THREAD_CPU_SAMPLING, false);
+    configs.put(CommonConstants.Accounting.CONFIG_OF_OOM_PROTECTION_KILLING_QUERY, true);
+    PinotConfiguration config = getConfig(1, 2, configs);
+    ResourceManager rm = getResourceManager(1, 2, 1, 1, configs);
+    // init accountant and start watcher task
+    Tracing.ThreadAccountantOps.initializeThreadAccountant(config, "testSelect");
+
+    ListenableFuture<?> future = rm.getQueryRunners().submit(() -> {
+      Tracing.ThreadAccountantOps.setupRunner("testSelectQueryId");
+      try {
+        int numIterations = 0;
+        List<DataTable> dataTables = new ArrayList<>();
+        while (numIterations < 100) {
+          // build data table with sampling
+          dataTables.add(SelectionOperatorUtils.getDataTableFromRows(rows, dataSchema, false));

Review Comment:
   Could we add DataTable.toBytes() as well here?  Meanwhile, the testing would probably fit better to the real use case if we call  `SelectionOperatorUtils.getDataTableFromRows(rows, dataSchema, false)` only once in each thread and submit multiple runner threads. Similar for the groupby case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add instrumentation for DataTable Creation [pinot]

Posted by "SabrinaZhaozyf (via GitHub)" <gi...@apache.org>.
SabrinaZhaozyf commented on code in PR #11942:
URL: https://github.com/apache/pinot/pull/11942#discussion_r1385691506


##########
pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java:
##########
@@ -215,6 +219,14 @@ private byte[] serializeResponse(ServerQueryRequest queryRequest, InstanceRespon
     byte[] responseByte = null;
     try {
       responseByte = instanceResponse.toDataTable().toBytes();

Review Comment:
   > Could we verify whether there's memory allocation in toBytes()? Do we need to instrument that part as well?
   
   Good point. There's memory allocation in toBytes (mostly from `writeLeadingSections`) and we have seen OOM here as well. Added instrumentation to [serializeStringDictionary](https://github.com/apache/pinot/blob/9092244e0be9f27158320c987833bdb3b6179bdd/pinot-common/src/main/java/org/apache/pinot/common/datatable/DataTableImplV4.java#L332) in toBytes.
   
   > IMO it might be helpful to very briefly explain why we only added instrumentation only for groupby and selection operation utils in the PR description
   
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org