You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "gianm (via GitHub)" <gi...@apache.org> on 2023/03/15 19:16:15 UTC

[GitHub] [druid] gianm opened a new pull request, #13939: Lower default maxRowsInMemory for realtime ingestion to 100k.

gianm opened a new pull request, #13939:
URL: https://github.com/apache/druid/pull/13939

   The thinking here is that for best ingestion throughput, we want intermediate persists to be as big as possible without using up all available memory. So, we rely mainly on maxBytesInMemory. The default maxRowsInMemory (1 million) is really just a safety: in case we have a large number of very small rows, we don't want to get overwhelmed by per-row overheads.
   
   However, maximum ingestion throughput isn't necessarily the primary goal for realtime ingestion. Query performance is also important. And because query performance is not as good on the in-memory dataset, it's helpful to keep it from growing too large. 100k seems like a reasonable balance here. It means that for a typical 5 million row segment, we won't trigger more than 50 persists due to this limit, which is a reasonable number of persists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13939: Lower default maxRowsInMemory for realtime ingestion.

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13939:
URL: https://github.com/apache/druid/pull/13939#discussion_r1138934876


##########
server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java:
##########
@@ -129,7 +129,7 @@ public RealtimeTuningConfig(
   )
   {
     this.appendableIndexSpec = appendableIndexSpec == null ? DEFAULT_APPENDABLE_INDEX : appendableIndexSpec;
-    this.maxRowsInMemory = maxRowsInMemory == null ? DEFAULT_MAX_ROWS_IN_MEMORY : maxRowsInMemory;
+    this.maxRowsInMemory = maxRowsInMemory == null ? DEFAULT_MAX_ROWS_IN_MEMORY_BATCH : maxRowsInMemory;

Review Comment:
   Thanks, that was a mistake!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #13939: Lower default maxRowsInMemory for realtime ingestion.

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on PR #13939:
URL: https://github.com/apache/druid/pull/13939#issuecomment-1546386194

   Yes, correct, the intent of this change is to adjust the balance of the defaults towards caring more about query perf.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] xvrl commented on pull request #13939: Lower default maxRowsInMemory for realtime ingestion.

Posted by "xvrl (via GitHub)" <gi...@apache.org>.
xvrl commented on PR #13939:
URL: https://github.com/apache/druid/pull/13939#issuecomment-1546333085

   @gianm in the draft release notes we indicate this change improves throughput, but from reading your reasoning it sounds like this actually improves query performance at the expense of throughput, am I reading this right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a diff in pull request #13939: Lower default maxRowsInMemory for realtime ingestion.

Posted by "kfaraz (via GitHub)" <gi...@apache.org>.
kfaraz commented on code in PR #13939:
URL: https://github.com/apache/druid/pull/13939#discussion_r1138551175


##########
server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java:
##########
@@ -129,7 +129,7 @@ public RealtimeTuningConfig(
   )
   {
     this.appendableIndexSpec = appendableIndexSpec == null ? DEFAULT_APPENDABLE_INDEX : appendableIndexSpec;
-    this.maxRowsInMemory = maxRowsInMemory == null ? DEFAULT_MAX_ROWS_IN_MEMORY : maxRowsInMemory;
+    this.maxRowsInMemory = maxRowsInMemory == null ? DEFAULT_MAX_ROWS_IN_MEMORY_BATCH : maxRowsInMemory;

Review Comment:
   Probably got missed.
   ```suggestion
       this.maxRowsInMemory = maxRowsInMemory == null ? DEFAULT_MAX_ROWS_IN_MEMORY_REALTIME : maxRowsInMemory;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #13939: Lower default maxRowsInMemory for realtime ingestion.

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on PR #13939:
URL: https://github.com/apache/druid/pull/13939#issuecomment-1546366288

   > @gianm in the draft release notes we indicate this change improves throughput, but from reading your reasoning it sounds like this actually improves query performance at the expense of throughput, am I reading this right? cc @clintropolis
   
   yeah, this is optimizing for query perf by flushing to disk more frequently, fixed the release notes to say instead
   
   >Optimized query performance by lowering the default maxRowsInMemory for real-time ingestion, which might lower overall ingestion throughput https://github.com/apache/druid/pull/13939
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm merged pull request #13939: Lower default maxRowsInMemory for realtime ingestion.

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm merged PR #13939:
URL: https://github.com/apache/druid/pull/13939


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org