You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "Kikyou1997 (via GitHub)" <gi...@apache.org> on 2023/04/28 08:46:18 UTC

[GitHub] [doris] Kikyou1997 opened a new pull request, #19205: [enhancement](stats) Make stats cache item size configurable

Kikyou1997 opened a new pull request, #19205:
URL: https://github.com/apache/doris/pull/19205

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on PR #19205:
URL: https://github.com/apache/doris/pull/19205#issuecomment-1541946172

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow merged pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "morrySnow (via GitHub)" <gi...@apache.org>.
morrySnow merged PR #19205:
URL: https://github.com/apache/doris/pull/19205


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on PR #19205:
URL: https://github.com/apache/doris/pull/19205#issuecomment-1527215668

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #19205:
URL: https://github.com/apache/doris/pull/19205#issuecomment-1543290217

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on PR #19205:
URL: https://github.com/apache/doris/pull/19205#issuecomment-1535626137

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #19205:
URL: https://github.com/apache/doris/pull/19205#discussion_r1186649242


##########
fe/fe-common/src/main/java/org/apache/doris/common/Config.java:
##########
@@ -1919,5 +1919,20 @@ public class Config extends ConfigBase {
 
     @ConfField(mutable = true)
     public static boolean disable_datev1  = true;
+
+    /**
+     * The actual memory size taken by stats cache highly depends on characteristics of data, since on the different
+     * dataset and scenarios the max/min literal's average size and buckets count of histogram would be highly
+     * different. Besides, JVM version etc. also has influence on it, though not much as data itself.
+     * Here I would give the mem size taken by stats cache with 10_0000 items.Each item's avg length of max/min literal
+     * is 32, and the avg column name length is 16, and each column has a histogram with 128 buckets
+     * In this case, stats cache takes total 911.954833984MiB mem.
+     * If without histogram, stats cache takes total 61.2777404785MiB mem.
+     * It's strongly discourage analyzing a column with a very large STRING value in the column, since it would cause
+     * FE OOM.
+     */
+    @ConfField
+    public static long stats_cache_size = 10_0000;

Review Comment:
   Once the cache size is determined it can't be changed during the FE process lifetime



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on PR #19205:
URL: https://github.com/apache/doris/pull/19205#issuecomment-1536014483

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #19205:
URL: https://github.com/apache/doris/pull/19205#issuecomment-1543290200

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "hello-stephen (via GitHub)" <gi...@apache.org>.
hello-stephen commented on PR #19205:
URL: https://github.com/apache/doris/pull/19205#issuecomment-1535671130

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 34.18 seconds
    stream load tsv:          426 seconds loaded 74807831229 Bytes, about 167 MB/s
    stream load json:         21 seconds loaded 2358488459 Bytes, about 107 MB/s
    stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
    stream load parquet:          30 seconds loaded 861443392 Bytes, about 27 MB/s
    insert into select:          6.6 seconds inserted 5,000,000 Rows, about 757K ops/s
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230505040856_clickbench_pr_138406.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "morrySnow (via GitHub)" <gi...@apache.org>.
morrySnow commented on code in PR #19205:
URL: https://github.com/apache/doris/pull/19205#discussion_r1186606926


##########
fe/fe-common/src/main/java/org/apache/doris/common/Config.java:
##########
@@ -1919,5 +1919,20 @@ public class Config extends ConfigBase {
 
     @ConfField(mutable = true)
     public static boolean disable_datev1  = true;
+
+    /**
+     * The actual memory size taken by stats cache highly depends on characteristics of data, since on the different
+     * dataset and scenarios the max/min literal's average size and buckets count of histogram would be highly
+     * different. Besides, JVM version etc. also has influence on it, though not much as data itself.
+     * Here I would give the mem size taken by stats cache with 10_0000 items.Each item's avg length of max/min literal
+     * is 32, and the avg column name length is 16, and each column has a histogram with 128 buckets
+     * In this case, stats cache takes total 911.954833984MiB mem.
+     * If without histogram, stats cache takes total 61.2777404785MiB mem.
+     * It's strongly discourage analyzing a column with a very large STRING value in the column, since it would cause
+     * FE OOM.
+     */
+    @ConfField
+    public static long stats_cache_size = 10_0000;

Review Comment:
   why put it in Config and not in Session variable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on pull request #19205: [enhancement](stats) Make stats cache item size configurable

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on PR #19205:
URL: https://github.com/apache/doris/pull/19205#issuecomment-1538213299

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org