You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "Gabriel39 (via GitHub)" <gi...@apache.org> on 2023/04/18 10:36:54 UTC

[GitHub] [doris] Gabriel39 opened a new pull request, #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Gabriel39 opened a new pull request, #18785:
URL: https://github.com/apache/doris/pull/18785

   # Proposed changes
   
   TPCH Q3: 1.2s -> 0.8s
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 merged pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "Gabriel39 (via GitHub)" <gi...@apache.org>.
Gabriel39 merged PR #18785:
URL: https://github.com/apache/doris/pull/18785


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18785:
URL: https://github.com/apache/doris/pull/18785#issuecomment-1515606703

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on code in PR #18785:
URL: https://github.com/apache/doris/pull/18785#discussion_r1171363864


##########
be/src/exprs/bloom_filter_func.h:
##########
@@ -91,7 +91,31 @@ class BloomFilterFuncBase {
 
     void set_length(int64_t bloom_filter_length) { _bloom_filter_length = bloom_filter_length; }
 
-    Status init_with_fixed_length() { return init_with_fixed_length(_bloom_filter_length); }
+    void set_build_bf_exactly(bool build_bf_exactly) { _build_bf_exactly = build_bf_exactly; }
+
+    Status init_with_fixed_length() {
+        if (_build_bf_exactly) {
+            return Status::OK();
+        } else {
+            return init_with_fixed_length(_bloom_filter_length);
+        }
+    }
+
+    Status init_with_cardinality(const size_t build_bf_cardinality) {
+        if (_build_bf_exactly) {
+            // Use the same algorithm as org.apache.doris.planner.RuntimeFilter#calculateFilterSize
+            constexpr double fpp = 0.05;
+            constexpr double k = 8; // BUCKET_WORDS
+            // m is the number of bits we would need to get the fpp specified
+            double m = -k * build_bf_cardinality / std::log(1 - std::pow(fpp, 1.0 / k));
+
+            // Handle case where ndv == 1 => ceil(log2(m/8)) < 0.
+            int log_filter_size = std::max(0, (int)(std::ceil(std::log(m / 8) / std::log(2))));
+            return init_with_fixed_length(((int64_t)1) << log_filter_size);

Review Comment:
   the call seems not effect ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18785:
URL: https://github.com/apache/doris/pull/18785#issuecomment-1514349530

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 commented on pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "Gabriel39 (via GitHub)" <gi...@apache.org>.
Gabriel39 commented on PR #18785:
URL: https://github.com/apache/doris/pull/18785#issuecomment-1514030273

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 commented on pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "Gabriel39 (via GitHub)" <gi...@apache.org>.
Gabriel39 commented on PR #18785:
URL: https://github.com/apache/doris/pull/18785#issuecomment-1514331951

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18785:
URL: https://github.com/apache/doris/pull/18785#issuecomment-1512855360

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18785:
URL: https://github.com/apache/doris/pull/18785#issuecomment-1512857761

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 commented on pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "Gabriel39 (via GitHub)" <gi...@apache.org>.
Gabriel39 commented on PR #18785:
URL: https://github.com/apache/doris/pull/18785#issuecomment-1513091962

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "hello-stephen (via GitHub)" <gi...@apache.org>.
hello-stephen commented on PR #18785:
URL: https://github.com/apache/doris/pull/18785#issuecomment-1513172274

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 35.37 seconds
    stream load tsv:          432 seconds loaded 74807831229 Bytes, about 165 MB/s
    stream load json:         23 seconds loaded 2358488459 Bytes, about 97 MB/s
    stream load orc:          60 seconds loaded 1101869774 Bytes, about 17 MB/s
    stream load parquet:          31 seconds loaded 861443392 Bytes, about 26 MB/s
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230418133519_clickbench_pr_130685.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18785: [Improvement](bloom filter) initialize bloom filter with adaptive size

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18785:
URL: https://github.com/apache/doris/pull/18785#issuecomment-1515606681

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org