You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/07/12 09:37:06 UTC

[GitHub] [incubator-doris] morningman opened a new issue #4083: [Bitmap][Perf] bitmap union performance is bad

morningman opened a new issue #4083:
URL: https://github.com/apache/incubator-doris/issues/4083


   We found that for the same set of integer data. After loading using `bitmap_hash()`, the performance of query using `bitmap_union_count is obviously` bad. So I did the following test:
   
   1. Generate 1 million random uint32 numbers.
   2. Sort the 1 million numbers to generate 1 million BitmapValue, and then perform the OR operation.
   2. Randomly scramble these 1 million numbers, then generate 1 million BitmapValue, and then perform OR operation.
   
   The final result shows that the sequential execution time is 1.8 seconds, while the random execution time is 4.4 seconds.
   
   Test case as follow:
   ```
   TEST(BitmapValueTest, perf) {
   	// 1. generate 1000000 random uint32
       std::vector<uint32_t> numbers;
       Random rand(0);
       for (int i = 100000000; i < 101000000; i++) {
           numbers.push_back(rand.Next());
       }
   
       // 2. sort the numbers
       std::sort(numbers.begin(), numbers.end());
       std::vector<BitmapValue> v1;
       for (int i = 0; i < numbers.size(); i++) {
           v1.emplace_back(numbers[i]);
       }
   
       // 3. test bitmap or operation with sorted integers
       MonotonicStopWatch w;
       w.start();
       BitmapValue b = v1[0];
       for (int i = 1; i < v1.size(); i++) {
           b &= v1[i];
       }
       std::cout << "get seq bitmap or cost: " << w.elapsed_time() << std::endl;
   
       // 4. shuffle the numbers
    	std::random_shuffle (numbers.begin(), numbers.end());
       std::sort(numbers.begin(), numbers.end());
       std::vector<BitmapValue> v2;
       for (int i = 0; i < numbers.size(); i++) {
           v2.emplace_back(numbers[i]);
       }
   
       // 3. test bitmap or operation with random shuffled integers
       MonotonicStopWatch w2;
       w2.start();
       BitmapValue b2 = v2[0];
       for (int i = 1; i < v2.size(); i++) {
           b2 &= v2[i];
       }
       std::cout << "get random bitmap or cost: " << w2.elapsed_time();
   }
   ```
   
   From the user level, in some scenarios, the efficiency of bitmap may be significantly lower than `ndv` or even directly `count dinstinct`.
   I don't know the specific reason yet and need further investigation.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org