You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2020/05/15 10:34:00 UTC
[jira] [Commented] (ARROW-8553) [C++] Optimize unaligned bitmap operations

    [ https://issues.apache.org/jira/browse/ARROW-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108159#comment-17108159 ] 

Yibo Cai commented on ARROW-8553:
---------------------------------

[~wesm], the aligned case is simple enough for compiler to auto vectorize the code.

Did a quick test with below patch, no obvious performance diff found.
{code:c}
diff --git a/cpp/src/arrow/util/bit_util.cc b/cpp/src/arrow/util/bit_util.cc
index 395801f5e..8beaf6cb8 100644
--- a/cpp/src/arrow/util/bit_util.cc
+++ b/cpp/src/arrow/util/bit_util.cc
@@ -261,7 +261,7 @@ template <template <typename> class BitOp>
 void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right,
                      int64_t right_offset, uint8_t* out, int64_t out_offset,
                      int64_t length) {
-  BitOp<uint8_t> op;
+  BitOp<uint64_t> op;
   DCHECK_EQ(left_offset % 8, right_offset % 8);
   DCHECK_EQ(left_offset % 8, out_offset % 8);
 
@@ -269,8 +269,11 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri
   left += left_offset / 8;
   right += right_offset / 8;
   out += out_offset / 8;
-  for (int64_t i = 0; i < nbytes; ++i) {
-    out[i] = op(left[i], right[i]);
+  uint64_t *out64 = (uint64_t*)out;
+  uint64_t *left64 = (uint64_t*)left;
+  uint64_t *right64 = (uint64_t*)right;
+  for (int64_t i = 0; i < nbytes/8; ++i) {
+    out64[i] = op(left64[i], right64[i]);
   }
 }
{code}
Benchmark before this patch (in uint8)
{code:c}
BenchmarkBitmapAnd/32768/0                   4253 ns         4251 ns       164715 bytes_per_second=7.17813G/s
BenchmarkBitmapAnd/131072/0                 16767 ns        16760 ns        41875 bytes_per_second=7.28348G/s
BenchmarkBitmapAnd/32768/0                   4264 ns         4262 ns       165145 bytes_per_second=7.15959G/s
BenchmarkBitmapAnd/131072/0                 16702 ns        16695 ns        41952 bytes_per_second=7.31158G/s
{code}
Benchmark after this patch (in uint64)
{code:c}
BenchmarkBitmapAnd/32768/0                   4133 ns         4131 ns       171808 bytes_per_second=7.38787G/s
BenchmarkBitmapAnd/131072/0                 17167 ns        17157 ns        40529 bytes_per_second=7.11491G/s
BenchmarkBitmapAnd/32768/0                   4103 ns         4101 ns       171883 bytes_per_second=7.44151G/s
BenchmarkBitmapAnd/131072/0                 17351 ns        17343 ns        43299 bytes_per_second=7.0385G/s
{code}

> [C++] Optimize unaligned bitmap operations
> ------------------------------------------
>
>                 Key: ARROW-8553
>                 URL: https://issues.apache.org/jira/browse/ARROW-8553
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 0.17.0
>            Reporter: Antoine Pitrou
>            Assignee: Yibo Cai
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently, {{BitmapAnd}} uses a bit-by-bit loop for unaligned inputs. Using {{Bitmap::VisitWords}} instead would probably yield a manyfold performance increase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)