You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2020/05/15 10:34:00 UTC
[jira] [Commented] (ARROW-8553) [C++] Optimize unaligned bitmap
operations
[ https://issues.apache.org/jira/browse/ARROW-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108159#comment-17108159 ]
Yibo Cai commented on ARROW-8553:
---------------------------------
[~wesm], the aligned case is simple enough for compiler to auto vectorize the code.
Did a quick test with below patch, no obvious performance diff found.
{code:c}
diff --git a/cpp/src/arrow/util/bit_util.cc b/cpp/src/arrow/util/bit_util.cc
index 395801f5e..8beaf6cb8 100644
--- a/cpp/src/arrow/util/bit_util.cc
+++ b/cpp/src/arrow/util/bit_util.cc
@@ -261,7 +261,7 @@ template <template <typename> class BitOp>
void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right,
int64_t right_offset, uint8_t* out, int64_t out_offset,
int64_t length) {
- BitOp<uint8_t> op;
+ BitOp<uint64_t> op;
DCHECK_EQ(left_offset % 8, right_offset % 8);
DCHECK_EQ(left_offset % 8, out_offset % 8);
@@ -269,8 +269,11 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri
left += left_offset / 8;
right += right_offset / 8;
out += out_offset / 8;
- for (int64_t i = 0; i < nbytes; ++i) {
- out[i] = op(left[i], right[i]);
+ uint64_t *out64 = (uint64_t*)out;
+ uint64_t *left64 = (uint64_t*)left;
+ uint64_t *right64 = (uint64_t*)right;
+ for (int64_t i = 0; i < nbytes/8; ++i) {
+ out64[i] = op(left64[i], right64[i]);
}
}
{code}
Benchmark before this patch (in uint8)
{code:c}
BenchmarkBitmapAnd/32768/0 4253 ns 4251 ns 164715 bytes_per_second=7.17813G/s
BenchmarkBitmapAnd/131072/0 16767 ns 16760 ns 41875 bytes_per_second=7.28348G/s
BenchmarkBitmapAnd/32768/0 4264 ns 4262 ns 165145 bytes_per_second=7.15959G/s
BenchmarkBitmapAnd/131072/0 16702 ns 16695 ns 41952 bytes_per_second=7.31158G/s
{code}
Benchmark after this patch (in uint64)
{code:c}
BenchmarkBitmapAnd/32768/0 4133 ns 4131 ns 171808 bytes_per_second=7.38787G/s
BenchmarkBitmapAnd/131072/0 17167 ns 17157 ns 40529 bytes_per_second=7.11491G/s
BenchmarkBitmapAnd/32768/0 4103 ns 4101 ns 171883 bytes_per_second=7.44151G/s
BenchmarkBitmapAnd/131072/0 17351 ns 17343 ns 43299 bytes_per_second=7.0385G/s
{code}
> [C++] Optimize unaligned bitmap operations
> ------------------------------------------
>
> Key: ARROW-8553
> URL: https://issues.apache.org/jira/browse/ARROW-8553
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 0.17.0
> Reporter: Antoine Pitrou
> Assignee: Yibo Cai
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Time Spent: 4h 40m
> Remaining Estimate: 0h
>
> Currently, {{BitmapAnd}} uses a bit-by-bit loop for unaligned inputs. Using {{Bitmap::VisitWords}} instead would probably yield a manyfold performance increase.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)