You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/09 08:58:32 UTC
[GitHub] [arrow] cyb70289 opened a new pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
cyb70289 opened a new pull request #7135:
URL: https://github.com/apache/arrow/pull/7135
Unaligned bitmap operations are currently calculated bit-by-bit.
Processing it word-by-word in uint64 improves performance significantly.
BenchmarkBitmapAnd/32768/1 jumps from 56M/s to 4.1G/s on my test machine.
This patch also improves unit test to cover new code thoroughly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kiszk commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#discussion_r422574885
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri
}
}
-template <typename Op>
+template <template <typename> class BitOp, typename LogicalOp>
void UnalignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right,
int64_t right_offset, uint8_t* out, int64_t out_offset,
int64_t length) {
- Op op;
- auto left_reader = internal::BitmapReader(left, left_offset, length);
- auto right_reader = internal::BitmapReader(right, right_offset, length);
- auto writer = internal::BitmapWriter(out, out_offset, length);
- for (int64_t i = 0; i < length; ++i) {
- if (op(left_reader.IsSet(), right_reader.IsSet())) {
- writer.Set();
- } else {
- writer.Clear();
+ using Word = uint64_t;
+
+ left += left_offset / 8;
+ right += right_offset / 8;
+ out += out_offset / 8;
+
+ left_offset %= 8;
+ right_offset %= 8;
+ out_offset %= 8;
+
+ const int64_t min_offset = std::min({left_offset, right_offset, out_offset});
+ const int64_t min_nbytes = BitUtil::BytesForBits(length + min_offset);
+ int64_t nwords = min_nbytes / sizeof(Word);
+
+ // process in words, we may touch two words in each iteration
+ if (nwords > 1) {
+ BitOp<Word> op;
+ constexpr int64_t bits_per_word = sizeof(Word) * 8;
+ const Word out_mask = (1U << out_offset) - 1;
+
+ length -= (nwords - 1) * bits_per_word;
+ Word left_word0 = util::SafeLoadAs<Word>(left);
+ Word right_word0 = util::SafeLoadAs<Word>(right);
+ Word out_word0 = util::SafeLoadAs<Word>(out);
+
+ do {
Review comment:
[Tests](https://travis-ci.org/github/apache/arrow/builds/684989945) have just started on TravisCI. Unfortunately, we are seeing a lot of error messages that are not related to this PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#discussion_r422476036
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri
}
}
-template <typename Op>
+template <template <typename> class BitOp, typename LogicalOp>
void UnalignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right,
int64_t right_offset, uint8_t* out, int64_t out_offset,
int64_t length) {
- Op op;
- auto left_reader = internal::BitmapReader(left, left_offset, length);
- auto right_reader = internal::BitmapReader(right, right_offset, length);
- auto writer = internal::BitmapWriter(out, out_offset, length);
- for (int64_t i = 0; i < length; ++i) {
- if (op(left_reader.IsSet(), right_reader.IsSet())) {
- writer.Set();
- } else {
- writer.Clear();
+ using Word = uint64_t;
+
+ left += left_offset / 8;
+ right += right_offset / 8;
+ out += out_offset / 8;
+
+ left_offset %= 8;
+ right_offset %= 8;
+ out_offset %= 8;
+
+ const int64_t min_offset = std::min({left_offset, right_offset, out_offset});
+ const int64_t min_nbytes = BitUtil::BytesForBits(length + min_offset);
+ int64_t nwords = min_nbytes / sizeof(Word);
+
+ // process in words, we may touch two words in each iteration
Review comment:
I'm not using Bitmap::VisitWords() as the first word returned may be not full (e.g, only 63 bits are valid), and there's no easy way to get that valid bit count in visitor callback. It also causes special handling of first (and last) word.
So I'm handling the raw data here. In each iteration, pick 64 bits from `left` and `right`, and/or/xor them, then write to `out`. Remaining bits are treated bit by bit.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628976390
@ursabot benchmark --suite-filter=arrow-bit-util-benchmark --benchmark-filter=BenchmarkBitmapAnd
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628945753
Cool. We really ought to show the times in more readable units (microseconds)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
kou commented on a change in pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#discussion_r422718116
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri
}
}
-template <typename Op>
+template <template <typename> class BitOp, typename LogicalOp>
void UnalignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right,
int64_t right_offset, uint8_t* out, int64_t out_offset,
int64_t length) {
- Op op;
- auto left_reader = internal::BitmapReader(left, left_offset, length);
- auto right_reader = internal::BitmapReader(right, right_offset, length);
- auto writer = internal::BitmapWriter(out, out_offset, length);
- for (int64_t i = 0; i < length; ++i) {
- if (op(left_reader.IsSet(), right_reader.IsSet())) {
- writer.Set();
- } else {
- writer.Clear();
+ using Word = uint64_t;
+
+ left += left_offset / 8;
+ right += right_offset / 8;
+ out += out_offset / 8;
+
+ left_offset %= 8;
+ right_offset %= 8;
+ out_offset %= 8;
+
+ const int64_t min_offset = std::min({left_offset, right_offset, out_offset});
+ const int64_t min_nbytes = BitUtil::BytesForBits(length + min_offset);
+ int64_t nwords = min_nbytes / sizeof(Word);
+
+ // process in words, we may touch two words in each iteration
+ if (nwords > 1) {
+ BitOp<Word> op;
+ constexpr int64_t bits_per_word = sizeof(Word) * 8;
+ const Word out_mask = (1U << out_offset) - 1;
+
+ length -= (nwords - 1) * bits_per_word;
+ Word left_word0 = util::SafeLoadAs<Word>(left);
+ Word right_word0 = util::SafeLoadAs<Word>(right);
+ Word out_word0 = util::SafeLoadAs<Word>(out);
+
+ do {
Review comment:
FYI: You can test big-endian case on local by `ARCH=s390x UBUNTU=20.04 docker-compose run --rm ubuntu-cpp` and qemu-user-static that can be installed by `apt install -y qemu-user-static` on Debian family.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628922958
@ursabot --help
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628930884
[AMD64 Ubuntu 18.04 C++ Benchmark (#105789)](https://ci.ursalabs.org/#builders/73/builds/63) builder has been succeeded.
Revision: 4d43fbee975c260be0c3a572a786f5fc8c086951
```diff
====================================== =========== =========== ============
benchmark baseline contender change
====================================== =========== =========== ============
GenerateBits/8192 9.26e+07 9.26362e+07 0.000391577
BenchmarkBitmapAnd/32768/1 3.41575e+09 3.4195e+09 0.00109956
BenchmarkBitmapVisitBitsetAnd/131072/0 1.83472e+07 1.83361e+07 -0.000607876
BenchmarkBitmapAnd/32768/0 7.66145e+09 7.65814e+09 -0.000432931
BenchmarkBitmapVisitBitsetAnd/131072/1 1.8333e+07 1.83381e+07 0.000275159
BenchmarkBitmapVisitUInt8And/131072/1 3.25753e+08 3.25672e+08 -0.000250421
VisitBitsUnrolled/8192 2.986e+08 2.98714e+08 0.000381681
BenchmarkBitmapVisitUInt64And/131072/1 2.83167e+09 2.82881e+09 -0.00101033
BenchmarkBitmapVisitUInt64And/32768/0 3.08718e+09 3.08527e+09 -0.000620327
BenchmarkBitmapVisitUInt8And/32768/2 3.23412e+08 3.2332e+08 -0.000282581
BitmapWriter/8192 8.97172e+07 8.97776e+07 0.00067319
BenchmarkBitmapVisitBitsetAnd/32768/0 1.82468e+07 1.82821e+07 0.00193189
BenchmarkBitmapAnd/32768/2 3.41822e+09 3.42276e+09 0.00132961
FirstTimeBitmapWriter/8192 1.02886e+08 1.0289e+08 3.80922e-05
BitmapReader/8192 1.03019e+08 1.03185e+08 0.00160311
CopyBitmapWithOffset/8192 5.95599e+08 5.95919e+08 0.000537691
BenchmarkBitmapVisitUInt64And/32768/2 2.49914e+09 2.49059e+09 -0.00342357
CopyBitmapWithoutOffset/8192 6.66709e+10 6.67222e+10 0.000769451
BenchmarkBitmapAnd/131072/2 3.50395e+09 3.48935e+09 -0.00416637
GenerateBitsUnrolled/8192 1.41166e+08 1.41621e+08 0.00322268
BenchmarkBitmapVisitUInt64And/131072/2 2.83148e+09 2.82898e+09 -0.000885996
BenchmarkBitmapVisitBitsetAnd/131072/2 1.83322e+07 1.83431e+07 0.000594924
BenchmarkBitmapVisitUInt8And/131072/0 3.34548e+08 3.36353e+08 0.00539583
BenchmarkBitmapAnd/131072/0 6.08446e+09 6.01076e+09 -0.0121134
BenchmarkBitmapVisitUInt64And/131072/0 3.0159e+09 3.01874e+09 0.000943126
VisitBits/8192 1.09519e+08 1.09382e+08 -0.00125118
BenchmarkBitmapVisitBitsetAnd/32768/2 1.82332e+07 1.8284e+07 0.00278681
BenchmarkBitmapVisitUInt8And/131072/2 3.25545e+08 3.25675e+08 0.000399539
BenchmarkBitmapVisitUInt64And/32768/1 2.49504e+09 2.49043e+09 -0.00184698
BenchmarkBitmapAnd/131072/1 3.49501e+09 3.49571e+09 0.000200727
BenchmarkBitmapVisitUInt8And/32768/0 3.3382e+08 3.34082e+08 0.000783797
BenchmarkBitmapVisitBitsetAnd/32768/1 1.82385e+07 1.8253e+07 0.00079466
BenchmarkBitmapVisitUInt8And/32768/1 3.23367e+08 3.23238e+08 -0.000397818
====================================== =========== =========== ============
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-626133990
https://issues.apache.org/jira/browse/ARROW-8553
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
kou commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-626408103
The GLib on Windows failure isn't Apache Arrow related problem. It's a glib2 gem and the latest GLib on MSYS2 problem. I'll fix it in a few days.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628970238
> I'm surprised by the results of buildbot, it shows almost no improvement. On my desktop with clang-10 and gcc-9, the old version is roughly 50m/sec and the new version is around 5gb/sec. Buildbot is roughly 4gb/sec for both versions.
Looks very strange. Just blind guess. Will both versions actually running same code?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628875060
@fsaintjacques @pitrou any more review needed on this or can it be merged?
Should the aligned case be optimized to use word-wise operations? If so we should open a JIRA issue about that
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kiszk commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#discussion_r422764992
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri
}
}
-template <typename Op>
+template <template <typename> class BitOp, typename LogicalOp>
void UnalignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right,
int64_t right_offset, uint8_t* out, int64_t out_offset,
int64_t length) {
- Op op;
- auto left_reader = internal::BitmapReader(left, left_offset, length);
- auto right_reader = internal::BitmapReader(right, right_offset, length);
- auto writer = internal::BitmapWriter(out, out_offset, length);
- for (int64_t i = 0; i < length; ++i) {
- if (op(left_reader.IsSet(), right_reader.IsSet())) {
- writer.Set();
- } else {
- writer.Clear();
+ using Word = uint64_t;
+
+ left += left_offset / 8;
+ right += right_offset / 8;
+ out += out_offset / 8;
+
+ left_offset %= 8;
+ right_offset %= 8;
+ out_offset %= 8;
+
+ const int64_t min_offset = std::min({left_offset, right_offset, out_offset});
+ const int64_t min_nbytes = BitUtil::BytesForBits(length + min_offset);
+ int64_t nwords = min_nbytes / sizeof(Word);
+
+ // process in words, we may touch two words in each iteration
+ if (nwords > 1) {
+ BitOp<Word> op;
+ constexpr int64_t bits_per_word = sizeof(Word) * 8;
+ const Word out_mask = (1U << out_offset) - 1;
+
+ length -= (nwords - 1) * bits_per_word;
+ Word left_word0 = util::SafeLoadAs<Word>(left);
+ Word right_word0 = util::SafeLoadAs<Word>(right);
+ Word out_word0 = util::SafeLoadAs<Word>(out);
+
+ do {
Review comment:
Thank you very much for your fix.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-626133175
Benchmark result of `release/arrow-bit-util-benchmark`, on AMD EPYC 7251, gcc-7.5
Before this patch
```
-------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------
BitmapReader/8192 131771 ns 131691 ns 5350 bytes_per_second=118.649M/s
VisitBits/8192 126231 ns 126187 ns 5545 bytes_per_second=123.825M/s
VisitBitsUnrolled/8192 46107 ns 46085 ns 15374 bytes_per_second=339.045M/s
BitmapWriter/8192 89468 ns 89424 ns 7154 bytes_per_second=87.365M/s
FirstTimeBitmapWriter/8192 78184 ns 78144 ns 8994 bytes_per_second=99.9755M/s
GenerateBits/8192 84917 ns 84880 ns 8358 bytes_per_second=92.0416M/s
GenerateBitsUnrolled/8192 42952 ns 42933 ns 16229 bytes_per_second=181.971M/s
CopyBitmapWithoutOffset/8192 191 ns 191 ns 3612121 bytes_per_second=39.9542G/s
CopyBitmapWithOffset/8192 8654 ns 8651 ns 81364 bytes_per_second=903.041M/s
BenchmarkBitmapAnd/32768/0 4353 ns 4350 ns 161617 bytes_per_second=7.01486G/s
BenchmarkBitmapAnd/131072/0 16896 ns 16887 ns 41278 bytes_per_second=7.22851G/s
BenchmarkBitmapAnd/32768/0 4268 ns 4267 ns 162600 bytes_per_second=7.1523G/s
BenchmarkBitmapAnd/131072/0 17009 ns 16998 ns 41089 bytes_per_second=7.18163G/s
[XXXXXXXXX: unaligned bitmap operation]
BenchmarkBitmapAnd/32768/1 552753 ns 552530 ns 1271 bytes_per_second=56.558M/s
BenchmarkBitmapAnd/131072/1 2230810 ns 2229574 ns 316 bytes_per_second=56.0645M/s
BenchmarkBitmapAnd/32768/2 565404 ns 565094 ns 1266 bytes_per_second=55.3005M/s
BenchmarkBitmapAnd/131072/2 2215504 ns 2214782 ns 303 bytes_per_second=56.439M/s
BenchmarkBitmapVisitBitsetAnd/32768/0 1493021 ns 1492271 ns 469 bytes_per_second=20.9412M/s
BenchmarkBitmapVisitBitsetAnd/131072/0 5974576 ns 5972560 ns 117 bytes_per_second=20.929M/s
BenchmarkBitmapVisitBitsetAnd/32768/0 1492997 ns 1492431 ns 469 bytes_per_second=20.939M/s
BenchmarkBitmapVisitBitsetAnd/131072/0 5977490 ns 5975855 ns 117 bytes_per_second=20.9175M/s
BenchmarkBitmapVisitBitsetAnd/32768/1 1493233 ns 1492718 ns 469 bytes_per_second=20.935M/s
BenchmarkBitmapVisitBitsetAnd/131072/1 5975924 ns 5973199 ns 117 bytes_per_second=20.9268M/s
BenchmarkBitmapVisitBitsetAnd/32768/2 1492305 ns 1491810 ns 467 bytes_per_second=20.9477M/s
BenchmarkBitmapVisitBitsetAnd/131072/2 5987022 ns 5984924 ns 117 bytes_per_second=20.8858M/s
BenchmarkBitmapVisitUInt8And/32768/0 83420 ns 83384 ns 8454 bytes_per_second=374.771M/s
BenchmarkBitmapVisitUInt8And/131072/0 330943 ns 330824 ns 2091 bytes_per_second=377.844M/s
BenchmarkBitmapVisitUInt8And/32768/0 83640 ns 83612 ns 8347 bytes_per_second=373.749M/s
BenchmarkBitmapVisitUInt8And/131072/0 330971 ns 330857 ns 2115 bytes_per_second=377.807M/s
BenchmarkBitmapVisitUInt8And/32768/1 104218 ns 104185 ns 6702 bytes_per_second=299.948M/s
BenchmarkBitmapVisitUInt8And/131072/1 413367 ns 413210 ns 1691 bytes_per_second=302.509M/s
BenchmarkBitmapVisitUInt8And/32768/2 104633 ns 104591 ns 6796 bytes_per_second=298.783M/s
BenchmarkBitmapVisitUInt8And/131072/2 409852 ns 409681 ns 1679 bytes_per_second=305.115M/s
BenchmarkBitmapVisitUInt64And/32768/0 7205 ns 7202 ns 97108 bytes_per_second=4.2372G/s
BenchmarkBitmapVisitUInt64And/131072/0 28334 ns 28326 ns 24630 bytes_per_second=4.30952G/s
BenchmarkBitmapVisitUInt64And/32768/0 7205 ns 7202 ns 97100 bytes_per_second=4.23729G/s
BenchmarkBitmapVisitUInt64And/131072/0 28334 ns 28324 ns 24693 bytes_per_second=4.30981G/s
BenchmarkBitmapVisitUInt64And/32768/1 10736 ns 10732 ns 65164 bytes_per_second=2.84349G/s
BenchmarkBitmapVisitUInt64And/131072/1 36150 ns 36137 ns 19364 bytes_per_second=3.378G/s
BenchmarkBitmapVisitUInt64And/32768/2 10739 ns 10734 ns 65218 bytes_per_second=2.84299G/s
BenchmarkBitmapVisitUInt64And/131072/2 36137 ns 36127 ns 19342 bytes_per_second=3.37896G/s
```
After this patch
```
-------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------
BitmapReader/8192 130861 ns 130803 ns 5349 bytes_per_second=119.454M/s
VisitBits/8192 125543 ns 125498 ns 5621 bytes_per_second=124.504M/s
VisitBitsUnrolled/8192 45482 ns 45463 ns 15377 bytes_per_second=343.69M/s
BitmapWriter/8192 89930 ns 89889 ns 7383 bytes_per_second=86.9129M/s
FirstTimeBitmapWriter/8192 77582 ns 77547 ns 9009 bytes_per_second=100.745M/s
GenerateBits/8192 83244 ns 83210 ns 8393 bytes_per_second=93.8886M/s
GenerateBitsUnrolled/8192 42691 ns 42673 ns 15958 bytes_per_second=183.079M/s
CopyBitmapWithoutOffset/8192 191 ns 191 ns 3661358 bytes_per_second=39.9303G/s
CopyBitmapWithOffset/8192 8582 ns 8578 ns 81828 bytes_per_second=910.725M/s
BenchmarkBitmapAnd/32768/0 4078 ns 4077 ns 171965 bytes_per_second=7.48507G/s
BenchmarkBitmapAnd/131072/0 17269 ns 17262 ns 41975 bytes_per_second=7.07161G/s
BenchmarkBitmapAnd/32768/0 4084 ns 4082 ns 171407 bytes_per_second=7.47578G/s
BenchmarkBitmapAnd/131072/0 16135 ns 16129 ns 43454 bytes_per_second=7.5683G/s
[XXXXXXXXX: unaligned bitmap operation]
BenchmarkBitmapAnd/32768/1 7304 ns 7301 ns 95839 bytes_per_second=4.17989G/s
BenchmarkBitmapAnd/131072/1 28344 ns 28332 ns 24730 bytes_per_second=4.3085G/s
BenchmarkBitmapAnd/32768/2 7310 ns 7307 ns 95861 bytes_per_second=4.17663G/s
BenchmarkBitmapAnd/131072/2 28289 ns 28274 ns 24733 bytes_per_second=4.31734G/s
BenchmarkBitmapVisitBitsetAnd/32768/0 1495320 ns 1494883 ns 468 bytes_per_second=20.9047M/s
BenchmarkBitmapVisitBitsetAnd/131072/0 5986673 ns 5984492 ns 117 bytes_per_second=20.8873M/s
BenchmarkBitmapVisitBitsetAnd/32768/0 1495659 ns 1495153 ns 468 bytes_per_second=20.9009M/s
BenchmarkBitmapVisitBitsetAnd/131072/0 5985930 ns 5983460 ns 117 bytes_per_second=20.8909M/s
BenchmarkBitmapVisitBitsetAnd/32768/1 1495922 ns 1495411 ns 468 bytes_per_second=20.8973M/s
BenchmarkBitmapVisitBitsetAnd/131072/1 5981635 ns 5979222 ns 117 bytes_per_second=20.9057M/s
BenchmarkBitmapVisitBitsetAnd/32768/2 1494916 ns 1494250 ns 468 bytes_per_second=20.9135M/s
BenchmarkBitmapVisitBitsetAnd/131072/2 5986896 ns 5984091 ns 117 bytes_per_second=20.8887M/s
BenchmarkBitmapVisitUInt8And/32768/0 83954 ns 83920 ns 8442 bytes_per_second=372.377M/s
BenchmarkBitmapVisitUInt8And/131072/0 333061 ns 332921 ns 2081 bytes_per_second=375.465M/s
BenchmarkBitmapVisitUInt8And/32768/0 84111 ns 84071 ns 8339 bytes_per_second=371.709M/s
BenchmarkBitmapVisitUInt8And/131072/0 331619 ns 331460 ns 2104 bytes_per_second=377.119M/s
BenchmarkBitmapVisitUInt8And/32768/1 104625 ns 104585 ns 6699 bytes_per_second=298.801M/s
BenchmarkBitmapVisitUInt8And/131072/1 416265 ns 416111 ns 1680 bytes_per_second=300.401M/s
BenchmarkBitmapVisitUInt8And/32768/2 104083 ns 104044 ns 6794 bytes_per_second=300.353M/s
BenchmarkBitmapVisitUInt8And/131072/2 410391 ns 410236 ns 1679 bytes_per_second=304.703M/s
BenchmarkBitmapVisitUInt64And/32768/0 7215 ns 7213 ns 97053 bytes_per_second=4.2309G/s
BenchmarkBitmapVisitUInt64And/131072/0 28384 ns 28371 ns 24663 bytes_per_second=4.30262G/s
BenchmarkBitmapVisitUInt64And/32768/0 7213 ns 7210 ns 97340 bytes_per_second=4.23244G/s
BenchmarkBitmapVisitUInt64And/131072/0 28412 ns 28401 ns 24618 bytes_per_second=4.29805G/s
BenchmarkBitmapVisitUInt64And/32768/1 10703 ns 10698 ns 65420 bytes_per_second=2.85252G/s
BenchmarkBitmapVisitUInt64And/131072/1 36249 ns 36235 ns 19236 bytes_per_second=3.36889G/s
BenchmarkBitmapVisitUInt64And/32768/2 10704 ns 10700 ns 65459 bytes_per_second=2.85207G/s
BenchmarkBitmapVisitUInt64And/131072/2 36243 ns 36228 ns 19315 bytes_per_second=3.36946G/s
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#discussion_r422501522
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri
}
}
-template <typename Op>
+template <template <typename> class BitOp, typename LogicalOp>
void UnalignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right,
int64_t right_offset, uint8_t* out, int64_t out_offset,
int64_t length) {
- Op op;
- auto left_reader = internal::BitmapReader(left, left_offset, length);
- auto right_reader = internal::BitmapReader(right, right_offset, length);
- auto writer = internal::BitmapWriter(out, out_offset, length);
- for (int64_t i = 0; i < length; ++i) {
- if (op(left_reader.IsSet(), right_reader.IsSet())) {
- writer.Set();
- } else {
- writer.Clear();
+ using Word = uint64_t;
+
+ left += left_offset / 8;
+ right += right_offset / 8;
+ out += out_offset / 8;
+
+ left_offset %= 8;
+ right_offset %= 8;
+ out_offset %= 8;
+
+ const int64_t min_offset = std::min({left_offset, right_offset, out_offset});
+ const int64_t min_nbytes = BitUtil::BytesForBits(length + min_offset);
+ int64_t nwords = min_nbytes / sizeof(Word);
+
+ // process in words, we may touch two words in each iteration
+ if (nwords > 1) {
+ BitOp<Word> op;
+ constexpr int64_t bits_per_word = sizeof(Word) * 8;
+ const Word out_mask = (1U << out_offset) - 1;
+
+ length -= (nwords - 1) * bits_per_word;
+ Word left_word0 = util::SafeLoadAs<Word>(left);
+ Word right_word0 = util::SafeLoadAs<Word>(right);
+ Word out_word0 = util::SafeLoadAs<Word>(out);
+
+ do {
Review comment:
I think this code should be safe for big endian machine. But no hardware to test.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628970734
Yes, that's what I'm looking into. Anyhow, this is a very nice improvement, good job @cyb70289.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628923180
@ursabot benchmark --suite-filter=arrow-bit-util-benchmark
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628968358
I'm surprised by the results of buildbot, it shows almost no improvement. On my desktop with clang-10 and gcc-9, the old version is roughly 50m/sec and the new version is around 5gb/sec. Buildbot is roughly 4gb/sec for both versions.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques closed pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
fsaintjacques closed pull request #7135:
URL: https://github.com/apache/arrow/pull/7135
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#discussion_r422632059
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri
}
}
-template <typename Op>
+template <template <typename> class BitOp, typename LogicalOp>
void UnalignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right,
int64_t right_offset, uint8_t* out, int64_t out_offset,
int64_t length) {
- Op op;
- auto left_reader = internal::BitmapReader(left, left_offset, length);
- auto right_reader = internal::BitmapReader(right, right_offset, length);
- auto writer = internal::BitmapWriter(out, out_offset, length);
- for (int64_t i = 0; i < length; ++i) {
- if (op(left_reader.IsSet(), right_reader.IsSet())) {
- writer.Set();
- } else {
- writer.Clear();
+ using Word = uint64_t;
+
+ left += left_offset / 8;
+ right += right_offset / 8;
+ out += out_offset / 8;
+
+ left_offset %= 8;
+ right_offset %= 8;
+ out_offset %= 8;
+
+ const int64_t min_offset = std::min({left_offset, right_offset, out_offset});
+ const int64_t min_nbytes = BitUtil::BytesForBits(length + min_offset);
+ int64_t nwords = min_nbytes / sizeof(Word);
+
+ // process in words, we may touch two words in each iteration
+ if (nwords > 1) {
+ BitOp<Word> op;
+ constexpr int64_t bits_per_word = sizeof(Word) * 8;
+ const Word out_mask = (1U << out_offset) - 1;
+
+ length -= (nwords - 1) * bits_per_word;
+ Word left_word0 = util::SafeLoadAs<Word>(left);
+ Word right_word0 = util::SafeLoadAs<Word>(right);
+ Word out_word0 = util::SafeLoadAs<Word>(out);
+
+ do {
Review comment:
I've updated code to support big endian. RandomXor test errors are fixed.
https://travis-ci.org/github/apache/arrow/jobs/685282203
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628920728
The aligned version is auto-vectorized.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628922964
```
Usage: @ursabot [OPTIONS] COMMAND [ARGS]...
Ursabot
Options:
--help Show this message and exit.
Commands:
benchmark Run the benchmark suite in comparison mode.
build Trigger all tests registered for this pull request.
crossbow Trigger crossbow builds for this pull request
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628923053
@ursabot benchmark --help
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-626381392
FWIW the CI failure looks like a transient issue
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#discussion_r422619489
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri
}
}
-template <typename Op>
+template <template <typename> class BitOp, typename LogicalOp>
void UnalignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right,
int64_t right_offset, uint8_t* out, int64_t out_offset,
int64_t length) {
- Op op;
- auto left_reader = internal::BitmapReader(left, left_offset, length);
- auto right_reader = internal::BitmapReader(right, right_offset, length);
- auto writer = internal::BitmapWriter(out, out_offset, length);
- for (int64_t i = 0; i < length; ++i) {
- if (op(left_reader.IsSet(), right_reader.IsSet())) {
- writer.Set();
- } else {
- writer.Clear();
+ using Word = uint64_t;
+
+ left += left_offset / 8;
+ right += right_offset / 8;
+ out += out_offset / 8;
+
+ left_offset %= 8;
+ right_offset %= 8;
+ out_offset %= 8;
+
+ const int64_t min_offset = std::min({left_offset, right_offset, out_offset});
+ const int64_t min_nbytes = BitUtil::BytesForBits(length + min_offset);
+ int64_t nwords = min_nbytes / sizeof(Word);
+
+ // process in words, we may touch two words in each iteration
+ if (nwords > 1) {
+ BitOp<Word> op;
+ constexpr int64_t bits_per_word = sizeof(Word) * 8;
+ const Word out_mask = (1U << out_offset) - 1;
+
+ length -= (nwords - 1) * bits_per_word;
+ Word left_word0 = util::SafeLoadAs<Word>(left);
+ Word right_word0 = util::SafeLoadAs<Word>(right);
+ Word out_word0 = util::SafeLoadAs<Word>(out);
+
+ do {
Review comment:
I see lots of errors from BitmapOp.RandomXor, which is a new test added in this patch. Endianness change may be necessary for SafeLoad/SafeStore.
I just requested IBM LinuxONE account, hope can get access to a big endian vm soon.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kiszk commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#discussion_r422496709
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri
}
}
-template <typename Op>
+template <template <typename> class BitOp, typename LogicalOp>
void UnalignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right,
int64_t right_offset, uint8_t* out, int64_t out_offset,
int64_t length) {
- Op op;
- auto left_reader = internal::BitmapReader(left, left_offset, length);
- auto right_reader = internal::BitmapReader(right, right_offset, length);
- auto writer = internal::BitmapWriter(out, out_offset, length);
- for (int64_t i = 0; i < length; ++i) {
- if (op(left_reader.IsSet(), right_reader.IsSet())) {
- writer.Set();
- } else {
- writer.Clear();
+ using Word = uint64_t;
+
+ left += left_offset / 8;
+ right += right_offset / 8;
+ out += out_offset / 8;
+
+ left_offset %= 8;
+ right_offset %= 8;
+ out_offset %= 8;
+
+ const int64_t min_offset = std::min({left_offset, right_offset, out_offset});
+ const int64_t min_nbytes = BitUtil::BytesForBits(length + min_offset);
+ int64_t nwords = min_nbytes / sizeof(Word);
+
+ // process in words, we may touch two words in each iteration
+ if (nwords > 1) {
+ BitOp<Word> op;
+ constexpr int64_t bits_per_word = sizeof(Word) * 8;
+ const Word out_mask = (1U << out_offset) - 1;
+
+ length -= (nwords - 1) * bits_per_word;
+ Word left_word0 = util::SafeLoadAs<Word>(left);
+ Word right_word0 = util::SafeLoadAs<Word>(right);
+ Word out_word0 = util::SafeLoadAs<Word>(out);
+
+ do {
Review comment:
Does the current implementation of this optimization work for both little and big endian?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628980113
[AMD64 Ubuntu 18.04 C++ Benchmark (#105803)](https://ci.ursalabs.org/#builders/73/builds/64) builder has been succeeded.
Revision: 9db5af011291f3ca8e0aa888f5387f0a6c7dcfc2
```diff
=========================== =========== =========== ============
benchmark baseline contender change
=========================== =========== =========== ============
BenchmarkBitmapAnd/32768/1 3.43484e+09 3.43486e+09 5.76714e-06
BenchmarkBitmapAnd/131072/0 6.13433e+09 6.21135e+09 0.0125565
BenchmarkBitmapAnd/32768/0 7.65693e+09 7.6917e+09 0.00454131
BenchmarkBitmapAnd/32768/2 3.43616e+09 3.43443e+09 -0.000503796
BenchmarkBitmapAnd/131072/1 3.51409e+09 3.52113e+09 0.00200249
BenchmarkBitmapAnd/131072/2 3.51636e+09 3.52036e+09 0.00113589
=========================== =========== =========== ============
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7135:
URL: https://github.com/apache/arrow/pull/7135#issuecomment-628923055
```
Usage: @ursabot benchmark [OPTIONS] [<baseline>]
Run the benchmark suite in comparison mode.
This command will run the benchmark suite for tip of the branch commit
against `<baseline>` (or master if not provided).
Examples:
# Run the all the benchmarks
@ursabot benchmark
# Compare only benchmarks where the name matches the /^Sum/ regex
@ursabot benchmark --benchmark-filter=^Sum
# Compare only benchmarks where the suite matches the /compute-/ regex.
# A suite is the C++ binary.
@ursabot benchmark --suite-filter=compute-
# Sometimes a new optimization requires the addition of new benchmarks to
# quantify the performance increase. When doing this be sure to add the
# benchmark in a separate commit before introducing the optimization.
#
# Note that specifying the baseline is the only way to compare using a new
# benchmark, since master does not contain the new benchmark and no
# comparison is possible.
#
# The following command compares the results of matching benchmarks,
# compiling against HEAD and the provided baseline commit, e.g. eaf8302.
# You can use this to quantify the performance improvement of new
# optimizations or to check for regressions.
@ursabot benchmark --benchmark-filter=MyBenchmark eaf8302
Options:
--suite-filter <regex> Regex filtering benchmark suites.
--benchmark-filter <regex> Regex filtering benchmarks.
--help Show this message and exit.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org