You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2020/05/08 09:43:00 UTC

[jira] [Commented] (ARROW-8553) [C++] Reimplement BitmapAnd using Bitmap::VisitWords

    [ https://issues.apache.org/jira/browse/ARROW-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102412#comment-17102412 ] 

Yibo Cai commented on ARROW-8553:
---------------------------------

Did a quick test, performance improvement is promising with VisitWords(>10x).

There's one issue need to address. Would like to hear your comments [~apitrou], [~bkietz]
VisitWords calls visitor on each word, but bits in first word is unknown to visitor, it may be less than a full word size. See [code|https://github.com/apache/arrow/blob/6002ec388840de5622e39af85abdc57a2cccc9b2/cpp/src/arrow/util/bit_util.h#L960].
It makes it hard to use VisitWords to handle bitmap operations (and, or, ...), as I don't how many valid bits to write to output buffer for first word, and bit offset of later words cannot be determined. VisitWords returns bit length of first word, but it's too late, all visitors are already finished.

I recommend adding a parameter "valid bits" to visitor function, which tells how many bits are valid in current word. Only first and last word may be not full size.
What's your opinion? Or are there better ways? Thanks.

> [C++] Reimplement BitmapAnd using Bitmap::VisitWords
> ----------------------------------------------------
>
>                 Key: ARROW-8553
>                 URL: https://issues.apache.org/jira/browse/ARROW-8553
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 0.17.0
>            Reporter: Antoine Pitrou
>            Assignee: Yibo Cai
>            Priority: Major
>
> Currently, {{BitmapAnd}} uses a bit-by-bit loop for unaligned inputs. Using {{Bitmap::VisitWords}} instead would probably yield a manyfold performance increase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)