You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2020/09/28 04:54:00 UTC

[jira] [Commented] (ARROW-10058) [C++] Investigate performance of LevelsToBitmap without BMI2

    [ https://issues.apache.org/jira/browse/ARROW-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202999#comment-17202999 ] 

Yibo Cai commented on ARROW-10058:
----------------------------------

POC with 4 bit lookup table (uint8[16][16]) to map [mask][data] directly to the pext-ed result. See big performance improvement (637M/s -> 1074M/s).
POC patch [^opt-level-conv.diff]. Will propose a formal PR.

Benchmark result of release/parquet-level-conversion-benchmark
 *Current code*
{code:bash}
---------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------
BM_DefinitionLevelsToBitmapRepeatedAllMissing        1072 ns         1072 ns       651457 bytes_per_second=1.77856G/s
BM_DefinitionLevelsToBitmapRepeatedAllPresent        1226 ns         1226 ns       570829 bytes_per_second=1.55599G/s
BM_DefinitionLevelsToBitmapRepeatedMostPresent       3065 ns         3065 ns       228285 bytes_per_second=637.151M/s
{code}
*With lookup table*
{code:bash}
---------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------
BM_DefinitionLevelsToBitmapRepeatedAllMissing        1093 ns         1093 ns       640348 bytes_per_second=1.74501G/s
BM_DefinitionLevelsToBitmapRepeatedAllPresent        1244 ns         1244 ns       564592 bytes_per_second=1.53301G/s
BM_DefinitionLevelsToBitmapRepeatedMostPresent       1817 ns         1817 ns       384456 bytes_per_second=1074.7M/s
{code}

> [C++] Investigate performance of LevelsToBitmap without BMI2
> ------------------------------------------------------------
>
>                 Key: ARROW-10058
>                 URL: https://issues.apache.org/jira/browse/ARROW-10058
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++
>            Reporter: Antoine Pitrou
>            Priority: Major
>         Attachments: opt-level-conv.diff
>
>
> Currently, when some Parquet nested data involves some repetition levels, converting the levels to bitmap goes through a slow scalar path unless the BMI2 instruction set is available and efficient (the latter using the PEXT instruction to process 16 levels at once).
> It may be possible to emulate PEXT for 5- or 6-bit masks by using a lookup table, allowing to process 5-6 levels at once.
> (also, it would be good to add nested reading benchmarks for non-trivial nesting; currently we only benchmark one-level struct and one-level list)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)