You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/01 17:50:03 UTC

[GitHub] [arrow] pitrou opened a new pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

pitrou opened a new pull request #8320:
URL: https://github.com/apache/arrow/pull/8320


   Use a lookup table to emulate PEXT 5 bits at a time.
   Remove the slow scalar path.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702306644


   https://issues.apache.org/jira/browse/ARROW-10058


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702614856


   @pitrou I'm devoting most of my bandwidth to try to finish up the parquet read component this week, is it ok if I take a closer look next week (hopefully with enough time before an RC is cut?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-704688666


   +1.  Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield closed pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
emkornfield closed pull request #8320:
URL: https://github.com/apache/arrow/pull/8320


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-704482050


   sorry some personal issues came up.  hope to have time tonight to review this and other parquet related CLs


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702297553


   See JIRA issue for benchmarks. Would be nice to have benchmarks on other machines. @emkornfield 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-704491476


   For the record, if I profile `BM_ReadStructOfListColumn/50`, I get the following hot spots (in cycles spent):
   * ~19% in `DefRepLevelsToListInfo`
   * ~15% in `DelimitRecords`
   * ~11% in `BitRunReader::NextRun`
   * ~10% in `SpacedExpand`
   * ~6% in `DictionaryConverter<int>::Copy`
   * ~5% in `PathWriteContext::AppendRepLevels`
   
   And `ExtractBitsSoftware` (the PEXT emulation) only takes ~1.1%, which seems good enough for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-704460267


   Updated benchmarks on AMD Ryzen:
   ```
                            benchmark         baseline         contender  change %                                                                                                                                           counters
   0     BM_ReadListOfStructColumn/50  392.881 MiB/sec   564.029 MiB/sec    43.562     {'run_name': 'BM_ReadListOfStructColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23}
   10            BM_ReadListColumn/50  485.560 MiB/sec   675.023 MiB/sec    39.019             {'run_name': 'BM_ReadListColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 42}
   7     BM_ReadStructOfListColumn/50  341.782 MiB/sec   462.097 MiB/sec    35.202     {'run_name': 'BM_ReadStructOfListColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 20}
   3       BM_ReadListOfListColumn/50  447.657 MiB/sec   566.594 MiB/sec    26.569       {'run_name': 'BM_ReadListOfListColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 39}
   23            BM_ReadListColumn/99    1.168 GiB/sec     1.365 GiB/sec    16.883            {'run_name': 'BM_ReadListColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 102}
   4     BM_ReadListOfStructColumn/99  975.429 MiB/sec     1.095 GiB/sec    14.925     {'run_name': 'BM_ReadListOfStructColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 56}
   9     BM_ReadStructOfListColumn/99  798.058 MiB/sec   896.789 MiB/sec    12.371     {'run_name': 'BM_ReadStructOfListColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 46}
   22      BM_ReadListOfListColumn/99    1.050 GiB/sec     1.168 GiB/sec    11.159       {'run_name': 'BM_ReadListOfListColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 94}
   1      BM_ReadListOfStructColumn/1  654.576 MiB/sec   725.676 MiB/sec    10.862      {'run_name': 'BM_ReadListOfStructColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 38}
   19             BM_ReadListColumn/1  919.949 MiB/sec  1005.740 MiB/sec     9.326              {'run_name': 'BM_ReadListColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 81}
   11     BM_ReadListOfStructColumn/0  835.259 MiB/sec   908.920 MiB/sec     8.819      {'run_name': 'BM_ReadListOfStructColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 49}
   17     BM_ReadStructOfListColumn/1  605.129 MiB/sec   649.556 MiB/sec     7.342      {'run_name': 'BM_ReadStructOfListColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 35}
   8              BM_ReadListColumn/0    1.067 GiB/sec     1.145 GiB/sec     7.334              {'run_name': 'BM_ReadListColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 92}
   5      BM_ReadStructOfListColumn/0  700.157 MiB/sec   740.414 MiB/sec     5.750      {'run_name': 'BM_ReadStructOfListColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 41}
   6        BM_ReadListOfListColumn/0  929.109 MiB/sec   966.896 MiB/sec     4.067        {'run_name': 'BM_ReadListOfListColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 82}
   14  BM_ReadStructOfStructColumn/50    1.537 GiB/sec     1.595 GiB/sec     3.772   {'run_name': 'BM_ReadStructOfStructColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 45}
   13          BM_ReadStructColumn/99    4.211 GiB/sec     4.330 GiB/sec     2.835          {'run_name': 'BM_ReadStructColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 252}
   20          BM_ReadStructColumn/50    1.155 GiB/sec     1.187 GiB/sec     2.755           {'run_name': 'BM_ReadStructColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 69}
   15   BM_ReadStructOfStructColumn/1    1.802 GiB/sec     1.849 GiB/sec     2.566    {'run_name': 'BM_ReadStructOfStructColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 53}
   12           BM_ReadStructColumn/1    1.798 GiB/sec     1.843 GiB/sec     2.521           {'run_name': 'BM_ReadStructColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 110}
   2   BM_ReadStructOfStructColumn/99    3.464 GiB/sec     3.530 GiB/sec     1.898  {'run_name': 'BM_ReadStructOfStructColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 100}
   16   BM_ReadStructOfStructColumn/0    6.021 GiB/sec     6.065 GiB/sec     0.724   {'run_name': 'BM_ReadStructOfStructColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 179}
   21           BM_ReadStructColumn/0    6.821 GiB/sec     6.812 GiB/sec    -0.137           {'run_name': 'BM_ReadStructColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 410}
   18       BM_ReadListOfListColumn/1  805.644 MiB/sec   801.480 MiB/sec    -0.517        {'run_name': 'BM_ReadListOfListColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 70}
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702639541


   > is it ok if I take a closer look next week
   
   No problem.
   
   > but I would not expect that to be the case.
   
   Right. The emulation is probably much slower.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702300021


   I also notice that we call `internal::GreaterThanBitmap` for each 64 levels, which always goes through the dynamic dispatch indirection (meaning two function calls, I think). We could call `GreaterThanBitmapImpl` but that requires compiling a specialized version of `level_conversion_inc.h` for AVX2, otherwise we lose performance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702615585


   > I also notice that we call internal::GreaterThanBitmap for each 64 levels, which always goes through the dynamic dispatch indirection (meaning two function calls, I think). We could call GreaterThanBitmapImpl but that requires compiling a specialized version of level_conversion_inc.h for AVX2, otherwise we lose performance.
   
   yeah it isn't ideal, it is possible there is a better factoring in there but it seemed hard to do and isolate BMI2 special instructions, I guess if this isn't too much slower then BMI2 on intel we could potentially collapse everything, but I would not expect that to be the case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org