You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/01 17:50:03 UTC
[GitHub] [arrow] pitrou opened a new pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
pitrou opened a new pull request #8320:
URL: https://github.com/apache/arrow/pull/8320
Use a lookup table to emulate PEXT 5 bits at a time.
Remove the slow scalar path.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702306644
https://issues.apache.org/jira/browse/ARROW-10058
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702614856
@pitrou I'm devoting most of my bandwidth to try to finish up the parquet read component this week, is it ok if I take a closer look next week (hopefully with enough time before an RC is cut?)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-704688666
+1. Thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] emkornfield closed pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
emkornfield closed pull request #8320:
URL: https://github.com/apache/arrow/pull/8320
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-704482050
sorry some personal issues came up. hope to have time tonight to review this and other parquet related CLs
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702297553
See JIRA issue for benchmarks. Would be nice to have benchmarks on other machines. @emkornfield
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-704491476
For the record, if I profile `BM_ReadStructOfListColumn/50`, I get the following hot spots (in cycles spent):
* ~19% in `DefRepLevelsToListInfo`
* ~15% in `DelimitRecords`
* ~11% in `BitRunReader::NextRun`
* ~10% in `SpacedExpand`
* ~6% in `DictionaryConverter<int>::Copy`
* ~5% in `PathWriteContext::AppendRepLevels`
And `ExtractBitsSoftware` (the PEXT emulation) only takes ~1.1%, which seems good enough for now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-704460267
Updated benchmarks on AMD Ryzen:
```
benchmark baseline contender change % counters
0 BM_ReadListOfStructColumn/50 392.881 MiB/sec 564.029 MiB/sec 43.562 {'run_name': 'BM_ReadListOfStructColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23}
10 BM_ReadListColumn/50 485.560 MiB/sec 675.023 MiB/sec 39.019 {'run_name': 'BM_ReadListColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 42}
7 BM_ReadStructOfListColumn/50 341.782 MiB/sec 462.097 MiB/sec 35.202 {'run_name': 'BM_ReadStructOfListColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 20}
3 BM_ReadListOfListColumn/50 447.657 MiB/sec 566.594 MiB/sec 26.569 {'run_name': 'BM_ReadListOfListColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 39}
23 BM_ReadListColumn/99 1.168 GiB/sec 1.365 GiB/sec 16.883 {'run_name': 'BM_ReadListColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 102}
4 BM_ReadListOfStructColumn/99 975.429 MiB/sec 1.095 GiB/sec 14.925 {'run_name': 'BM_ReadListOfStructColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 56}
9 BM_ReadStructOfListColumn/99 798.058 MiB/sec 896.789 MiB/sec 12.371 {'run_name': 'BM_ReadStructOfListColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 46}
22 BM_ReadListOfListColumn/99 1.050 GiB/sec 1.168 GiB/sec 11.159 {'run_name': 'BM_ReadListOfListColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 94}
1 BM_ReadListOfStructColumn/1 654.576 MiB/sec 725.676 MiB/sec 10.862 {'run_name': 'BM_ReadListOfStructColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 38}
19 BM_ReadListColumn/1 919.949 MiB/sec 1005.740 MiB/sec 9.326 {'run_name': 'BM_ReadListColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 81}
11 BM_ReadListOfStructColumn/0 835.259 MiB/sec 908.920 MiB/sec 8.819 {'run_name': 'BM_ReadListOfStructColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 49}
17 BM_ReadStructOfListColumn/1 605.129 MiB/sec 649.556 MiB/sec 7.342 {'run_name': 'BM_ReadStructOfListColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 35}
8 BM_ReadListColumn/0 1.067 GiB/sec 1.145 GiB/sec 7.334 {'run_name': 'BM_ReadListColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 92}
5 BM_ReadStructOfListColumn/0 700.157 MiB/sec 740.414 MiB/sec 5.750 {'run_name': 'BM_ReadStructOfListColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 41}
6 BM_ReadListOfListColumn/0 929.109 MiB/sec 966.896 MiB/sec 4.067 {'run_name': 'BM_ReadListOfListColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 82}
14 BM_ReadStructOfStructColumn/50 1.537 GiB/sec 1.595 GiB/sec 3.772 {'run_name': 'BM_ReadStructOfStructColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 45}
13 BM_ReadStructColumn/99 4.211 GiB/sec 4.330 GiB/sec 2.835 {'run_name': 'BM_ReadStructColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 252}
20 BM_ReadStructColumn/50 1.155 GiB/sec 1.187 GiB/sec 2.755 {'run_name': 'BM_ReadStructColumn/50', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 69}
15 BM_ReadStructOfStructColumn/1 1.802 GiB/sec 1.849 GiB/sec 2.566 {'run_name': 'BM_ReadStructOfStructColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 53}
12 BM_ReadStructColumn/1 1.798 GiB/sec 1.843 GiB/sec 2.521 {'run_name': 'BM_ReadStructColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 110}
2 BM_ReadStructOfStructColumn/99 3.464 GiB/sec 3.530 GiB/sec 1.898 {'run_name': 'BM_ReadStructOfStructColumn/99', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 100}
16 BM_ReadStructOfStructColumn/0 6.021 GiB/sec 6.065 GiB/sec 0.724 {'run_name': 'BM_ReadStructOfStructColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 179}
21 BM_ReadStructColumn/0 6.821 GiB/sec 6.812 GiB/sec -0.137 {'run_name': 'BM_ReadStructColumn/0', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 410}
18 BM_ReadListOfListColumn/1 805.644 MiB/sec 801.480 MiB/sec -0.517 {'run_name': 'BM_ReadListOfListColumn/1', 'run_type': 'iteration', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 70}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702639541
> is it ok if I take a closer look next week
No problem.
> but I would not expect that to be the case.
Right. The emulation is probably much slower.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702300021
I also notice that we call `internal::GreaterThanBitmap` for each 64 levels, which always goes through the dynamic dispatch indirection (meaning two function calls, I think). We could call `GreaterThanBitmapImpl` but that requires compiling a specialized version of `level_conversion_inc.h` for AVX2, otherwise we lose performance.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #8320: ARROW-10058: [C++] Improve repeated levels conversion without BMI2
Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8320:
URL: https://github.com/apache/arrow/pull/8320#issuecomment-702615585
> I also notice that we call internal::GreaterThanBitmap for each 64 levels, which always goes through the dynamic dispatch indirection (meaning two function calls, I think). We could call GreaterThanBitmapImpl but that requires compiling a specialized version of level_conversion_inc.h for AVX2, otherwise we lose performance.
yeah it isn't ideal, it is possible there is a better factoring in there but it seemed hard to do and isolate BMI2 special instructions, I guess if this isn't too much slower then BMI2 on intel we could potentially collapse everything, but I would not expect that to be the case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org