You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/05 15:10:35 UTC
[GitHub] [arrow] pitrou opened a new pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks
pitrou opened a new pull request #8342:
URL: https://github.com/apache/arrow/pull/8342
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703742864
@emkornfield If you can approve this quickly I can then exercise #8320 on it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703712511
Benchmark numbers on AMD Ryzen:
```
BM_ReadStructColumn/0 1770216 ns 1768552 ns 391 bytes_per_second=6.62618G/s items_per_second=592.901M/s
BM_ReadStructColumn/1 6769274 ns 6762710 ns 104 bytes_per_second=1.73285G/s items_per_second=155.053M/s
BM_ReadStructColumn/50 10427316 ns 10419932 ns 65 bytes_per_second=1.12465G/s items_per_second=100.632M/s
BM_ReadStructColumn/99 2910162 ns 2907881 ns 239 bytes_per_second=4.03G/s items_per_second=360.598M/s
BM_ReadStructOfStructColumn/0 3916658 ns 3912652 ns 179 bytes_per_second=5.99018G/s items_per_second=267.996M/s
BM_ReadStructOfStructColumn/1 13437999 ns 13425673 ns 52 bytes_per_second=1.74572G/s items_per_second=78.1023M/s
BM_ReadStructOfStructColumn/50 15649910 ns 15638181 ns 44 bytes_per_second=1.49874G/s items_per_second=67.0523M/s
BM_ReadStructOfStructColumn/99 7022917 ns 7015954 ns 95 bytes_per_second=3.3406G/s items_per_second=149.456M/s
BM_ReadStructOfListColumn/0 17565225 ns 17553576 ns 40 bytes_per_second=683.621M/s items_per_second=59.7357M/s
BM_ReadStructOfListColumn/1 20743098 ns 20729469 ns 34 bytes_per_second=578.886M/s items_per_second=50.5838M/s
BM_ReadStructOfListColumn/50 35540775 ns 35528805 ns 19 bytes_per_second=337.754M/s items_per_second=29.5134M/s
BM_ReadStructOfListColumn/99 14650196 ns 14640694 ns 47 bytes_per_second=819.633M/s items_per_second=71.6207M/s
BM_ReadListColumn/0 7604549 ns 7601412 ns 92 bytes_per_second=1052.44M/s items_per_second=137.945M/s
BM_ReadListColumn/1 9118144 ns 9114307 ns 77 bytes_per_second=877.741M/s items_per_second=115.047M/s
BM_ReadListColumn/50 16743022 ns 16738957 ns 41 bytes_per_second=477.927M/s items_per_second=62.6428M/s
BM_ReadListColumn/99 6798257 ns 6794603 ns 104 bytes_per_second=1.14981G/s items_per_second=154.325M/s
BM_ReadListOfStructColumn/0 15266355 ns 15256162 ns 45 bytes_per_second=786.567M/s items_per_second=68.7313M/s
BM_ReadListOfStructColumn/1 19382854 ns 19372448 ns 36 bytes_per_second=619.436M/s items_per_second=54.1272M/s
BM_ReadListOfStructColumn/50 30720332 ns 30711989 ns 23 bytes_per_second=390.727M/s items_per_second=34.1422M/s
BM_ReadListOfStructColumn/99 12476630 ns 12467956 ns 56 bytes_per_second=962.467M/s items_per_second=84.1017M/s
BM_ReadListOfListColumn/0 8938648 ns 8934638 ns 77 bytes_per_second=895.392M/s items_per_second=117.361M/s
BM_ReadListOfListColumn/1 10470577 ns 10466690 ns 66 bytes_per_second=764.329M/s items_per_second=100.182M/s
BM_ReadListOfListColumn/50 18194899 ns 18190700 ns 39 bytes_per_second=439.785M/s items_per_second=57.6435M/s
BM_ReadListOfListColumn/99 7500037 ns 7496263 ns 93 bytes_per_second=1067.2M/s items_per_second=139.88M/s
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-704413585
+1, will merge.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8342:
URL: https://github.com/apache/arrow/pull/8342
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703697282
Benchmarks on AMD Ryzen:
```
BM_ReadStructColumn/0 1730334 ns 1728700 ns 403 bytes_per_second=6.77894G/s items_per_second=606.569M/s
BM_ReadStructColumn/1 6780341 ns 6774443 ns 103 bytes_per_second=1.72985G/s items_per_second=154.784M/s
BM_ReadStructColumn/50 10310423 ns 10303979 ns 68 bytes_per_second=1.1373G/s items_per_second=101.764M/s
BM_ReadStructColumn/99 2894992 ns 2892589 ns 243 bytes_per_second=4.0513G/s items_per_second=362.504M/s
BM_ReadStructOfStructColumn/0 3927370 ns 3922870 ns 177 bytes_per_second=5.97458G/s items_per_second=267.298M/s
BM_ReadStructOfStructColumn/1 13401458 ns 13386514 ns 52 bytes_per_second=1.75083G/s items_per_second=78.3308M/s
BM_ReadStructOfStructColumn/50 15523635 ns 15511854 ns 44 bytes_per_second=1.51094G/s items_per_second=67.5984M/s
BM_ReadStructOfStructColumn/99 6975243 ns 6968122 ns 99 bytes_per_second=3.36353G/s items_per_second=150.482M/s
BM_ReadStructOfListColumn/0 17324567 ns 17312395 ns 40 bytes_per_second=69.3134M/s items_per_second=6.0567M/s
BM_ReadStructOfListColumn/1 20369870 ns 20353461 ns 35 bytes_per_second=58.9571M/s items_per_second=5.15175M/s
BM_ReadStructOfListColumn/50 35235091 ns 35214778 ns 20 bytes_per_second=34.0761M/s items_per_second=2.97761M/s
BM_ReadStructOfListColumn/99 14671458 ns 14662895 ns 47 bytes_per_second=81.838M/s items_per_second=7.15111M/s
BM_ReadListColumn/0 7489433 ns 7485862 ns 95 bytes_per_second=106.866M/s items_per_second=14.0072M/s
BM_ReadListColumn/1 8942143 ns 8936655 ns 78 bytes_per_second=89.5176M/s items_per_second=11.7332M/s
BM_ReadListColumn/50 16543501 ns 16539589 ns 42 bytes_per_second=48.3681M/s items_per_second=6.3397M/s
BM_ReadListColumn/99 6786057 ns 6781524 ns 103 bytes_per_second=117.966M/s items_per_second=15.462M/s
BM_ReadListOfStructColumn/0 15026063 ns 15014904 ns 45 bytes_per_second=79.9194M/s items_per_second=6.98346M/s
BM_ReadListOfStructColumn/1 19258078 ns 19246280 ns 37 bytes_per_second=62.3488M/s items_per_second=5.44812M/s
BM_ReadListOfStructColumn/50 30258890 ns 30247716 ns 23 bytes_per_second=39.6718M/s items_per_second=3.46658M/s
BM_ReadListOfStructColumn/99 12519324 ns 12510907 ns 56 bytes_per_second=95.9148M/s items_per_second=8.38117M/s
BM_ReadListOfListColumn/0 8708917 ns 8703415 ns 81 bytes_per_second=9.19025M/s items_per_second=1.20458M/s
BM_ReadListOfListColumn/1 10363542 ns 10359340 ns 67 bytes_per_second=7.7212M/s items_per_second=1012.03k/s
BM_ReadListOfListColumn/50 18012436 ns 18007187 ns 39 bytes_per_second=4.44192M/s items_per_second=582.212k/s
BM_ReadListOfListColumn/99 7502575 ns 7499158 ns 92 bytes_per_second=10.6661M/s items_per_second=1.39802M/s
```
We see that nested structs are fine, but lists reduce performance by a large amount (around 10x for each list nesting level).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou removed a comment on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks
Posted by GitBox <gi...@apache.org>.
pitrou removed a comment on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703697282
Benchmarks on AMD Ryzen:
```
BM_ReadStructColumn/0 1730334 ns 1728700 ns 403 bytes_per_second=6.77894G/s items_per_second=606.569M/s
BM_ReadStructColumn/1 6780341 ns 6774443 ns 103 bytes_per_second=1.72985G/s items_per_second=154.784M/s
BM_ReadStructColumn/50 10310423 ns 10303979 ns 68 bytes_per_second=1.1373G/s items_per_second=101.764M/s
BM_ReadStructColumn/99 2894992 ns 2892589 ns 243 bytes_per_second=4.0513G/s items_per_second=362.504M/s
BM_ReadStructOfStructColumn/0 3927370 ns 3922870 ns 177 bytes_per_second=5.97458G/s items_per_second=267.298M/s
BM_ReadStructOfStructColumn/1 13401458 ns 13386514 ns 52 bytes_per_second=1.75083G/s items_per_second=78.3308M/s
BM_ReadStructOfStructColumn/50 15523635 ns 15511854 ns 44 bytes_per_second=1.51094G/s items_per_second=67.5984M/s
BM_ReadStructOfStructColumn/99 6975243 ns 6968122 ns 99 bytes_per_second=3.36353G/s items_per_second=150.482M/s
BM_ReadStructOfListColumn/0 17324567 ns 17312395 ns 40 bytes_per_second=69.3134M/s items_per_second=6.0567M/s
BM_ReadStructOfListColumn/1 20369870 ns 20353461 ns 35 bytes_per_second=58.9571M/s items_per_second=5.15175M/s
BM_ReadStructOfListColumn/50 35235091 ns 35214778 ns 20 bytes_per_second=34.0761M/s items_per_second=2.97761M/s
BM_ReadStructOfListColumn/99 14671458 ns 14662895 ns 47 bytes_per_second=81.838M/s items_per_second=7.15111M/s
BM_ReadListColumn/0 7489433 ns 7485862 ns 95 bytes_per_second=106.866M/s items_per_second=14.0072M/s
BM_ReadListColumn/1 8942143 ns 8936655 ns 78 bytes_per_second=89.5176M/s items_per_second=11.7332M/s
BM_ReadListColumn/50 16543501 ns 16539589 ns 42 bytes_per_second=48.3681M/s items_per_second=6.3397M/s
BM_ReadListColumn/99 6786057 ns 6781524 ns 103 bytes_per_second=117.966M/s items_per_second=15.462M/s
BM_ReadListOfStructColumn/0 15026063 ns 15014904 ns 45 bytes_per_second=79.9194M/s items_per_second=6.98346M/s
BM_ReadListOfStructColumn/1 19258078 ns 19246280 ns 37 bytes_per_second=62.3488M/s items_per_second=5.44812M/s
BM_ReadListOfStructColumn/50 30258890 ns 30247716 ns 23 bytes_per_second=39.6718M/s items_per_second=3.46658M/s
BM_ReadListOfStructColumn/99 12519324 ns 12510907 ns 56 bytes_per_second=95.9148M/s items_per_second=8.38117M/s
BM_ReadListOfListColumn/0 8708917 ns 8703415 ns 81 bytes_per_second=9.19025M/s items_per_second=1.20458M/s
BM_ReadListOfListColumn/1 10363542 ns 10359340 ns 67 bytes_per_second=7.7212M/s items_per_second=1012.03k/s
BM_ReadListOfListColumn/50 18012436 ns 18007187 ns 39 bytes_per_second=4.44192M/s items_per_second=582.212k/s
BM_ReadListOfListColumn/99 7502575 ns 7499158 ns 92 bytes_per_second=10.6661M/s items_per_second=1.39802M/s
```
We see that nested structs are fine, but lists reduce performance by a large amount (around 10x for each list nesting level).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703707643
(wrong benchmark numbers deleted)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703713114
https://issues.apache.org/jira/browse/ARROW-10120
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org