You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/05 15:28:49 UTC
[GitHub] [arrow] pitrou removed a comment on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks
pitrou removed a comment on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703697282
Benchmarks on AMD Ryzen:
```
BM_ReadStructColumn/0 1730334 ns 1728700 ns 403 bytes_per_second=6.77894G/s items_per_second=606.569M/s
BM_ReadStructColumn/1 6780341 ns 6774443 ns 103 bytes_per_second=1.72985G/s items_per_second=154.784M/s
BM_ReadStructColumn/50 10310423 ns 10303979 ns 68 bytes_per_second=1.1373G/s items_per_second=101.764M/s
BM_ReadStructColumn/99 2894992 ns 2892589 ns 243 bytes_per_second=4.0513G/s items_per_second=362.504M/s
BM_ReadStructOfStructColumn/0 3927370 ns 3922870 ns 177 bytes_per_second=5.97458G/s items_per_second=267.298M/s
BM_ReadStructOfStructColumn/1 13401458 ns 13386514 ns 52 bytes_per_second=1.75083G/s items_per_second=78.3308M/s
BM_ReadStructOfStructColumn/50 15523635 ns 15511854 ns 44 bytes_per_second=1.51094G/s items_per_second=67.5984M/s
BM_ReadStructOfStructColumn/99 6975243 ns 6968122 ns 99 bytes_per_second=3.36353G/s items_per_second=150.482M/s
BM_ReadStructOfListColumn/0 17324567 ns 17312395 ns 40 bytes_per_second=69.3134M/s items_per_second=6.0567M/s
BM_ReadStructOfListColumn/1 20369870 ns 20353461 ns 35 bytes_per_second=58.9571M/s items_per_second=5.15175M/s
BM_ReadStructOfListColumn/50 35235091 ns 35214778 ns 20 bytes_per_second=34.0761M/s items_per_second=2.97761M/s
BM_ReadStructOfListColumn/99 14671458 ns 14662895 ns 47 bytes_per_second=81.838M/s items_per_second=7.15111M/s
BM_ReadListColumn/0 7489433 ns 7485862 ns 95 bytes_per_second=106.866M/s items_per_second=14.0072M/s
BM_ReadListColumn/1 8942143 ns 8936655 ns 78 bytes_per_second=89.5176M/s items_per_second=11.7332M/s
BM_ReadListColumn/50 16543501 ns 16539589 ns 42 bytes_per_second=48.3681M/s items_per_second=6.3397M/s
BM_ReadListColumn/99 6786057 ns 6781524 ns 103 bytes_per_second=117.966M/s items_per_second=15.462M/s
BM_ReadListOfStructColumn/0 15026063 ns 15014904 ns 45 bytes_per_second=79.9194M/s items_per_second=6.98346M/s
BM_ReadListOfStructColumn/1 19258078 ns 19246280 ns 37 bytes_per_second=62.3488M/s items_per_second=5.44812M/s
BM_ReadListOfStructColumn/50 30258890 ns 30247716 ns 23 bytes_per_second=39.6718M/s items_per_second=3.46658M/s
BM_ReadListOfStructColumn/99 12519324 ns 12510907 ns 56 bytes_per_second=95.9148M/s items_per_second=8.38117M/s
BM_ReadListOfListColumn/0 8708917 ns 8703415 ns 81 bytes_per_second=9.19025M/s items_per_second=1.20458M/s
BM_ReadListOfListColumn/1 10363542 ns 10359340 ns 67 bytes_per_second=7.7212M/s items_per_second=1012.03k/s
BM_ReadListOfListColumn/50 18012436 ns 18007187 ns 39 bytes_per_second=4.44192M/s items_per_second=582.212k/s
BM_ReadListOfListColumn/99 7502575 ns 7499158 ns 92 bytes_per_second=10.6661M/s items_per_second=1.39802M/s
```
We see that nested structs are fine, but lists reduce performance by a large amount (around 10x for each list nesting level).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org