You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/05 15:12:56 UTC

[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks

pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703697282


   Benchmarks on AMD Ryzen:
   ```
   BM_ReadStructColumn/0             1730334 ns      1728700 ns          403 bytes_per_second=6.77894G/s items_per_second=606.569M/s
   BM_ReadStructColumn/1             6780341 ns      6774443 ns          103 bytes_per_second=1.72985G/s items_per_second=154.784M/s
   BM_ReadStructColumn/50           10310423 ns     10303979 ns           68 bytes_per_second=1.1373G/s items_per_second=101.764M/s
   BM_ReadStructColumn/99            2894992 ns      2892589 ns          243 bytes_per_second=4.0513G/s items_per_second=362.504M/s
   
   BM_ReadStructOfStructColumn/0     3927370 ns      3922870 ns          177 bytes_per_second=5.97458G/s items_per_second=267.298M/s
   BM_ReadStructOfStructColumn/1    13401458 ns     13386514 ns           52 bytes_per_second=1.75083G/s items_per_second=78.3308M/s
   BM_ReadStructOfStructColumn/50   15523635 ns     15511854 ns           44 bytes_per_second=1.51094G/s items_per_second=67.5984M/s
   BM_ReadStructOfStructColumn/99    6975243 ns      6968122 ns           99 bytes_per_second=3.36353G/s items_per_second=150.482M/s
   
   BM_ReadStructOfListColumn/0      17324567 ns     17312395 ns           40 bytes_per_second=69.3134M/s items_per_second=6.0567M/s
   BM_ReadStructOfListColumn/1      20369870 ns     20353461 ns           35 bytes_per_second=58.9571M/s items_per_second=5.15175M/s
   BM_ReadStructOfListColumn/50     35235091 ns     35214778 ns           20 bytes_per_second=34.0761M/s items_per_second=2.97761M/s
   BM_ReadStructOfListColumn/99     14671458 ns     14662895 ns           47 bytes_per_second=81.838M/s items_per_second=7.15111M/s
   
   BM_ReadListColumn/0               7489433 ns      7485862 ns           95 bytes_per_second=106.866M/s items_per_second=14.0072M/s
   BM_ReadListColumn/1               8942143 ns      8936655 ns           78 bytes_per_second=89.5176M/s items_per_second=11.7332M/s
   BM_ReadListColumn/50             16543501 ns     16539589 ns           42 bytes_per_second=48.3681M/s items_per_second=6.3397M/s
   BM_ReadListColumn/99              6786057 ns      6781524 ns          103 bytes_per_second=117.966M/s items_per_second=15.462M/s
   
   BM_ReadListOfStructColumn/0      15026063 ns     15014904 ns           45 bytes_per_second=79.9194M/s items_per_second=6.98346M/s
   BM_ReadListOfStructColumn/1      19258078 ns     19246280 ns           37 bytes_per_second=62.3488M/s items_per_second=5.44812M/s
   BM_ReadListOfStructColumn/50     30258890 ns     30247716 ns           23 bytes_per_second=39.6718M/s items_per_second=3.46658M/s
   BM_ReadListOfStructColumn/99     12519324 ns     12510907 ns           56 bytes_per_second=95.9148M/s items_per_second=8.38117M/s
   
   BM_ReadListOfListColumn/0         8708917 ns      8703415 ns           81 bytes_per_second=9.19025M/s items_per_second=1.20458M/s
   BM_ReadListOfListColumn/1        10363542 ns     10359340 ns           67 bytes_per_second=7.7212M/s items_per_second=1012.03k/s
   BM_ReadListOfListColumn/50       18012436 ns     18007187 ns           39 bytes_per_second=4.44192M/s items_per_second=582.212k/s
   BM_ReadListOfListColumn/99        7502575 ns      7499158 ns           92 bytes_per_second=10.6661M/s items_per_second=1.39802M/s
   ```
   
   We see that nested structs are fine, but lists reduce performance by a large amount (around 10x for each list nesting level).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org