You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/05 15:10:35 UTC

[GitHub] [arrow] pitrou opened a new pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks

pitrou opened a new pull request #8342:
URL: https://github.com/apache/arrow/pull/8342


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703742864


   @emkornfield If you can approve this quickly I can then exercise #8320 on it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703712511


   Benchmark numbers on AMD Ryzen:
   ```
   BM_ReadStructColumn/0                    1770216 ns      1768552 ns          391 bytes_per_second=6.62618G/s items_per_second=592.901M/s
   BM_ReadStructColumn/1                    6769274 ns      6762710 ns          104 bytes_per_second=1.73285G/s items_per_second=155.053M/s
   BM_ReadStructColumn/50                  10427316 ns     10419932 ns           65 bytes_per_second=1.12465G/s items_per_second=100.632M/s
   BM_ReadStructColumn/99                   2910162 ns      2907881 ns          239 bytes_per_second=4.03G/s items_per_second=360.598M/s
   
   BM_ReadStructOfStructColumn/0            3916658 ns      3912652 ns          179 bytes_per_second=5.99018G/s items_per_second=267.996M/s
   BM_ReadStructOfStructColumn/1           13437999 ns     13425673 ns           52 bytes_per_second=1.74572G/s items_per_second=78.1023M/s
   BM_ReadStructOfStructColumn/50          15649910 ns     15638181 ns           44 bytes_per_second=1.49874G/s items_per_second=67.0523M/s
   BM_ReadStructOfStructColumn/99           7022917 ns      7015954 ns           95 bytes_per_second=3.3406G/s items_per_second=149.456M/s
   
   BM_ReadStructOfListColumn/0             17565225 ns     17553576 ns           40 bytes_per_second=683.621M/s items_per_second=59.7357M/s
   BM_ReadStructOfListColumn/1             20743098 ns     20729469 ns           34 bytes_per_second=578.886M/s items_per_second=50.5838M/s
   BM_ReadStructOfListColumn/50            35540775 ns     35528805 ns           19 bytes_per_second=337.754M/s items_per_second=29.5134M/s
   BM_ReadStructOfListColumn/99            14650196 ns     14640694 ns           47 bytes_per_second=819.633M/s items_per_second=71.6207M/s
   
   BM_ReadListColumn/0                      7604549 ns      7601412 ns           92 bytes_per_second=1052.44M/s items_per_second=137.945M/s
   BM_ReadListColumn/1                      9118144 ns      9114307 ns           77 bytes_per_second=877.741M/s items_per_second=115.047M/s
   BM_ReadListColumn/50                    16743022 ns     16738957 ns           41 bytes_per_second=477.927M/s items_per_second=62.6428M/s
   BM_ReadListColumn/99                     6798257 ns      6794603 ns          104 bytes_per_second=1.14981G/s items_per_second=154.325M/s
   
   BM_ReadListOfStructColumn/0             15266355 ns     15256162 ns           45 bytes_per_second=786.567M/s items_per_second=68.7313M/s
   BM_ReadListOfStructColumn/1             19382854 ns     19372448 ns           36 bytes_per_second=619.436M/s items_per_second=54.1272M/s
   BM_ReadListOfStructColumn/50            30720332 ns     30711989 ns           23 bytes_per_second=390.727M/s items_per_second=34.1422M/s
   BM_ReadListOfStructColumn/99            12476630 ns     12467956 ns           56 bytes_per_second=962.467M/s items_per_second=84.1017M/s
   
   BM_ReadListOfListColumn/0                8938648 ns      8934638 ns           77 bytes_per_second=895.392M/s items_per_second=117.361M/s
   BM_ReadListOfListColumn/1               10470577 ns     10466690 ns           66 bytes_per_second=764.329M/s items_per_second=100.182M/s
   BM_ReadListOfListColumn/50              18194899 ns     18190700 ns           39 bytes_per_second=439.785M/s items_per_second=57.6435M/s
   BM_ReadListOfListColumn/99               7500037 ns      7496263 ns           93 bytes_per_second=1067.2M/s items_per_second=139.88M/s
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-704413585


   +1, will merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8342:
URL: https://github.com/apache/arrow/pull/8342


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703697282


   Benchmarks on AMD Ryzen:
   ```
   BM_ReadStructColumn/0             1730334 ns      1728700 ns          403 bytes_per_second=6.77894G/s items_per_second=606.569M/s
   BM_ReadStructColumn/1             6780341 ns      6774443 ns          103 bytes_per_second=1.72985G/s items_per_second=154.784M/s
   BM_ReadStructColumn/50           10310423 ns     10303979 ns           68 bytes_per_second=1.1373G/s items_per_second=101.764M/s
   BM_ReadStructColumn/99            2894992 ns      2892589 ns          243 bytes_per_second=4.0513G/s items_per_second=362.504M/s
   
   BM_ReadStructOfStructColumn/0     3927370 ns      3922870 ns          177 bytes_per_second=5.97458G/s items_per_second=267.298M/s
   BM_ReadStructOfStructColumn/1    13401458 ns     13386514 ns           52 bytes_per_second=1.75083G/s items_per_second=78.3308M/s
   BM_ReadStructOfStructColumn/50   15523635 ns     15511854 ns           44 bytes_per_second=1.51094G/s items_per_second=67.5984M/s
   BM_ReadStructOfStructColumn/99    6975243 ns      6968122 ns           99 bytes_per_second=3.36353G/s items_per_second=150.482M/s
   
   BM_ReadStructOfListColumn/0      17324567 ns     17312395 ns           40 bytes_per_second=69.3134M/s items_per_second=6.0567M/s
   BM_ReadStructOfListColumn/1      20369870 ns     20353461 ns           35 bytes_per_second=58.9571M/s items_per_second=5.15175M/s
   BM_ReadStructOfListColumn/50     35235091 ns     35214778 ns           20 bytes_per_second=34.0761M/s items_per_second=2.97761M/s
   BM_ReadStructOfListColumn/99     14671458 ns     14662895 ns           47 bytes_per_second=81.838M/s items_per_second=7.15111M/s
   
   BM_ReadListColumn/0               7489433 ns      7485862 ns           95 bytes_per_second=106.866M/s items_per_second=14.0072M/s
   BM_ReadListColumn/1               8942143 ns      8936655 ns           78 bytes_per_second=89.5176M/s items_per_second=11.7332M/s
   BM_ReadListColumn/50             16543501 ns     16539589 ns           42 bytes_per_second=48.3681M/s items_per_second=6.3397M/s
   BM_ReadListColumn/99              6786057 ns      6781524 ns          103 bytes_per_second=117.966M/s items_per_second=15.462M/s
   
   BM_ReadListOfStructColumn/0      15026063 ns     15014904 ns           45 bytes_per_second=79.9194M/s items_per_second=6.98346M/s
   BM_ReadListOfStructColumn/1      19258078 ns     19246280 ns           37 bytes_per_second=62.3488M/s items_per_second=5.44812M/s
   BM_ReadListOfStructColumn/50     30258890 ns     30247716 ns           23 bytes_per_second=39.6718M/s items_per_second=3.46658M/s
   BM_ReadListOfStructColumn/99     12519324 ns     12510907 ns           56 bytes_per_second=95.9148M/s items_per_second=8.38117M/s
   
   BM_ReadListOfListColumn/0         8708917 ns      8703415 ns           81 bytes_per_second=9.19025M/s items_per_second=1.20458M/s
   BM_ReadListOfListColumn/1        10363542 ns     10359340 ns           67 bytes_per_second=7.7212M/s items_per_second=1012.03k/s
   BM_ReadListOfListColumn/50       18012436 ns     18007187 ns           39 bytes_per_second=4.44192M/s items_per_second=582.212k/s
   BM_ReadListOfListColumn/99        7502575 ns      7499158 ns           92 bytes_per_second=10.6661M/s items_per_second=1.39802M/s
   ```
   
   We see that nested structs are fine, but lists reduce performance by a large amount (around 10x for each list nesting level).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou removed a comment on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks

Posted by GitBox <gi...@apache.org>.
pitrou removed a comment on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703697282


   Benchmarks on AMD Ryzen:
   ```
   BM_ReadStructColumn/0             1730334 ns      1728700 ns          403 bytes_per_second=6.77894G/s items_per_second=606.569M/s
   BM_ReadStructColumn/1             6780341 ns      6774443 ns          103 bytes_per_second=1.72985G/s items_per_second=154.784M/s
   BM_ReadStructColumn/50           10310423 ns     10303979 ns           68 bytes_per_second=1.1373G/s items_per_second=101.764M/s
   BM_ReadStructColumn/99            2894992 ns      2892589 ns          243 bytes_per_second=4.0513G/s items_per_second=362.504M/s
   
   BM_ReadStructOfStructColumn/0     3927370 ns      3922870 ns          177 bytes_per_second=5.97458G/s items_per_second=267.298M/s
   BM_ReadStructOfStructColumn/1    13401458 ns     13386514 ns           52 bytes_per_second=1.75083G/s items_per_second=78.3308M/s
   BM_ReadStructOfStructColumn/50   15523635 ns     15511854 ns           44 bytes_per_second=1.51094G/s items_per_second=67.5984M/s
   BM_ReadStructOfStructColumn/99    6975243 ns      6968122 ns           99 bytes_per_second=3.36353G/s items_per_second=150.482M/s
   
   BM_ReadStructOfListColumn/0      17324567 ns     17312395 ns           40 bytes_per_second=69.3134M/s items_per_second=6.0567M/s
   BM_ReadStructOfListColumn/1      20369870 ns     20353461 ns           35 bytes_per_second=58.9571M/s items_per_second=5.15175M/s
   BM_ReadStructOfListColumn/50     35235091 ns     35214778 ns           20 bytes_per_second=34.0761M/s items_per_second=2.97761M/s
   BM_ReadStructOfListColumn/99     14671458 ns     14662895 ns           47 bytes_per_second=81.838M/s items_per_second=7.15111M/s
   
   BM_ReadListColumn/0               7489433 ns      7485862 ns           95 bytes_per_second=106.866M/s items_per_second=14.0072M/s
   BM_ReadListColumn/1               8942143 ns      8936655 ns           78 bytes_per_second=89.5176M/s items_per_second=11.7332M/s
   BM_ReadListColumn/50             16543501 ns     16539589 ns           42 bytes_per_second=48.3681M/s items_per_second=6.3397M/s
   BM_ReadListColumn/99              6786057 ns      6781524 ns          103 bytes_per_second=117.966M/s items_per_second=15.462M/s
   
   BM_ReadListOfStructColumn/0      15026063 ns     15014904 ns           45 bytes_per_second=79.9194M/s items_per_second=6.98346M/s
   BM_ReadListOfStructColumn/1      19258078 ns     19246280 ns           37 bytes_per_second=62.3488M/s items_per_second=5.44812M/s
   BM_ReadListOfStructColumn/50     30258890 ns     30247716 ns           23 bytes_per_second=39.6718M/s items_per_second=3.46658M/s
   BM_ReadListOfStructColumn/99     12519324 ns     12510907 ns           56 bytes_per_second=95.9148M/s items_per_second=8.38117M/s
   
   BM_ReadListOfListColumn/0         8708917 ns      8703415 ns           81 bytes_per_second=9.19025M/s items_per_second=1.20458M/s
   BM_ReadListOfListColumn/1        10363542 ns     10359340 ns           67 bytes_per_second=7.7212M/s items_per_second=1012.03k/s
   BM_ReadListOfListColumn/50       18012436 ns     18007187 ns           39 bytes_per_second=4.44192M/s items_per_second=582.212k/s
   BM_ReadListOfListColumn/99        7502575 ns      7499158 ns           92 bytes_per_second=10.6661M/s items_per_second=1.39802M/s
   ```
   
   We see that nested structs are fine, but lists reduce performance by a large amount (around 10x for each list nesting level).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703707643


   (wrong benchmark numbers deleted)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8342: ARROW-10120: [C++] Add two-level nested Parquet read to Arrow benchmarks

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8342:
URL: https://github.com/apache/arrow/pull/8342#issuecomment-703713114


   https://issues.apache.org/jira/browse/ARROW-10120


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org