You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/15 16:52:08 UTC

[GitHub] [arrow] pitrou commented on pull request #14100: ARROW-4709: [C++] Optimize for ordered JSON fields

pitrou commented on PR #14100:
URL: https://github.com/apache/arrow/pull/14100#issuecomment-1315595870

   Right, it seems the speedup is relatively minor. I get these results:
   ```
   --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (39)
   --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                        benchmark        baseline       contender  change %                                                                                                                                                                                                                            counters
    ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000 141.204 MiB/sec 163.772 MiB/sec    15.983  {'family_index': 5, 'per_family_instance_index': 24, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4544425.0}
    ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000 137.627 MiB/sec 158.607 MiB/sec    15.244  {'family_index': 5, 'per_family_instance_index': 26, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22, 'json_size': 4544425.0}
     ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:100 149.699 MiB/sec 167.002 MiB/sec    11.559   {'family_index': 5, 'per_family_instance_index': 12, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 259, 'json_size': 424102.0}
     ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:100 146.855 MiB/sec 163.307 MiB/sec    11.202   {'family_index': 5, 'per_family_instance_index': 14, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 253, 'json_size': 424102.0}
                                           ChunkJSONLineDelimited       97.748902      104.282934     6.685                                      {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'ChunkJSONLineDelimited', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7071295, 'json_size': 150361.0}
                                         ParseJSONBlockWithSchema 136.226 MiB/sec 138.375 MiB/sec     1.577                                        {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'ParseJSONBlockWithSchema', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 664, 'json_size': 150361.0}
      ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:10 193.421 MiB/sec 195.560 MiB/sec     1.106     {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 291, 'json_size': 483895.0}
      ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:10 192.316 MiB/sec 194.350 MiB/sec     1.058     {'family_index': 5, 'per_family_instance_index': 2, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 287, 'json_size': 483895.0}
      ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:10 179.680 MiB/sec 180.962 MiB/sec     0.714     {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 272, 'json_size': 484344.0}
                              ReadJSONBlockWithSchemaSingleThread 117.169 MiB/sec 117.609 MiB/sec     0.376                             {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'ReadJSONBlockWithSchemaSingleThread', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6, 'json_size': 15026882.0}
     ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:100 143.278 MiB/sec 141.908 MiB/sec    -0.957   {'family_index': 5, 'per_family_instance_index': 13, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 248, 'json_size': 424088.0}
    ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:100 144.507 MiB/sec 142.969 MiB/sec    -1.064  {'family_index': 5, 'per_family_instance_index': 16, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 248, 'json_size': 425955.0}
    ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:100 139.161 MiB/sec 137.485 MiB/sec    -1.204  {'family_index': 5, 'per_family_instance_index': 17, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 241, 'json_size': 422790.0}
     ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:10 184.568 MiB/sec 182.084 MiB/sec    -1.346    {'family_index': 5, 'per_family_instance_index': 6, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 281, 'json_size': 482610.0}
    ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:100 135.707 MiB/sec 133.874 MiB/sec    -1.351  {'family_index': 5, 'per_family_instance_index': 19, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 236, 'json_size': 422790.0}
     ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:10 190.419 MiB/sec 187.804 MiB/sec    -1.373    {'family_index': 5, 'per_family_instance_index': 4, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 289, 'json_size': 482610.0}
     ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:10  63.273 MiB/sec  62.386 MiB/sec    -1.402     {'family_index': 5, 'per_family_instance_index': 9, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 87, 'json_size': 530883.0}
     ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:10  62.917 MiB/sec  62.032 MiB/sec    -1.406     {'family_index': 5, 'per_family_instance_index': 8, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88, 'json_size': 524228.0}
    ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:100 141.608 MiB/sec 139.480 MiB/sec    -1.503  {'family_index': 5, 'per_family_instance_index': 18, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 244, 'json_size': 425955.0}
     ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:100 140.454 MiB/sec 138.317 MiB/sec    -1.521   {'family_index': 5, 'per_family_instance_index': 15, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 244, 'json_size': 424088.0}
     ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:10 177.692 MiB/sec 174.088 MiB/sec    -2.028    {'family_index': 5, 'per_family_instance_index': 5, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 268, 'json_size': 485740.0}
   ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:1000 135.271 MiB/sec 132.217 MiB/sec    -2.258 {'family_index': 5, 'per_family_instance_index': 28, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 24, 'json_size': 4085536.0}
      ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:10 180.369 MiB/sec 176.281 MiB/sec    -2.267     {'family_index': 5, 'per_family_instance_index': 3, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 274, 'json_size': 484344.0}
    ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:100  48.815 MiB/sec  47.630 MiB/sec    -2.427   {'family_index': 5, 'per_family_instance_index': 23, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'json_size': 425534.0}
     ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:10 173.820 MiB/sec 169.515 MiB/sec    -2.477    {'family_index': 5, 'per_family_instance_index': 7, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 262, 'json_size': 485740.0}
                                           ChunkJSONPrettyPrinted 305.981 MiB/sec 298.223 MiB/sec    -2.536                                         {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'ChunkJSONPrettyPrinted', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1033, 'json_size': 215361.0}
    ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:100  49.710 MiB/sec  48.375 MiB/sec    -2.685   {'family_index': 5, 'per_family_instance_index': 20, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'json_size': 430278.0}
    ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:100  49.334 MiB/sec  47.915 MiB/sec    -2.878   {'family_index': 5, 'per_family_instance_index': 21, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'json_size': 425534.0}
   ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:1000 130.316 MiB/sec 126.470 MiB/sec    -2.951 {'family_index': 5, 'per_family_instance_index': 29, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4088946.0}
    ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:1000 134.779 MiB/sec 130.799 MiB/sec    -2.953  {'family_index': 5, 'per_family_instance_index': 25, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 22, 'json_size': 4546025.0}
    ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:100  49.652 MiB/sec  48.147 MiB/sec    -3.031   {'family_index': 5, 'per_family_instance_index': 22, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'json_size': 430278.0}
     ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:10  62.658 MiB/sec  60.611 MiB/sec    -3.267    {'family_index': 5, 'per_family_instance_index': 11, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 86, 'json_size': 530883.0}
   ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:1000 131.295 MiB/sec 126.847 MiB/sec    -3.388 {'family_index': 5, 'per_family_instance_index': 30, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 24, 'json_size': 4085536.0}
   ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:1000 126.625 MiB/sec 121.335 MiB/sec    -4.178 {'family_index': 5, 'per_family_instance_index': 31, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:10/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23, 'json_size': 4088946.0}
   ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:1000  43.432 MiB/sec  41.614 MiB/sec    -4.187  {'family_index': 5, 'per_family_instance_index': 34, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 457089.0}
    ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:1000 131.544 MiB/sec 125.765 MiB/sec    -4.393  {'family_index': 5, 'per_family_instance_index': 27, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:0/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 21, 'json_size': 4546025.0}
   ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:1000  43.140 MiB/sec  41.204 MiB/sec    -4.488  {'family_index': 5, 'per_family_instance_index': 33, 'run_name': 'ParseJSONFields/ordered:0/schema:1/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 454665.0}
   ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:1000  43.505 MiB/sec  41.523 MiB/sec    -4.556  {'family_index': 5, 'per_family_instance_index': 32, 'run_name': 'ParseJSONFields/ordered:1/schema:1/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 457089.0}
   ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:1000  43.536 MiB/sec  41.527 MiB/sec    -4.615  {'family_index': 5, 'per_family_instance_index': 35, 'run_name': 'ParseJSONFields/ordered:0/schema:0/sparsity:90/num_fields:1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70, 'json_size': 454665.0}
   
   ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Regressions: (2)
   ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                      benchmark        baseline       contender  change %                                                                                                                                                                                                                         counters
   ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:10  63.104 MiB/sec  60.157 MiB/sec    -4.670 {'family_index': 5, 'per_family_instance_index': 10, 'run_name': 'ParseJSONFields/ordered:1/schema:0/sparsity:90/num_fields:10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88, 'json_size': 524228.0}
                   ReadJSONBlockWithSchemaMultiThread/real_time 752.152 MiB/sec 695.624 MiB/sec    -7.515                {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'ReadJSONBlockWithSchemaMultiThread/real_time', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 36, 'json_size': 15026882.0}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org