You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Diana Clarke (Jira)" <ji...@apache.org> on 2021/07/06 19:04:00 UTC

[jira] [Updated] (ARROW-13266) [JS] Improve benchmark names

     [ https://issues.apache.org/jira/browse/ARROW-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Diana Clarke updated ARROW-13266:
---------------------------------
    Description: 
1) I found the double usage of "name" confusing.

    {code}"name": "name: 'lat', length: 1,000,000, type: Float32",{code}
    
    Perhaps `column` instead?
    
    {code}"name": "column: 'lat', length: 1,000,000, type: Float32",{code}

2) The names could be more informative (and there are currently duplicates). I see the following in the json.

{code}
    "name": "Table.from",
    "name": "readBatches",
    "name": "serialize",
    "name": "name: 'lat', length: 1,000,000, type: Float32",
    "name": "name: 'lng', length: 1,000,000, type: Float32",
    "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'lat', length: 1,000,000, type: Float32",
    "name": "name: 'lng', length: 1,000,000, type: Float32",
    "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'lat', length: 1,000,000, type: Float32",
    "name": "name: 'lng', length: 1,000,000, type: Float32",
    "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'lat', length: 1,000,000, type: Float32",
    "name": "name: 'lng', length: 1,000,000, type: Float32",
    "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "length: 1,000,000",
    "name": "name: 'lat', length: 1,000,000, type: Float32, test: gt, value: 0",
    "name": "name: 'lng', length: 1,000,000, type: Float32, test: gt, value: 0",
    "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle",
{code}


  Yet I do see informative names in the code (like {{DataFrame Count By...}} & {{DataFrame Filter-Scan Count...}}):

    - https://github.com/apache/arrow/blob/5ca16287a389afceabdd4b487d2e43e62745abcc/js/perf/index.ts#L124
    - https://github.com/apache/arrow/blob/5ca16287a389afceabdd4b487d2e43e62745abcc/js/perf/index.ts#L114
 
   Perhaps add the suite name? And make the values json rather than comma separated values as one string?
   
   Something like this:
   
{code}
       ...
       "name": "DataFrame Count By"
       "values": {
           "column": "lng",
           "length": "1,000,000",
           "type": "Float32",
           "test": "gt",
           "value": "0"
        }
        ...
{code}    

  was:
1) I found the double usage of "name" confusing.

    {code}"name": "name: 'lat', length: 1,000,000, type: Float32",{code}
    
    Perhaps `column` instead?
    
    {code}"name": "column: 'lat', length: 1,000,000, type: Float32",{code}

2) The names could be more informative (and there are currently duplicates). I see the following in the json.

{code}
    "name": "Table.from",
    "name": "readBatches",
    "name": "serialize",
    "name": "name: 'lat', length: 1,000,000, type: Float32",
    "name": "name: 'lng', length: 1,000,000, type: Float32",
    "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'lat', length: 1,000,000, type: Float32",
    "name": "name: 'lng', length: 1,000,000, type: Float32",
    "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'lat', length: 1,000,000, type: Float32",
    "name": "name: 'lng', length: 1,000,000, type: Float32",
    "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'lat', length: 1,000,000, type: Float32",
    "name": "name: 'lng', length: 1,000,000, type: Float32",
    "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
    "name": "length: 1,000,000",
    "name": "name: 'lat', length: 1,000,000, type: Float32, test: gt, value: 0",
    "name": "name: 'lng', length: 1,000,000, type: Float32, test: gt, value: 0",
    "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle",
{code}


  Yet I do see informative names in the code (like `DataFrame Count By...` & `DataFrame Filter-Scan Count...`):

    - https://github.com/apache/arrow/blob/5ca16287a389afceabdd4b487d2e43e62745abcc/js/perf/index.ts#L124
    - https://github.com/apache/arrow/blob/5ca16287a389afceabdd4b487d2e43e62745abcc/js/perf/index.ts#L114
 
   Perhaps add the suite name? And make the values json rather than comma separated values as one string?
   
   Something like this:
   
{code}
       ...
       "name": "DataFrame Count By"
       "values": {
           "column": "lng",
           "length": "1,000,000",
           "type": "Float32",
           "test": "gt",
           "value": "0"
        }
        ...
{code}    


> [JS] Improve benchmark names
> ----------------------------
>
>                 Key: ARROW-13266
>                 URL: https://issues.apache.org/jira/browse/ARROW-13266
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>            Reporter: Diana Clarke
>            Assignee: Diana Clarke
>            Priority: Minor
>
> 1) I found the double usage of "name" confusing.
>     {code}"name": "name: 'lat', length: 1,000,000, type: Float32",{code}
>     
>     Perhaps `column` instead?
>     
>     {code}"name": "column: 'lat', length: 1,000,000, type: Float32",{code}
> 2) The names could be more informative (and there are currently duplicates). I see the following in the json.
> {code}
>     "name": "Table.from",
>     "name": "readBatches",
>     "name": "serialize",
>     "name": "name: 'lat', length: 1,000,000, type: Float32",
>     "name": "name: 'lng', length: 1,000,000, type: Float32",
>     "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
>     "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
>     "name": "name: 'lat', length: 1,000,000, type: Float32",
>     "name": "name: 'lng', length: 1,000,000, type: Float32",
>     "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
>     "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
>     "name": "name: 'lat', length: 1,000,000, type: Float32",
>     "name": "name: 'lng', length: 1,000,000, type: Float32",
>     "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
>     "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
>     "name": "name: 'lat', length: 1,000,000, type: Float32",
>     "name": "name: 'lng', length: 1,000,000, type: Float32",
>     "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>",
>     "name": "name: 'destination', length: 1,000,000, type: Dictionary<Int8, Utf8>",
>     "name": "length: 1,000,000",
>     "name": "name: 'lat', length: 1,000,000, type: Float32, test: gt, value: 0",
>     "name": "name: 'lng', length: 1,000,000, type: Float32, test: gt, value: 0",
>     "name": "name: 'origin', length: 1,000,000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle",
> {code}
>   Yet I do see informative names in the code (like {{DataFrame Count By...}} & {{DataFrame Filter-Scan Count...}}):
>     - https://github.com/apache/arrow/blob/5ca16287a389afceabdd4b487d2e43e62745abcc/js/perf/index.ts#L124
>     - https://github.com/apache/arrow/blob/5ca16287a389afceabdd4b487d2e43e62745abcc/js/perf/index.ts#L114
>  
>    Perhaps add the suite name? And make the values json rather than comma separated values as one string?
>    
>    Something like this:
>    
> {code}
>        ...
>        "name": "DataFrame Count By"
>        "values": {
>            "column": "lng",
>            "length": "1,000,000",
>            "type": "Float32",
>            "test": "gt",
>            "value": "0"
>         }
>         ...
> {code}    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)