You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/28 00:37:42 UTC

[GitHub] [arrow] alippai opened a new pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

alippai opened a new pull request #10181:
URL: https://github.com/apache/arrow/pull/10181


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841520624


   During benchmarking it I realized that we already use TextEncoder and TextDecoder only.
   `useNativeEncoders` is always evaluated to `true` in all supported Node versions and it takes precedence over `_Buffer`
   https://github.com/apache/arrow/blob/f47703e5237aca8cc081e140fd8a6120492649db/js/src/util/utf8.ts#L27-L34


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841511834


   https://github.com/apache/arrow/blob/master/js/test/data/tables/generate.py isn't this the test data generator script? I see only float and categorical here, no string. 
   
   @trxcllnt there is a good chance I can handle python, but which script are you referring to? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841529749


   @alippai if you change the conditional to this and re-run `yarn build -t es2015 -m cjs`, you can force it to use `Buffer.from`:
   ```ts
   const useNativeEncoders = !_Buffer && typeof TextDecoder === 'function' && typeof TextEncoder === 'function';
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-829692597


   @trxcllnt the full suite ran in 359s vs 360s. Do you have specific testcase I should check or are you referring to microbenchmarks? I can do both


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828233314


   Sure, I'll do that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841526434


   @domoritz what's happening? Am I reading the code wrong and it's actually hitting https://github.com/apache/arrow/blob/master/js/src/util/utf8.ts#L35-L38 somehow, or is it just variance in the benchmarks or perhaps a missing rebase?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841512097


   @domoritz I agree that some benchmarks should be added, I'll took into that this weekend.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841462855


   There are discussions about a 4.0.1 release on the mailing list. This pull request would need to get in soon to be included in the release. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
trxcllnt edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841530218


   Node didn't used to make the TextEncoder/TextDecoder globals, so if we want to support the faster way in node we should figure out a way to control this at the packaging or import stages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841541780


   I'm closing this, as we agree that we don't want to remove the Buffer API.
   
   Additional ideas for speedup:
   Encode:
   Use TextEncoder..encodeInto() and pre-allocate the space needed for the column
   Decode:
   TextDecode the full column (if possible) and iterate on the substrings based on the lengths stored in arrow. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828189114


   I saw in package.json it's already required


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-829700953


   Maybe we should add benchmarks for different vector types to https://github.com/apache/arrow/blob/master/js/perf/index.js. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841512673


   @alippai that's the script. The categorical dictionaries are strings and will show up in the output like this:
   
   ```
   Get "tracks" values by index:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841535011


   We do have [separate entry-points](https://github.com/apache/arrow/blob/master/js/gulp/package-task.js#L49-L51) today, so that's totally fine. We'd need to have some shared export that can defer the lookup to runtime (like we're doing here).
   
   Not sure if that will play nice with deep-imports where they skip over the entrypoint to include just one file like:
   ```js
   var { Table } = require('apache-arrow/table');
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828076225


   This needs Node 11, right? I'm okay with breaking backwards compatibility here since we are modernizing Arrow for the v5 release anyway. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828053990


   https://issues.apache.org/jira/browse/ARROW-12578


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai closed pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai closed pull request #10181:
URL: https://github.com/apache/arrow/pull/10181


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
trxcllnt edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-830286623


   @alippai Yeah, we have a few string columns in the test data we generate via Python for the micro-benchmarks.
   
   Related: we should probably move that data generation all to JS now that we have data generation in the JS tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kou commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
kou commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828228160


   If we release 4.0.1, we assign 5.0.0 version and 4.0.1 version to the issue.
   If you think that we should release 4.0.1 with this, could you send an e-mail to dev@arrow.apache.org list? We can discuss it on the list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841601667


   I think there was some misunderstanding here. Despite the performance degration, we can still merge this change as it simplifies the setup and fixes Rollup builds. I re-opened a pull request in https://github.com/apache/arrow/pull/10332. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841497150


   I think we should add a benchmark before merging this pull request @trxcllnt. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-829647060


   @alippai has this change been benchmarked against using `Buffer.from()`? We're using buffer here because it was dramatically faster than TextEncoder/TextDecoder in node, but it's possible that's no longer the case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841524268


   <details>
     <summary>Master</summary>
     <code>
   Parse "tracks":
   Table.from
    x 5,289 ops/sec ±10.58% (76 runs sampled)
      avg: 0.19ms
      1.14% of a frame @ 60FPS
   
   Parse "tracks":
   readBatches
    x 7,041 ops/sec ±1.47% (86 runs sampled)
      avg: 0.14ms
      0.84% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'lat', length: 1000000, type: Float32
    x 33.08 ops/sec ±0.87% (57 runs sampled)
      avg: 30.23ms
      181.38% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'lng', length: 1000000, type: Float32
    x 33.77 ops/sec ±0.29% (58 runs sampled)
      avg: 29.61ms
      177.66% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.30 ops/sec ±2.95% (5 runs sampled)
      avg: 3324.91ms
      19949.46% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.27 ops/sec ±7.83% (5 runs sampled)
      avg: 3650.97ms
      21905.82% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 47.13 ops/sec ±0.74% (61 runs sampled)
      avg: 21.22ms
      127.32% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 49.24 ops/sec ±0.41% (63 runs sampled)
      avg: 20.31ms
      121.86% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.25 ops/sec ±3.91% (5 runs sampled)
      avg: 4050.31ms
      24301.86% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.17 ops/sec ±16.15% (5 runs sampled)
      avg: 5885.89ms
      35315.34% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 531 ops/sec ±3.41% (71 runs sampled)
      avg: 1.88ms
      11.28% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 545 ops/sec ±3.38% (38 runs sampled)
      avg: 1.84ms
      11.04% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.22 ops/sec ±10.36% (5 runs sampled)
      avg: 4446.8ms
      26680.8% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.19 ops/sec ±9.42% (5 runs sampled)
      avg: 5145.19ms
      30871.14% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 1,989,586 ops/sec ±0.43% (89 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 2,065,325 ops/sec ±1.10% (82 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 2,355,183 ops/sec ±0.79% (87 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 2,274,617 ops/sec ±1.27% (89 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Table Iterate "tracks":
   length: 1000000
    x 27.05 ops/sec ±1.87% (47 runs sampled)
      avg: 36.97ms
      221.82% of a frame @ 60FPS
   
   DataFrame Count By "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 420 ops/sec ±1.16% (86 runs sampled)
      avg: 2.38ms
      14.28% of a frame @ 60FPS
   
   DataFrame Count By "tracks":
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 425 ops/sec ±1.14% (87 runs sampled)
      avg: 2.35ms
      14.1% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 61.64 ops/sec ±1.65% (62 runs sampled)
      avg: 16.22ms
      97.32% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 59.13 ops/sec ±1.54% (63 runs sampled)
      avg: 16.91ms
      101.46% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 67.39 ops/sec ±1.46% (68 runs sampled)
      avg: 14.84ms
      89.04% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 144 ops/sec ±2.11% (72 runs sampled)
      avg: 6.94ms
      41.64% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 151 ops/sec ±1.59% (77 runs sampled)
      avg: 6.63ms
      39.78% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 0.27 ops/sec ±10.69% (5 runs sampled)
      avg: 3682.97ms
      22097.82% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 22.87 ops/sec ±1.30% (42 runs sampled)
      avg: 43.72ms
      262.32% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 23.61 ops/sec ±0.75% (43 runs sampled)
      avg: 42.35ms
      254.1% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 38.42 ops/sec ±0.74% (66 runs sampled)
      avg: 26.03ms
      156.18% of a frame @ 60FPS
   </code>
   </details>
   
   <details>
     <summary>This branch</summary>
     <code>
   Parse "tracks":
   Table.from
    x 2,037 ops/sec ±18.31% (68 runs sampled)
      avg: 0.49ms
      2.94% of a frame @ 60FPS
   
   Parse "tracks":
   readBatches
    x 3,776 ops/sec ±3.49% (76 runs sampled)
      avg: 0.26ms
      1.56% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'lat', length: 1000000, type: Float32
    x 21.61 ops/sec ±1.55% (39 runs sampled)
      avg: 46.28ms
      277.68% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'lng', length: 1000000, type: Float32
    x 26.55 ops/sec ±0.79% (48 runs sampled)
      avg: 37.67ms
      226.02% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.19 ops/sec ±24.69% (5 runs sampled)
      avg: 5372.36ms
      32234.16% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.22 ops/sec ±8.28% (5 runs sampled)
      avg: 4525.98ms
      27155.88% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 48.89 ops/sec ±1.02% (63 runs sampled)
      avg: 20.45ms
      122.7% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 49.09 ops/sec ±0.46% (63 runs sampled)
      avg: 20.37ms
      122.22% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.23 ops/sec ±7.37% (5 runs sampled)
      avg: 4350.49ms
      26102.94% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.22 ops/sec ±6.16% (5 runs sampled)
      avg: 4543.31ms
      27259.86% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 587 ops/sec ±2.51% (74 runs sampled)
      avg: 1.7ms
      10.2% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 506 ops/sec ±3.18% (41 runs sampled)
      avg: 1.97ms
      11.82% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.22 ops/sec ±10.54% (5 runs sampled)
      avg: 4582.27ms
      27493.62% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.20 ops/sec ±2.11% (5 runs sampled)
      avg: 5045.86ms
      30275.16% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 1,769,734 ops/sec ±0.39% (87 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 1,937,769 ops/sec ±0.54% (83 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 2,074,383 ops/sec ±0.45% (91 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 2,073,111 ops/sec ±0.47% (91 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Table Iterate "tracks":
   length: 1000000
    x 18.78 ops/sec ±1.00% (35 runs sampled)
      avg: 53.24ms
      319.44% of a frame @ 60FPS
   
   DataFrame Count By "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 383 ops/sec ±0.33% (90 runs sampled)
      avg: 2.61ms
      15.66% of a frame @ 60FPS
   
   DataFrame Count By "tracks":
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 389 ops/sec ±0.95% (86 runs sampled)
      avg: 2.57ms
      15.42% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 58.41 ops/sec ±1.93% (60 runs sampled)
      avg: 17.12ms
      102.72% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 57.72 ops/sec ±1.97% (61 runs sampled)
      avg: 17.33ms
      103.98% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 65.08 ops/sec ±0.66% (67 runs sampled)
      avg: 15.37ms
      92.22% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 137 ops/sec ±0.43% (76 runs sampled)
      avg: 7.32ms
      43.92% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 142 ops/sec ±0.45% (79 runs sampled)
      avg: 7.05ms
      42.3% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 0.28 ops/sec ±15.00% (5 runs sampled)
      avg: 3595.08ms
      21570.48% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 27.00 ops/sec ±0.73% (48 runs sampled)
      avg: 37.04ms
      222.24% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 26.55 ops/sec ±1.14% (47 runs sampled)
      avg: 37.67ms
      226.02% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 40.91 ops/sec ±0.54% (54 runs sampled)
      avg: 24.44ms
      146.64% of a frame @ 60FPS
   </code>
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
trxcllnt edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841535011


   We do have [separate entry-points](https://github.com/apache/arrow/blob/master/js/gulp/package-task.js#L49-L51) today, so that's totally fine. We'd need to have some shared export that can defer the lookup to runtime (like we're doing here).
   
   Not sure if that will play nice with deep-imports where they skip over the entrypoint to include just one file like:
   ```js
   const { Table } = require('apache-arrow/table');
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841529879


   Performance: I ran microbenchmarks for TextEncoder vs Buffer+utils and Buffer based is 50% percent faster for small, ascii strings. As we are using larger strings or we switch to eg. Japanese language the difference gets smaller, but Buffer is still definitely faster.
   
   We might want to fix the "dead code" or remove it in 4.0.1 (and re-add it in a future release)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828217551


   https://github.com/apache/arrow/blob/3a6f6053c74eb698208395091009ac50be9dc29e/js/package.json#L109


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841533090


   What do you think about shipping different entry files for node and browser and solve this issue compile time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841599204


   String decoding is really really slow. Would you want to help with the materialization (https://issues.apache.org/jira/browse/ARROW-10220) @alippai?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841530218


   Node didn't used to make the TextEncoder/TextDecoder globals, so if we want to support node we should figure out a way to control this at the packaging or import stages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841519307


   Caching keys is https://issues.apache.org/jira/browse/ARROW-10220 (cc @bmschmidt). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841520624


   During bencharking it I realized that we already use TextEncoder and TextDecoder only.
   `useNativeEncoders` is always evaluated to `true` in all supported Node versions and it takes precedence over `_Buffer`
   https://github.com/apache/arrow/blob/f47703e5237aca8cc081e140fd8a6120492649db/js/src/util/utf8.ts#L27-L34


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] domoritz edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
domoritz edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841524268


   <details>
     <summary>Master</summary>
     <pre>
   Parse "tracks":
   Table.from
    x 5,289 ops/sec ±10.58% (76 runs sampled)
      avg: 0.19ms
      1.14% of a frame @ 60FPS
   
   Parse "tracks":
   readBatches
    x 7,041 ops/sec ±1.47% (86 runs sampled)
      avg: 0.14ms
      0.84% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'lat', length: 1000000, type: Float32
    x 33.08 ops/sec ±0.87% (57 runs sampled)
      avg: 30.23ms
      181.38% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'lng', length: 1000000, type: Float32
    x 33.77 ops/sec ±0.29% (58 runs sampled)
      avg: 29.61ms
      177.66% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.30 ops/sec ±2.95% (5 runs sampled)
      avg: 3324.91ms
      19949.46% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.27 ops/sec ±7.83% (5 runs sampled)
      avg: 3650.97ms
      21905.82% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 47.13 ops/sec ±0.74% (61 runs sampled)
      avg: 21.22ms
      127.32% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 49.24 ops/sec ±0.41% (63 runs sampled)
      avg: 20.31ms
      121.86% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.25 ops/sec ±3.91% (5 runs sampled)
      avg: 4050.31ms
      24301.86% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.17 ops/sec ±16.15% (5 runs sampled)
      avg: 5885.89ms
      35315.34% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 531 ops/sec ±3.41% (71 runs sampled)
      avg: 1.88ms
      11.28% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 545 ops/sec ±3.38% (38 runs sampled)
      avg: 1.84ms
      11.04% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.22 ops/sec ±10.36% (5 runs sampled)
      avg: 4446.8ms
      26680.8% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.19 ops/sec ±9.42% (5 runs sampled)
      avg: 5145.19ms
      30871.14% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 1,989,586 ops/sec ±0.43% (89 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 2,065,325 ops/sec ±1.10% (82 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 2,355,183 ops/sec ±0.79% (87 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 2,274,617 ops/sec ±1.27% (89 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Table Iterate "tracks":
   length: 1000000
    x 27.05 ops/sec ±1.87% (47 runs sampled)
      avg: 36.97ms
      221.82% of a frame @ 60FPS
   
   DataFrame Count By "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 420 ops/sec ±1.16% (86 runs sampled)
      avg: 2.38ms
      14.28% of a frame @ 60FPS
   
   DataFrame Count By "tracks":
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 425 ops/sec ±1.14% (87 runs sampled)
      avg: 2.35ms
      14.1% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 61.64 ops/sec ±1.65% (62 runs sampled)
      avg: 16.22ms
      97.32% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 59.13 ops/sec ±1.54% (63 runs sampled)
      avg: 16.91ms
      101.46% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 67.39 ops/sec ±1.46% (68 runs sampled)
      avg: 14.84ms
      89.04% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 144 ops/sec ±2.11% (72 runs sampled)
      avg: 6.94ms
      41.64% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 151 ops/sec ±1.59% (77 runs sampled)
      avg: 6.63ms
      39.78% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 0.27 ops/sec ±10.69% (5 runs sampled)
      avg: 3682.97ms
      22097.82% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 22.87 ops/sec ±1.30% (42 runs sampled)
      avg: 43.72ms
      262.32% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 23.61 ops/sec ±0.75% (43 runs sampled)
      avg: 42.35ms
      254.1% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 38.42 ops/sec ±0.74% (66 runs sampled)
      avg: 26.03ms
      156.18% of a frame @ 60FPS
   </pre>
   </details>
   
   <details>
     <summary>This branch</summary>
     <pre>
   Parse "tracks":
   Table.from
    x 2,037 ops/sec ±18.31% (68 runs sampled)
      avg: 0.49ms
      2.94% of a frame @ 60FPS
   
   Parse "tracks":
   readBatches
    x 3,776 ops/sec ±3.49% (76 runs sampled)
      avg: 0.26ms
      1.56% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'lat', length: 1000000, type: Float32
    x 21.61 ops/sec ±1.55% (39 runs sampled)
      avg: 46.28ms
      277.68% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'lng', length: 1000000, type: Float32
    x 26.55 ops/sec ±0.79% (48 runs sampled)
      avg: 37.67ms
      226.02% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.19 ops/sec ±24.69% (5 runs sampled)
      avg: 5372.36ms
      32234.16% of a frame @ 60FPS
   
   Get "tracks" values by index:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.22 ops/sec ±8.28% (5 runs sampled)
      avg: 4525.98ms
      27155.88% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 48.89 ops/sec ±1.02% (63 runs sampled)
      avg: 20.45ms
      122.7% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 49.09 ops/sec ±0.46% (63 runs sampled)
      avg: 20.37ms
      122.22% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.23 ops/sec ±7.37% (5 runs sampled)
      avg: 4350.49ms
      26102.94% of a frame @ 60FPS
   
   Iterate "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.22 ops/sec ±6.16% (5 runs sampled)
      avg: 4543.31ms
      27259.86% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 587 ops/sec ±2.51% (74 runs sampled)
      avg: 1.7ms
      10.2% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 506 ops/sec ±3.18% (41 runs sampled)
      avg: 1.97ms
      11.82% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.22 ops/sec ±10.54% (5 runs sampled)
      avg: 4582.27ms
      27493.62% of a frame @ 60FPS
   
   Slice toArray "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 0.20 ops/sec ±2.11% (5 runs sampled)
      avg: 5045.86ms
      30275.16% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'lat', length: 1000000, type: Float32
    x 1,769,734 ops/sec ±0.39% (87 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'lng', length: 1000000, type: Float32
    x 1,937,769 ops/sec ±0.54% (83 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 2,074,383 ops/sec ±0.45% (91 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Slice "tracks" vectors:
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 2,073,111 ops/sec ±0.47% (91 runs sampled)
      avg: 0ms
      0% of a frame @ 60FPS
   
   Table Iterate "tracks":
   length: 1000000
    x 18.78 ops/sec ±1.00% (35 runs sampled)
      avg: 53.24ms
      319.44% of a frame @ 60FPS
   
   DataFrame Count By "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
    x 383 ops/sec ±0.33% (90 runs sampled)
      avg: 2.61ms
      15.66% of a frame @ 60FPS
   
   DataFrame Count By "tracks":
   name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
    x 389 ops/sec ±0.95% (86 runs sampled)
      avg: 2.57ms
      15.42% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 58.41 ops/sec ±1.93% (60 runs sampled)
      avg: 17.12ms
      102.72% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 57.72 ops/sec ±1.97% (61 runs sampled)
      avg: 17.33ms
      103.98% of a frame @ 60FPS
   
   DataFrame Filter-Scan Count "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 65.08 ops/sec ±0.66% (67 runs sampled)
      avg: 15.37ms
      92.22% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 137 ops/sec ±0.43% (76 runs sampled)
      avg: 7.32ms
      43.92% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 142 ops/sec ±0.45% (79 runs sampled)
      avg: 7.05ms
      42.3% of a frame @ 60FPS
   
   DataFrame Direct Count "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 0.28 ops/sec ±15.00% (5 runs sampled)
      avg: 3595.08ms
      21570.48% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
    x 27.00 ops/sec ±0.73% (48 runs sampled)
      avg: 37.04ms
      222.24% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
    x 26.55 ops/sec ±1.14% (47 runs sampled)
      avg: 37.67ms
      226.02% of a frame @ 60FPS
   
   DataFrame Filter-Iterate "tracks":
   name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
    x 40.91 ops/sec ±0.54% (54 runs sampled)
      avg: 24.44ms
      146.64% of a frame @ 60FPS
   </pre>
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841541780


   I'm closing this, as we agree that we don't want to remove the TextEncoder API.
   
   Additional ideas for speedup:
   Encode:
   Use TextEncoder..encodeInto() and pre-allocate the space needed for the column
   Decode:
   TextDecode the full column (if possible) and iterate on the substrings based on the lengths stored in arrow. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
trxcllnt edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-830286623


   @alippai Yeah, we have a few string columns in the test data we generate via Python for the micro-benchmarks. I can send you the file I generated if it's a pain to run the Python script.
   
   Related: we should probably move that data generation all to JS now that we have data generation in the JS tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828219343


   @kou I saw you removed the release tag 4.0.1. I don't know whether this is a bug or an improvement, however it fixes a regression which was introduced in 4.0.0 so including it in 4.0.1 (if released) would make sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841527729


   Idk what causes the differences. @trxcllnt just ran a benchmark to compare buffer vs no buffer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841514959


   Dictionary may work as well, indeed. My understanding was that the buffer -> string conversions happen only once for every dictionary value and it's cloned later. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS

Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-830286623


   Yeah, we have a few string columns in the test data we generate via Python for the micro-benchmarks. We should probably move that data generation all to JS now that we have data generation in the JS tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org