You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/28 00:37:42 UTC
[GitHub] [arrow] alippai opened a new pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
alippai opened a new pull request #10181:
URL: https://github.com/apache/arrow/pull/10181
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841520624
During benchmarking it I realized that we already use TextEncoder and TextDecoder only.
`useNativeEncoders` is always evaluated to `true` in all supported Node versions and it takes precedence over `_Buffer`
https://github.com/apache/arrow/blob/f47703e5237aca8cc081e140fd8a6120492649db/js/src/util/utf8.ts#L27-L34
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841511834
https://github.com/apache/arrow/blob/master/js/test/data/tables/generate.py isn't this the test data generator script? I see only float and categorical here, no string.
@trxcllnt there is a good chance I can handle python, but which script are you referring to?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841529749
@alippai if you change the conditional to this and re-run `yarn build -t es2015 -m cjs`, you can force it to use `Buffer.from`:
```ts
const useNativeEncoders = !_Buffer && typeof TextDecoder === 'function' && typeof TextEncoder === 'function';
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-829692597
@trxcllnt the full suite ran in 359s vs 360s. Do you have specific testcase I should check or are you referring to microbenchmarks? I can do both
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828233314
Sure, I'll do that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841526434
@domoritz what's happening? Am I reading the code wrong and it's actually hitting https://github.com/apache/arrow/blob/master/js/src/util/utf8.ts#L35-L38 somehow, or is it just variance in the benchmarks or perhaps a missing rebase?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841512097
@domoritz I agree that some benchmarks should be added, I'll took into that this weekend.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841462855
There are discussions about a 4.0.1 release on the mailing list. This pull request would need to get in soon to be included in the release.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] trxcllnt edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
trxcllnt edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841530218
Node didn't used to make the TextEncoder/TextDecoder globals, so if we want to support the faster way in node we should figure out a way to control this at the packaging or import stages.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841541780
I'm closing this, as we agree that we don't want to remove the Buffer API.
Additional ideas for speedup:
Encode:
Use TextEncoder..encodeInto() and pre-allocate the space needed for the column
Decode:
TextDecode the full column (if possible) and iterate on the substrings based on the lengths stored in arrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828189114
I saw in package.json it's already required
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-829700953
Maybe we should add benchmarks for different vector types to https://github.com/apache/arrow/blob/master/js/perf/index.js.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841512673
@alippai that's the script. The categorical dictionaries are strings and will show up in the output like this:
```
Get "tracks" values by index:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841535011
We do have [separate entry-points](https://github.com/apache/arrow/blob/master/js/gulp/package-task.js#L49-L51) today, so that's totally fine. We'd need to have some shared export that can defer the lookup to runtime (like we're doing here).
Not sure if that will play nice with deep-imports where they skip over the entrypoint to include just one file like:
```js
var { Table } = require('apache-arrow/table');
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828076225
This needs Node 11, right? I'm okay with breaking backwards compatibility here since we are modernizing Arrow for the v5 release anyway.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828053990
https://issues.apache.org/jira/browse/ARROW-12578
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai closed pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai closed pull request #10181:
URL: https://github.com/apache/arrow/pull/10181
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] trxcllnt edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
trxcllnt edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-830286623
@alippai Yeah, we have a few string columns in the test data we generate via Python for the micro-benchmarks.
Related: we should probably move that data generation all to JS now that we have data generation in the JS tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
kou commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828228160
If we release 4.0.1, we assign 5.0.0 version and 4.0.1 version to the issue.
If you think that we should release 4.0.1 with this, could you send an e-mail to dev@arrow.apache.org list? We can discuss it on the list.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841601667
I think there was some misunderstanding here. Despite the performance degration, we can still merge this change as it simplifies the setup and fixes Rollup builds. I re-opened a pull request in https://github.com/apache/arrow/pull/10332.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841497150
I think we should add a benchmark before merging this pull request @trxcllnt.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-829647060
@alippai has this change been benchmarked against using `Buffer.from()`? We're using buffer here because it was dramatically faster than TextEncoder/TextDecoder in node, but it's possible that's no longer the case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841524268
<details>
<summary>Master</summary>
<code>
Parse "tracks":
Table.from
x 5,289 ops/sec ±10.58% (76 runs sampled)
avg: 0.19ms
1.14% of a frame @ 60FPS
Parse "tracks":
readBatches
x 7,041 ops/sec ±1.47% (86 runs sampled)
avg: 0.14ms
0.84% of a frame @ 60FPS
Get "tracks" values by index:
name: 'lat', length: 1000000, type: Float32
x 33.08 ops/sec ±0.87% (57 runs sampled)
avg: 30.23ms
181.38% of a frame @ 60FPS
Get "tracks" values by index:
name: 'lng', length: 1000000, type: Float32
x 33.77 ops/sec ±0.29% (58 runs sampled)
avg: 29.61ms
177.66% of a frame @ 60FPS
Get "tracks" values by index:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.30 ops/sec ±2.95% (5 runs sampled)
avg: 3324.91ms
19949.46% of a frame @ 60FPS
Get "tracks" values by index:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.27 ops/sec ±7.83% (5 runs sampled)
avg: 3650.97ms
21905.82% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 47.13 ops/sec ±0.74% (61 runs sampled)
avg: 21.22ms
127.32% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 49.24 ops/sec ±0.41% (63 runs sampled)
avg: 20.31ms
121.86% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.25 ops/sec ±3.91% (5 runs sampled)
avg: 4050.31ms
24301.86% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.17 ops/sec ±16.15% (5 runs sampled)
avg: 5885.89ms
35315.34% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 531 ops/sec ±3.41% (71 runs sampled)
avg: 1.88ms
11.28% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 545 ops/sec ±3.38% (38 runs sampled)
avg: 1.84ms
11.04% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.22 ops/sec ±10.36% (5 runs sampled)
avg: 4446.8ms
26680.8% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.19 ops/sec ±9.42% (5 runs sampled)
avg: 5145.19ms
30871.14% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 1,989,586 ops/sec ±0.43% (89 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 2,065,325 ops/sec ±1.10% (82 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 2,355,183 ops/sec ±0.79% (87 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 2,274,617 ops/sec ±1.27% (89 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Table Iterate "tracks":
length: 1000000
x 27.05 ops/sec ±1.87% (47 runs sampled)
avg: 36.97ms
221.82% of a frame @ 60FPS
DataFrame Count By "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 420 ops/sec ±1.16% (86 runs sampled)
avg: 2.38ms
14.28% of a frame @ 60FPS
DataFrame Count By "tracks":
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 425 ops/sec ±1.14% (87 runs sampled)
avg: 2.35ms
14.1% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 61.64 ops/sec ±1.65% (62 runs sampled)
avg: 16.22ms
97.32% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 59.13 ops/sec ±1.54% (63 runs sampled)
avg: 16.91ms
101.46% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 67.39 ops/sec ±1.46% (68 runs sampled)
avg: 14.84ms
89.04% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 144 ops/sec ±2.11% (72 runs sampled)
avg: 6.94ms
41.64% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 151 ops/sec ±1.59% (77 runs sampled)
avg: 6.63ms
39.78% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 0.27 ops/sec ±10.69% (5 runs sampled)
avg: 3682.97ms
22097.82% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 22.87 ops/sec ±1.30% (42 runs sampled)
avg: 43.72ms
262.32% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 23.61 ops/sec ±0.75% (43 runs sampled)
avg: 42.35ms
254.1% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 38.42 ops/sec ±0.74% (66 runs sampled)
avg: 26.03ms
156.18% of a frame @ 60FPS
</code>
</details>
<details>
<summary>This branch</summary>
<code>
Parse "tracks":
Table.from
x 2,037 ops/sec ±18.31% (68 runs sampled)
avg: 0.49ms
2.94% of a frame @ 60FPS
Parse "tracks":
readBatches
x 3,776 ops/sec ±3.49% (76 runs sampled)
avg: 0.26ms
1.56% of a frame @ 60FPS
Get "tracks" values by index:
name: 'lat', length: 1000000, type: Float32
x 21.61 ops/sec ±1.55% (39 runs sampled)
avg: 46.28ms
277.68% of a frame @ 60FPS
Get "tracks" values by index:
name: 'lng', length: 1000000, type: Float32
x 26.55 ops/sec ±0.79% (48 runs sampled)
avg: 37.67ms
226.02% of a frame @ 60FPS
Get "tracks" values by index:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.19 ops/sec ±24.69% (5 runs sampled)
avg: 5372.36ms
32234.16% of a frame @ 60FPS
Get "tracks" values by index:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.22 ops/sec ±8.28% (5 runs sampled)
avg: 4525.98ms
27155.88% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 48.89 ops/sec ±1.02% (63 runs sampled)
avg: 20.45ms
122.7% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 49.09 ops/sec ±0.46% (63 runs sampled)
avg: 20.37ms
122.22% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.23 ops/sec ±7.37% (5 runs sampled)
avg: 4350.49ms
26102.94% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.22 ops/sec ±6.16% (5 runs sampled)
avg: 4543.31ms
27259.86% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 587 ops/sec ±2.51% (74 runs sampled)
avg: 1.7ms
10.2% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 506 ops/sec ±3.18% (41 runs sampled)
avg: 1.97ms
11.82% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.22 ops/sec ±10.54% (5 runs sampled)
avg: 4582.27ms
27493.62% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.20 ops/sec ±2.11% (5 runs sampled)
avg: 5045.86ms
30275.16% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 1,769,734 ops/sec ±0.39% (87 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 1,937,769 ops/sec ±0.54% (83 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 2,074,383 ops/sec ±0.45% (91 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 2,073,111 ops/sec ±0.47% (91 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Table Iterate "tracks":
length: 1000000
x 18.78 ops/sec ±1.00% (35 runs sampled)
avg: 53.24ms
319.44% of a frame @ 60FPS
DataFrame Count By "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 383 ops/sec ±0.33% (90 runs sampled)
avg: 2.61ms
15.66% of a frame @ 60FPS
DataFrame Count By "tracks":
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 389 ops/sec ±0.95% (86 runs sampled)
avg: 2.57ms
15.42% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 58.41 ops/sec ±1.93% (60 runs sampled)
avg: 17.12ms
102.72% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 57.72 ops/sec ±1.97% (61 runs sampled)
avg: 17.33ms
103.98% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 65.08 ops/sec ±0.66% (67 runs sampled)
avg: 15.37ms
92.22% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 137 ops/sec ±0.43% (76 runs sampled)
avg: 7.32ms
43.92% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 142 ops/sec ±0.45% (79 runs sampled)
avg: 7.05ms
42.3% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 0.28 ops/sec ±15.00% (5 runs sampled)
avg: 3595.08ms
21570.48% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 27.00 ops/sec ±0.73% (48 runs sampled)
avg: 37.04ms
222.24% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 26.55 ops/sec ±1.14% (47 runs sampled)
avg: 37.67ms
226.02% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 40.91 ops/sec ±0.54% (54 runs sampled)
avg: 24.44ms
146.64% of a frame @ 60FPS
</code>
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] trxcllnt edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
trxcllnt edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841535011
We do have [separate entry-points](https://github.com/apache/arrow/blob/master/js/gulp/package-task.js#L49-L51) today, so that's totally fine. We'd need to have some shared export that can defer the lookup to runtime (like we're doing here).
Not sure if that will play nice with deep-imports where they skip over the entrypoint to include just one file like:
```js
const { Table } = require('apache-arrow/table');
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841529879
Performance: I ran microbenchmarks for TextEncoder vs Buffer+utils and Buffer based is 50% percent faster for small, ascii strings. As we are using larger strings or we switch to eg. Japanese language the difference gets smaller, but Buffer is still definitely faster.
We might want to fix the "dead code" or remove it in 4.0.1 (and re-add it in a future release)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828217551
https://github.com/apache/arrow/blob/3a6f6053c74eb698208395091009ac50be9dc29e/js/package.json#L109
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841533090
What do you think about shipping different entry files for node and browser and solve this issue compile time?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841599204
String decoding is really really slow. Would you want to help with the materialization (https://issues.apache.org/jira/browse/ARROW-10220) @alippai?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841530218
Node didn't used to make the TextEncoder/TextDecoder globals, so if we want to support node we should figure out a way to control this at the packaging or import stages.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841519307
Caching keys is https://issues.apache.org/jira/browse/ARROW-10220 (cc @bmschmidt).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841520624
During bencharking it I realized that we already use TextEncoder and TextDecoder only.
`useNativeEncoders` is always evaluated to `true` in all supported Node versions and it takes precedence over `_Buffer`
https://github.com/apache/arrow/blob/f47703e5237aca8cc081e140fd8a6120492649db/js/src/util/utf8.ts#L27-L34
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] domoritz edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
domoritz edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841524268
<details>
<summary>Master</summary>
<pre>
Parse "tracks":
Table.from
x 5,289 ops/sec ±10.58% (76 runs sampled)
avg: 0.19ms
1.14% of a frame @ 60FPS
Parse "tracks":
readBatches
x 7,041 ops/sec ±1.47% (86 runs sampled)
avg: 0.14ms
0.84% of a frame @ 60FPS
Get "tracks" values by index:
name: 'lat', length: 1000000, type: Float32
x 33.08 ops/sec ±0.87% (57 runs sampled)
avg: 30.23ms
181.38% of a frame @ 60FPS
Get "tracks" values by index:
name: 'lng', length: 1000000, type: Float32
x 33.77 ops/sec ±0.29% (58 runs sampled)
avg: 29.61ms
177.66% of a frame @ 60FPS
Get "tracks" values by index:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.30 ops/sec ±2.95% (5 runs sampled)
avg: 3324.91ms
19949.46% of a frame @ 60FPS
Get "tracks" values by index:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.27 ops/sec ±7.83% (5 runs sampled)
avg: 3650.97ms
21905.82% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 47.13 ops/sec ±0.74% (61 runs sampled)
avg: 21.22ms
127.32% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 49.24 ops/sec ±0.41% (63 runs sampled)
avg: 20.31ms
121.86% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.25 ops/sec ±3.91% (5 runs sampled)
avg: 4050.31ms
24301.86% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.17 ops/sec ±16.15% (5 runs sampled)
avg: 5885.89ms
35315.34% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 531 ops/sec ±3.41% (71 runs sampled)
avg: 1.88ms
11.28% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 545 ops/sec ±3.38% (38 runs sampled)
avg: 1.84ms
11.04% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.22 ops/sec ±10.36% (5 runs sampled)
avg: 4446.8ms
26680.8% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.19 ops/sec ±9.42% (5 runs sampled)
avg: 5145.19ms
30871.14% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 1,989,586 ops/sec ±0.43% (89 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 2,065,325 ops/sec ±1.10% (82 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 2,355,183 ops/sec ±0.79% (87 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 2,274,617 ops/sec ±1.27% (89 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Table Iterate "tracks":
length: 1000000
x 27.05 ops/sec ±1.87% (47 runs sampled)
avg: 36.97ms
221.82% of a frame @ 60FPS
DataFrame Count By "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 420 ops/sec ±1.16% (86 runs sampled)
avg: 2.38ms
14.28% of a frame @ 60FPS
DataFrame Count By "tracks":
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 425 ops/sec ±1.14% (87 runs sampled)
avg: 2.35ms
14.1% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 61.64 ops/sec ±1.65% (62 runs sampled)
avg: 16.22ms
97.32% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 59.13 ops/sec ±1.54% (63 runs sampled)
avg: 16.91ms
101.46% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 67.39 ops/sec ±1.46% (68 runs sampled)
avg: 14.84ms
89.04% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 144 ops/sec ±2.11% (72 runs sampled)
avg: 6.94ms
41.64% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 151 ops/sec ±1.59% (77 runs sampled)
avg: 6.63ms
39.78% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 0.27 ops/sec ±10.69% (5 runs sampled)
avg: 3682.97ms
22097.82% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 22.87 ops/sec ±1.30% (42 runs sampled)
avg: 43.72ms
262.32% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 23.61 ops/sec ±0.75% (43 runs sampled)
avg: 42.35ms
254.1% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 38.42 ops/sec ±0.74% (66 runs sampled)
avg: 26.03ms
156.18% of a frame @ 60FPS
</pre>
</details>
<details>
<summary>This branch</summary>
<pre>
Parse "tracks":
Table.from
x 2,037 ops/sec ±18.31% (68 runs sampled)
avg: 0.49ms
2.94% of a frame @ 60FPS
Parse "tracks":
readBatches
x 3,776 ops/sec ±3.49% (76 runs sampled)
avg: 0.26ms
1.56% of a frame @ 60FPS
Get "tracks" values by index:
name: 'lat', length: 1000000, type: Float32
x 21.61 ops/sec ±1.55% (39 runs sampled)
avg: 46.28ms
277.68% of a frame @ 60FPS
Get "tracks" values by index:
name: 'lng', length: 1000000, type: Float32
x 26.55 ops/sec ±0.79% (48 runs sampled)
avg: 37.67ms
226.02% of a frame @ 60FPS
Get "tracks" values by index:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.19 ops/sec ±24.69% (5 runs sampled)
avg: 5372.36ms
32234.16% of a frame @ 60FPS
Get "tracks" values by index:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.22 ops/sec ±8.28% (5 runs sampled)
avg: 4525.98ms
27155.88% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 48.89 ops/sec ±1.02% (63 runs sampled)
avg: 20.45ms
122.7% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 49.09 ops/sec ±0.46% (63 runs sampled)
avg: 20.37ms
122.22% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.23 ops/sec ±7.37% (5 runs sampled)
avg: 4350.49ms
26102.94% of a frame @ 60FPS
Iterate "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.22 ops/sec ±6.16% (5 runs sampled)
avg: 4543.31ms
27259.86% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 587 ops/sec ±2.51% (74 runs sampled)
avg: 1.7ms
10.2% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 506 ops/sec ±3.18% (41 runs sampled)
avg: 1.97ms
11.82% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.22 ops/sec ±10.54% (5 runs sampled)
avg: 4582.27ms
27493.62% of a frame @ 60FPS
Slice toArray "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 0.20 ops/sec ±2.11% (5 runs sampled)
avg: 5045.86ms
30275.16% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'lat', length: 1000000, type: Float32
x 1,769,734 ops/sec ±0.39% (87 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'lng', length: 1000000, type: Float32
x 1,937,769 ops/sec ±0.54% (83 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 2,074,383 ops/sec ±0.45% (91 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Slice "tracks" vectors:
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 2,073,111 ops/sec ±0.47% (91 runs sampled)
avg: 0ms
0% of a frame @ 60FPS
Table Iterate "tracks":
length: 1000000
x 18.78 ops/sec ±1.00% (35 runs sampled)
avg: 53.24ms
319.44% of a frame @ 60FPS
DataFrame Count By "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>
x 383 ops/sec ±0.33% (90 runs sampled)
avg: 2.61ms
15.66% of a frame @ 60FPS
DataFrame Count By "tracks":
name: 'destination', length: 1000000, type: Dictionary<Int8, Utf8>
x 389 ops/sec ±0.95% (86 runs sampled)
avg: 2.57ms
15.42% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 58.41 ops/sec ±1.93% (60 runs sampled)
avg: 17.12ms
102.72% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 57.72 ops/sec ±1.97% (61 runs sampled)
avg: 17.33ms
103.98% of a frame @ 60FPS
DataFrame Filter-Scan Count "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 65.08 ops/sec ±0.66% (67 runs sampled)
avg: 15.37ms
92.22% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 137 ops/sec ±0.43% (76 runs sampled)
avg: 7.32ms
43.92% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 142 ops/sec ±0.45% (79 runs sampled)
avg: 7.05ms
42.3% of a frame @ 60FPS
DataFrame Direct Count "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 0.28 ops/sec ±15.00% (5 runs sampled)
avg: 3595.08ms
21570.48% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'lat', length: 1000000, type: Float32, test: gt, value: 0
x 27.00 ops/sec ±0.73% (48 runs sampled)
avg: 37.04ms
222.24% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'lng', length: 1000000, type: Float32, test: gt, value: 0
x 26.55 ops/sec ±1.14% (47 runs sampled)
avg: 37.67ms
226.02% of a frame @ 60FPS
DataFrame Filter-Iterate "tracks":
name: 'origin', length: 1000000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
x 40.91 ops/sec ±0.54% (54 runs sampled)
avg: 24.44ms
146.64% of a frame @ 60FPS
</pre>
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841541780
I'm closing this, as we agree that we don't want to remove the TextEncoder API.
Additional ideas for speedup:
Encode:
Use TextEncoder..encodeInto() and pre-allocate the space needed for the column
Decode:
TextDecode the full column (if possible) and iterate on the substrings based on the lengths stored in arrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] trxcllnt edited a comment on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
trxcllnt edited a comment on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-830286623
@alippai Yeah, we have a few string columns in the test data we generate via Python for the micro-benchmarks. I can send you the file I generated if it's a pain to run the Python script.
Related: we should probably move that data generation all to JS now that we have data generation in the JS tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-828219343
@kou I saw you removed the release tag 4.0.1. I don't know whether this is a bug or an improvement, however it fixes a regression which was introduced in 4.0.0 so including it in 4.0.1 (if released) would make sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] domoritz commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
domoritz commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841527729
Idk what causes the differences. @trxcllnt just ran a benchmark to compare buffer vs no buffer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alippai commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
alippai commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-841514959
Dictionary may work as well, indeed. My understanding was that the buffer -> string conversions happen only once for every dictionary value and it's cloned later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] trxcllnt commented on pull request #10181: ARROW-12578: [JS] Remove Buffer in favor of TextEncoder API in NodeJS
Posted by GitBox <gi...@apache.org>.
trxcllnt commented on pull request #10181:
URL: https://github.com/apache/arrow/pull/10181#issuecomment-830286623
Yeah, we have a few string columns in the test data we generate via Python for the micro-benchmarks. We should probably move that data generation all to JS now that we have data generation in the JS tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org