You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Paul Taylor (Jira)" <ji...@apache.org> on 2022/03/31 19:22:00 UTC
[jira] [Commented] (ARROW-15852) [JS] Table getByteLength and indexOf don't work
[ https://issues.apache.org/jira/browse/ARROW-15852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515537#comment-17515537 ]
Paul Taylor commented on ARROW-15852:
-------------------------------------
[~timhigins] Thanks for the report. In running your code, I discovered an oversight we made in the v7.0 refactor.
That said, I think your {{indexOf()}} call is incorrect – {{indexOf()}} is the inverse of {{get()}} such that this should assert true: {{table.indexOf(table.get(0)) === 0}}
In your case (looking up the index of a row), you want to pass the entire row contents to the {{table.indexOf()}} call like this:
{code:javascript}
const { tableFromArrays } = require('apache-arrow');
const t = tableFromArrays({
a: [0, 1, 2],
b: ["foo", "bar", "baz"]
});
console.log(t.getByteLength(0));
console.log(t.getByteLength(1));
console.log(t.indexOf({a: 0, b: "foo"}));
console.log(t.indexOf({a: 1, b: "bar"}));
console.log(t.indexOf({a: 2, b: "baz"}));
{code}
> [JS] Table getByteLength and indexOf don't work
> -----------------------------------------------
>
> Key: ARROW-15852
> URL: https://issues.apache.org/jira/browse/ARROW-15852
> Project: Apache Arrow
> Issue Type: Bug
> Components: JavaScript
> Affects Versions: 7.0.0
> Reporter: Timothy Higinbottom
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> The functions table.getByteLength() and table.indexOf() don't return the correct values.
> They are bound dynamically to the Table class, in a way I don't fully understand, with the following code:
> [https://github.com/apache/arrow/blob/1b796ec3f9caeb5e86e3348ba940bef8d95915c5/js/src/table.ts#L378-L390]
> The other functions like that, get(), set(), and isValid() all seem to work. However, getByteLength() and indexOf() return the placeholder/sentinel values of 0 and -1 respectively that are defined in the no-op code here: [https://github.com/apache/arrow/blob/1b796ec3f9caeb5e86e3348ba940bef8d95915c5/js/src/table.ts#L207-L221,] which I assume is to generate the right type definitions, and thus documentation.
> It's fairly simple for a user to implement the right logic themselves (at least for getByteLength) and it's a quick patch to define the functions normally instead of on the prototype, e.g.:
>
> {code:java}
> /**
> * Get the size in bytes of an element by index.
> * @param index The index at which to get the byteLength.
> */
> // @ts-ignore
> public getByteLength(index: number): number { return this.data[index].byteLength; }
> /**
> * Get the size in bytes of a table.
> */
> //@ts-ignore
> public getByteLength(): number {
> return this.data.map((batch) => batch.byteLength).reduce((sum, newLength) => sum + newLength);
> } {code}
> I'd be happy to send this as a PR if that's an OK alternative to the way it's currently implemented.
> Here's a Github repo of a minimal reproduction of the issue in NodeJS:
> [https://github.com/alexkreidler/apache-arrow-js-small-bug]
>
> And an observable notebook for in the browser (although I couldn't get ESM working): [https://observablehq.com/@08027ecfa2b2f7bb/arrow-7-canary]
>
> Thanks to all for your work on Arrow!
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)