You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Paul Taylor (Jira)" <ji...@apache.org> on 2022/03/31 19:22:00 UTC

[jira] [Commented] (ARROW-15852) [JS] Table getByteLength and indexOf don't work

    [ https://issues.apache.org/jira/browse/ARROW-15852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515537#comment-17515537 ] 

Paul Taylor commented on ARROW-15852:
-------------------------------------

[~timhigins] Thanks for the report. In running your code, I discovered an oversight we made in the v7.0 refactor.

That said, I think your {{indexOf()}} call is incorrect – {{indexOf()}} is the inverse of {{get()}} such that this should assert true: {{table.indexOf(table.get(0)) === 0}}

In your case (looking up the index of a row), you want to pass the entire row contents to the {{table.indexOf()}} call like this:
{code:javascript}
const { tableFromArrays } = require('apache-arrow');

const t = tableFromArrays({
  a: [0, 1, 2],
  b: ["foo", "bar", "baz"]
});

console.log(t.getByteLength(0));
console.log(t.getByteLength(1));

console.log(t.indexOf({a: 0, b: "foo"}));
console.log(t.indexOf({a: 1, b: "bar"}));
console.log(t.indexOf({a: 2, b: "baz"}));
{code}

> [JS] Table getByteLength and indexOf don't work
> -----------------------------------------------
>
>                 Key: ARROW-15852
>                 URL: https://issues.apache.org/jira/browse/ARROW-15852
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>    Affects Versions: 7.0.0
>            Reporter: Timothy Higinbottom
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The functions table.getByteLength() and table.indexOf() don't return the correct values.
> They are bound dynamically to the Table class, in a way I don't fully understand, with the following code:
> [https://github.com/apache/arrow/blob/1b796ec3f9caeb5e86e3348ba940bef8d95915c5/js/src/table.ts#L378-L390]
> The other functions like that, get(), set(), and isValid() all seem to work.  However, getByteLength() and indexOf() return the placeholder/sentinel values of 0 and -1 respectively that are defined in the no-op code here: [https://github.com/apache/arrow/blob/1b796ec3f9caeb5e86e3348ba940bef8d95915c5/js/src/table.ts#L207-L221,] which I assume is to generate the right type definitions, and thus documentation.
> It's fairly simple for a user to implement the right logic themselves (at least for getByteLength) and it's a quick patch to define the functions normally instead of on the prototype, e.g.:
>  
> {code:java}
>     /**
>      * Get the size in bytes of an element by index.
>      * @param index The index at which to get the byteLength.
>      */
>     // @ts-ignore
>     public getByteLength(index: number): number { return this.data[index].byteLength; }
>     /**
>      * Get the size in bytes of a table.
>      */
>     //@ts-ignore
>     public getByteLength(): number { 
>         return this.data.map((batch) => batch.byteLength).reduce((sum, newLength) => sum + newLength);
>     } {code}
> I'd be happy to send this as a PR if that's an OK alternative to the way it's currently implemented. 
> Here's a Github repo of a minimal reproduction of the issue in NodeJS:
> [https://github.com/alexkreidler/apache-arrow-js-small-bug]
>  
> And an observable notebook for in the browser (although I couldn't get ESM working): [https://observablehq.com/@08027ecfa2b2f7bb/arrow-7-canary]
>  
> Thanks to all for your work on Arrow!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)