You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Paul Taylor (Jira)" <ji...@apache.org> on 2021/02/05 05:51:00 UTC

[jira] [Comment Edited] (ARROW-10901) [JS] toArray delivers double length arrays in some cases

    [ https://issues.apache.org/jira/browse/ARROW-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279363#comment-17279363 ] 

Paul Taylor edited comment on ARROW-10901 at 2/5/21, 5:50 AM:
--------------------------------------------------------------

toArray() on the Numeric vector types returns a zero-copy TypedArray view over the underlying `data` buffer for vectors/single-chunk columns. For multi-chunk columns it copies the data from each chunk into a single contiguous buffer.

toArray() is a method to deserialize values from their binary Arrow representation to JS values, potentially at the cost of additional copy/deserialization. For example, Utf8Vector will return an Array of strings, DateVector will return an Array of Dates, etc.

If you want the numeric values of an IntVector, you can use the `.values` getter directly. This returns the underlying Vector's binary data as a JS typed array of the appropriate byte-width, excepting the 64-bit cases.

Not every environment implements the `BigInt64Array` and `BigUint64Array`. Since we want to support those environments, we've opted to return the 32-bit variants of the 64-bit Vector types.

If you're targeting only environments with BigInts, the `Int64Vector` and `Uint64Vector` have additional `.values64` getters that return `BigInt64Array` or `BigUint64Array` respectively. These getters will throw an error if called in an environment without BigInts.


was (Author: paul.e.taylor):
toArray() on the Numeric vector types returns a zero-copy TypedArray view over the underlying `data` buffer for vectors/single-chunk columns. For multi-chunk columns it copies the data from each chunk into a single contiguous buffer.

toArray() is a method to deserialize values from their binary Arrow representation to JS values. For example, Utf8Vector will return an Array of strings, DateVector will return an Array of Dates, etc.

If you want the numeric values of an IntVector, you can use the `.values` getter directly. This returns the underlying Vector's binary data as a JS typed array of the appropriate byte-width, excepting the 64-bit cases.

Not every environment implements the `BigInt64Array` and `BigUint64Array`. Since we want to support those environments, we've opted to return the 32-bit variants of the 64-bit Vector types.

If you're targeting only environments with BigInts, the `Int64Vector` and `Uint64Vector` have additional `.values64` getters that return `BigInt64Array` or `BigUint64Array` respectively. These getters will throw an error if called in an environment without BigInts.

> [JS] toArray delivers double length arrays in some cases
> --------------------------------------------------------
>
>                 Key: ARROW-10901
>                 URL: https://issues.apache.org/jira/browse/ARROW-10901
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>    Affects Versions: 2.0.0
>            Reporter: roland
>            Priority: Major
>         Attachments: Screen Shot 2020-12-14 at 3.34.24 PM.png, Screen Shot 2020-12-14 at 3.38.54 PM.png
>
>
> When calling `toArray` on a column, one would expect that a column of length 10, would give back an array of length 10. Instead, it sometimes gives back an array of length 20.
> I think this is the case for elements where the type is something like Int64, where it's not guaranteed JS will actually fit the number into it Float (which iirc is not 64 bit exactly). 
> At the same time, if I call `toArray`, I would expect the numbers to stay the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)