You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "bluehat974 (via GitHub)" <gi...@apache.org> on 2023/09/25 13:42:42 UTC

[GitHub] [arrow] bluehat974 opened a new issue, #37856: Improve JS documentation on how to read/deserialize arrow data

bluehat974 opened a new issue, #37856:
URL: https://github.com/apache/arrow/issues/37856

   ### Describe the enhancement requested
   
   cc @domoritz 
   
   Current JS documentation is not clear on how to read & manipulate the data from Apache Arrow JS
   
   JS version of Apache Arrow is used in JS environment (DuckDB Wasm, ObservableHQ, Arquero) 
   and people are asking on how to properly read the data, but there is no clear answer
   https://github.com/duckdb/duckdb-wasm/pull/1418
   
   There is some documentation to read arrow data or deserialize to JSON
   https://duckdb.org/docs/api/wasm/query.html#arrow-table-to-json
   https://observablehq.com/@theneuralbit/using-apache-arrow-js-with-large-datasets
   
   but this examples should be unified to the original Apache Arrow JS documentation
   https://github.com/apache/arrow/blob/main/js/README.md
   
   Some ideas of code example to provide to the documentation:
   * Best way to read data without deserialize into JSON version
   * Explain how to take advantage of JS Proxy to read data faster instead of deserialize to JSON
   * If serialization is required, how to do it properly
   * How to convert column to row 
   * How to read nested type (STRUCT, MAP, DICTIONNARY...)
   * How to cast arrow type (from DECIMAL to DOUBLE)
   * How to cast arrow type (LONG, DOUBLE, DECIMAL) to desired js type (bigint, number, string...)
   
   
   
   ### Component(s)
   
   Documentation, JavaScript


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [JS] Improve JS documentation on how to read/deserialize arrow data [arrow]

Posted by "kevinschaich (via GitHub)" <gi...@apache.org>.
kevinschaich commented on issue #37856:
URL: https://github.com/apache/arrow/issues/37856#issuecomment-2083316617

   100% agree on points mentioned above. I'm also curious if there is built-in Arrow functionality to handle casting to native Javascript types.
   
   My workaround:
   
   ```typescript
   import { Table } from 'apache-arrow'
   import { mapValues } from 'lodash'
   
   export const arrowTableToRecords = (arrow: Table): Record<string, any>[] => {
       // this does not handle BigInts, can't override prototype because it refers to private symbol
       // const after = arrow.toArray().map((row) => row.toJSON())
   
       return arrow.toArray().map((obj: object) => {
           return mapValues(obj, (v: any) => {
               if (typeof v === 'bigint') {
                   if (v < Number.MIN_SAFE_INTEGER || v > Number.MAX_SAFE_INTEGER) {
                       throw new TypeError(`${v} is not safe to convert to a number.`)
                   }
                   return Number(v)
               }
               return v
           })
       })
   }
   ```
   
   LMK if others have a better way to do this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [JS] Improve JS documentation on how to read/deserialize arrow data [arrow]

Posted by "domoritz (via GitHub)" <gi...@apache.org>.
domoritz commented on issue #37856:
URL: https://github.com/apache/arrow/issues/37856#issuecomment-2083369395

   I'm thinking about adding a way to tell arrow that you want data to be returned in more compatible types (e.g. arrays of numbers instead of bigints, numbers instead of decimal objects). It's not there yet but I think `toArray` is often not generating what people want. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org