You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/17 17:06:27 UTC

[GitHub] [arrow] trxcllnt edited a comment on pull request #2035: ARROW-2116: [JS] implement IPC writers

trxcllnt edited a comment on pull request #2035:
URL: https://github.com/apache/arrow/pull/2035#issuecomment-694369864


   @t829702 no, the Arrow JSON IPC representation is only used to validate integration tests between the different Arrow implementations. It is _not_ an optimized or ergonomic way to interact with Arrow.
   
   This [`csv-to-arrow-js` example](https://github.com/trxcllnt/csv-to-arrow-js/blob/f2596045474ce1742e3089da48a5c83a6005be90/index.js#L28-L38) is closer to what you'd need. It uses a CSV parsing library to yield rows as JSON objects, then transforms the JSON rows into Arrow RecordBatches, which are then serialized and flushed to stdout.
   
   There are a few strategies to convert arbitrary JavaScript types into Arrow tables, and the strategy you pick depends on your needs. They all use the Builder classes under the hood, and generally follow this pattern:
   1. Define the types of the data you will be constructing
   2. Construct a Builder for that type (via `Builder.new()`, or related stream equivalents)
   3. Write values to the Builder
   4. Flush the Builder to yield a Vector of the values written up to that point
   5. Repeat steps 3 and 4 as necessary for all your data
   6. Once all your data has been serialized, call `builder.finish()` to yield the last chunk
   
   See [this comment](https://github.com/apache/arrow/blob/7a532edeabc6f30838e5a53dfef35f37fdf99737/js/src/builder.ts#L88-L104) on the Builder constructor for a basic example, or the [`throughIterable()` implementation](https://github.com/apache/arrow/blob/7a532edeabc6f30838e5a53dfef35f37fdf99737/js/src/builder.ts#L501-L510) for how to handle things like flushing after reaching a row count or byte size `highWaterMark`. The [Builder tests](https://github.com/apache/arrow/blob/7a532edeabc6f30838e5a53dfef35f37fdf99737/js/test/unit/builders/builder-tests.ts#L234-L269) also include examples of using iterables, asyncIterables, DOM, and node streams.
   
   I also have this [higher-level example](https://codepen.io/trxcllnt/pen/NWPMpPN?editors=0010) that uses the `Vector.from()` method to convert existing in-memory JSON data into an Arrow StructVector (and `Vector.from()` [uses](https://github.com/apache/arrow/blob/7a532edeabc6f30838e5a53dfef35f37fdf99737/js/src/vector/index.ts#L120-L121) the Builder classes internally).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org