You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/16 16:29:02 UTC

[jira] [Commented] (ARROW-1990) [JS] Add "DataFrame" object

    [ https://issues.apache.org/jira/browse/ARROW-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327337#comment-16327337 ] 

ASF GitHub Bot commented on ARROW-1990:
---------------------------------------

TheNeuralBit opened a new pull request #1482: ARROW-1990: [JS] Add "DataFrame" object
URL: https://github.com/apache/arrow/pull/1482
 
 
   This PR moves the `Table` class out of the Vector hierarchy and adds optimized dataframe operations to it. Currently implements an optimized `scan()` method, `filter(predicate)`, `count()`, and `countBy(column_name)` (only works on dictionary-encoded columns).
   
   Some usage examples, based on the file generated by `js/data/test/tables/generate.py`:
   ``` js
   > let table = Table.from(...);
   > let table = Table.from([fs.readFileSync('./test/data/tables/generated.arrow')])
   undefined
   > table.count()
   1000000
   > table.filter(col('lat').gteq(0)).count()
   499718
   > table.countBy('origin').asJSON()
   { Charlottesville: 166839,
     'New York': 166251,
     'San Francisco': 166642,
     Seattle: 166659,
     'Terre Haute': 166756,
     'Washington, DC': 166853 }
   > table.filter(col('lng').gteq(0)).countBy('origin').asJSON()
   { Charlottesville: 83109,
     'New York': 83221,
     'San Francisco': 83515,
     Seattle: 83362,
     'Terre Haute': 83314,
     'Washington, DC': 83479 }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> [JS] Add "DataFrame" object
> ---------------------------
>
>                 Key: ARROW-1990
>                 URL: https://issues.apache.org/jira/browse/ARROW-1990
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: JavaScript
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>            Priority: Major
>              Labels: pull-request-available
>
> Add a TypeScript class that can perform optimized dataframe operations on an arrow {{Table}} and/or {{StructVector}}. Initially this should include operations like filtering, counting, and scanning. Eventually this class could include more operations like sorting, count by/group by, etc...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)