You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Yosuke Shiro (Jira)" <ji...@apache.org> on 2019/12/31 01:04:00 UTC

[jira] [Resolved] (ARROW-7474) [Ruby] Save CSV files faster

     [ https://issues.apache.org/jira/browse/ARROW-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yosuke Shiro resolved ARROW-7474.
---------------------------------
    Fix Version/s: 1.0.0
       Resolution: Fixed

Issue resolved by pull request 6106
[https://github.com/apache/arrow/pull/6106]

> [Ruby] Save CSV files faster
> ----------------------------
>
>                 Key: ARROW-7474
>                 URL: https://issues.apache.org/jira/browse/ARROW-7474
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Ruby
>            Reporter: kojix2
>            Assignee: Kouhei Sutou
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>         Attachments: arrow.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hi developers
> Saving Arrow::Table in CSV format may be slow.
> Ad hoc benchmarks...
>  
> {code:ruby}
>  
> require 'arrow'
> require 'csv'
> require 'gr/plot'
> t = Arrow::Table.load('some_nice.tsv', format: :csv, delimiter: "\t".ord)
> n = 1.step(1000, 100).to_a
> arrow_save_times = []
> csv_save_times = []
> n.each do |i|
>  t2 = t.slice(0, i)
> start = Time.now
>  t2.save('test.csv')
>  arrow_save_times << p(Time.now - start)
> t2 = t.raw_records
> start = Time.now
>  CSV.open('test2.csv', 'w') do |csv|
>  t2.each do |r|
>  csv << r
>  end
>  end
>  csv_save_times << p(Time.now - start)
> end
> GR.stem([n, arrow_save_times], [n, csv_save_times],
>  labels: ["arrow", "CSV"], xlabel: "lines", ylabel: "time", location: 2)
> GR.savefig("arrow.png")
> gets
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)