You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Håkon Magne Holmen (Jira)" <ji...@apache.org> on 2022/10/02 22:05:00 UTC

[jira] [Created] (ARROW-17913) feather.read_table 150x slower when reading columns in newer versions

Håkon Magne Holmen created ARROW-17913:
------------------------------------------

             Summary: feather.read_table 150x slower when reading columns in newer versions
                 Key: ARROW-17913
                 URL: https://issues.apache.org/jira/browse/ARROW-17913
             Project: Apache Arrow
          Issue Type: Bug
    Affects Versions: 9.0.0, 8.0.0, 7.0.0
         Environment: python 3.9, ubuntu 20.04
            Reporter: Håkon Magne Holmen


h3. Description

Performance when reading columns using {{feather.read_table}} on Arrow 7.0.0-9.0.0 is drastically slower than it was in 6.0.0.

Profiling the code below shows that the bottleneck is somewhere in the {{read_names}} function of {{pyarrow._feather.FeatherReader}}.

h5. Example

Setup code:

{code}
import pandas as pd
from pyarrow import feather

rows, cols = (1_000_000, 10)
data = {f'c{c}': range(rows) for c in range(cols)}
df = pd.DataFrame(data=data)

feather.write_feather(df, 'test.feather', compression="uncompressed"){code} 

Benchmarks Arrow 9.0.0:

{code}
%timeit feather.read_table('test.feather', memory_map=True)
%timeit feather.read_table('test.feather', columns=list(df.columns), memory_map=True)

> 178 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
33.8 ms ± 964 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
{code}

Benchmarks Arrow 6.0.0:

{code}
%timeit feather.read_table('test.feather', memory_map=True)
%timeit feather.read_table('test.feather', columns=list(df.columns), memory_map=True)

> 173 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
224 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)