You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Bob (Jira)" <ji...@apache.org> on 2019/10/14 17:08:00 UTC
[jira] [Created] (ARROW-6876) Reading parquet file becomes really
slow for 0.15.0
Bob created ARROW-6876:
--------------------------
Summary: Reading parquet file becomes really slow for 0.15.0
Key: ARROW-6876
URL: https://issues.apache.org/jira/browse/ARROW-6876
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.15.0
Environment: python3.7
Reporter: Bob
Hi,
I just noticed that reading a parquet file becomes really slow after I upgraded to 0.15.0 when using pandas.
Example:
*With 0.14.1*
In [4]: %timeit df = pd.read_parquet(path)
2.02 s ± 47.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
*With 0.15.0*
In [5]: %timeit df = pd.read_parquet(path)
22.9 s ± 478 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The file is about 15MB in size. I am testing on the same machine using the same version of python and pandas.
Have you received similar complain? What could be the issue here?
Thanks a lot.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)