You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Adrien Hoarau (Jira)" <ji...@apache.org> on 2020/12/15 16:45:00 UTC
[jira] [Commented] (ARROW-10919) [Python] Wrong values with Table
slicing and conversion to/from pandas ExtensionArray
[ https://issues.apache.org/jira/browse/ARROW-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249789#comment-17249789 ]
Adrien Hoarau commented on ARROW-10919:
---------------------------------------
Seems to be a problem with the offsetting in the slice, as long as it's a view of the original structure. Forcing the slice to be a copy and not a ref to the original structure seems to solve it (though not ideal obviously). The way I force it currently is doing a roundtrip to/from parquet on the slice. Open to better suggestions :)
> [Python] Wrong values with Table slicing and conversion to/from pandas ExtensionArray
> -------------------------------------------------------------------------------------
>
> Key: ARROW-10919
> URL: https://issues.apache.org/jira/browse/ARROW-10919
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 2.0.0
> Environment: INSTALLED VERSIONS
> ------------------
> commit : b5958ee1999e9aead1938c0bba2b674378807b3d
> python : 3.8.6.final.0
> python-bits : 64
> OS : Linux
> OS-release : 5.4.0-58-generic
> Version : #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020
> machine : x86_64
> processor : x86_64
> byteorder : little
> LC_ALL : None
> LANG : en_US.UTF-8
> LOCALE : en_US.UTF-8
> pandas : 1.1.5
> numpy : 1.19.4
> pytz : 2020.4
> dateutil : 2.8.1
> pip : 20.2.1
> setuptools : 49.2.1
> Cython : None
> pytest : 5.4.3
> hypothesis : None
> sphinx : None
> blosc : None
> feather : None
> xlsxwriter : None
> lxml.etree : None
> html5lib : None
> pymysql : None
> psycopg2 : None
> jinja2 : None
> IPython : None
> pandas_datareader: None
> bs4 : None
> bottleneck : None
> fsspec : 0.8.4
> fastparquet : None
> gcsfs : None
> matplotlib : None
> numexpr : None
> odfpy : None
> openpyxl : None
> pandas_gbq : None
> pyarrow : 2.0.0
> pytables : None
> pyxlsb : None
> s3fs : 0.4.2
> scipy : None
> sqlalchemy : None
> tables : None
> tabulate : None
> xarray : None
> xlrd : None
> xlwt : None
> numba : None
> Reporter: Adrien Hoarau
> Priority: Major
> Attachments: Screenshot from 2020-12-15 13-28-38.png
>
>
>
> {code:java}
> import pandas as pd
> from pyarrow import Table
> df = pd.DataFrame({'int_na': [0, None, 2, 3, None, 5, 6, None, 8]}, dtype=pd.Int64Dtype())
> print(df)
> {code}
> int_na
> 0 0
> 1 <NA>
> 2 2
> 3 3
> 4 <NA>
> 5 5
> 6 6
> 7 <NA>
> 8 8
> {code:java}
> Table.from_pandas(df).slice(2, None).to_pandas()
> {code}
> int_na
> 0 2
> 1 <NA>
> 2 1
> 3 5
> 4 <NA>
> 5 1
> 6 8
--
This message was sent by Atlassian Jira
(v8.3.4#803005)