You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/08/07 18:39:01 UTC
[jira] [Resolved] (ARROW-1247) [Python] pyarrow causes python to
crash errors on parquet.dll
[ https://issues.apache.org/jira/browse/ARROW-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-1247.
---------------------------------
Resolution: Cannot Reproduce
Assignee: Wes McKinney
Thanks! If you run into a reproducible failure please reopen the issue so we can investigate
> [Python] pyarrow causes python to crash errors on parquet.dll
> -------------------------------------------------------------
>
> Key: ARROW-1247
> URL: https://issues.apache.org/jira/browse/ARROW-1247
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.4.1
> Environment: Python Version:
> 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
> Windows Edition: Windows Server 2012 R2
> Reporter: Aditi Breed
> Assignee: Wes McKinney
>
> Hello,
> I have a script which fetches data, and stores the data in Pandas dataframe.
> I make 3 aggregations of data, MEAN/STDEV/MAX, each of which are converted to an arrow table and saved on the disk as a parquet file.
> This code works just fine for 100-500 records, but errors out for bigger volume. I also know this code works because another developer is using the same code on a mirrored machine ( in terms of hardware ) and it works.
> The order of the dataset I am trying to save is millions.
> The code errors out @ line pq.write_table(arrowTable, filePath).
> Here is the code:
> arrowTable = pa.Table.from_pandas(self.grpByMeanDS2)
>
> begintime = datetime.now()
> begintime_str = begintime.strftime("%Y%m%d%I%M%S")
>
> filePath = SaveFileLoc + "\\Raw\\" + agg + "Data" + begintime_str + ".parq"
> print('Begin Saving File')
> pq.write_table(arrowTable, filePath)
> print('Done Saving File')
>
> print('Appending FilePath to List')
> self.listspDF.append(filePath)
> print('Done Appending FilePath to List')
>
> Python crashes and throws a "python has to close error".
> Following is the detailed error:
> ------------------
> Problem Event Name: APPCRASH
> Application Name: python.exe
> Application Version: 3.5.2150.1013
> Application Timestamp: 577be340
> Fault Module Name: parquet.dll
> Fault Module Version: 0.0.0.0
> Fault Module Timestamp: 59403662
> Exception Code: c0000005
> Exception Offset: 000000000005f990
> OS Version: 6.3.9600.2.0.0.400.8
> Locale ID: 1033
> Read our privacy statement online:
> http://go.microsoft.com/fwlink/?linkid=280262
> If the online privacy statement is not available, please read our privacy statement offline:
> C:\Windows\system32\en-US\erofflps.txt
> --------------------------------------------
> I have tried updating Python and pyarrow, with no luck.
> Following is the version of python:
> import sys
> print (sys.version)
> 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
> Following are results of pip freeze:
> alabaster==0.7.9
> anaconda-clean==1.0
> anaconda-client==1.5.1
> anaconda-navigator==1.3.1
> argcomplete==1.0.0
> astroid==1.4.7
> astropy==2.0
> Babel==2.3.4
> backports.shutil-get-terminal-size==1.0.0
> beautifulsoup4==4.5.1
> bitarray==0.8.1
> blaze==0.10.1
> bokeh==0.12.2
> boto==2.42.0
> Bottleneck==1.2.1
> cffi==1.7.0
> chest==0.2.3
> click==6.6
> cloudpickle==0.2.1
> clyent==1.2.2
> colorama==0.3.7
> comtypes==1.1.2
> conda==4.3.22
> conda-build==2.0.2
> configobj==5.0.6
> contextlib2==0.5.3
> cryptography==1.5
> cycler==0.10.0
> Cython==0.24.1
> cytoolz==0.8.0
> dask==0.11.0
> datashape==0.5.2
> decorator==4.0.10
> dill==0.2.5
> docutils==0.12
> dynd===c328ab7
> et-xmlfile==1.0.1
> fastcache==1.0.2
> filelock==2.0.6
> Flask==0.11.1
> Flask-Cors==2.1.2
> gevent==1.1.2
> greenlet==0.4.10
> h5py==2.7.0
> HeapDict==1.0.0
> idna==2.1
> imageio==2.2.0
> imagesize==0.7.1
> ipykernel==4.5.0
> ipython==5.1.0
> ipython-genutils==0.1.0
> ipywidgets==5.2.2
> itsdangerous==0.24
> jdcal==1.2
> jedi==0.9.0
> Jinja2==2.8
> jsonschema==2.5.1
> jupyter==1.0.0
> jupyter-client==4.4.0
> jupyter-console==5.0.0
> jupyter-core==4.2.0
> lazy-object-proxy==1.2.1
> llvmlite==0.19.0
> locket==0.2.0
> lxml==3.6.4
> MarkupSafe==0.23
> matplotlib==2.0.2
> menuinst==1.4.1
> mistune==0.7.3
> mpmath==0.19
> multipledispatch==0.4.8
> nb-anacondacloud==1.2.0
> nb-conda==2.0.0
> nb-conda-kernels==2.0.0
> nbconvert==4.2.0
> nbformat==4.1.0
> nbpresent==3.0.2
> networkx==1.11
> nltk==3.2.1
> nose==1.3.7
> notebook==4.2.3
> numba==0.34.0
> numexpr==2.6.2
> numpy==1.13.1
> odo==0.5.0
> openpyxl==2.3.2
> pandas==0.20.2
> partd==0.3.6
> path.py==0.0.0
> pathlib2==2.1.0
> patsy==0.4.1
> pep8==1.7.0
> pickleshare==0.7.4
> Pillow==3.3.1
> pkginfo==1.3.2
> ply==3.9
> prompt-toolkit==1.0.3
> psutil==4.3.1
> py==1.4.31
> py4j==0.10.4
> pyarrow==0.4.1
> pyasn1==0.1.9
> pycosat==0.6.1
> pycparser==2.14
> pycrypto==2.6.1
> pycurl==7.43.0
> pyflakes==1.3.0
> Pygments==2.1.3
> pyidealdata==0.7.0
> pylint==1.5.4
> pyodbc==4.0.17
> pyOpenSSL==16.2.0
> pyparsing==2.1.4
> pyspark==2.1.0+hadoop2.7
> pytest==2.9.2
> python-dateutil==2.5.3
> pytz==2016.6.1
> PyUber==1.4.4
> PyWavelets==0.5.2
> pywin32==220
> PyYAML==3.12
> pyzmq==15.4.0
> QtAwesome==0.3.3
> qtconsole==4.2.1
> QtPy==1.1.2
> requests==2.14.2
> rope-py3k==0.9.4.post1
> ruamel-yaml===-VERSION
> scikit-image==0.13.0
> scikit-learn==0.18.2
> scipy==0.19.1
> simplegeneric==0.8.1
> singledispatch==3.4.0.3
> six==1.10.0
> snowballstemmer==1.2.1
> sockjs-tornado==1.0.3
> sphinx==1.4.6
> spyder==3.0.0
> SQLAlchemy==1.0.13
> statsmodels==0.8.0
> sympy==1.0
> tables==3.2.2
> toolz==0.8.0
> tornado==4.4.1
> traitlets==4.3.0
> unicodecsv==0.14.1
> wcwidth==0.1.7
> Werkzeug==0.11.11
> widgetsnbextension==1.2.6
> win-unicode-console==0.5
> wrapt==1.10.6
> xlrd==1.0.0
> XlsxWriter==0.9.3
> xlwings==0.10.0
> xlwt==1.1.2
> I was wondering if someone could shed light why pyarrow would not work on a certain machine ?
> Thanks,
> Adu
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)