You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/08/07 18:39:01 UTC

[jira] [Resolved] (ARROW-1247) [Python] pyarrow causes python to crash errors on parquet.dll

     [ https://issues.apache.org/jira/browse/ARROW-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney resolved ARROW-1247.
---------------------------------
    Resolution: Cannot Reproduce
      Assignee: Wes McKinney

Thanks! If you run into a reproducible failure please reopen the issue so we can investigate

> [Python] pyarrow causes python to crash errors on parquet.dll
> -------------------------------------------------------------
>
>                 Key: ARROW-1247
>                 URL: https://issues.apache.org/jira/browse/ARROW-1247
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.4.1
>         Environment: Python Version:
> 3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
> Windows Edition: Windows Server 2012 R2
>            Reporter: Aditi Breed
>            Assignee: Wes McKinney
>
> Hello,
>       I have a script which fetches data, and stores the data in Pandas dataframe.
> I make 3 aggregations of data, MEAN/STDEV/MAX, each of which are converted to an arrow table and saved on the disk as a parquet file.
> This code works just fine for 100-500 records, but errors out for bigger volume. I also know this code works because another developer is using the same code on a mirrored machine ( in terms of hardware ) and it works.
> The order of the dataset I am trying to save is millions.
> The code errors out @ line 	pq.write_table(arrowTable, filePath).
> Here is the code:
>     arrowTable = pa.Table.from_pandas(self.grpByMeanDS2)
> 	
> 	begintime = datetime.now()
> 	begintime_str = begintime.strftime("%Y%m%d%I%M%S")		
> 	
> 	filePath = SaveFileLoc + "\\Raw\\" + agg + "Data" + begintime_str + ".parq"
> 	print('Begin Saving File')
> 	pq.write_table(arrowTable, filePath)
> 	print('Done Saving File')
> 	
> 	print('Appending FilePath to List')
> 	self.listspDF.append(filePath)
> 	print('Done Appending FilePath to List')
> 	
> Python crashes and throws a "python has to close error".
> Following is the detailed error:
> ------------------
> Problem Event Name:                        APPCRASH
>   Application Name:                           python.exe
>   Application Version:                        3.5.2150.1013
>   Application Timestamp:                  577be340
>   Fault Module Name:                        parquet.dll
>   Fault Module Version:                     0.0.0.0
>   Fault Module Timestamp:               59403662
>   Exception Code:                               c0000005
>   Exception Offset:                              000000000005f990
>   OS Version:                                       6.3.9600.2.0.0.400.8
>   Locale ID:                                          1033
> Read our privacy statement online:
>   http://go.microsoft.com/fwlink/?linkid=280262
> If the online privacy statement is not available, please read our privacy statement offline:
>   C:\Windows\system32\en-US\erofflps.txt
> --------------------------------------------
> I have tried updating Python and pyarrow, with no luck.
> Following is the version of python:
>     import sys
>     print (sys.version)
>     3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
> Following are results of pip freeze:
> 	alabaster==0.7.9
> 	anaconda-clean==1.0
> 	anaconda-client==1.5.1
> 	anaconda-navigator==1.3.1
> 	argcomplete==1.0.0
> 	astroid==1.4.7
> 	astropy==2.0
> 	Babel==2.3.4
> 	backports.shutil-get-terminal-size==1.0.0
> 	beautifulsoup4==4.5.1
> 	bitarray==0.8.1
> 	blaze==0.10.1
> 	bokeh==0.12.2
> 	boto==2.42.0
> 	Bottleneck==1.2.1
> 	cffi==1.7.0
> 	chest==0.2.3
> 	click==6.6
> 	cloudpickle==0.2.1
> 	clyent==1.2.2
> 	colorama==0.3.7
> 	comtypes==1.1.2
> 	conda==4.3.22
> 	conda-build==2.0.2
> 	configobj==5.0.6
> 	contextlib2==0.5.3
> 	cryptography==1.5
> 	cycler==0.10.0
> 	Cython==0.24.1
> 	cytoolz==0.8.0
> 	dask==0.11.0
> 	datashape==0.5.2
> 	decorator==4.0.10
> 	dill==0.2.5
> 	docutils==0.12
> 	dynd===c328ab7
> 	et-xmlfile==1.0.1
> 	fastcache==1.0.2
> 	filelock==2.0.6
> 	Flask==0.11.1
> 	Flask-Cors==2.1.2
> 	gevent==1.1.2
> 	greenlet==0.4.10
> 	h5py==2.7.0
> 	HeapDict==1.0.0
> 	idna==2.1
> 	imageio==2.2.0
> 	imagesize==0.7.1
> 	ipykernel==4.5.0
> 	ipython==5.1.0
> 	ipython-genutils==0.1.0
> 	ipywidgets==5.2.2
> 	itsdangerous==0.24
> 	jdcal==1.2
> 	jedi==0.9.0
> 	Jinja2==2.8
> 	jsonschema==2.5.1
> 	jupyter==1.0.0
> 	jupyter-client==4.4.0
> 	jupyter-console==5.0.0
> 	jupyter-core==4.2.0
> 	lazy-object-proxy==1.2.1
> 	llvmlite==0.19.0
> 	locket==0.2.0
> 	lxml==3.6.4
> 	MarkupSafe==0.23
> 	matplotlib==2.0.2
> 	menuinst==1.4.1
> 	mistune==0.7.3
> 	mpmath==0.19
> 	multipledispatch==0.4.8
> 	nb-anacondacloud==1.2.0
> 	nb-conda==2.0.0
> 	nb-conda-kernels==2.0.0
> 	nbconvert==4.2.0
> 	nbformat==4.1.0
> 	nbpresent==3.0.2
> 	networkx==1.11
> 	nltk==3.2.1
> 	nose==1.3.7
> 	notebook==4.2.3
> 	numba==0.34.0
> 	numexpr==2.6.2
> 	numpy==1.13.1
> 	odo==0.5.0
> 	openpyxl==2.3.2
> 	pandas==0.20.2
> 	partd==0.3.6
> 	path.py==0.0.0
> 	pathlib2==2.1.0
> 	patsy==0.4.1
> 	pep8==1.7.0
> 	pickleshare==0.7.4
> 	Pillow==3.3.1
> 	pkginfo==1.3.2
> 	ply==3.9
> 	prompt-toolkit==1.0.3
> 	psutil==4.3.1
> 	py==1.4.31
> 	py4j==0.10.4
> 	pyarrow==0.4.1
> 	pyasn1==0.1.9
> 	pycosat==0.6.1
> 	pycparser==2.14
> 	pycrypto==2.6.1
> 	pycurl==7.43.0
> 	pyflakes==1.3.0
> 	Pygments==2.1.3
> 	pyidealdata==0.7.0
> 	pylint==1.5.4
> 	pyodbc==4.0.17
> 	pyOpenSSL==16.2.0
> 	pyparsing==2.1.4
> 	pyspark==2.1.0+hadoop2.7
> 	pytest==2.9.2
> 	python-dateutil==2.5.3
> 	pytz==2016.6.1
> 	PyUber==1.4.4
> 	PyWavelets==0.5.2
> 	pywin32==220
> 	PyYAML==3.12
> 	pyzmq==15.4.0
> 	QtAwesome==0.3.3
> 	qtconsole==4.2.1
> 	QtPy==1.1.2
> 	requests==2.14.2
> 	rope-py3k==0.9.4.post1
> 	ruamel-yaml===-VERSION
> 	scikit-image==0.13.0
> 	scikit-learn==0.18.2
> 	scipy==0.19.1
> 	simplegeneric==0.8.1
> 	singledispatch==3.4.0.3
> 	six==1.10.0
> 	snowballstemmer==1.2.1
> 	sockjs-tornado==1.0.3
> 	sphinx==1.4.6
> 	spyder==3.0.0
> 	SQLAlchemy==1.0.13
> 	statsmodels==0.8.0
> 	sympy==1.0
> 	tables==3.2.2
> 	toolz==0.8.0
> 	tornado==4.4.1
> 	traitlets==4.3.0
> 	unicodecsv==0.14.1
> 	wcwidth==0.1.7
> 	Werkzeug==0.11.11
> 	widgetsnbextension==1.2.6
> 	win-unicode-console==0.5
> 	wrapt==1.10.6
> 	xlrd==1.0.0
> 	XlsxWriter==0.9.3
> 	xlwings==0.10.0
> 	xlwt==1.1.2
> I was wondering if someone could shed light why pyarrow would not work on a certain machine ?
> Thanks,
> Adu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)