You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (Jira)" <ji...@apache.org> on 2021/03/02 18:12:00 UTC
[jira] [Commented] (SPARK-34463) toPandas failed with error: buffer
source array is read-only when Arrow with self-destruct is enabled
[ https://issues.apache.org/jira/browse/SPARK-34463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293893#comment-17293893 ]
Bryan Cutler commented on SPARK-34463:
--------------------------------------
As David said, it depends on what is done in Pandas that might lead to this. I'm not sure why `value_counts()` would cause this error, but other operations should work. I believe you could also workaround by making a copy of the DataFrame yourself. I think this example shows that the self descruct feature should be clearly documented to be experimental and only used if you know absolutely what you are doing.
> toPandas failed with error: buffer source array is read-only when Arrow with self-destruct is enabled
> -----------------------------------------------------------------------------------------------------
>
> Key: SPARK-34463
> URL: https://issues.apache.org/jira/browse/SPARK-34463
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.2.0
> Reporter: Weichen Xu
> Priority: Major
>
> Environment:
> apache/spark master
> pandas version > 1.0.5
> Reproduce code:
> {code:java}
> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
> spark.conf.set('spark.sql.execution.arrow.pyspark.selfDestruct.enabled', True)
> spark.createDataFrame(sc.parallelize([(i,) for i in range(13)], 1), 'id long').selectExpr('IF(id % 3==0, id+1, NULL) AS f1', '(id+1) % 2 AS label').toPandas()['label'].value_counts()
> {code}
> Get error like:
> {quote}Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/base.py", line 1033, in value_counts
> dropna=dropna,
> File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py", line 820, in value_counts
> keys, counts = value_counts_arraylike(values, dropna)
> File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/algorithms.py", line 865, in value_counts_arraylike
> keys, counts = f(values, dropna)
> File "pandas/_libs/hashtable_func_helper.pxi", line 1098, in pandas._libs.hashtable.value_count_int64
> File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
> File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
> ValueError: buffer source array is read-only
> {quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org