You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/01/18 09:30:00 UTC
[jira] [Assigned] (SPARK-37930) Fix DataFrame select subset with duplicated columns
[ https://issues.apache.org/jira/browse/SPARK-37930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-37930:
------------------------------------
Assignee: Apache Spark
> Fix DataFrame select subset with duplicated columns
> ---------------------------------------------------
>
> Key: SPARK-37930
> URL: https://issues.apache.org/jira/browse/SPARK-37930
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.3.0
> Reporter: dch nguyen
> Assignee: Apache Spark
> Priority: Major
>
> pandas
> {code:java}
> >>> pdf
> a
> 0 1
> 1 2
> 2 3
> 3 4
> >>> pdf[['a', 'a']]
> a a
> 0 1 1
> 1 2 2
> 2 3 3
> 3 4 4 {code}
> pandas on spark
> {code:java}
> >>> psdf
> a
> 0 1
> 1 2
> 2 3
> 3 4
> >>> psdf[['a', 'a']]
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/u02/spark/python/pyspark/pandas/frame.py", line 12077, in __repr__
> pdf = self._get_or_create_repr_pandas_cache(max_display_count)
> File "/u02/spark/python/pyspark/pandas/frame.py", line 12068, in _get_or_create_repr_pandas_cache
> self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
> File "/u02/spark/python/pyspark/pandas/frame.py", line 12063, in _to_internal_pandas
> return self._internal.to_pandas_frame
> File "/u02/spark/python/pyspark/pandas/utils.py", line 576, in wrapped_lazy_property
> setattr(self, attr_name, fn(self))
> File "/u02/spark/python/pyspark/pandas/internal.py", line 1055, in to_pandas_frame
> return InternalFrame.restore_index(pdf, **self.arguments_for_restore_index)
> File "/u02/spark/python/pyspark/pandas/internal.py", line 1156, in restore_index
> pdf.columns = pd.Index(
> File "/u02/venv3.9-2/lib/python3.9/site-packages/pandas/core/generic.py", line 5500, in __setattr__
> return object.__setattr__(self, name, value)
> File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
> File "/u02/venv3.9-2/lib/python3.9/site-packages/pandas/core/generic.py", line 766, in _set_axis
> self._mgr.set_axis(axis, labels)
> File "/u02/venv3.9-2/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 216, in set_axis
> self._validate_set_axis(axis, new_labels)
> File "/u02/venv3.9-2/lib/python3.9/site-packages/pandas/core/internals/base.py", line 57, in _validate_set_axis
> raise ValueError(
> ValueError: Length mismatch: Expected axis has 4 elements, new values have 2 elements {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org