You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2022/01/14 06:02:00 UTC
[jira] [Commented] (SPARK-37882) pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values
[ https://issues.apache.org/jira/browse/SPARK-37882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475967#comment-17475967 ]
Hyukjin Kwon commented on SPARK-37882:
--------------------------------------
[~mattvan83] mind providing self-contained reproducer?
> pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values
> ---------------------------------------------------------------------
>
> Key: SPARK-37882
> URL: https://issues.apache.org/jira/browse/SPARK-37882
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 3.2.0
> Environment: Ubuntu 18.04
> Reporter: Matthieu Vanhoutte
> Priority: Major
>
> Hello,
> When trying to convert a pandas dataframe
> {code:java}
> ss_corpus_dataframe{code}
> (containing one column with two-dimensional numpy array) into a pandas-on-spark dataframe with the following code:
> {code:java}
> df = ps.from_pandas(ss_corpus_dataframe){code}
> I got the following error:
> {code:java}
> Traceback (most recent call last):
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 375, in run_asgi
> result = await app(self.scope, self.receive, self.send)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
> return await self.app(scope, receive, send)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py", line 82, in __call__
> raise exc from None
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py", line 78, in __call__
> await self.app(scope, inner_receive, inner_send)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/applications.py", line 208, in __call__
> await super().__call__(scope, receive, send)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
> await self.middleware_stack(scope, receive, send)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
> raise exc
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
> await self.app(scope, receive, _send)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
> raise exc
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
> await self.app(scope, receive, sender)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py", line 656, in __call__
> await route.handle(scope, receive, send)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py", line 259, in handle
> await self.app(scope, receive, send)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py", line 61, in app
> response = await func(request)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
> raw_response = await run_endpoint_function(
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/routing.py", line 159, in run_endpoint_function
> return await dependant.call(**values)
> File "./app/routers/semantic_searches.py", line 60, in create_semantic_search
> date_time_sem_search, clean_query, output_dict, error_code = await apply_semantic_search_async(query=query, api_sent_embed_url=settings.api_sent_embed_address, ss_corpus_dataframe=ss_corpus_dataframe.dataframe, id_matrices=id_matrices, top_k=75, similarity_score_thresh=0.5)
> File "./app/backend/semantic_search/sts_tf_semantic_search.py", line 134, in apply_semantic_search_async
> df = ps.from_pandas(ss_corpus_dataframe)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/namespace.py", line 143, in from_pandas
> return DataFrame(pobj)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/frame.py", line 520, in __init__
> internal = InternalFrame.from_pandas(pdf)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/internal.py", line 1460, in from_pandas
> ) = InternalFrame.prepare_pandas_frame(pdf)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/internal.py", line 1533, in prepare_pandas_frame
> spark_type = infer_pd_series_spark_type(reset_index[col], dtype)
> File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/typedef/typehints.py", line 329, in infer_pd_series_spark_type
> return from_arrow_type(pa.Array.from_pandas(pser).type)
> File "pyarrow/array.pxi", line 904, in pyarrow.lib.Array.from_pandas
> File "pyarrow/array.pxi", line 302, in pyarrow.lib.array
> File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
> File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values{code}
> Could it be possible to add the possibility to convert multi-dimensional array values from pandas to pandas-on-spark?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org