You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2022/01/14 06:02:00 UTC

[jira] [Commented] (SPARK-37882) pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values

    [ https://issues.apache.org/jira/browse/SPARK-37882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475967#comment-17475967 ] 

Hyukjin Kwon commented on SPARK-37882:
--------------------------------------

[~mattvan83] mind providing self-contained reproducer?

> pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values
> ---------------------------------------------------------------------
>
>                 Key: SPARK-37882
>                 URL: https://issues.apache.org/jira/browse/SPARK-37882
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.2.0
>         Environment: Ubuntu 18.04
>            Reporter: Matthieu Vanhoutte
>            Priority: Major
>
> Hello,
> When trying to convert a pandas dataframe 
> {code:java}
> ss_corpus_dataframe{code}
>  (containing one column with two-dimensional numpy array) into a pandas-on-spark dataframe with the following code:
> {code:java}
> df = ps.from_pandas(ss_corpus_dataframe){code}
> I got the following error:
> {code:java}
> Traceback (most recent call last):
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 375, in run_asgi
>     result = await app(self.scope, self.receive, self.send)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
>     return await self.app(scope, receive, send)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py", line 82, in __call__
>     raise exc from None
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/uvicorn/middleware/message_logger.py", line 78, in __call__
>     await self.app(scope, inner_receive, inner_send)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/applications.py", line 208, in __call__
>     await super().__call__(scope, receive, send)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
>     await self.middleware_stack(scope, receive, send)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
>     raise exc
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
>     await self.app(scope, receive, _send)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
>     raise exc
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
>     await self.app(scope, receive, sender)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py", line 656, in __call__
>     await route.handle(scope, receive, send)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py", line 259, in handle
>     await self.app(scope, receive, send)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/starlette/routing.py", line 61, in app
>     response = await func(request)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
>     raw_response = await run_endpoint_function(
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/fastapi/routing.py", line 159, in run_endpoint_function
>     return await dependant.call(**values)
>   File "./app/routers/semantic_searches.py", line 60, in create_semantic_search
>     date_time_sem_search, clean_query, output_dict, error_code = await apply_semantic_search_async(query=query, api_sent_embed_url=settings.api_sent_embed_address, ss_corpus_dataframe=ss_corpus_dataframe.dataframe, id_matrices=id_matrices, top_k=75, similarity_score_thresh=0.5)
>   File "./app/backend/semantic_search/sts_tf_semantic_search.py", line 134, in apply_semantic_search_async
>     df = ps.from_pandas(ss_corpus_dataframe)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/namespace.py", line 143, in from_pandas
>     return DataFrame(pobj)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/frame.py", line 520, in __init__
>     internal = InternalFrame.from_pandas(pdf)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/internal.py", line 1460, in from_pandas
>     ) = InternalFrame.prepare_pandas_frame(pdf)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/internal.py", line 1533, in prepare_pandas_frame
>     spark_type = infer_pd_series_spark_type(reset_index[col], dtype)
>   File "/home/matthieu/anaconda3/envs/sts-transformers-gpu-fresh/lib/python3.8/site-packages/pyspark/pandas/typedef/typehints.py", line 329, in infer_pd_series_spark_type
>     return from_arrow_type(pa.Array.from_pandas(pser).type)
>   File "pyarrow/array.pxi", line 904, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 302, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values{code}
> Could it be possible to add the possibility to convert multi-dimensional array values from pandas to pandas-on-spark?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org