You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "bjornjorgensen (via GitHub)" <gi...@apache.org> on 2023/04/08 19:27:13 UTC
[GitHub] [spark] bjornjorgensen commented on pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect
bjornjorgensen commented on PR #40525:
URL: https://github.com/apache/spark/pull/40525#issuecomment-1500961292
@itholic Thank you, great work :)
After this PR
`from pyspark import pandas as ps `
ModuleNotFoundError Traceback (most recent call last)
File /opt/spark/python/pyspark/sql/connect/utils.py:45, in require_minimum_grpc_version()
44 try:
---> 45 import grpc
46 except ImportError as error:
ModuleNotFoundError: No module named 'grpc'
The above exception was the direct cause of the following exception:
ImportError Traceback (most recent call last)
Cell In[1], line 11
9 import pyarrow
10 from pyspark import SparkConf, SparkContext
---> 11 from pyspark import pandas as ps
12 from pyspark.sql import SparkSession
13 from pyspark.sql.functions import col, concat, concat_ws, expr, lit, trim
File /opt/spark/python/pyspark/pandas/__init__.py:59
50 warnings.warn(
51 "'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to "
52 "set this environment variable to '1' in both driver and executor sides if you use "
(...)
55 "already launched."
56 )
57 os.environ["PYARROW_IGNORE_TIMEZONE"] = "1"
---> 59 from pyspark.pandas.frame import DataFrame
60 from pyspark.pandas.indexes.base import Index
61 from pyspark.pandas.indexes.category import CategoricalIndex
File /opt/spark/python/pyspark/pandas/frame.py:88
85 from pyspark.sql.window import Window
87 from pyspark import pandas as ps # For running doctests and reference resolution in PyCharm.
---> 88 from pyspark.pandas._typing import (
89 Axis,
90 DataFrameOrSeries,
91 Dtype,
92 Label,
93 Name,
94 Scalar,
95 T,
96 GenericColumn,
97 )
98 from pyspark.pandas.accessors import PandasOnSparkFrameMethods
99 from pyspark.pandas.config import option_context, get_option
File /opt/spark/python/pyspark/pandas/_typing.py:25
22 from pandas.api.extensions import ExtensionDtype
24 from pyspark.sql.column import Column as PySparkColumn
---> 25 from pyspark.sql.connect.column import Column as ConnectColumn
26 from pyspark.sql.dataframe import DataFrame as PySparkDataFrame
27 from pyspark.sql.connect.dataframe import DataFrame as ConnectDataFrame
File /opt/spark/python/pyspark/sql/connect/column.py:19
1 #
2 # Licensed to the Apache Software Foundation (ASF) under one or more
3 # contributor license agreements. See the NOTICE file distributed with
(...)
15 # limitations under the License.
16 #
17 from pyspark.sql.connect.utils import check_dependencies
---> 19 check_dependencies(__name__)
21 import datetime
22 import decimal
File /opt/spark/python/pyspark/sql/connect/utils.py:35, in check_dependencies(mod_name)
33 require_minimum_pandas_version()
34 require_minimum_pyarrow_version()
---> 35 require_minimum_grpc_version()
File /opt/spark/python/pyspark/sql/connect/utils.py:47, in require_minimum_grpc_version()
45 import grpc
46 except ImportError as error:
---> 47 raise ImportError(
48 "grpc >= %s must be installed; however, " "it was not found." % minimum_grpc_version
49 ) from error
50 if LooseVersion(grpc.__version__) < LooseVersion(minimum_grpc_version):
51 raise ImportError(
52 "gRPC >= %s must be installed; however, "
53 "your version was %s." % (minimum_grpc_version, grpc.__version__)
54 )
ImportError: grpc >= 1.48.1 must be installed; however, it was not found.
`pip install grpc`
Collecting grpc
Downloading grpc-1.0.0.tar.gz (5.2 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-vp4d8s4c/grpc_c0f1992ad8f7456b8ac09ecbaeb81750/setup.py", line 33, in <module>
raise RuntimeError(HINT)
RuntimeError: Please install the official package with: pip install grpcio
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Note: you may need to restart the kernel to use updated packages.
After
`pip install grpcio`
then it works.
I don't think that ever pandas users that try pandas API on spark will use spark connect. So can we change this back?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org