You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "ueshin (via GitHub)" <gi...@apache.org> on 2023/10/20 21:43:56 UTC

[PR] [SPARK-45619][CONNECT][PYTHON] Apply the observed metrics to Observation object [spark]

ueshin opened a new pull request, #43469:
URL: https://github.com/apache/spark/pull/43469

   ### What changes were proposed in this pull request?
   
   Apply the observed metrics to `Observation` object.
   
   ### Why are the changes needed?
   
   When using `Observation`, the observed metrics should be applied to the object.
   
   For example, in vanilla PySpark:
   
   ```py
   >>> df = spark.createDataFrame([["Alice", 2], ["Bob", 5]], ["name", "age"])
   >>> observation = Observation("my metrics")
   >>> observed_df = df.observe(observation, count(lit(1)).alias("count"), max(col("age")))
   >>> observed_df.count()
   2
   >>>
   >>> observation.get
   {'count': 2, 'max(age)': 5}
   ```
   
   whereas in Spark Connect, currently it fails with `PySparkNotImplementedError`:
   
   ```py
   >>> observation.get
   Traceback (most recent call last):
   ...
       raise PySparkNotImplementedError(
   pyspark.errors.exceptions.base.PySparkNotImplementedError: [NOT_IMPLEMENTED] Observation support for Spark Connect is not implemented.
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the `Observation` will see the changes.
   
   ### How was this patch tested?
   
   Added/modified the related tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45619][CONNECT][PYTHON] Apply the observed metrics to Observation object [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43469:
URL: https://github.com/apache/spark/pull/43469#discussion_r1367653717


##########
python/pyspark/sql/connect/client/core.py:
##########
@@ -945,7 +956,7 @@ def explain_string(self, plan: pb2.Plan, explain_mode: str = "extended") -> str:
         return result
 
     def execute_command(
-        self, command: pb2.Command
+        self, command: pb2.Command, observations: Dict[str, Observation]

Review Comment:
   I think observations should be optional.



##########
python/pyspark/sql/tests/connect/client/test_client.py:
##########
@@ -44,7 +44,7 @@ def test_user_agent_passthrough(self):
         client._stub = mock
 
         command = proto.Command()
-        client.execute_command(command)
+        client.execute_command(command, observations={})

Review Comment:
   Shall we avoid pass empty observations?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45619][CONNECT][PYTHON] Apply the observed metrics to Observation object [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #43469:
URL: https://github.com/apache/spark/pull/43469#issuecomment-1777209092

   @MaxGekk  python linter is fine in both OSS master and my local env:
   
   ```
   (spark_dev_311) ➜  spark git:(master) dev/lint-python
   starting python compilation test...
   python compilation succeeded.
   
   starting black test...
   black checks passed.
   
   starting flake8 test...
   flake8 checks passed.
   
   starting mypy annotations test...
   annotations passed mypy checks.
   
   starting mypy examples test...
   examples passed mypy checks.
   
   starting mypy data test...
   annotations passed data checks.
   
   
   all lint-python tests passed!
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45619][CONNECT][PYTHON] Apply the observed metrics to Observation object [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43469:
URL: https://github.com/apache/spark/pull/43469#issuecomment-1776210723

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45619][CONNECT][PYTHON] Apply the observed metrics to Observation object [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #43469:
URL: https://github.com/apache/spark/pull/43469#issuecomment-1777072652

   Can this PR cause the errors?
   ```
   annotations failed mypy checks:
   python/pyspark/sql/connect/client/core.py:808: error: "ObservedMetrics" has no attribute "keys"  [attr-defined]
   Found 1 error in 1 file (checked 709 source files)
   1
   Error: Process completed with exit code 1.
   ```
   and
   ```
   python/pyspark/sql/tests/connect/test_connect_basic.py.test_observe
   keys
   
   Traceback (most recent call last):
     File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_basic.py", line 1782, in test_observe
       self.connect.read.table(self.tbl_name)
     File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1730, in toPandas
       return self._session.client.to_pandas(query, self._plan.observations)
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 849, in to_pandas
       table, schema, metrics, observed_metrics, _ = self._execute_and_fetch(
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1301, in _execute_and_fetch
       for response in self._execute_and_fetch_as_iterator(req, observations):
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1279, in _execute_and_fetch_as_iterator
       self._handle_error(error)
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1527, in _handle_error
       raise error
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1272, in _execute_and_fetch_as_iterator
       yield from handle_response(b)
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1205, in handle_response
       for observed_metrics in self._build_observed_metrics(b.observed_metrics):
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 808, in <genexpr>
       return (PlanObservedMetrics(x.name, [v for v in x.values], list(x.keys)) for x in metrics)
   AttributeError: keys
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45619][CONNECT][PYTHON] Apply the observed metrics to Observation object [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43469: [SPARK-45619][CONNECT][PYTHON] Apply the observed metrics to Observation object
URL: https://github.com/apache/spark/pull/43469


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org