You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2023/07/17 10:25:49 UTC
[GitHub] [spark] zhengruifeng commented on pull request #42040: [WIP][SPARK-43611][SQL][PS][CONNCECT] Fix unexpected `AnalysisException` from Spark Connect client
zhengruifeng commented on PR #42040:
URL: https://github.com/apache/spark/pull/42040#issuecomment-1637817969
In https://github.com/apache/spark/pull/39925, we introduced a new mechanism to resolve expression with specified plan.
However, sometimes the plan ID might be eliminated by the analyzer, and then some expressions can not be correctly resolved, this issue is the No.1 blocker of PS on Connect.
Currently, I investigate the two examples [in the ticket](https://issues.apache.org/jira/browse/SPARK-43611) and check each rule applied to them.
example 1:
```
>>> import pyspark.pandas as ps
>>> psdf1 = ps.DataFrame({"A": [1, 2, 3]})
>>> psdf2 = ps.DataFrame({"B": [1, 2, 3]})
>>> psdf1.append(psdf2)
```
example 2:
```
import pyspark.pandas as ps
import pandas as pd
pdf = pd.DataFrame({"A": [None, 3, None, None], "B": [2, 4, None, 3], "C": [None, None, None, 1], "D": [0, 1, 5, 4],}, columns=["A", "B", "C", "D"],)
psdf = ps.from_pandas(pdf)
psdf.backfill()
```
In the draft, I modify two rules to retain the plan id. (actually, I modified [ResolveNaturalAndUsingJoin](https://github.com/apache/spark/blob/6161bf44f40f8146ea4c115c788fd4eaeb128769/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3302-L3316) in https://github.com/apache/spark/commit/167bbca49c1c12ccd349d4330862c136b38d4522)
I am wondering whether is there some graceful approach to fix this issue? Otherwise, I'm afraid I will touch more rules.
cc @cloud-fan @HyukjinKwon @itholic
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org