You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2023/07/17 10:25:49 UTC

[GitHub] [spark] zhengruifeng commented on pull request #42040: [WIP][SPARK-43611][SQL][PS][CONNCECT] Fix unexpected `AnalysisException` from Spark Connect client

zhengruifeng commented on PR #42040:
URL: https://github.com/apache/spark/pull/42040#issuecomment-1637817969

   In https://github.com/apache/spark/pull/39925, we introduced a new mechanism to resolve expression with specified plan.
   
   However, sometimes the plan ID might be eliminated by the analyzer, and then some expressions can not be correctly resolved, this issue is the No.1 blocker of PS on Connect.
   
   Currently, I investigate the two examples [in the ticket](https://issues.apache.org/jira/browse/SPARK-43611) and check each rule applied to them.
   
   example 1:
   ```
   >>> import pyspark.pandas as ps
   >>> psdf1 = ps.DataFrame({"A": [1, 2, 3]})
   >>> psdf2 = ps.DataFrame({"B": [1, 2, 3]})
   >>> psdf1.append(psdf2)
   ```
   
   example 2:
   ```
   import pyspark.pandas as ps
   import pandas as pd
   
   pdf = pd.DataFrame({"A": [None, 3, None, None], "B": [2, 4, None, 3], "C": [None, None, None, 1], "D": [0, 1, 5, 4],}, columns=["A", "B", "C", "D"],)
   psdf = ps.from_pandas(pdf)
   psdf.backfill()
   ```
   
   In the draft, I modify two rules to retain the plan id. (actually, I modified [ResolveNaturalAndUsingJoin](https://github.com/apache/spark/blob/6161bf44f40f8146ea4c115c788fd4eaeb128769/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3302-L3316) in https://github.com/apache/spark/commit/167bbca49c1c12ccd349d4330862c136b38d4522)
   
   I am wondering whether is there some graceful approach to fix this issue? Otherwise, I'm afraid I will touch more rules.
   
   cc @cloud-fan @HyukjinKwon @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org