You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/28 04:54:19 UTC

[GitHub] [spark] HyukjinKwon commented on issue #24234: [WIP][SPARK_26022][PYTHON][DOCS] PySpark Comparison with Pandas

HyukjinKwon commented on issue #24234: [WIP][SPARK_26022][PYTHON][DOCS] PySpark Comparison with Pandas
URL: https://github.com/apache/spark/pull/24234#issuecomment-477447661
 
 
   Hi, @gatorsmile, @BryanCutler, @ueshin, @rxin, @viirya, @thunterdb 
   
   I realised the difference is too vast so I kind of tried to narrow down it to:
   
   1. Describing fundamental differences
   2. Common DataFrame related API usages
   3. Notable differences.
   
   Few concerns from me are:
   
   - This has to describe both in details to compare, which can be change soon in both Pandas and PySpark. I tried to avoid those details that can be changed soon.
   - Since it is comparison, it's very easy for me to be biased onto one side. I tired to avoid this too at my best.
   - It's too vast to compare. High level, both are similar; however, in details, so many stuff are different, completely.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org