You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/04 02:11:44 UTC

[GitHub] [spark] HyukjinKwon edited a comment on pull request #32036: [SPARK-34890][PYTHON] Port/integrate Koalas main codes into PySpark

HyukjinKwon edited a comment on pull request #32036:
URL: https://github.com/apache/spark/pull/32036#issuecomment-812957328

Yeah, actually I have thought a lot about it, and discussed with some other people offline.

> since we already have pre-existing Pandas integration, having unrelated options referring to pandas could be confusing.

This is a very good point .. I am thinking about using a different name such as pandas-on-Spark internally. So, for example, we could have a configuration such as `spark.pandas-on-spark.blahblah`.

My current thought is to stick to use `pyspark.pandas` because:
- I checked few references such as Modin (which is probably the most similar case with us) that uses `modin.pandas`.
- Spark in general does not have its own naming in a component up to my best knowledge. As an example, [Shark](https://github.com/amplab/shark) became (or more superseded by) Spark SQL
- Koalas might not be a good name in a long run either (as far as I know it was more like related to branding?) - it might be best to clarify in its component name in a way.

However, I am open to change and to other names if many people think that `pyspark.koalas` or other alternatives are better.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org