You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by holdenk <gi...@git.apache.org> on 2017/03/09 22:40:20 UTC

[GitHub] spark issue #16805: [SPARK-19353][CORE] Generalize PipedRDD to use I/O forma...

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16805

Thanks for your interest in contributing to the Apache Spark project, especially in areas which can improve our interop with non-JVM languages.

That being said, this change is huge and in its current state not something that can be reasonable reviewed, if you can make a branch with just the relevant changes rebased on master that would greatly help. It seems like this is based on an internal branch which is out of sync with the current Spark master.

You may wish to look at the Spark Improvements Proposal process as a way to share your design document with the dev list or you could just send your current design document to the dev list.

In the meantime revisiting the changes that are included in this branch would be a good task to explore in parallel. I would suggest closing this PR until you've got that ready or tagging as WIP so reviewers know its ok to skip.

From looking at the description change it would be important to verify that PySpark and SparkR also do not suffer a performance regression with the proposed refactor.

Thanks again for your work :)

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org