You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by mateiz <gi...@git.apache.org> on 2014/03/16 23:31:04 UTC

[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/146#issuecomment-37773571
  
    Also a few comments on the doc page (http://www.cs.berkeley.edu/%7Emarmbrus/sparkdocs/_site/sql-programming-guide.html):
    * Put Spark SQL closer to the top of the programming guide and API doc menus, say under Spark Streaming. I think it will be read more often than some of the other ones.
    * On the API docs, list Spark SQL Core first, then Hive Support, and then Catalyst Optimizer
    * Each package should have a package.scala with a package-level doc for it (see e.g. http://www.cs.berkeley.edu/%7Emarmbrus/sparkdocs/_site/api/streaming/index.html#org.apache.spark.streaming.package)
    * The doc should explain the types of various things returned (e.g. what is an ExecutedQuery, what does loadFile return)
    * Name the Hive Metastore Support section to just Hive Support, since it supports the QL as well.
    * It would be cool to include an example of running MLlib or something similar on top of SQL data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---