You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2016/07/08 07:28:11 UTC

[jira] [Comment Edited] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

    [ https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367345#comment-15367345 ] 

Cheng Lian edited comment on SPARK-16303 at 7/8/16 7:27 AM:
------------------------------------------------------------

Thanks for working on this! I'd suggest to send out the PR first so that people can comment on the PR. If you think it's still in WIP status, you may add a {{\[WIP\]}} tag in the PR title.

bq. ... I suggest having everything till the 'Data Sources' section in one single source file. ... I suggest using separate methods for each meaningful block. ...

Totally agree.

bq. ... As far as I read, it is impossible to overlap examples in the plugin that we use to extract code snippets from the source files.

Actually, I've added support for overlapped snippets in [PR #13972|https://github.com/apache/spark/pull/13972]. Please check [this PR comment|https://github.com/apache/spark/pull/13972#issuecomment-229543341] for more details. This is exactly motivated by the imports issue you mentioned.

bq. I noticed that the current java version is 1.7 in the parent pom. Is it possible to update the examples submodule to 1.8? I believe that lambdas will simplify the Java code and make it more readable.

I agree that using Java 8 features can be a lot easier for writing Java code. However, I believe we are still using Java 7 for the default Jenkins PR builder, thus example code in Java 8 may hit compilation errors unless you apply some Maven profile tricks. On the other hand, even we add Java 8 examples, Java 7 examples are still necessary since Java 7 is still quite popular.

bq. What is the correct way to load RDDs? There are different alternatives. For instance, via {{spark.sparkContext}}, or via DataFrames/Datasets. I assume that the first way makes more sense in section "Interoperating with RDDs" rather than creating DataFrames/Datasets, getting RDDs and then converting back.

Yea, the first makes more sense for the RDD interoperating section.

bq. Is it fine to re-use encoders?

Yes.

bq. If I use the {{getValuesMap\[T\]()}} method, then I will have a Dataset of {{Map\[String, T\]}} as a result. It seems that Maps are unsupported right now in Datasets.

We do support {{Dataset\[Map\[K, V\]\]}}, but there's no pre-defined implicit encoders in {{SQLImplicits}} because the number of permutations of all common key/value data types is too large. Users will have to define it explicitly, e.g.:

{code}
implicit val e1: Encoder[Map[Int, String]] = ExpressionEncoder()
implicit val e2: Encoder[Map[Long, Double]] = ExpressionEncoder()
{code}



was (Author: lian cheng):
Thanks for working on this! I'd suggest to send out the PR first so that people can comment on the PR. If you think it's still in WIP status, you may add a {{\[WIP\]}} tag in the PR title.

bq. ... I suggest having everything till the 'Data Sources' section in one single source file. ... I suggest using separate methods for each meaningful block. ...

Totally agree.

bq. ... As far as I read, it is impossible to overlap examples in the plugin that we use to extract code snippets from the source files.

Actually, I've added support for overlapped snippets in [PR #13972|https://github.com/apache/spark/pull/13972]. Please check [this PR comment|https://github.com/apache/spark/pull/13972#issuecomment-229543341] for more details. This is exactly motivated by the imports issue you mentioned.

bq. I noticed that the current java version is 1.7 in the parent pom. Is it possible to update the examples submodule to 1.8? I believe that lambdas will simplify the Java code and make it more readable.

I agree that using Java 8 features can be a lot easier for writing Java code. However, I believe we are still using Java 7 for the default Jenkins PR builder, thus example code in Java 8 may hit compilation errors unless you apply some Maven profile tricks. On the other hand, even we add Java 8 examples, Java 7 examples are still necessary since Java 7 is still quite popular.

4. What is the correct way to load RDDs? There are different alternatives. For instance, via {{spark.sparkContext}}, or via DataFrames/Datasets. I assume that the first way makes more sense in section "Interoperating with RDDs" rather than creating DataFrames/Datasets, getting RDDs and then converting back.

bq. Is it fine to re-use encoders?

Yes.

bq. If I use the {{getValuesMap\[T\]()}} method, then I will have a Dataset of {{Map\[String, T\]}} as a result. It seems that Maps are unsupported right now in Datasets.

We do support {{Dataset\[Map\[K, V\]\]}}, but there's no pre-defined implicit encoders in {{SQLImplicits}} because the number of permutations of all common key/value data types is too large. Users will have to define it explicitly, e.g.:

{code}
implicit val e1: Encoder[Map[Int, String]] = ExpressionEncoder()
implicit val e2: Encoder[Map[Long, Double]] = ExpressionEncoder()
{code}


> Update SQL examples and programming guide for Scala and Java language bindings
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-16303
>                 URL: https://issues.apache.org/jira/browse/SPARK-16303
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Documentation, Examples
>    Affects Versions: 2.0.0
>            Reporter: Cheng Lian
>            Assignee: Anton Okolnychyi
>
> We need to update SQL examples code under the {{examples}} sub-project, and then replace hard-coded snippets in the SQL programming guide with snippets automatically extracted from actual source files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org