You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Felix Cheung (JIRA)" <ji...@apache.org> on 2018/01/21 23:03:00 UTC

[jira] [Comment Edited] (SPARK-23114) Spark R 2.3 QA umbrella

    [ https://issues.apache.org/jira/browse/SPARK-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333725#comment-16333725 ] 

Felix Cheung edited comment on SPARK-23114 at 1/21/18 11:02 PM:
----------------------------------------------------------------

[~sameerag]

Here are some ideas for the release notes (that goes to spark-website in the announcements)

For SparkR, new in 2.3.0:

SQL changes:

SQL functions, cubing & nested structure

collect_list, collect_set, split_string, repeat_string, rollup, cube
 explode_outer posexplode_outer, %<=>%, !, not, create_array, create_map, grouping_bit, grouping_id
 input_file_name, alias, trunc, date_trunc, map_keys, map_values, current_date, current_timestamp, trim/trimString,
 dayofweek, unionByName,

to_json (map or array of maps)

Data Source -  multiLine (json/csv)

 

ML changes:

Decision Tree (regression and classification)

Constrained Logistic Regression
 offset in SparkR GLM [https://github.com/apache/spark/pull/18831]
 stringIndexerOrderType
 handleInvalid (spark.svmLinear, spark.logit, spark.mlp, spark.naiveBayes, spark.gbt, spark.decisionTree, spark.randomForest)

 

SS changes:

Structured Streaming API for withWatermark, trigger (once, processingTime), partitionBy

stream-stream join

 

Documentation:

major overhaul and simplification of API doc for SQL functions

 


was (Author: felixcheung):
[~sameerag]

Here are some ideas for the release notes (that goes to spark-website in the announcements)

For SparkR, new in 2.3.0:

SQL changes:

SQL functions, cubing & nested structure

collect_list, collect_set, split_string, repeat_string, rollup, cube
 explode_outer posexplode_outer, %<=>%, !, not, create_array, create_map, grouping_bit, grouping_id
 input_file_name, alias, trunc, date_trunc, map_keys, map_values, current_date, current_timestamp, trim/trimString,
 dayofweek, unionByName,

to_json (map or array of maps)

Data Source -  multiLine (json/csv)

 

ML changes:

Decision Tree (regression and classification)

Constrained Logistic Regression
offset in SparkR GLM https://github.com/apache/spark/pull/18831
stringIndexerOrderType
handleInvalid (spark.svmLinear, spark.logit, spark.mlp, spark.naiveBayes, spark.gbt, spark.decisionTree, spark.randomForest)

 

SS changes:

Structured Streaming API for withWatermark, trigger (once, processingTime), partitionBy

stream-stream join

 

Documentation:

major overhaul and simplification of API doc

 

> Spark R 2.3 QA umbrella
> -----------------------
>
>                 Key: SPARK-23114
>                 URL: https://issues.apache.org/jira/browse/SPARK-23114
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Documentation, SparkR
>            Reporter: Joseph K. Bradley
>            Assignee: Felix Cheung
>            Priority: Critical
>
> This JIRA lists tasks for the next Spark release's QA period for SparkR.
> The list below gives an overview of what is involved, and the corresponding JIRA issues are linked below that.
> h2. API
> * Audit new public APIs (from the generated html doc)
> ** relative to Spark Scala/Java APIs
> ** relative to popular R libraries
> h2. Documentation and example code
> * For new algorithms, create JIRAs for updating the user guide sections & examples
> * Update Programming Guide
> * Update website



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org