You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Felix Cheung (JIRA)" <ji...@apache.org> on 2018/01/21 23:03:00 UTC
[jira] [Comment Edited] (SPARK-23114) Spark R 2.3 QA umbrella
[ https://issues.apache.org/jira/browse/SPARK-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333725#comment-16333725 ]
Felix Cheung edited comment on SPARK-23114 at 1/21/18 11:02 PM:
----------------------------------------------------------------
[~sameerag]
Here are some ideas for the release notes (that goes to spark-website in the announcements)
For SparkR, new in 2.3.0:
SQL changes:
SQL functions, cubing & nested structure
collect_list, collect_set, split_string, repeat_string, rollup, cube
explode_outer posexplode_outer, %<=>%, !, not, create_array, create_map, grouping_bit, grouping_id
input_file_name, alias, trunc, date_trunc, map_keys, map_values, current_date, current_timestamp, trim/trimString,
dayofweek, unionByName,
to_json (map or array of maps)
Data Source - multiLine (json/csv)
ML changes:
Decision Tree (regression and classification)
Constrained Logistic Regression
offset in SparkR GLM [https://github.com/apache/spark/pull/18831]
stringIndexerOrderType
handleInvalid (spark.svmLinear, spark.logit, spark.mlp, spark.naiveBayes, spark.gbt, spark.decisionTree, spark.randomForest)
SS changes:
Structured Streaming API for withWatermark, trigger (once, processingTime), partitionBy
stream-stream join
Documentation:
major overhaul and simplification of API doc for SQL functions
was (Author: felixcheung):
[~sameerag]
Here are some ideas for the release notes (that goes to spark-website in the announcements)
For SparkR, new in 2.3.0:
SQL changes:
SQL functions, cubing & nested structure
collect_list, collect_set, split_string, repeat_string, rollup, cube
explode_outer posexplode_outer, %<=>%, !, not, create_array, create_map, grouping_bit, grouping_id
input_file_name, alias, trunc, date_trunc, map_keys, map_values, current_date, current_timestamp, trim/trimString,
dayofweek, unionByName,
to_json (map or array of maps)
Data Source - multiLine (json/csv)
ML changes:
Decision Tree (regression and classification)
Constrained Logistic Regression
offset in SparkR GLM https://github.com/apache/spark/pull/18831
stringIndexerOrderType
handleInvalid (spark.svmLinear, spark.logit, spark.mlp, spark.naiveBayes, spark.gbt, spark.decisionTree, spark.randomForest)
SS changes:
Structured Streaming API for withWatermark, trigger (once, processingTime), partitionBy
stream-stream join
Documentation:
major overhaul and simplification of API doc
> Spark R 2.3 QA umbrella
> -----------------------
>
> Key: SPARK-23114
> URL: https://issues.apache.org/jira/browse/SPARK-23114
> Project: Spark
> Issue Type: Umbrella
> Components: Documentation, SparkR
> Reporter: Joseph K. Bradley
> Assignee: Felix Cheung
> Priority: Critical
>
> This JIRA lists tasks for the next Spark release's QA period for SparkR.
> The list below gives an overview of what is involved, and the corresponding JIRA issues are linked below that.
> h2. API
> * Audit new public APIs (from the generated html doc)
> ** relative to Spark Scala/Java APIs
> ** relative to popular R libraries
> h2. Documentation and example code
> * For new algorithms, create JIRAs for updating the user guide sections & examples
> * Update Programming Guide
> * Update website
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org