You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "zhengruifeng (JIRA)" <ji...@apache.org> on 2016/10/18 08:33:58 UTC

[jira] [Commented] (SPARK-14938) Use Datasets.as to improve internal implementation

    [ https://issues.apache.org/jira/browse/SPARK-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584871#comment-15584871 ] 

zhengruifeng commented on SPARK-14938:
--------------------------------------

This jira is out of date. I think it's time to close it.

> Use Datasets.as to improve internal implementation
> --------------------------------------------------
>
>                 Key: SPARK-14938
>                 URL: https://issues.apache.org/jira/browse/SPARK-14938
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: zhengruifeng
>
> As discussed in [https://github.com/apache/spark/pull/11915], we can use {{Dataset.as}} API instead of RDD operations.
> From:
> {code}
> dataset.select(col($(labelCol)).cast(DoubleType), f, w).rdd.map {
>     case Row(label: Double, feature: Double, weight: Double) =>
>         (label, feature, weight)
> }
> {code}
> To:
> {code}
> dataset.select(col($(labelCol)).cast(DoubleType), f, w)
>     .as[(Double, Double, Double)].rdd
> {code}
> From:
> {code}
> dataset.select(col($(featuresCol)), col($(labelCol)).cast(DoubleType), col($(censorCol)))
>     .rdd.map {
>         case Row(features: Vector, label: Double, censor: Double) =>
>             AFTPoint(features, label, censor)
>        }
> {code}
> To:
> {code}
> val sqlContext = dataset.sqlContext
> import sqlContext.implicits._
> dataset.select(col($(featuresCol)).as("features"),
>     col($(labelCol)).cast(DoubleType).as("label"),
>     col($(censorCol)).as("censor")).as[AFTPoint].rdd
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org