You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhengruifeng (JIRA)" <ji...@apache.org> on 2016/10/18 08:33:58 UTC
[jira] [Commented] (SPARK-14938) Use Datasets.as to improve
internal implementation
[ https://issues.apache.org/jira/browse/SPARK-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584871#comment-15584871 ]
zhengruifeng commented on SPARK-14938:
--------------------------------------
This jira is out of date. I think it's time to close it.
> Use Datasets.as to improve internal implementation
> --------------------------------------------------
>
> Key: SPARK-14938
> URL: https://issues.apache.org/jira/browse/SPARK-14938
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: zhengruifeng
>
> As discussed in [https://github.com/apache/spark/pull/11915], we can use {{Dataset.as}} API instead of RDD operations.
> From:
> {code}
> dataset.select(col($(labelCol)).cast(DoubleType), f, w).rdd.map {
> case Row(label: Double, feature: Double, weight: Double) =>
> (label, feature, weight)
> }
> {code}
> To:
> {code}
> dataset.select(col($(labelCol)).cast(DoubleType), f, w)
> .as[(Double, Double, Double)].rdd
> {code}
> From:
> {code}
> dataset.select(col($(featuresCol)), col($(labelCol)).cast(DoubleType), col($(censorCol)))
> .rdd.map {
> case Row(features: Vector, label: Double, censor: Double) =>
> AFTPoint(features, label, censor)
> }
> {code}
> To:
> {code}
> val sqlContext = dataset.sqlContext
> import sqlContext.implicits._
> dataset.select(col($(featuresCol)).as("features"),
> col($(labelCol)).cast(DoubleType).as("label"),
> col($(censorCol)).as("censor")).as[AFTPoint].rdd
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org